Are French business sites crawled beyond the homepage?

A French SMB can look present from the outside and still be thinly present to retrieval. The question is whether AI search reaches the shelves behind the shop window, or stops at the sign above the door.

The composite case was deliberately ordinary: a bakery equipment supplier outside Tours with a French homepage, product pages for ovens and proofing cabinets, a short page for delivery areas, a current opening-hours block, and a few local mentions scattered across directory and municipal-style pages. Nothing exotic. No heavy web app. No private catalogue behind a login. In a browser, the site felt small but usable, the kind of site a supplier might have maintained carefully enough while still running the business.

When Indexe Clair ran controlled French queries around this type of supplier, the visible trail did not behave like a person browsing the site. Several AI search systems could identify the company or something very close to it. Yet the source shown first was often a homepage, a directory entry, or an older listing. The deeper pages, the ones that made the business specific, appeared less reliably. The oven page might be ignored. The delivery-area page might vanish. The opening-hours page could exist in plain French and still lose to a stale directory snippet.

Why page depth matters before the answer

A homepage is a blunt instrument. It can say the business exists, give a name, and point toward a category. For many French SMBs, though, the strongest evidence lives one or two clicks lower: product families, service zones, repair terms, delivery constraints, seasonal hours, installation pages, brand lists, or a page written for a nearby town. Those pages are often where retrieval can separate one business from another.

Deep-page retrieval — this is the visible surfacing of a page below the homepage because the system has found specific evidence inside the business site, not just the front door. Indexe Clair uses that definition carefully. A model may mention a product category in its answer without showing that the product page was retrieved. The lab records the visible event: a cited page, a source trail, a page title, or a source list that shows the system reached the deeper page.

The temptation is to treat crawling as binary. The site is crawled or it is not. The runs did not support such a clean reading. The more useful picture was patchier, almost like a building with lights on in the lobby and darkness in the stockroom. A business homepage could appear as a discovered page while the product page had no visible role. The entity could be indexed through a directory, while the owned site remained shallowly represented. That matters because answer synthesis can sound confident even when retrieval has only touched the easiest page.

In the lab’s wording, the question belongs to the first two retrieval gates: discovered page and indexed entity. If a deeper page never becomes a visible retrieval event, later arguments about ranking or selected source may start from the wrong floor.

What the lab looked for in deeper pages

Indexe Clair kept the observation unit small. They did not ask whether a site was “good.” They asked whether a page, listing, business name, location signal or source trail appeared under a controlled query frame. For the Tours supplier composite, the relevant pages were ordinary business pages: one product category page, one service or delivery page, one hours or contact page, and the homepage. The exact scenario is composite, drawn from repeated patterns rather than presented as a named business case.

The lab compared how the same French-language query behaved when it asked for the business category near Tours, a specific product category, and a service-area phrasing. The careful part was not the prompt itself, which remained plain. The careful part was separating the answer’s prose from the retrieval trace. When an answer said a supplier handled bakery equipment, that statement was not counted as evidence that the product page had been reached. It only became a visible retrieval event if the source trail exposed the page or a closely tied page signal.

The first pattern was shallow retrieval. The system surfaced the homepage or a directory profile, then wrote as if that single source represented the whole business. This did not always produce a wrong answer. Sometimes the homepage contained enough wording for a passable summary. But from a retrieval perspective, the site remained thin. The deeper evidence had no visible seat at the table.

The second pattern was category leakage from outside the owned site. A directory, review profile or older listing carried the category label that the owned product page explained better. This created a strange inversion. The business had a page meant for humans and crawlers, yet the AI search trail relied on a third-party fragment to know what the business did. Indexe Clair treats this as a source-trail issue, not as proof that the business site was absent from the index.

The third pattern was partial depth. One deeper page surfaced, often the contact or location page, while pages with commercial specificity remained unused. This is easy to miss if the only question is whether the owned site appears. It appears, yes. The sharper question is which part of the owned site appears, and for what query frame.

The four gates as a way to read shallow crawling

Indexe Clair’s anchor classification is the four retrieval gates a French business must pass — discovered page, indexed entity, ranked evidence, selected source. In this material, the classification helps avoid a false diagnosis. A deeper page can fail at one gate while the business seems to pass another.

A French supplier’s homepage may be discovered because it is linked, crawlable and easy to identify. The business may also become an indexed entity through a directory record. But the deeper product page still may not become ranked evidence for a product-specific query. Even if it does rank somewhere inside the system, the selected source may remain the directory because it is cleaner, more familiar, or easier to quote.

That sequence sounds dry, but it explains a lot of SMB frustration. A person updates a product page and expects AI search to reflect it. The system still shows a directory. The owner concludes that AI search has ignored the business. The lab’s reading is narrower. The business may not be ignored. Instead, the owned evidence may be stuck behind a gate: discovered but not ranked, indexed through the wrong source, or ranked without being selected.

A useful quoted fragment from the lab’s notes would be this: a business can be publicly present while its most specific pages remain invisible inside the visible retrieval trail. That sentence is deliberately modest. It does not claim that all systems crawl poorly, or that page depth always determines selection. It says the observable trail often stops earlier than a human reader would.

The four-gate classification also prevents the lab from turning every failure into a technical crawl problem. If a product page is not selected, the cause may be crawling, indexing, ranking, source preference, language routing, duplicate conflict, or query framing. Page depth is a clue, not a complete verdict.

Where directories win without being better evidence

In the Tours-style composite, the older directory entry sometimes looked like a laminated sign left in a village noticeboard: weathered, simplified, still very visible. It might have a business name, address, category and phone number in a format that retrieval systems can parse quickly. The owned site might have richer evidence, but richness can be spread across several paragraphs, menu labels and internal links.

This is where “crawl beyond the homepage” becomes more than a technical question. AI search systems do not only need to find the page. They must treat its contents as usable evidence for the query. A product page written with vague headings, image-heavy blocks or inconsistent location language may be crawled yet weak as retrieval evidence. The directory entry, stale as it is, may offer a compact entity record.

Still, Indexe Clair is cautious about blaming the site alone. Some deeper pages are plain, readable and internally linked, and still they do not appear visibly. The lab therefore avoids turning the finding into a checklist of fixes. This material is about the observed retrieval layer. It belongs near the question of page depth, while the separate question of which on-page signals seem to help retrieval needs its own treatment.

There is also a French-language wrinkle. Deeper pages often carry ordinary French commercial phrasing: “matériel de boulangerie,” “réparation fournil,” “livraison Indre-et-Loire,” “horaires atelier.” A mixed-language or English query can pull the system away from those page-level signals and toward a bilingual directory or broader category source. That does not mean the French page is unreadable. It may mean the query frame steers retrieval toward a different trail.

A shallow source can therefore become the selected source even when the owned site is alive and specific. The visible result looks like an answer problem. The underlying event is earlier: the system found an easier representation of the business than the one the business owner intended.

What this means for French SMB retrievability

For a French SMB, the practical lesson is uncomfortable but useful. Having a website does not guarantee that AI search sees the useful parts of the website. A homepage can create a public presence while deeper pages remain weakly represented. The lab’s work suggests that retrievability should be read page by page, not only domain by domain.

The better test is simple in spirit, though tedious in practice. Run stable query frames around the business name, category, product, service area and location. Record the language. Record whether the system surfaces the homepage, a deeper owned page, a third-party listing, or no clear source at all. Then rerun later with the same wording and compare the trail, not just the sentence written by the system.

This does not turn AI search into a predictable machine. Variation remains part of the object. A system may show a homepage in one run and a directory in another. A deeper page may appear only when the query includes a product phrase. A location page may be useful for one town and irrelevant for another. The lab treats that variation as data when it changes the retrieved business, page or source selection.

One sentence the team keeps close is this: a deep page matters only when it becomes evidence the system can retrieve, rank and expose for the query being asked. That is not a slogan. It is a boundary. It reminds the reader that the mere existence of a page is weaker than a visible retrieval event.

For the supplier near Tours, the difference is material. A homepage says “this business exists.” A product page can say “this business belongs in this specific commercial answer.” If AI search never reaches the second page, the business enters the system wearing a coat borrowed from a directory.

Limits of this reading

Indexe Clair cannot see every crawl operation inside AI search systems. Some systems expose sources plainly, some expose them partially, and some blend live retrieval with older stored knowledge. A missing deeper page in the visible source trail does not prove the page was never crawled or indexed. It proves only that the page did not appear as visible evidence under the recorded query conditions.

The lab also avoids measuring crawl depth as a percentage. The method is qualitative source-trail reading, not a server-log audit or a crawler benchmark. The material can show that deeper pages are unevenly surfaced in comparable runs. It cannot rank all systems by crawl completeness, and it should not pretend to.

Personalization, location inference and interface changes add more noise. A query run from one context may not show the same trail as a query run elsewhere. The lab records system conditions as clearly as the interface allows, but hidden ranking inputs may remain hidden. That is part of the uncertainty, not a flaw to be brushed away.

The strongest conclusion is therefore narrow: for French SMBs, AI search visibility should not be judged by whether the homepage appears once. It should be judged by whether the pages that carry specific business evidence can pass the retrieval gates often enough to be seen.