Do AI systems route bilingual business queries away from French pages?

A French page can be present, crawlable and still lose the retrieval path when a query nudges the system toward another language trail. The question is where that turn happens: at discovery, entity matching, ranking, or final source selection.

The odd case is not the one where a business has no web presence. Indexe Clair starts with the more irritating case: a French company has a working site, service pages, address, opening hours, a few directory entries, and enough regional mentions to look alive. Then an English-language AI search query about the same category surfaces an English directory stub, a review profile, or a mixed listing before the company’s own French page.

In one composite scenario used by the lab, a bakery equipment supplier near Tours has product pages in French, short local mentions, and a stale directory entry that has partly translated category wording. A French query finds the owned site in some runs. An English query about “bakery equipment supplier near Tours” more often leans toward the directory trail. A mixed query, with the town name in French and the category in English, behaves like a coin that has been filed on one edge. Sometimes it lands on the business. Sometimes it slides toward whatever source has the easier language bridge.

The query frame is not a cosmetic layer

A query frame is the fixed wording, language, location and intent used to run a comparison. In ordinary search work, people often treat language as a surface choice: French in, French results; English in, English results. AI search makes that assumption less safe. The language can become a routing signal. It may change which source trails are considered close enough to answer the question, not just which words appear in the final answer.

Indexe Clair separates three moments that are easily blended together. The first is whether a French page can be discovered at all. The second is whether the system connects that page to the business entity. The third is whether that source becomes useful enough to appear in the answer trail. In bilingual retrieval, those three moments do not always move together. A system may know that the business exists, but the selected source for an English query may still be a directory because the directory gives the category in a form that matches the query more neatly.

Language routing in AI search is the selection of one evidence trail over another because the query frame makes language act as a relevance signal. That definition matters because the failure can look like a language problem while actually being a source-selection problem. The French page is not necessarily invisible. It may be present as weak evidence while another record is easier to rank.

The lab is cautious here. It does not claim that English queries always disadvantage French pages. Some English prompts retrieve the owned French site cleanly, especially when the business name is precise and the location is unambiguous. The weaker pattern appears in category queries: “supplier,” “repair service,” “near Tours,” “around Lyon,” “professional bakery ovens,” and other phrases where the system must infer which French terms belong to the same business intent. That is where bilingual routing becomes a narrow little hinge.

Where the French page falls out of the trail

The composite Tours supplier shows the pattern in a stripped-down way. In French, the category phrasing lines up with the owned site: matériel de boulangerie, équipements, fourniture professionnelle, Tours, Indre-et-Loire. The site is not beautiful. It has a product page that loads slowly and a contact page where the address repeats in a slightly different form. Still, it gives the system enough crawlable French text to work with.

In English, the signal gets rearranged. The owned site may not say “bakery equipment supplier” in a literal English phrase. A directory entry might. A review profile might use translated category labels. A map-style listing may put “bakery equipment” beside the town in a structured field. The system may then retrieve the directory not because it is fresher or more accurate, but because it offers a compressed bilingual handle. It is the shelf label in a storage room: not the best object, just the easiest one to grab.

The lab classifies these cases through the four retrieval gates a French business must pass — discovered page, indexed entity, ranked evidence, selected source. A French page may pass the first gate because it can be crawled. The business may pass the second gate because the name and address form an indexed entity. The owned page may even enter ranked evidence for a French query. But the final gate can shift in an English or mixed query if another source gives the system a simpler language match.

This is why “the AI found the business” can be a sloppy statement. Found where? Through which source? Under what query frame? A system that answers with the correct business name but cites a stale bilingual directory has not shown the same retrieval behavior as one that selects the current owned French page. Both may look like visibility in a quick screenshot. Under the lab’s method, they are different retrieval events.

A small flaw usually reveals the split. The model names the business correctly but gives the old opening hours. Or it keeps the right town but imports an English category that the company does not use on its site. Or it links the entity to the directory address while the owned page has the corrected postal line. These little bruises are useful. They show which trail carried the answer.

Mixed-language prompts create their own traps

Mixed prompts are tempting because real users write them. A marketer in London may search in English for a French supplier. A French founder may type an English AI-search phrase because the tool’s interface feels global. A local agency may combine “near me,” a French city, and a French business category. These are not artificial prompts. They are messy, normal prompts.

Indexe Clair treats them as separate query frames rather than watered-down versions of French or English. A mixed prompt can create a hybrid relevance problem. The system must decide whether the English part names the business category, whether the French part names the location, and whether available sources should be translated, matched semantically, or replaced by already bilingual records. A directory with a poor translation can then become more retrievable than a French page with stronger business evidence.

The pattern is especially visible when the business category has several plausible translations. “Repair service” might map to réparation, dépannage, service après-vente, maintenance, atelier, or a trade-specific term. “Supplier” may become fournisseur, grossiste, distributeur, or matériel professionnel depending on context. If the owned site uses one French term and a directory uses another, a mixed prompt may pull retrieval toward the source that happens to bridge the English phrasing.

The Lyon peri-urban repair service, another composite scenario used by the lab, shows the same tension from the other side. A small repair firm outside Lyon has clear French service pages and municipal mentions. It is visible when the query names its suburb and trade in French. But a mixed query such as “appliance repair near Lyon dépannage électroménager” can pull larger city directories or chain listings forward. The system reads “near Lyon” as a stronger organizing frame than the suburb, and the English word “repair” gives large listings a ready-made category match.

There is a human reason this matters. A business owner may test AI search once in French, see the owned site appear, and assume the retrieval layer is healthy. Another buyer may search in English and reach a different source trail altogether. The business did not vanish. The route changed.

What the lab records when a language route shifts

The lab does not rely on the final wording alone. It records the query frame, the apparent source trail, the language of selected sources, the business name variant, the location signal, and whether the selected evidence came from an owned page, directory, review profile, regional mention or mixed listing. This is a slow way to read AI search. It is also the only way to avoid mistaking a fluent answer for a stable retrieval path.

A useful observation might be modest: the French prompt selected the company’s service page; the English prompt selected a directory; the mixed prompt named the company but kept the directory’s category wording. That is not a grand law about all French SMBs. It is a visible retrieval event, and it can be compared with later runs. If the same split appears again under comparable conditions, the lab treats it as a pattern worth interpreting.

One pattern is source-language substitution. The system has access to a French owned page, but it selects a source with English or bilingual labels because the query is English. Another is entity-language splitting, where the business name remains stable but the category, address or service description comes from a separate record. A third is geographic-language drift: the query’s English framing collapses a peri-urban location into a larger nearby city, even though the French query keeps the smaller place.

Those are qualitative types, not scores. Indexe Clair does not assign a business a bilingual retrieval grade. The classification is meant to keep the mechanism visible. A page can be present but not selected. A listing can be selected but stale. A business can be retrieved through its name while its owned page remains absent from the visible trail. The difference is not academic housekeeping; it changes what a business should inspect.

What follows for French SMB evidence

The first practical implication is uncomfortable: translation is not the whole answer. Adding an English paragraph to a French site may help a query bridge, but the lab would not treat it as a guaranteed fix. If the page remains hard to crawl, internally isolated, inconsistent with directory records, or vague about location, a bilingual snippet may become another loose signal rather than a stronger trail.

The more grounded task is to make the entity connection less fragile across languages. A French page that names the business consistently, states the category in crawlable text, repeats the location clearly, links to relevant service or product pages, and avoids hiding key evidence in images gives AI search more to match. Where a business genuinely serves English-speaking buyers, an English explanation can help, but it should sit beside the French evidence rather than replace it. The system needs a bridge, not a costume.

The lab also watches whether the selected source changes when prompts move from category-first to name-first. A name-first English prompt may retrieve the owned French page because the entity is explicit. A category-first English prompt may select a directory because the category match is easier there. That contrast is often more revealing than either run alone. It shows whether the business is indexed as an entity but weakly ranked as evidence for a category.

For agencies and SMB owners, the useful question is not “Can AI search understand French?” That is too broad, almost theatrical. The sharper question is: when a buyer describes this French business in another language, which source becomes the system’s bridge? If the bridge is a stale listing, the owned site may need clearer bilingual category evidence, stronger internal links, and cleaner consistency with external records. If the bridge is a review profile, the issue may be authority or structure. If the bridge changes from run to run, repeatability itself is the finding.

Limits of the language-routing reading

This material cannot show how every AI search system handles French-language evidence internally. Some systems expose sources plainly; others show a finished answer with little trail. Personalization may be partly hidden. Live retrieval can mix with older stored knowledge. A run that looks like language routing may also include freshness, popularity, structured data, or geographic signals that the interface does not reveal.

Indexe Clair therefore treats bilingual drift as an observed pattern, not a universal rule. The same French page can be selected under one English query and missed under another. A directory can win because of language, but also because it is easier to parse, more internally structured, or more strongly connected to the business entity. The lab can record which source surfaced; it cannot always prove the private ranking reason behind that selection.

The method is still useful because it keeps the question small enough to test. Run the same business through French, English and mixed query frames. Record the source trail, not just the prose. Watch whether the business name, category and location travel together or split across sources. The answer may not be tidy. It often looks like a drawer of old labels. But if the French owned page only appears when the prompt speaks its exact language, that is a retrieval problem worth naming.