Indexe Clair.

← Back to the index

Research note 11 · · ·

Which French sources do AI search engines choose first?

AI search source choice for French business queries is best read as a retrieval pattern, not a simple hierarchy of trust. Indexe Clair observes which trail becomes selected first — owned site, directory, review page, regional mention, official record or mixed listing — and compares that choice against the business evidence left unused.

Recorded by Camille Varenne March 19, 2026

The first source in an AI search answer is not always the most accurate source. Often it is the source that gives the retrieval layer the cleanest, most selectable business object.

In a composite scenario, a supplier near Tours has an owned French website with product pages, a contact page and clear category language. It also has a directory profile, a few review traces, a small local mention and an old listing where the opening hours no longer match. When a controlled query asks for a bakery equipment supplier near Tours, the answer does not always begin with the owned site. Sometimes it starts from the directory. Sometimes it uses the owned site for description and the directory for identity. Sometimes the source trail is visible enough to see the splice.

The same shape appears in the composite Lyon repair service, with a different accent. The business has a crawlable service site and municipal mention, but city-framed queries pull larger competitors and review profiles into the first visible trail. The independent firm is not absent from public evidence. It is simply not the easiest source to select first under every query. Indexe Clair studies that first choice because it often decides which version of the business the reader meets.

Source choice is an event, not a moral ranking

It is tempting to read source selection as a judgment of quality. If an AI search system cites a directory before the company’s own site, the directory must have been considered more reliable. Sometimes that may be true in a limited sense. More often, the lab reads a narrower event: the system selected one visible evidence trail for this query, under these conditions, while leaving other traces lower or unused.

Source selection — in Indexe Clair’s terms — is the moment when an AI search system uses one source trail as visible evidence while leaving other available traces unused or less prominent, because that source has become the answer’s retrievable path. The definition avoids a common mistake. Selection does not prove that the source is correct, complete or preferred by the business. It proves that the source became usable inside the system’s retrieval and answer process.

For French SMB queries, the selected source can be an owned website, a directory profile, a review page, a regional or municipal mention, an official record, a search-result surface, or a mixed listing that carries parts of several records. The source that appears first may hold the business name cleanly but the hours badly. Another may describe the service well but omit the precise commune. A third may be current but too thin to act as the main evidence.

This is why the lab does not sort sources into good and bad too quickly. A directory can be stale and still useful for entity recognition. An owned site can be accurate and still hard to parse. A review page can confirm activity but distort category. A regional mention can strengthen geography while saying little about current services. The source trail is a workshop bench with mismatched tools on it. The system picks one tool first; that does not mean it is the best tool for every cut.

Owned sites often carry truth, directories often carry structure

In many French SMB cases, the owned website is where the freshest operational truth sits. It has the updated service description, the corrected phone number, the product category the owner actually wants to sell. But owned sites vary wildly. Some are clear and crawlable. Others hide core information in images, use thin headings, bury location text, or split business identity across pages with inconsistent wording.

Directories, by contrast, often package the business as a neat object: name, address, category, phone, map relation, sometimes reviews or hours. That package can be attractive to a retrieval layer. The problem is that structure and freshness are not the same property. A directory may preserve an old category long after the owned site has changed. It may keep a former address because the duplicate record was never reconciled. It may rank because it is legible, not because it is right.

Indexe Clair’s source-trail readings often find this split. The owned page contains the best local nuance. The directory contains the cleanest entity shell. AI search systems may then select the shell and write a sentence that sounds like it came from the business itself. When sources are exposed, the mismatch can be examined. When sources are hidden, the answer has to be treated with more caution.

The lab’s concern is not to defend owned websites out of loyalty. There are owned sites that give retrieval very little to work with. A homepage with a logo, a slideshow and “contactez-nous” may be less useful than a directory profile that plainly says “réparation électroménager à Vénissieux.” For retrieval, a modest sentence can beat a polished page if the modest sentence carries the entity, category and place clearly.

Review pages, regional mentions and official records play different roles

Review profiles enter French business retrieval in a different way. They can signal activity, customer language, category and geography. They may be especially visible when a query contains local intent or when the business’s own site is weak. But review pages can also create noise. A repair service may be described by customers using informal category words. A supplier may be discussed through a product nickname rather than its official service line. The system may retrieve the business through the crowd’s vocabulary, then misread that vocabulary as the company’s main category.

Regional and municipal mentions are usually thinner but useful in another way. They can anchor geography. A page from a town, a local association or a regional article may not contain full business details, yet it confirms that the business belongs to a place. In peri-urban cases, that place signal matters. The Lyon repair service may be pulled toward the city center unless surrounding commune references are visible enough to survive the query frame.

Official records sit in a more complicated position. They can support entity existence, name consistency and registration signals, but they are not always friendly to ordinary service queries. A source can be official and still not answer what a customer asks. If a query asks for a bakery equipment supplier’s current pickup hours, an official record may verify the company but not the useful operational detail. AI search may therefore use official evidence as background while selecting a directory or owned page for the visible answer.

The lab reads these sources as roles, not ranks. Owned sites often hold operational detail. Directories often hold structured identity. Review pages carry lived category language. Regional mentions help with place. Official records support existence and naming. The selected first source depends on which role the system needs for the query, and which source is easiest to retrieve.

The four gates explain why first choice varies

Indexe Clair places source choice inside the four retrieval gates a French business must pass: discovered page, indexed entity, ranked evidence, selected source. The classification is qualitative, and it is useful precisely because different source types can win at different gates.

An owned site may be discovered and even ranked for a direct brand query, but a directory may provide the indexed entity that a broader category query leans on. A review page may not be the business’s ideal description, yet it ranks as evidence because it matches the words people use in the query. A regional mention may never become the selected source for a service description, but it can help a location signal survive. The first visible source is the end of a chain, not the whole chain.

This is where the lab’s source-trail practice slows down the reading. Suppose a French query retrieves the Tours supplier through a directory first. The immediate reaction is to say the directory outranked the owned site. That may be too simple. The owned site might be discovered, but its product pages may not be tied clearly enough to “fournisseur matériel boulangerie.” The entity may be indexed through the directory. The directory may rank for the category-town phrase. Then the directory becomes the selected source. Each gate adds a small bend.

For the Lyon composite, the chain competitor sometimes appears first because the query frame asks for a broad city service. The independent repair site may be indexed as an entity, but its evidence may not rank strongly for the larger urban frame. A review profile or chain location page then becomes the selected source. The issue is not only source type. It is the fit between source, query language and geography.

First source selection is the visible tip of earlier retrieval decisions: what was found, what became an entity, and what ranked.

That sentence is one of the lab’s core cautions. It prevents a reader from treating the first cited source as the whole explanation. A first source tells the team where the visible answer began. It does not, by itself, reveal every source the system knew, ignored or held lower in the ranking.

How the lab compares source trails without making a league table

The work-item’s search question asks which French sources AI search engines choose first. Indexe Clair resists turning that into a universal league table. The lab does not claim that one system always prefers directories or that another always prefers owned sites. The evidence is too conditional. Query language, business category, location, source freshness and interface exposure all change what can be seen.

Instead, the lab compares runs by source role and conflict type. For the same business scenario, the team records whether the first visible trail is the owned site, a directory, a review profile, a regional mention, an official record or a mixed source. Then they compare what that source carried: current hours, stale address, category label, geography, business name, product detail, service area. The question becomes richer than “who cited what.” It becomes “which part of the business did the selected source make retrievable?”

The lab also looks at language routing. A French prompt may select the owned French site, while an English prompt selects a directory or a translated listing. A mixed-language prompt can produce a hybrid path: French place names, English category terms, and a source that was written for neither situation cleanly. This is not a side issue in France. Many business traces are French-first, while some AI search behavior appears more comfortable with sources that package entities in broadly parseable formats.

There is a quiet trap here. If a system chooses a directory first, the business may assume the owned site has failed. Sometimes it has. Other times, the owned site is visible under a more precise frame, but the directory wins broad discovery. The lab tries to preserve that difference because it changes the next question. A site that is invisible needs one kind of investigation. A site that is visible but not selected needs another.

What source choice tells a French business owner

For a business owner or agency, the first useful reading is comparative. Place the selected source beside the owned evidence and ask what the system gained from choosing it. Did it get the business name more cleanly? The address? A category? Reviews? A map relation? A short crawlable description? The answer can be uncomfortable because the stale source may be wrong in one detail and still clearer in another.

If the owned site loses source selection, the response is not simply to add more promotional copy. Retrieval usually needs plain evidence: crawlable category language, consistent business name, visible location terms, internally linked service pages, and a source trail that does not contradict itself across public records. This is an interpretation drawn from observed mechanisms. It is not a promise that changing one page will move one system.

The lab’s practical stance is restrained. A French SMB should not try to make every public source identical in a brittle way. Public evidence naturally differs. A review page and an owned service page serve different purposes. But conflicts that touch name, address, hours, category and geography are more likely to distort source selection than differences in tone. Those are the seams where the retrieval layer can catch.

For agencies, this reading also changes reporting. Saying “AI cited Pages Jaunes” or “AI did not cite the owned site” is too thin. A stronger report identifies the query frame, the selected source, the unused owned evidence, the conflict, and the gate where the business appears to lose selection. That gives a reader something to test again.

Limits of the first-source question

This material cannot prove the private reasons an AI search system selected one source over another. The lab can see visible source trails only where interfaces expose them. Some answers may be shaped by sources that are not shown. Live retrieval may mix with cached knowledge. Personalization, location settings and interface changes can alter results in ways that are difficult to fully control.

The lab also avoids broad claims about all French businesses. The observed patterns come from qualitative source-trail reading around business categories with scattered public evidence: owned sites, directories, reviews, municipal or sector pages and regional mentions. A restaurant, repair service, industrial supplier and medical office may all present different retrieval shapes. One sample cannot carry the whole country.

Even so, the first selected source remains worth studying. It is the point where the retrieval layer becomes visible to the reader. If the source is current, clear and close to the business, the answer has better footing. If the source is stale, duplicate or only loosely tied to the location, the answer may still sound confident while leaning on the wrong piece of the public record. Indexe Clair’s conclusion is simple enough to test: ask which source was selected first, then ask which better evidence had to be bypassed for that to happen.

Camille Varenne
responsible for the record
Indexe Clair · France · March 19, 2026