People Love It, Models Ignore It

January 29, 2026

People Love It, Models Ignore It

You publish a page that solves a real problem. It reads clean. It has examples, and it has the edge cases covered. You would happily hand it to a customer.

Then you ask an AI platform the exact question that page answers, and your page never shows up. No citation, no link, no paraphrase. Just omitted.

That moment is new. Not because platforms give different answers, as most people already accept that as reality. The shift is deeper. Human relevance and model utility can diverge.

If you are still using “quality” as a single universal standard, you will misdiagnose why content fails in AI answers, and you will waste time fixing the wrong things.

The Utility Gap is the simplest way to name the problem.

Image Credit: Duane Forrester

What The Utility Gap Is

This gap is the distance between what a human considers relevant and what a model considers useful for producing an answer.

Humans read to understand. They tolerate warm-up, nuance, and narrative. They will scroll to find the one paragraph that matters and often make a decision after seeing the whole page or most of the page.

A retrieval plus generation system works differently. It retrieves candidates, it consumes them in chunks, and it extracts signals that let it complete a task. It does not need your story, just the usable parts.

That difference changes how “good” works.

A page can be excellent for a human and still be low-utility to a model. That page can also be technically visible, indexed, and credible, and yet, it can still fail the moment a system tries to turn it into an answer.

This is not a theory we’re exploring here, as research already separates relevance from utility in LLM-driven retrieval.

Why Relevance Is No Longer Universal

Many standard IR ranking metrics are intentionally top-heavy, reflecting a long-standing assumption that user utility and examination probability diminish with rank. In RAG, retrieved items are consumed by an LLM, which typically ingests a set of passages rather than scanning a ranked list like a human, so classic position discounts and relevance-only assumptions can be misaligned with end-to-end answer quality. (I’m over-simplifying here, as IR is far more complex that one paragraph can capture.)

A 2025 paper on retrieval evaluation for LLM-era systems attempts to make this explicit. It argues classic IR metrics miss two big misalignments: position discount differs for LLM consumers, and human relevance does not equal machine utility. It introduces an annotation scheme that measures both helpful passages and distracting passages, then proposes a metric called UDCG (Utility and Distraction-aware Cumulative Gain). The paper also reports experiments across multiple datasets and models, with UDCG improving correlation with end-to-end answer accuracy versus traditional metrics.

The marketer takeaway is blunt. Some content is not merely ignored. It can reduce answer quality by pulling the model off-track. That is a utility problem, not a writing problem.

A related warning comes from NIST. Ian Soboroff’s “Don’t Use LLMs to Make Relevance Judgments” argues you should not substitute model judgments for human relevance judgments in the evaluation process. The mapping is not reliable, even when the text output feels human.

That matters for your strategy. If relevance were universal, a model could stand in for a human judge, and you would get stable results, but you do not.

The Utility Gap sits right in that space. You cannot assume that what reads well to a person will be treated as useful by the systems now mediating discovery.

Even When The Answer Is Present, Models Do Not Use It Consistently

Many teams hear “LLMs can take long context” and assume that means “LLMs will find what matters.” That assumption fails often.

“Lost in the Middle: How Language Models Use Long Contexts” shows that model performance can degrade sharply based on where relevant information appears in the context. Results often look best when the relevant information is near the beginning or end of the input, and worse when it sits in the middle, even for explicitly long-context models.

This maps cleanly to content on the web. Humans will scroll. Models may not use the middle of your page as reliably as you expect. If your key definition, constraint, or decision rule sits halfway down, it can become functionally invisible.

You can write the right thing and still place it where the system does not consistently use it. This means that utility is not just about correctness; it’s also about extractability.

Proof In The Wild: Same Intent, Different Utility Target

This is where the Utility Gap moves from research to reality.

BrightEdge published research comparing how ChatGPT and Google AI approach visibility by industry. In healthcare, BrightEdge reports 62% divergence and gives an example that matters to marketers because it shows the system choosing a path, not just an answer. For “how to find a doctor,” the report describes ChatGPT pushing Zocdoc while Google points toward hospital directories. Same intent. Different route.

A related report from them also frames this as a broader pattern, especially in action-oriented queries, where the platform pushes toward different decision and conversion surfaces.

That is the Utility Gap showing up as behavior. The model is selecting what it considers useful for task completion, and those choices can favor aggregators, marketplaces, directories, or a competitor’s framing of the problem. Your high-quality page can lose without being wrong.

Portability Is The Myth You Have To Drop

The old assumption was simple. If you build a high-quality page and you win in search, you win in discovery, and that is no longer a safe assumption.

BCG describes the shift in discoverability and highlights how measurement is moving from rankings to visibility across AI-mediated surfaces. Their piece includes a claim about low overlap between traditional search and AI answer sources, which reinforces the idea that success does not transfer cleanly across systems.

Profound published a similar argument, positioning the overlap gap as a reason top Google visibility does not guarantee visibility in ChatGPT.

Method matters with overlap studies, so treat these numbers as directional signals rather than fixed constants. Search Engine Land published a critique of the broader trend of SEO research being over-amplified or generalized beyond what its methods can support, including discussion of overlap-style claims.

You do not need a perfect percent to act. You just need to accept the principle. Visibility and performance are not portable by default, and utility is relative to the system assembling the answer.

How You Measure The Utility Gap Without A Lab

You do not need enterprise tooling to start, but you do need consistency and intent discipline.

Start with 10 intents that directly impact revenue or retention. Pick queries that represent real customer decision points: choosing a product category, comparing options, fixing a common issue, evaluating safety or compliance, or selecting a provider. Focus on intent, not keyword volume.

Run the exact same prompt on the AI surfaces your customers use. That might include Google Gemini, ChatGPT, and an answer engine like Perplexity. You are not looking for perfection, just repeatable differences.

Capture four things each time:

Which sources get cited or linked.
Whether your brand is mentioned (cited, mentioned, paraphrased, or omitted).
Whether your preferred page appears.
Whether the answer routes the user toward or away from you.

Then, score what you see. Keep the scoring simple so you will actually do it. A practical scale looks like this in plain terms:

Your content clearly drives the answer.
Your content appears, but plays a minor role.
Your content is absent, and a third party dominates.
The answer conflicts with your guidance or routes users somewhere you do not want them to go.

That becomes your Utility Gap baseline.

When you repeat this monthly, you track drift. When you repeat it after content changes, you can see whether you reduced the gap or merely rewrote words.

How You Reduce The Utility Gap Without Turning Your Site Into A Checklist

The goal is not to “write for AI.” The goal is to make your content more usable to systems that retrieve and assemble answers. Most of the work is structural.

Put the decision-critical information up front. Humans accept a slow ramp. Retrieval systems reward clean early signals. If the user’s decision depends on three criteria, put those criteria near the top. If the safest default matters, state it early.

Write anchorable statements. Models often assemble answers from sentences that look like stable claims. Clear definitions, explicit constraints, and direct cause-and-effect phrasing increase usability. Hedged, poetic, or overly narrative language can read well to humans and still be hard to extract into an answer.

Separate core guidance from exceptions. A common failure pattern is mixing the main path, edge cases, and product messaging inside one dense block. That density increases distraction risk, which aligns with the utility and distraction framing in the UDCG work.

Make context explicit. Humans infer, but models benefit when you state assumptions, geography, time sensitivity, and prerequisites. If guidance changes based on region, access level, or user type, say so clearly.

Treat mid-page content as fragile. If the most important part of your answer sits in the middle, promote it or repeat it in a tighter form near the beginning. Long-context research shows position can change whether information gets used.

Add primary sources when they matter. You are not doing this for decoration. You are giving the model and the reader evidence to anchor trust.

This is content engineering, not gimmicks.

Where This Leaves You

The Utility Gap is not a call to abandon traditional SEO. It is a call to stop assuming quality is portable.

Your job now runs in two modes at once. Humans still need great content. Models need usable content. Those needs overlap, but they are not identical. When they diverge, you get invisible failure.

That changes roles.

Content writers cannot treat structure as a formatting concern anymore. Structure is now part of performance. If you want your best guidance to survive retrieval and synthesis, you have to write in a way that lets machines extract the right thing, fast, without getting distracted.

SEOs cannot treat “content” as something they optimize around at the edges. Technical SEO still matters, but it no longer carries the whole visibility story. If your primary lever has been crawlability and on-page hygiene, you now have to understand how the content itself behaves when it is chunked, retrieved, and assembled into answers.

The organizations that win will not argue about whether AI answers differ. They will treat model-relative utility as a measurable gap, then close it together, intent by intent.

More Resources:

This post was originally published on Duane Forrester Decodes.

Featured Image: LariBat/Shutterstock

Source link