Last September, Lily Ray asked Perplexity for the latest news on SEO and AI search. It told her, confidently, about the “September 2025 ‘Perspective’ Core Algorithm Update”; a Google update that, as she then wrote at length in “The AI Slop Loop,” didn’t exist. Google hasn’t named core updates in years. “Perspectives” was already a SERP feature. If a real update had rolled out while she was in Austria, her inbox would have told her before Perplexity did.
She checked the citations. Both pointed at AI-generated posts on SEO agency blogs: sites that had run a content pipeline, hallucinated an update, and published it as reporting. Perplexity read the slop, treated it as source material, and served it back to her as news.
In February, the BBC’s Thomas Germain spent 20 minutes writing a blog post on his personal site. Its title: “The best tech journalists at eating hot dogs.” It ranked him first, invented a 2026 South Dakota International Hot Dog Championship that had never happened, and cited precisely nothing. Within 24 hours, both Google’s AI Overviews and ChatGPT were passing his fabrication along to anyone who asked. Claude didn’t bite. Google and OpenAI did.
Everyone who has looked has seen it.
I’ve Argued About The Ouroboros Before. I Had The Timeline Wrong
The prevailing framing for this problem has been model collapse. You train a model on web text, the web fills up with AI output, the next model trains on a corpus increasingly made of its own exhaust, and eventually the distribution flattens into mush. Innovation comes from exceptions, and probabilistic systems that converge toward the mean attenuate exceptions by design. I’ve used the phrase digital ouroboros for this.
That framing assumes training cycles. It assumes time. It assumes that contamination moves at the speed of model release.
It doesn’t. What Lily documented, what Germain documented, what the New York Times then went and quantified – none of that is training-side. The models involved were not retrained between the hallucination appearing on a blog and being served as citation-backed fact. The contamination moved at the speed of a crawl. The ouroboros isn’t taking generations to eat itself. It’s eating itself at query time, every time someone asks one of these systems a question.
The pipe everyone has been watching is not the pipe that is breaking.
The Distinction That Matters
Model collapse is a training-corpus problem. Synthetic content seeps into the pre-training data, the next generation of model inherits it, capability degrades. Researchers have been warning about this for two years. They’re right. They’re also describing something slow enough that everyone can nod gravely and keep shipping.
Retrieval contamination is faster and already here. RAG systems – Perplexity, Google AI Overviews, ChatGPT with search – do not generate answers purely from parametric memory. They fetch documents from the live web, stuff them into context, and generate a response conditioned on what they retrieved. If the retriever surfaces a hallucinated SEO post, the answer inherits the hallucination. No retraining required.
The academic literature on this is clear. PoisonedRAG (Zou et al., 2024) showed that injecting a small number of crafted passages into a retrieval corpus was sufficient to control the output of a RAG system on targeted queries. BadRAG (Xue et al., 2024) demonstrated the same class of attack using semantic backdoors. Both papers treat this as an adversarial problem: what happens when an attacker deliberately poisons the corpus.
What Germain and Lily accidentally proved is that the adversarial model is the normal operating model. You don’t need a crafted adversarial passage. You need a blog post. The open web is the corpus, and anyone with a domain can write to it.
The Oumi analysis commissioned by the New York Times put numbers on what this costs. Across 4,326 SimpleQA tests, Google’s AI Overviews answered correctly 85% of the time on Gemini 2, 91% on Gemini 3. At Google’s scale – more than five trillion searches a year – a 9% error rate still translates to tens of millions of wrong answers every hour. But the more revealing figure is this: on Gemini 3, 56% of the correct answers were ungrounded, up from 37% on Gemini 2. The upgrade improved surface accuracy and made the citations worse. When the model got something right, more than half the time, the source it pointed to didn’t support the claim.
The retrieval layer is not a filter. It is the infection vector.
Who’s Seeding The Corpus
The industry that has most enthusiastically produced it – and then most enthusiastically written about the consequences of consuming it – is the SEO industry. I’ve written before about content scaling being just content spinning with better grammar, and about the AI visibility tool complex that builds dashboards from the output of non-deterministic systems. This is the same loop, one layer deeper. An SEO agency runs an AI content pipeline because AI Overviews have cut their clients’ traffic. The pipeline publishes speculative “winners and losers” posts during a core update that’s still rolling out, citing nothing. Another agency’s pipeline picks those up as sources. The output floods into the retrieval index. AI Overviews cites one of them. The original agency then writes a case study about how AI Overviews are “surfacing” their content.
An Ahrefs study of over 26,000 ChatGPT source URLs found that “best X” listicles accounted for nearly 44% of all cited page types, including cases where brands rank themselves first against their competitors. Harpreet Chatha told the BBC you can publish “the best waterproof shoes for 2026,” put yourself first, and be cited in AI Overviews and ChatGPT within days. Lily, during the actual March 2026 core update, found AI-generated articles claiming to list winners and losers while the update was still rolling out; articles that opened with filler and listed brands without a single real citation.
The practitioners scaling AI content are also the ones most directly harmed when AI search systems cite that content as fact. Nobody forced this. The industry built the pipeline, fed it, and complained about what came out the other end. Not adversarial poisoning. Just the industry polluting its own water supply and then hiring consultants to test it.
The Tier That Matters
The Oumi study is about AI Overviews, which is free by design. Google AI Overviews reportedly reached over two billion monthly active users by mid-2025. ChatGPT has around 900 million weekly active users, of which roughly 50 million pay. Meaning about 94% of the people interacting with OpenAI’s product are on the free tier.
The paid tiers are better. Per OpenAI’s own launch claims, cited in Lily’s piece, GPT-5.4 is 33% less likely to produce false individual claims than GPT-5.2. The free-tier GPT-5.3 is also improved over its predecessor (26.8% fewer hallucinations with web search, 19.7% fewer without), but it’s still meaningfully less reliable than the paywalled version. Gemini 3, which made AI Overviews more accurate on surface tests, also made the ungrounded rate worse. Better answer, weaker citation.
Nobody seems to mind. The reliable version of the product is paywalled. The version most of the planet gets – including the version at the top of Google Search – can be manipulated by 20 minutes of work on a personal website. Intelligence is the marketing category. What two billion users actually receive is a confident summarization of whatever the crawler happened to find.
Grokipedia As The Terminal State
The accidents of the retrieval layer are one thing. Grokipedia is the version where accident is no longer a useful word.
Elon Musk’s xAI launched Grokipedia on Oct. 27, 2025, with 885,279 articles, all generated or rewritten by Grok. Some of them were lifted from Wikipedia wholesale, with a disclaimer at the bottom acknowledging the CC-BY-SA license; a license Wikipedia maintains precisely because a community of human editors writes and verifies the content. Others were rewritten from scratch. PolitiFact found Grokipedia citations, including Instagram reels as sources, which Wikipedia’s own policies rule out as “generally unacceptable.” Grokipedia’s entry on Canadian singer Feist said her father died in May 2021, citing a 2017 Vice article about Canadian indie rock that made no mention of the death. And her father was still alive when that article was written. The Nobel Prize in Physics entry added an uncited sentence claiming physics is traditionally the first prize awarded at the ceremony, which isn’t true.
Musk said the goal is to “research the rest of the internet, whatever is publicly available, and correct the Wikipedia article.” The rest of the internet now includes the synthetic content produced by every AI content pipeline pointed at it. An AI system reading the open web, rewriting Wikipedia based on what it finds, and presenting the result as a reference work is the retrieval-contamination problem with the feedback loop made explicit and shipped as a product.
By mid-February 2026, Grokipedia had lost most of its Google visibility. Wikipedia outranks Grokipedia for searches about Grokipedia itself.
“This human-created knowledge is what AI companies rely on to generate content; even Grokipedia needs Wikipedia to exist.” – The Wikimedia Foundation
The synthetic encyclopedia is subsidized by the human one. When the subsidy stops, the thing depending on it stops making sense.
Wikipedia is not beyond criticism. Its edit wars, ideological gatekeeping, and systemic gaps in who gets to shape articles are well-documented and real. But the response to a flawed human editorial process is not to remove the humans entirely and call the result an improvement. I’ve written before about the accountability vacuum that opens when you replace human judgment with API calls. Wikipedia’s problems are the problems of a messy, contested, accountable system. Grokipedia’s problems are the problems of a system with no accountability at all.
The Citation Layer Is Decoupling From Authorship
I wrote recently about Reddit selling “Authentic Human Conversation™” to AI companies while the platform’s own moderators report that they can no longer tell which comments are human. The Oumi study found that of 5,380 sources cited by AI Overviews, Facebook and Reddit were the second and fourth most common. The citation layer of the most-used answer engine in the world is substantially built on two platforms that cannot verify the human origin of their own content.
Human creators are pulling out of the open web because the traffic bargain has collapsed. Answer engines are citing content whose authorship cannot be verified, or was never human to begin with. The citation is still there. The thing being cited is not what it used to be.
The ouroboros framing was right. The timeline wasn’t. Retrieval collapse doesn’t wait for the next training run. It needs an indexable URL and a retrieval system willing to trust it.
The systems are willing. And more than half the time they get an answer right, they can’t point to a source that supports what they just told you.
More Resources:
This post was originally published on The Inference.
Featured Image: Anton Vierietin/Shutterstock