Data Architecture — Sun Apr 19

← Home | ← data-architecture

If AI Eats the Web, Who Feeds AI?

Sun Apr 19 — AI can answer faster than search, but speed is not the same as sustainability. If creators lose traffic, revenue, and incentive, the open knowledge loop starts to break.
#AI #Data #Strategy #Internet #Platform-Economics

I’m not an early AI adopter.

I did not really start using ChatGPT until my last job, when colleagues said they leaned heavily on it.

I also know the panic around it all too well. Because of my music hobbies, I have been around plenty of conversations where people are deeply concerned about what AI might do to creative work.

At first, I used it more like a better Google. I understood it might not always be correct, but I liked that it cut out the noise, the snotty comments, and a lot of the distraction that comes with the internet as a whole.

That is part of the appeal.

Yesterday, at the pub during our monthly catch-up, one friend whose opinion I respect mentioned he had adopted Codex and was loving it. He is an exceptional full-stack developer, so that surprised me a bit.

Not in a bad way. More because he may already be using these tools more heavily than I am.

We ended up discussing how I see AI products being implemented inside organizations. In that world, data is constantly being created and changed: projects, reports, ideas, planning, decisions, revisions. The flow of data in and out of the business is both the incentive and the justification for the system to exist.

But when I look at my own usage as a model, there is a harder question underneath the convenience:

If AI becomes the main way people consume knowledge, what happens to the people and platforms that produce it?

That is not just an AI question.

It is a data supply chain question.

The old loop

For years, the web worked on a rough but understandable bargain.

People wrote things.
Search engines helped others find them.
Traffic came back to the source.
That traffic created some mix of money, reputation, authority, community, or opportunity.

It was not perfect. Far from it.

But the loop basically worked:

create → publish → discover → visit → reward

That reward mattered. It is why people kept writing blog posts, documenting obscure technical fixes, maintaining specialist forums, publishing local research, and putting useful ideas into public view.

A lot of the internet’s value came from people who were not trying to build empires. They were just adding signal.

The new risk

AI changes the shape of that loop.

Now the answer can arrive without the visit.

The model reads, summarizes, restructures, and returns the useful bit before the user ever reaches the website that produced it. The user gets speed. The model gets utility. But the original source may get very little back.

That changes the economics.

The loop starts to look more like this:

create → publish → ingest → answer elsewhere

That is where the problem starts.

Because if enough value is captured by the answer layer, the upstream producers begin to lose incentive to keep producing. Less traffic. Less ad revenue. Less authority. Less reason to publish in public at all.

And once that starts, the whole system becomes weaker.

This is not really about websites

This is about incentives.

Data people should recognize this immediately.

If you over-extract from a system without maintaining the source, quality drops. You might not see it on day one. In fact, the system can look highly efficient for quite a while. Outputs still appear. Dashboards still populate. Queries still run.

But upstream, the foundations are thinning out.

That is the real risk here.

Not that AI suddenly becomes useless.

Not that the internet vanishes overnight.

The real risk is slower and more structural:

AI may reduce the incentive to produce the human-made knowledge it depends on.

That is a supply problem.

Human signal is not infinite

A lot of useful knowledge on the internet is not polished corporate publishing.

It is odd forum posts.
Repair notes.
Half-broken personal blogs (like this one :-) ).
Niche explainers.
Specialist mailing lists.
Unfashionable documentation.
People who know one exact thing and took the time to write it down.

Those things are incredibly valuable.

They are also fragile.

They do not always survive if no one visits, no one cites, and no one benefits from maintaining them. The biggest loss would not be mainstream facts. Those will keep getting reproduced.

The first losses would be the strange, practical, hard-won corners of the web.

And that is where a surprising amount of truth lives.

The synthetic feedback loop

There is another problem too.

If AI-generated content fills more of the public web, future AI systems risk learning from a world increasingly shaped by earlier AI output. That creates a feedback loop.

The danger is not just errors. It is flattening.

Less originality.
Less texture.
Less disagreement.
Less lived experience.
More confident wording built on recycled phrasing.

A system can look polished while becoming brittle.

That should sound familiar to anyone who has worked with poor data pipelines. Clean-looking output is not the same thing as healthy source data.

You can absolutely produce neat downstream artifacts from an upstream mess.

For a while.

The likely future is not the death of the web

I do not think websites disappear.

I think value moves.

The open web gets mined harder.
The best material gets protected.
More knowledge shifts behind paywalls, private communities, enterprise tools, licensed datasets, newsletters, closed platforms, and harder-to-extract formats.

That means the future may not belong to whoever has access to the most pages.

It may belong to whoever has access to the best human signal.

That is a different game.

And it has real consequences for data strategy, product design, and trust.

Why this matters beyond AI

This is bigger than chatbots.

It is about whether the internet remains a place that rewards people for making useful things public.

If the answer becomes “not really,” then public knowledge shrinks, private knowledge grows, and the common layer that made the modern web so powerful gets weaker.

That matters for:

  • search
  • journalism
  • technical writing
  • research
  • education
  • open communities
  • independent experts
  • every future AI model built on public knowledge

You cannot keep draining value from the upstream forever and expect the source to stay healthy.

That is not how systems work.

The Data & Grit view

The easiest mistake with AI is to see only the output.

The harder, more useful question is to examine the chain behind it.

Where does the value originate?
Who maintains the source?
What incentives keep the signal alive?
What happens when convenience breaks the reward loop?

That is the real story.

AI is not just a smarter answer machine.

It is a new extraction layer sitting on top of a knowledge ecosystem that still depends on humans to create the raw material. If that ecosystem stops rewarding the humans who make it valuable, the quality of the whole thing eventually falls.

Maybe not today.

Maybe not this quarter.

But systems always collect the bill in the end.

Conclusion

The internet used to reward publishing.

AI risks rewarding extraction.

And any platform shift that weakens the creation of original human knowledge will eventually weaken the quality of machine intelligence built on top of it.

That is the tension.

Inside an organization, this problem is less severe. The company has a direct incentive to keep its own data useful. Internal reports, project notes, documentation, decisions, and operational records all have value back to the business. They are updated, corrected, expanded, and challenged because accuracy matters and somebody is responsible for maintaining them.

That creates a healthier loop.

create → use → correct → improve

In that environment, AI can sit on top of a system with a built-in reason to stay current.

The wider internet does not work like that.

The open web depends on a looser bargain. People publish because there is some mix of traffic, recognition, community, revenue, or opportunity coming back to them. If AI takes more of the value at the answer layer while returning less to the source, that bargain starts to break.

And once it breaks, the public knowledge layer becomes thinner.

Not because people suddenly stop knowing things.

Because fewer people have reason to keep publishing, updating, correcting, and maintaining that knowledge in public.

That is why this matters.

Internal AI can thrive on data with a clear owner, a clear feedback loop, and a clear incentive to improve.

The open internet has no single owner, no guaranteed maintenance cycle, and no built-in promise that the people creating the value will be rewarded for it.

So the risk is not that AI stops being useful.

It is that it becomes very good at consuming a public knowledge system that has weaker and weaker reasons to stay rich, accurate, and alive.


Gareth Winterman