On AI bots…

This article highlights a way that AI is changing the web that’s both subtle yet monumental.

To date, search engines crawled your site, and you allowed it, because it was mutually beneficial. You let them – even encouraged them – to consume your content, and in return they sent you traffic. You wanted them into your site, because dealing with their consumption generally meant MORE visitors over time.

Not so with AI. Now your content is consumed to train their models, which then repackage the content as their own original “thoughts.” This means that letting AI bots consume your content probably means LESS traffic to your site over time. Using your content, they can answer questions themselves, without ever revealing that it was your content that made it possible.

Crawlers used to advertise and promote your content. Now some crawlers pre-empt and hide your content.

This is the breaking of a implicit, but key, part of the social contract that made the web work. What happens now?

Does this make the lowly, humble robots.txt file the new frontier in AI wars? What happens when crawlers decide to ignore robots.txt entirely? Are site owners gearing up for a battle against the AI crawlers? Is there a nascent market for products to “protect” your content against AI ingestion? Are we going to see tools and methods to “obfuscate” ingested content from AI? Are content creators going to generate “poison pills” or “content honeypots” for AI crawlers to stumble on?

These are truly fascinating times.

The rise and fall of robots.txt

For decades, a humble text file governed the behavior of web scrapers. But as the AI industry grows, the social contract of robots.txt is falling apart.