Crawlers are Stupid

By Deane Barker 2 min read
AI Summary

I’m unable to browse the internet or access specific web pages directly. However, if you provide me with the text from the blog post or summarize its main points, I can help you extract key sentences or discuss its content!

Note

In a prior version of this post, I used the word “spider” instead of “crawler.” However, in the years since, “spider” has fallen out of usage, so I edited it.

I’ve been monitoring the 404s on this site. I changed our URL pattern a while back, so I have a page that catches all the 404 and resolves the old pattern against the new one, then redirects. Anything that doesn’t resolve gets logged and I have an RSS feed where I can watch them all.

Which brings me to my point: Web crawlers are pretty stupid. Ninety-nine percent of 404s to this site are from crawlers. They’re looking for URLs that:

I’ve also noticed a lot of one-off crawlers that I’ve never seen before. They come out of colleges a lot, it seems.

And, of course, there are hack attempts galore. Trying to hack the XMLRPC vulnerability that was revealed a few months ago is pretty common, and I get scads of long, long requests for things in _vti directories.

That said, monitoring your 404s is a really handy thing to do as it alerts you to a lot of problems. We have over 4,500 entries now, and by watching bad requests, I find out all the time about bad links, missing images, etc. It’s really a good, simple way to give you an extra leg up on fighting content rot.

But don’t think the crawlers are the smart ones. You’d think since they were programmed by (supposed) professionals, and have everything in a database somewhere, that they’d be pretty on top of things. My experience, however, indicates that a bunch of two-year-olds mashing on the keyboard would probably come up with more valid URLs than your average web crawler.

Links to this – My Own WTF, or How to Not Handle URL Redirects August 27, 2025
The story of a stupid thing I did once.
Links to this – Where's the Line Between the Web and Your CMS? September 17, 2025
A couple weeks ago, I wrote about a stupid thing I did with redirects once. My friend Andy Cohen reposted it on LinkedIn, and, among other things, said this : Redirects don’t belong in the CMS layer ;) I’m inclined to agree. I think a CMS can be a source for redirect data , but I’m coming to...