Sun Oct 23 06:35:44 EDT 2011

server log fun

I was obsessively poring over my web logs, as usual, when I noticed something unusual. - - [20/Oct/2011:06:15:08 -0400] "GET /aboutlogo.html HTTP/1.1" 200 415 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322)" - - [20/Oct/2011:06:15:11 -0400] "GET /style.css HTTP/1.1" 200 265 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322)"

This is odd, since aboutlogo.html hasn't actually been linked anywhere on for at least a year. It's a relic of an old version of the front page, which I forgot to delete. Unless maintains the world's most boring bookmark collection, this means that it's a web spider, refreshing a stored link.

Now, I see a lot of stealth-spiders, but it's rare to see one that's clever enough to request the CSS as well, like a real human. Google and Yahoo will occasionally do it, so they can generate accurate screenshots, but generally nobody else bothers, or they don't give themselves away like this. (It didn't ask for favicon, which is understandable, since it Expires in 2037, but it didn't ask for the image on that page, which was a bit of a blunder.) Let's run a WHOIS on the IP, and see who owns it.

NetRange: -
NetType: Direct Assignment
RegDate: 1997-03-31

And nslookup says:

Now that's interesting.

What's (very mildly) alarming is that they didn't ask for a robots.txt. That /16 is apparently used by bingbot, so it's entirely possible that they requested my robots.txt officially, got a 404, and concluded that I don't give a shit.

Which is true, of course, I give no shits about web spiders; but there's a lot of hysterical pansies on the Internet who hate it when people actually look at the stuff they've published publicly. And, of course, it contradicts Microsoft's stated policy.

So, who knows.

Posted by | Permanent link | File under: important, Etc