![]() |
Yahoo Slurp Crawler.... not playing by the rules?
Whoa... I just caught Slurp in somewhere it shouldn't be...
http://files.unitedbimmer.com/ub.c/k...-printable.png Now, United Bimmer's robots.txt file has this in it first of all: [code]Disallow: /forums/printhread.php[/code] And second of all, the ONLY link to that page is from the actual thread itself, and that link has a rel="nofollow" tag in it, so even if I didn't have the robots.txt entry, it still should follow it. Why/how is it crawling there? :eyecrazy |
Christ - a link jump loop!
|
Link jump loop?
Edit: Ah, but after an iteration or two, the spider should figure it out and default out. Not good, but not the end of the world either. |
Okay :type
|
Wow, stupid mistake.
Notice how in the robots.txt file I restricted "printhread.php"? Wouldn't it make more sense if I restricted "printthread.php"? haha :lol |
yep, that would do it. although some crawlers do ignore the robots.txt file, though 99% do, every once in a while there's one that does. you might want to add some referrer check to it to prevent a loop.
|
Yeah, I watch all the bots that crawl us very closely (mostly to ensure performance and efficiency), as right now there's only 12, so it's easy to keep tabs on them.
If I ever see one violating robots.txt, I'd probably ban it... but we get 65% of all our search engine traffic from Yahoo, so I'd rather keep it happy. :) |
| All times are GMT -5. The time now is 11:12 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2013, vBulletin Solutions, Inc.
Copyright © 2005-2013 UnitedBimmer.com
Ad Management by RedTyger