United Bimmer Community - BMW Forum

United Bimmer Community - BMW Forum (http://www.unitedbimmer.com/forums/)
-   Geek Chat (http://www.unitedbimmer.com/forums/geek-chat/)
-   -   Yahoo Slurp Crawler.... not playing by the rules? (http://www.unitedbimmer.com/forums/geek-chat/6717-yahoo-slurp-crawler-not-playing-rules.html)

komodo 03-08-2006 11:28 PM

Yahoo Slurp Crawler.... not playing by the rules?
 
Whoa... I just caught Slurp in somewhere it shouldn't be...
http://files.unitedbimmer.com/ub.c/k...-printable.png

Now, United Bimmer's robots.txt file has this in it first of all:
[code]Disallow: /forums/printhread.php[/code]

And second of all, the ONLY link to that page is from the actual thread itself, and that link has a rel="nofollow" tag in it, so even if I didn't have the robots.txt entry, it still should follow it.

Why/how is it crawling there? :eyecrazy

witeshark 03-08-2006 11:47 PM

Christ - a link jump loop!

komodo 03-08-2006 11:53 PM

Link jump loop?

Edit: Ah, but after an iteration or two, the spider should figure it out and default out. Not good, but not the end of the world either.

witeshark 03-08-2006 11:56 PM

Okay :type

komodo 03-09-2006 12:46 AM

Wow, stupid mistake.

Notice how in the robots.txt file I restricted "printhread.php"? Wouldn't it make more sense if I restricted "printthread.php"? haha :lol

jms 03-09-2006 09:23 AM

yep, that would do it. although some crawlers do ignore the robots.txt file, though 99% do, every once in a while there's one that does. you might want to add some referrer check to it to prevent a loop.

komodo 03-09-2006 10:48 AM

Yeah, I watch all the bots that crawl us very closely (mostly to ensure performance and efficiency), as right now there's only 12, so it's easy to keep tabs on them.

If I ever see one violating robots.txt, I'd probably ban it... but we get 65% of all our search engine traffic from Yahoo, so I'd rather keep it happy. :)


All times are GMT -5. The time now is 11:12 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2013, vBulletin Solutions, Inc.
Copyright © 2005-2013 UnitedBimmer.com
Ad Management by RedTyger


Search Engine Optimization by vBSEO 2.4.0 © 2005, Crawlability, Inc.