2

I have some problem, when I try to write robots.txt for my site ...

I find some issues by search on Google, and tell me about honor and not honoring robots.txt, how I can prevent it, can I perform it with .htaccess or other way ?

j0k
  • 1,346
  • 12
  • 19

2 Answers2

2

If there are crawlers not following your robots.txt rules you will need to ban them by IP. Placing their user agent's into your robots.txt to ban does nothing if they aren't following it's rules.

Anagio
  • 11,205
  • 1
  • 27
  • 49
0

Simple: Ban them all! With PHP and Regex. For example:

if (preg_match('/(?i)badbot1|badbot2|badbot3/',$_SERVER['HTTP_USER_AGENT'])){ 

header ('HTTP/1.1 403 Forbidden'); 
exit(); 
}

The header statement is optional

Be careful, never close the last "badbot" with a pipe "|". If you do, you ban all your traffic! So, use "badbot1|badbot2|badbot3".

Never "|badbot1|badbot2|badbot3" and Never "badbot1|badbot2|badbot3|"

Good luck

Mike Niner
  • 21
  • 2