View Single Post
  #5  
Old 20th July 2006, 19:04
pAuL1974 pAuL1974 is offline
Member
 
Join Date: Mar 2006
Location: London/Herts
Posts: 774
Default Re: Forum trawled by search engine bots... why?

Quote:
Originally Posted by The_Godfather
There is nothing you can do to stop Goggle bots. They scan forums for their search engine (hence why specific Goggle searches lead you to this site).

Unless Anne makes this place private (which is unlikely providing the nature of this site (self-help for SAers to find)) - they will continue to scan the forums.
My understanding is the use of a Robots.txt file (wikipedia link and example below) indicates that a web spider should not access certain parts of a site.

Whilst the web spider is under no obligation to obey the Robots.txt request, a reputable spider will oblige, and only spider where the site owner wishes them to. If the spider is allowed on the main site, it will still give SAUK seach engine hits to attract new members.

Some people do put e-mail addresses into search engines, so anyone posting details here in the MSN topic, for example, can be easily found. It's then only a few clicks away to reveal what may be some very personal and revealing posts. I changed my username here immediatlely on joining for this reason, me posting my MSN details on the forum was an oversight.

http://en.wikipedia.org/wiki/Robots.txt

example of a robots.txt file that requests 4 directories to not be spidered:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/