This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Webmaster forum > October 2007 > robots.txt - Question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
robots.txt - Question
|
|
| khabri 2007-10-21, 6:18 pm |
| Hello:
I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
I am wondering how are search engine bots implemented. Lets assume, I
have Disallow: /foobar in the robots.txt. On the main page of my site,
I link to content say /foobar/pictures.html
So will the search engine bot index /foobar/pictures.html or not ? If
not, does it mean that during the entire period of crawling, it
maintains the information that it has read in robots.txt ?
Thank you for your time.
| |
| Mark Goodge 2007-10-21, 6:18 pm |
| On Sun, 21 Oct 2007 18:42:41 -0000, khabri put finger to keyboard and
typed:
>Hello:
>
>I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
>I am wondering how are search engine bots implemented. Lets assume, I
>have Disallow: /foobar in the robots.txt. On the main page of my site,
>I link to content say /foobar/pictures.html
>
>So will the search engine bot index /foobar/pictures.html or not ?
It won't, if it correctly follows the standards.
>If
>not, does it mean that during the entire period of crawling, it
>maintains the information that it has read in robots.txt ?
It should cache the contents of robots.txt at the start of every crawl
and obey it thereafter, until it next checks it.
Mark
--
http://www.BritishSurnames.co.uk - What does your surname say about you?
"All I want is to find an easier way to get out of our little heads"
| |
| Nikita the Spider 2007-10-22, 10:18 pm |
| In article <1192992161.373908.214400@y27g2000pre.googlegroups.com>,
khabri <khabri@XXXXXXXXXX> wrote:
> Hello:
>
> I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
> I am wondering how are search engine bots implemented. Lets assume, I
> have Disallow: /foobar in the robots.txt. On the main page of my site,
> I link to content say /foobar/pictures.html
>
> So will the search engine bot index /foobar/pictures.html or not ?
Mark Goodge is correct; correctly programmed bots should not access that
file.
> If
> not, does it mean that during the entire period of crawling, it
> maintains the information that it has read in robots.txt ?
It's up to the bot how often they re-read your robots.txt file. Note
that you can send an 'Expires' header along with your robots.txt file
and it *should* be respected. (No guarantees, though!)
Good luck
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|