This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Webmaster forum > October 2007 > robots.txt - Question





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author robots.txt - Question
khabri

2007-10-21, 6:18 pm

Hello:

I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html

I am wondering how are search engine bots implemented. Lets assume, I
have Disallow: /foobar in the robots.txt. On the main page of my site,
I link to content say /foobar/pictures.html

So will the search engine bot index /foobar/pictures.html or not ? If
not, does it mean that during the entire period of crawling, it
maintains the information that it has read in robots.txt ?

Thank you for your time.

Mark Goodge

2007-10-21, 6:18 pm

On Sun, 21 Oct 2007 18:42:41 -0000, khabri put finger to keyboard and
typed:

>Hello:
>
>I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
>I am wondering how are search engine bots implemented. Lets assume, I
>have Disallow: /foobar in the robots.txt. On the main page of my site,
>I link to content say /foobar/pictures.html
>
>So will the search engine bot index /foobar/pictures.html or not ?


It won't, if it correctly follows the standards.

>If
>not, does it mean that during the entire period of crawling, it
>maintains the information that it has read in robots.txt ?


It should cache the contents of robots.txt at the start of every crawl
and obey it thereafter, until it next checks it.

Mark
--
http://www.BritishSurnames.co.uk - What does your surname say about you?
"All I want is to find an easier way to get out of our little heads"
Nikita the Spider

2007-10-22, 10:18 pm

In article <1192992161.373908.214400@y27g2000pre.googlegroups.com>,
khabri <khabri@XXXXXXXXXX> wrote:

> Hello:
>
> I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
> I am wondering how are search engine bots implemented. Lets assume, I
> have Disallow: /foobar in the robots.txt. On the main page of my site,
> I link to content say /foobar/pictures.html
>
> So will the search engine bot index /foobar/pictures.html or not ?


Mark Goodge is correct; correctly programmed bots should not access that
file.

> If
> not, does it mean that during the entire period of crawling, it
> maintains the information that it has read in robots.txt ?


It's up to the bot how often they re-read your robots.txt file. Note
that you can send an 'Expires' header along with your robots.txt file
and it *should* be respected. (No guarantees, though!)

Good luck

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews