This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Webmaster forum > September 2005 > How do I keep google from spidering?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author How do I keep google from spidering?
Proteus

2005-09-26, 6:49 pm

I have some directories on a private website, containing files I would
prefer the general public not be accessing, that I would prefer are not
listed on google. Besides perhaps password protecting them, is there some
way to not have them appear in search engines?

Richard Grove

2005-09-26, 6:49 pm

Proteus wrote:
> I have some directories on a private website, containing files I would
> prefer the general public not be accessing, that I would prefer are not
> listed on google. Besides perhaps password protecting them, is there some
> way to not have them appear in search engines?
>



http://www.google.co.uk/search?q=robots.txt


Richard Grove
www.redeyemedia.co.uk
www.shopmaker.co.uk
Proteus

2005-09-26, 6:49 pm

On Mon, 26 Sep 2005 16:01:28 +0100, Richard Grove wrote:

> Proteus wrote:
>
>
> http://www.google.co.uk/search?q=robots.txt
>
>
> Richard Grove
> www.redeyemedia.co.uk
> www.shopmaker.co.uk


Thanks. I just made a robots.txt file seen below, called robots.txt that
I am now going to upload to my /public_html directory on my website (I
assume that /public_html is the "root" folder that I put the robots.txt
file in? If not somebody please tell me where I should put it. I mean when
I ftp to my site, there is a root as in the linux/unix sense, but I am
guessing that the folder seen in that root called /public_html is the root
to use for the purposes of where to put the robots.txt file

# Robots.txt file created by http://www.webtoolcentral.com
# For domain: http://sparlo.net
# No robot will spider the domain
User-agent: *
Disallow: /


Charles Sweeney

2005-09-26, 6:49 pm

Proteus wrote

> On Mon, 26 Sep 2005 16:01:28 +0100, Richard Grove wrote:
>
>
> Thanks. I just made a robots.txt file seen below, called robots.txt
> that I am now going to upload to my /public_html directory on my
> website (I assume that /public_html is the "root" folder that I put
> the robots.txt file in? If not somebody please tell me where I should
> put it. I mean when I ftp to my site, there is a root as in the
> linux/unix sense, but I am guessing that the folder seen in that root
> called /public_html is the root to use for the purposes of where to
> put the robots.txt file
>
> # Robots.txt file created by http://www.webtoolcentral.com
> # For domain: http://sparlo.net
> # No robot will spider the domain
> User-agent: *
> Disallow: /


Bear in mind, that only obediant robots will obey the robots.txt file.

--
Charles Sweeney
http://CharlesSweeney.com
Proteus

2005-09-26, 6:50 pm

On Mon, 26 Sep 2005 16:01:33 +0000, Charles Sweeney wrote:
...
> Bear in mind, that only obediant robots will obey the robots.txt file.


Yeah, I am sure spammers have their own nefarious bots that ignore the
robots.txt file. But hopefully I can reduce bandwidth usage on my site by
keeping my stuff off the most popular search engines. I might also
password protect some folders. Can search engine spiders get into
directories on hosted webspace if such dirs have password protection
(.htaccess) enabled?

William Tasso

2005-09-26, 6:50 pm

Writing in news:alt.www.webmaster
From the safety of the No thank you cafeteria
Charles Sweeney <me@charlessweeney.com> said:

> Proteus wrote
>
[color=darkred]
>
> Bear in mind, that only obediant robots will obey the robots.txt file.


Where do I find out how /robots.txt files work?
http://www.robotstxt.org/wc/faq.html#robotstxt

Surely listing sensitive files is asking for trouble?
http://www.robotstxt.org/wc/faq.html#nosecurity
--
William Tasso

Lovely are recruiting citizens - http://citizensrequired.com/
DoobieDo

2005-09-26, 6:50 pm

In article <Xns96DDAD3BFB6FBmecharlessweeneycom@130.133.1.4>,
me@charlessweeney.com says...
> Proteus wrote
>
>
> Bear in mind, that only obediant robots will obey the robots.txt file.
>
>

stick an index.html file in your private directory which redirects back
to the public stuff and access your hidden files directly...?
David Dyer-Bennet

2005-09-26, 6:50 pm

Proteus <proteus@uselessemail.net> writes:

> I have some directories on a private website, containing files I would
> prefer the general public not be accessing, that I would prefer are not
> listed on google. Besides perhaps password protecting them, is there some
> way to not have them appear in search engines?


Yes. Look up "robots.txt" files. There are also some meta headers
you could add on each page, but I generally find it easier to
centralize that issue in one place, and be able to block a subtree
with a single line rather than having to put something on every page.

These are, of course, purely advisory; nothing *compels* any spider to
conform to them. But all the major search engines have always done so
since they were introduced. Depending on the degree of guarantee of
privacy you're looking for, that might not be considered sufficient.
--
David Dyer-Bennet, <mailto:dd-b@dd-b.net>, <http://www.dd-b.net/dd-b/>
RKBA: <http://noguns-nomoney.com/> <http://www.dd-b.net/carry/>
Pics: <http://dd-b.lighthunters.net/> <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>
Proteus

2005-09-26, 6:51 pm

On Mon, 26 Sep 2005 13:01:15 -0500, David Dyer-Bennet wrote:
...
> These are, of course, purely advisory; nothing *compels* any spider to
> conform to them. But all the major search engines have always done so
> since they were introduced. Depending on the degree of guarantee of
> privacy you're looking for, that might not be considered sufficient.


I just want to reduce bandwidth from general public usage, nothing really
private per se to hide. If I can keep googlers out that should help quite
a bit, so if google bots respect robots.txt that should suffice, along
with yahoo.
David Dyer-Bennet

2005-09-26, 10:21 pm

Proteus <proteus@uselessemail.net> writes:

> On Mon, 26 Sep 2005 13:01:15 -0500, David Dyer-Bennet wrote:
> ..
>
> I just want to reduce bandwidth from general public usage, nothing really
> private per se to hide. If I can keep googlers out that should help quite
> a bit, so if google bots respect robots.txt that should suffice, along
> with yahoo.


That should work out fine, then.
--
David Dyer-Bennet, <mailto:dd-b@dd-b.net>, <http://www.dd-b.net/dd-b/>
RKBA: <http://noguns-nomoney.com/> <http://www.dd-b.net/carry/>
Pics: <http://dd-b.lighthunters.net/> <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>
Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews