This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Webmaster forum > November 2006 > robots.txt question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
robots.txt question
|
|
| 3_Putt 2006-11-19, 7:51 pm |
| I am planning a web-based game where people have to figure out what is
on my website based on clues and then they get points for that.
I know that I can keep certain directories from being indexed with a
robots.txt file, but there is no way to keep the players from seeing
the dir names in that robots.txt file, right ?
If they have those dir names, is there any way they can list or access
the contents ? If so, how can I block that ?
| |
| Otto de Voogd 2006-11-19, 7:51 pm |
| On Tue, 07 Nov 2006 09:47:25 -0800, 3_Putt wrote:
> I am planning a web-based game where people have to figure out what is
> on my website based on clues and then they get points for that.
>
> I know that I can keep certain directories from being indexed with a
> robots.txt file, but there is no way to keep the players from seeing
> the dir names in that robots.txt file, right ?
>
> If they have those dir names, is there any way they can list or access
> the contents ? If so, how can I block that ?
You can also put a "noindex" meta tag in the headers of pages that you do
not want indexed, instead of using robots.txt. Especially for files whose
location you do not want to reveal.
<meta name="robots" content="noindex">
Of course this only works with search engines that respect this
directinve. Ultimately, requiring a userid and password would really
prevent pages from being indexed.
--
Otto de Voogd
http://www.7is7.com/ - My Projects
http://www.StatEye.com/ - Website Statistics
| |
| Ken Sims 2006-11-19, 7:52 pm |
| Hi -
On 7 Nov 2006 09:47:25 -0800, "3_Putt" <bx306@yahoo.com> wrote:
>I know that I can keep certain directories from being indexed with a
>robots.txt file, but there is no way to keep the players from seeing
>the dir names in that robots.txt file, right ?
You don't have to put the whole directory name in robots.txt, just
enough of the leading characters to select only the files and
directories desired.
Example:
Disallow: /d
will disallow all files and directories off the root that start with
d. If you wanted to disallow file/directory dumbhead but not
file/directory dumbbutt, you'd need to have at least:
Disallow: /dumbh
--
Ken
http://www.kensims.net/
| |
|
| Ken Sims wrote:
> Hi -
>
> On 7 Nov 2006 09:47:25 -0800, "3_Putt" <bx306@yahoo.com> wrote:
>
>
> You don't have to put the whole directory name in robots.txt, just
> enough of the leading characters to select only the files and
> directories desired.
>
> Example:
>
> Disallow: /d
>
> will disallow all files and directories off the root that start with
> d. If you wanted to disallow file/directory dumbhead but not
> file/directory dumbbutt, you'd need to have at least:
>
> Disallow: /dumbh
>
> --
> Ken
> http://www.kensims.net/
Also, make sure you have directory index listings turned off, and put a
default page in each directory - e.g. index.html that redirects them to
wherever.
--
dp
| |
| Matt Probert 2006-11-19, 7:53 pm |
| On 7 Nov 2006 09:47:25 -0800, "3_Putt" <bx306@yahoo.com> wrote:
>I am planning a web-based game where people have to figure out what is
>on my website based on clues and then they get points for that.
>
>I know that I can keep certain directories from being indexed with a
>robots.txt file, but there is no way to keep the players from seeing
>the dir names in that robots.txt file, right ?
>
>If they have those dir names, is there any way they can list or access
>the contents ? If so, how can I block that ?
>
Oh dear. No.
If you want to block access, you need to password protect directories.
PERIOD.
robots.txt is simply a text file that SUGGESTS where you might LIKE
robots to avoid.
Matt
--
Woe to him that willfully innovates, while ignorant of the constant.
http://www.probertencyclopaedia.com
| |
| William Tasso 2006-11-19, 7:53 pm |
| Fleeing from the madness of the Posted via Supernews,
http://www.supernews.com jungle
Ken Sims <ng3122@kensims.#nospam#.net.invalid> stumbled into
news:alt.www.webmaster
and said:
> ...
> You don't have to put the whole directory name in robots.txt, just
> enough of the leading characters to select only the files and
> directories desired.
>
> Example:
>
> Disallow: /d
>
> will disallow all files and directories off the root that start with
> d.
Well, I didn't know that - many thanks for the heads-up.
--
William Tasso
http://williamtasso.com/words/what-is-usenet.asp
| |
| johngohde@naturalhealthperspective.com 2006-11-19, 7:53 pm |
|
3_Putt wrote:
>
> I know that I can keep certain directories from being indexed with a
> robots.txt file, but there is no way to keep the players from seeing
> the dir names in that robots.txt file, right ?
That simply is NOT true. GigaBlast, a rather big SE, has been ignoring
my robots.txt file. So, I certainly think that motivated gamers would
check out all their options to cheat.
Your best bet is simply to use an encrypted database or simply
encrypted files to hide anything that needs to stay hidden.
| |
|
|
| johngohde@naturalhealthperspective.com 2006-11-19, 7:55 pm |
| William Tasso wrote:
> Fleeing from the madness of the Chaos jungle
> William Tasso <SpamBlocked@tbdata.com> stumbled into news:alt.www.webmaster
> and said:
>
>
> ... and now 'verified by the validator @:
> http://tool.motoricerca.info/robots-checker.phtml
>
> btw: anyone have any comment on that tool?
It claimed that my file has some errors in it, while Google says it
passes.
Mainly, it said that blank lines were not valid.
But, my crawl rate code passed. :)
| |
| William Tasso 2006-11-19, 7:55 pm |
| Fleeing from the madness of the http://groups.google.com jungle
johngohde@naturalhealthperspective.com
<johngohde@naturalhealthperspective.com> stumbled into
news:alt.www.webmaster
and said:
> William Tasso wrote:
>
> It claimed that my file has some errors in it, while Google says it
> passes.
google has a robots.txt validator? or are you quoting from empirical
evidence?
> Mainly, it said that blank lines were not valid.
>
> But, my crawl rate code passed. :)
Oh good <g>
--
William Tasso
http://williamtasso.com/words/what-is-usenet.asp
| |
| Nikita the Spider 2006-11-19, 7:55 pm |
| In article <1162993626.802031.84480@h54g2000cwb.googlegroups.com>,
"johngohde@naturalhealthperspective.com"
<johngohde@naturalhealthperspective.com> wrote:
> 3_Putt wrote:
>
> That simply is NOT true. GigaBlast, a rather big SE, has been ignoring
> my robots.txt file. So, I certainly think that motivated gamers would
> check out all their options to cheat.
>
> Your best bet is simply to use an encrypted database or simply
> encrypted files to hide anything that needs to stay hidden.
John,
I had a look at your robots.txt file. Are you aware that wildcarded
filenames like "*.jpg" are not part of the robots.txt standard? If those
are the lines that GigaBlast is ignoring, that might be the reason why.
Google supports them, but I have not read that any other search engine
that does so.
Just FYI
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
| |
| johngohde@naturalhealthperspective.com 2006-11-19, 7:55 pm |
| William Tasso wrote:
> Fleeing from the madness of the http://groups.google.com jungle
> johngohde@naturalhealthperspective.com
> <johngohde@naturalhealthperspective.com> stumbled into
> news:alt.www.webmaster
> and said:
>
>
> google has a robots.txt validator? or are you quoting from empirical
> evidence?
Part of Google's Webmasters Tools.
https://www.google.com/webmasters/sitemaps/siteoverview
The only reason that I am validated with Google is that I do NOT want
to be discriminated against for NOT having a sitemap.
| |
| johngohde@naturalhealthperspective.com 2006-11-19, 7:55 pm |
| Nikita the Spider wrote:
>
> John,
> I had a look at your robots.txt file. Are you aware that wildcarded
> filenames like "*.jpg" are not part of the robots.txt standard?
Guess what? I can read, too
Just thought that my mother might want to know, that I really do NOT
care about how lame and totally out of date the robots.txt standards
are. I certainly do not care enough to individually list out all my
graphics files, at this juncture of my life.
> If those
> are the lines that GigaBlast is ignoring, that might be the reason why.
> Google supports them, but I have not read that any other search engine
> that does so.
What does that change regarding my point? Absolutely nothing!!! No
matter what the reason, search engines do NOT aways follow the
robots.txt file. Nothing is stopping a gamer from crawling any site
personally themselves.
And, I most certainly am NOT going to spend six months of my life to
see if GigaBlast is ignoring my entire robots.txt file because it don't
like a couple lines of it ... you are dreaming.
Google ignores my craw rate specification. The robots.txt file is MOST
the place to specify the desired crawl rate. And, Google doen't like
that few lines of code. So, why would GigaBlast be any different?
| |
| Nikita the Spider 2006-11-19, 7:55 pm |
| In article <1163158221.940994.310210@k70g2000cwa.googlegroups.com>,
"johngohde@naturalhealthperspective.com"
<johngohde@naturalhealthperspective.com> wrote:
> Nikita the Spider wrote:
>
> Guess what? I can read, too
Hi John,
I've seen a number of tutorials that purport to teach people how to
construct a robots.txt file but actually misinform them. The robots.txt
syntax checker that complained about blank lines (as mentioned elsewhere
in this thread) is a good example. I thought you might have been lead
astray by one of these bad tutorials.
> Just thought that my mother might want to know, that I really do NOT
> care about how lame and totally out of date the robots.txt standards
> are. I certainly do not care enough to individually list out all my
> graphics files, at this juncture of my life.
>
>
> What does that change regarding my point? Absolutely nothing!!! No
> matter what the reason, search engines do NOT aways follow the
> robots.txt file. Nothing is stopping a gamer from crawling any site
> personally themselves.
>
> And, I most certainly am NOT going to spend six months of my life to
> see if GigaBlast is ignoring my entire robots.txt file because it don't
> like a couple lines of it ... you are dreaming.
Why all the hostility? I didn't challenge your assertion that robots
don't always respect robots.txt and even if I had done so, the nasty
attitude and snide remarks are unwarranted.
*plonk*
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
| |
| johngohde@naturalhealthperspective.com 2006-11-19, 7:57 pm |
|
Nikita the Spider wrote:
> Why all the hostility? I didn't challenge your assertion that robots
> don't always respect robots.txt and even if I had done so, the nasty
> attitude and snide remarks are unwarranted.
>
> *plonk*
Good riddance!
Mothers, and busybody creeps, need NOT reply to any of my posts.
I am NOT the person asking the question, idiot.
| |
| Charles Sweeney 2006-11-19, 7:58 pm |
| 3_Putt wrote
> I am planning a web-based game where people have to figure out what is
> on my website based on clues
LOL! I've seen a thousand websites like that!
--
Charles Sweeney
http://CharlesSweeney.com
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|