This is Interesting: Free Magazines for Graphics designers and webmasters
Home > Archive > Webmaster forum > July 2005 > robots.txt for https but not http
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
robots.txt for https but not http
|
|
| me@helpmefigurethisout.com 2005-07-29, 4:17 am |
| This is in reference to the following post:
http://groups-beta.google.com/group...325a14f33434694
Both in the post and on google's help pages, it says that if your https
and http go to different directories, you will need to put a robots.txt
file in each one. Is this true for all search engines, or just google?
The reason I am asking is because I have a web site with two different
directories, one secure at https://www.mywebsite.com (maps to
/usr/wwws/) and one non-secure at http://wwww.mywebsite.com (maps to
/usr/www/). I want the non-secure site to allowing indexing by search
engines and the secure site to block indexing. I placed a robots.txt in
the secure directory /usr/wwws/ to disallow all indexing, but I am
afraid if a spider follows a link from the non-secure site to the
secure site it will read the robots.txt and interpret it as meaning "no
indexing of www.mywebsite.com", whether that be the https or http
protocol. Should I be worried about this or do other crawlers operate
similarly to google?
| |
| William Tasso 2005-07-29, 7:52 am |
| Writing in news:alt.www.webmaster
From the safety of the http://groups.google.com cafeteria
<me@helpmefigurethisout.com> said:
> This is in reference to the following post:
>
> http://groups-beta.google.com/group...325a14f33434694
>
> Both in the post and on google's help pages, it says that if your https
> and http go to different directories, you will need to put a robots.txt
> file in each one. Is this true for all search engines, or just google?
It's true for all/any UA that reads and honours the content of robots.txt
> The reason I am asking is because I have a web site with two different
> directories, one secure at https://www.mywebsite.com (maps to
> /usr/wwws/) and one non-secure at http://wwww.mywebsite.com (maps to
> /usr/www/). I want the non-secure site to allowing indexing by search
> engines and the secure site to block indexing.
robots.txt does not block anything
> I placed a robots.txt in
> the secure directory /usr/wwws/ to disallow all indexing, but I am
> afraid if a spider follows a link from the non-secure site to the
> secure site it will read the robots.txt and interpret it as meaning "no
> indexing of www.mywebsite.com", whether that be the https or http
> protocol.
They are effctively completely different web sites, however ...
> Should I be worried about this or do other crawlers operate
> similarly to google?
You'd have to ask them.
Your site(s) will receive visits from all manner of bots that pay no
attention to the robots.txt and from others that specifically query the
file to find your disallowed content.
--
William Tasso
** Business as usual
| |
| www.1-script.com 2005-07-29, 7:24 pm |
| me@helpmefigurethisout.com wrote:
> Both in the post and on google's help pages, it says that if your https
> and http go to different directories, you will need to put a robots.txt
> file in each one. Is this true for all search engines, or just google?
No need to worry about robots.txt for the secure part of your site. No URL
that starts with 'https' is going to be indexed anyways. Unless it's a
malicious robot, of course, but then, again, a malicious robot is not
going to worry about robots.txt either.
Cheers,
Dmitri
http://www.1-script.com/download.php
Free Search Engine Scripts
-------------------------------------
-- ---------------------------------------------
Article posted with Web Developer's USENET Archive
http://www.1-script.com/forums
no-spam read and post WWW interface to your favorite newsgroup -
alt.www.webmaster - 29973 messages and counting!
-----------------------------------------------
| |
| Toby Inkster 2005-07-29, 11:21 pm |
| www.1-script.com wrote:
> No need to worry about robots.txt for the secure part of your site. No URL
> that starts with 'https' is going to be indexed anyways.
Plenty of search engines index HTTPS sites.
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
|
|
|
| | Copyright 2003 - 2008 forum4designers.com Software forum Computer Hardware reviews |
|