View Full Version : Robots.txt and IRMs
Justin
10-15-1999, 01:38 PM
main.com - main account
redirect.com ==> main.com/redirect/
If you place in your robots.txt file something like "Disallow /redirect/*", it will only affect main.com - then placing another one within main.com/redirect/ will affect only redirect.com, since it is within it's document root.
Hope this helps.
------------------
Justin Nelson
FutureQuest Support
Terra
10-15-1999, 01:49 PM
It's not always obvious on how control files for IRM's work...
Hmmm, this is difficult to explain in ascii...[nbsp][nbsp];)
acme.com <-- primary account
binford.com <-- IRM (acme.com/binford)
Now, if you have a link somewhere on:
acme.com/index.html (or any other page) that has a link to acme.com/binford/index.html then you would need to add a line in the acme.com/robots.txt file excluding the /binford directory...
If you want a robots.txt file in binford, then just place one at '/big/dom/xacme/www/binford' as the spider will only see this when indexing binford.com...
---
Now we enter .htaccess files:
Apache searches the directory tree upwards looking for .htaccess files...
If you have an .htaccess file in '/big/dom/xacme/www/.htaccess' then when a request comes in for binford.com, Apache will see the www/.htaccess file and use that as well as www/binford/.htaccess...
In conclusion:
robots.txt is URI dependent
.htaccess is not URI dependent, and is found by searching the directory tree...
Unless you understand how Apache works, it can be difficult to explain to someone how .htaccess and robots.txt are accessed...[nbsp][nbsp]I hope the above provides enough light to fill in the empty blanks...
I learned it the hard way by reading the manual and observation... ;)
--
Terra
--Sometimes the most easy concepts can become garbled beyond all recognition--
FutureQuest
tedloh
10-15-1999, 01:54 PM
I take this to mean that, even though root/irm is listed in the robots.txt file in the root directory, irm can be indexed by a spider because that is it's own root and contains no robots.txt file.
Thanks.[nbsp][nbsp]Was just wondering why one of my sites had not been indexed yet even though some places say they spider and update within a day or two.
[add]
Just saw your post, Terra.[nbsp][nbsp]That explained exactly what I wanted to know (because I don't have that binford-type link on my site anyhow) - so in other words it is not the robots.txt file in my root affecting any search positioning.
In addition, the explanation about .htaccess was useful - now I will be sure to be careful when adding a file extension.
------------------
Ted (Chief Do-It-All)
Tygre Systems Co Ltd
Bangkok, Thailand, Land of Smiles :) :)
http://www.tygresystems.com (work in progress)
ted@tygresystems.com
[This message has been edited by tedloh (edited 10-15-99)]
tedloh
10-16-1999, 12:11 AM
Simple question, since I can't figure it out for sure.
If I have an IRM which is mapped to a directory below my main root
ie: www.mappeddir.com (http://www.mappeddir.com) = root/irm
and robots.txt has a line which disallows root/irm, will a spider coming to visit www.mappeddir.com (http://www.mappeddir.com) be affected by the robots.txt file in the root directory?
------------------
Ted (Chief Do-It-All)
Tygre Systems Co Ltd
Bangkok, Thailand, Land of Smiles :) :)
http://www.tygresystems.com (work in progress)
ted@tygresystems.com
vBulletin® v3.6.8, Copyright ©2000-2013, Jelsoft Enterprises Ltd.