PDA

View Full Version : Non-www 301s, and Abs urls


CMJC
05-16-2005, 09:26 PM
I'd appreciate a response from FQ Staff, but thought it'd be valuable to have the reply public.

In using jargon to ask, I don't want to appear more proficient than I am, so please bear in mind I'm no techie.

Rationale:

Now that Google has perfectly formed bullet-holes in both feet following the Allegra and Web Accelerator debacles, I want my sites to be a glitch-free as possible for Googlebot et al.

So I've decided to pay the slight load-time overhead of switching to all absolute urls (or effectively absolute urls such as (root)/folder/file.htm)

The other concern I have is avoiding duplicate crawls (and subsequent duplicate penalty) to www and non-www urls.

Question:

FQ servers collect both www and non-www calls to my account, so do I need to add in my root .htaccess a 301 for all non-www calls to go to the appropriate www url?

In effect, I'm asking how FQ currently re-routes such calls. Is it a 301 (seems very unlikely): so should I therefore use a site-wide root 301 to be certain?

Colin

PS. If you've not been watching the tremors at Mountain View, It's a great time to sell G-stock.

PPS. Very unusual: My site has been inaccessible for the last 90 minutes, (calls time out. Only Ftp connects) I assume it's something to do with the Rocko memory upgrade, and my local proxy updates. I'll wait before filing a ticket.

sheila
05-16-2005, 10:42 PM
PPS. Very unusual: My site has been inaccessible for the last 90 minutes, (calls time out. Only Ftp connects) I assume it's something to do with the Rocko memory upgrade, and my local proxy updates. I'll wait before filing a ticket.
As we have responded to CMJC in email from the Service Desk...
The sites on ROCKO are all operating fine and there is no outage. The following KnowledgeBase FAQ contains tips on how to proceed when you have trouble accessing your site:
http://service.FutureQuest.net/?_a=knowledgebase&_j=questiondetails&_i=32

CMJC
05-16-2005, 11:16 PM
As we have responded to CMJC in email from the Service Desk...
The sites on ROCKO are all operating fine and there is no outage. The following KnowledgeBase FAQ contains tips on how to proceed when you have trouble accessing your site:
http://service.FutureQuest.net/?_a=knowledgebase&_j=questiondetails&_i=32

Yep, I confirm it's definitely due to my local IP, and not FQ's excellent network.

Thanks Sheila!

sheila
05-16-2005, 11:56 PM
Question:

FQ servers collect both www and non-www calls to my account, so do I need to add in my root .htaccess a 301 for all non-www calls to go to the appropriate www url?

In effect, I'm asking how FQ currently re-routes such calls. Is it a 301 (seems very unlikely): so should I therefore use a site-wide root 301 to be certain?
FutureQuest does not "reroute" such requests. It is possible that bots would crawl with and without "www" as separate spidering sessions. (Whether that has any effect on your search engine rankings...I would be unable to speculate.)

In order to redirect all calls to one version or the other (I think most people opt for the with "www" option), you would need to do one of the following:

301 Redirect as you have suggested
Hard code one version or the other into the links on your site. This would have the effect of redirecting any site visitor, including bots, upon the first internal link they followed.
Rewrite engine (for advanced users only!!!)
Other? (am I forgetting something?)

All of the above options have been previously discussed in these forums, and should theoretically be able to be located by a search of the forums...?

Andilinks
05-17-2005, 12:18 AM
If you've not been watching the tremors at Mountain View, It's a great time to sell G-stock.Yes, there have been apparent upheavals this month and the web accellerator was a mis-step in my opinion, but I think it is hardly time for a panic. I'd hold GOOG.

Andi

CMJC
05-17-2005, 02:03 AM
FutureQuest does not "reroute" such requests. It is possible that bots would crawl with and without "www" as separate spidering sessions. (Whether that has any effect on your search engine rankings...I would be unable to speculate.)

In order to redirect all calls to one version or the other (I think most people opt for the with "www" option), you would need to do one of the following:

301 Redirect as you have suggested
Hard code one version or the other into the links on your site. This would have the effect of redirecting any site visitor, including bots, upon the first internal link they followed.
Rewrite engine (for advanced users only!!!)
Other? (am I forgetting something?)

All of the above options have been previously discussed in these forums, and should theoretically be able to be located by a search of the forums...?

Ta Sheila,

Oh I assumed FQ did re-route non-www calls already.
If it doesn't give away any 'house secrets', how do you 'catch non-www calls and send them to the correct file then?

Not to worry about my main point, you've given me the info I needed.

I'll dig out and get the mod-rewrite code set up properly to deal with it. (Too many to use individual 301 redirects)

Yes, it definitely does affect how bots, even Googlebot, handles a site, as I know to my cost.

Takes ages weeding out the non-www listings in G, which G counts as duplicate contents and 'penalises'.

It's time to do some late-spring cleaning of my sloppy internal links too: all absolute from now on.

Thanks again Sheila. (My ISP still won't let me see my site! 6hrs + ??)

Colin

PS. Sell before everyone finds out. G Staff are selling...

CMJC
05-17-2005, 02:48 AM
I didn't start this thread with the intention of eliciting Staff help on mod_rewrite code, merely to find out how FQ re-routes non-www calls.

FQ doesn't re-route such calls, so I need to use mod_rewrite in my root htaccesss to achieve this.

Here's the code I'm about to use:

RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]


BUT: I hesitate because I'm not sure if that will clash with how FQ set up subdomains on my account. I've got two. But I don't think using these mod_rewrite rules will affect them. (However, I may need to specifically exclude my subdomains in the rule.)

Can FQ Staff confirm oneway or t'other please?

Colin

PS. At my risk is understood. I just need a bit of hand-holding.

To reiterate: I want to redirect permanently all calls for non-www to the www url, without affecting any calls to my subdomains. In order that bots will not crawl non-www urls, just the www url.

Terra
05-17-2005, 06:55 PM
The RewriteRule you are planning to use should work ok... ;)

--
Terra
sysAdmin
FutureQuest

CMJC
05-17-2005, 09:01 PM
Ta Terra!

CMJC
05-17-2005, 10:20 PM
The RewriteRule you are planning to use should work ok... ;)
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]

Whoops! Using this rule in my root .htaccess file, I can access the correct page using
http://www.mysite.com/filename.htm type urls, but when I try using the non-www form (http://mysite.com/filename.htm) I get a 404 with this showing in the browser address box:

http://www.mysite.com/www/filename.htm

So I've removed the rule.
Is it something to do with how my subdomains are created at FQ?

Colin

PS: Here's the relevant error log report:

File does not exist: /big/dom/xmy_domain/www/www/filename.htm

Terra
05-18-2005, 12:30 AM
Is it something to do with how my subdomains are created at FQ?
Nope...

You will need to do 1 of 2 things:
1) If .htaccess in /big/dom/xdom directory, you will need to use the 'RewriteBase' directive... The 'www/www' is a dead giveaway...
2) Move the RewriteRule into a 'www/.htaccess' file...

--
Terra
--I always wondered what was behind Curtain #3--
FutureQuest

CMJC
05-18-2005, 02:13 AM
Ta Terra!

Which of those options would be most efficient (ie. for Rocko and bots)?

My .htacess file is in /big/dom/xdom directory (which I take to be 'my root' at FQ)
There is no .htacess file in my main www directory.

And (pushing the envelope a tad...) if keeping the .htaccess file in /big/dom/xdom directory is most efficient would this code be correct to use within it at FQ to achieve the point of this thread?


Options +FollowSymLinks
RewriteEngine on
RewriteBase /www
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]

Colin

Terra
05-18-2005, 02:55 AM
#2 is most efficient, because where it sits now, any /cgi-bin access will have to run through all of your Apache directives...

The RewriteRule you have looks sane and should be safe...

--
Terra
--insanity is in the eye of the beholder--
FutureQuest

CMJC
05-18-2005, 03:36 AM
Ta Terra!

I'm not keen on shifting my htaccess from /big/dom/xdom to www in one move because I've got so many other rules inside it, and I'm sure there'd be serious repurcussions. Though I take your hint about processing cgi-bin being slowed by it.

As a test, I left it there and made a new .htaccess just for the www folder, with the above rule in, and it didn't work. DNS error page shows and the www/www error logged.

My sick puppy, I know.

I'll go away and try to work it out.

Thanks for your help.

Colin

Andilinks
05-18-2005, 12:41 PM
Did you sell yet? 237.50
4.37 or (1.87%)
GOOG detailed pricing and financial information. Metric Value
Bid: 237.47
Ask: 237.50
Day's Low & High: 233.52 - 238.80
Open: 233.61
Previous Close: 233.13
52-week range: 95.96 to 233.45 http://finance.lycos.com/qc/stocks/quotes.aspx?symbols=NASDAQ:GOOG

edit: the goofy smiley was accidental but appropriate to a 1.87% rise, so I leave it

CMJC
05-18-2005, 04:38 PM
*Chuckle* Wouldn't touch the stuff dear chap, gambling's for losers, n'besides, Englishmen are not even allowed to own the scrip.

Great time to sell, leave some up for the punters, and form an orderly queue behind the G Staff.

Terra and anyone interested:

I decided to split my root htaccess into two, it was over 16kb, and put one in the www directory with just the code in this thread, and it seems to work OK so far. Though I've not looked at my error logs closely yet.

I've kept the main htaccess in my root, but it's only 6kb now, easier on the cgi-bin jobs.

Colin

Andilinks
05-18-2005, 05:29 PM
Wouldn't touch the stuff dear chap, gambling's for losers...But you're not above spreading rumors, claiming to have insider info. You do sound like an embittered loser, there are so many left over from 1999...

Though at that price selling is probably not a bad idea.

Andi

CMJC
05-19-2005, 11:29 AM
Not a rumour: Common knowledge.

The two founders of G sold a chunk of their holding over a month ago. Check the back issues, and you'll see.

Nor am I bitter, nor have I lost. (The investments I make never decrease or fade, and have perfectly divine rewards.)

You missed the point.

I don't touch scrip of any kind.

Share dealing is gambling, and gambling is for losers.

Carry on, and you'll see.

Tara!

TVB
05-19-2005, 11:43 AM
Colin,

Is the item you left in your post at http://aota.net/forums/showthread.php?postid=133246#post133246

the #2 one that Terra confirmed in his subsequent post?

I'm reading and taking avid note about this thread with the same goal in mind.

Thanks,

Betsy

CDarklock
05-20-2005, 02:53 PM
Share dealing is gambling, and gambling is for losers.

Share dealing is gambling in much the same way HTML is black magic: not only is it in no way similar, it's a lot easier and less interesting... if you bother to understand it.

Andilinks
06-06-2005, 03:51 AM
Non-www 301s, and Abs urls

The GoogleGuy post #7 here: http://www.webmasterworld.com/forum30/29720.htm

suggests that this very thing be done, but the above code in my .htaccess file does nothing. Or does it just redirect spiders? That would be odd.

It seems to me that the URL without the www should simply go to a "file not found" since virtually everyone needs Google and their algorithm penalizes dual content for www and non www.

The whole world has to dance to the tune of the 500 lb. Googorilla. Having either www or non www work is a nice feature in some other world, it is a penalty in this one.

Andi