View Full Version : Bandwidth help again
anthony
07-19-2006, 04:40 AM
All this month I have been working on the bandwidth problem for my site www.studio93.it . The site gets almost 2000 hits a day has alot of news. I put in the .htaccess to block image theft which has worked and also on the site the news photo's when they are uploaded a thumbnail is created and called in the news articles on the site instead of the larger photo. The larger photo is seen when reading the article in question. My robot.txt I have blocked yahoo and google image engines which is working. I am allowed 10g a month but as of today I am at 8 already . I have got it down from last month but I need to get lower. I have also a rss feed does that have anything to do with the bandwidth ?
thanks anthony
Arthur
07-19-2006, 06:49 AM
If you take a look at the Top URLs in your stats, you'll see that a very large part of the transfer (about 4.5 GB) is coming from all images and image.php.
All of your thumbnail images seem to be served by image.php, this among others means that these images are not cached by the user's browser. On each and every page the user visits the thumbnail images are requested again from the server, even if the images are the same.
The best way to take advantage of browser caches is to make the content static, i.e. linking directly to the image files, instead of funneling them through image.php. If that is not an option, you should have image.php provide "freshness information", such as a last-modified date and preferably also an expiration date and cache-control information.
This is also reflected in the file types that were served. Over 50% were PHP files and only 28% was JPG (GIF 13%). About 90% of the hits is for graphical content, so that ratio is skewed.
Search the web for "cacheability" and you'll find lots of information on the subject. Well over 75% of the server responses (http://service.futurequest.net/kb31) was "200", only about 20% was "304" (IOW were read from the browser cache). If you can get these numbers more balanced, you'll save a lot of bandwidth/transfer.
99.1% of your referrers is self-referential, so you do not have a bandwidth theft problem, or at least not a big one.
Although you can get a lot of information from the stats as provided by FutureQuest, if you need to have more detailed information, you may want to consider purchasing a commercial stats analyzer.
The numbers I gave are based on this month's data only. I was aided by Sawmill.
-Arthur
anthony
07-19-2006, 09:25 AM
A
thanks I am calling the images for the articles directly now
http://www.studio93.it/news_foto/sm/2861d5120dc5fff9de38d739ba0b36a0.jpg there really isn't any need to use the image.php file if there is the .htaccess in the directory that blocks anyone trying to take the gif or jpg
I have to find some info on the cacheability and read up on it thanks
anthony
anthony
07-19-2006, 11:45 AM
A
I came up with this , we update the radio about once a day 0900 or 1000local wasn't sure about the time zone, but I put gmt which is 9 hours behind us
let me know if you think I am on the right track
I am in italy
thanks
anthony
<?php
ob_start();
?>
<?php Header("Cache-Control: Public");
Header("Pragma: Public");
header("Last-Modified: ".gmdate("D, d M Y H:i:s",getlastmod())." GMT");
$inserted = date ("D , j M Y ");
header("Expires: $inserted 23:59:59 GMT"); // Date in the past
header("Refresh: 60;");
?>
the page here
<?php
# Make a Content-Length: header
header('Content-Length: ' . ob_get_length());
ob_end_flush();
?>
Arthur
07-19-2006, 04:14 PM
"public" in cache-control headers is only for pages that are protected by HTTP authentication, so I don't believe that's what you want (I can be wrong of course). "max-age" is probably more appropriate, which you could set to 24 hours or less for daily updates. Also, "must-revalidate" and "proxy-revalidate" may be of use to you.
Italy is GMT+2, the servers are in GMT-4. But that shouldn't matter, because you're basing the last-modified date on the last-modified date of the file and that date automatically is correct.
-Arthur
anthony
07-20-2006, 12:29 AM
a I went this way I checked a validator seems ok , the header("Expires: $inserted 23:59:59 GMT"); do I need this or can I delete it ?
thanks
T
<?php
header("Cache-Control: must-revalidate");
Header("Cache-Control: max-age=82800");
Header("Pragma: max-age");
header("Last-Modified: ".gmdate("D, d M Y H:i:s",getlastmod())." GMT");
$inserted = date ("D , j M Y ");
header("Expires: $inserted 23:59:59 GMT"); // Date in the past
header("Refresh: 60;");
?>
tony
<?php
# Make a Content-Length: header
header('Content-Length: ' . ob_get_length());
ob_end_flush();
?>
Arthur
07-20-2006, 04:55 AM
The "Expires" header I would keep, but I would probably set it to 06:00:00 GMT (because you're updating at 7 or 8am GMT).
I would get rid of the "Refresh" header. Although it is not part of the official HTTP specification, most browsers seem to use it. It reloads the page after the specified time, which is probably not what you want if you want to save on bandwidth/transfer.
Here are the headers the server currently sends when loading the main index page; Date: Thu, 20 Jul 2006 07:53:31 GMT
Server: Apache
X-Powered-By: PHP/4.3.10
Cache-Control: max-age=82800
Pragma: Public
Last-Modified: Thu, 20 Jul 2006 03:33:41 GMT
Expires: Thu , 20 Jul 2006 23:59:59 GMT
refresh: 180;
Content-Length: 29079
Content-Type: text/html
-Arthur
anthony
07-20-2006, 06:53 AM
thanks A did the changes see what happens sure it will be better I will let you know. If anyone wants to validate their page I found this site
http://www.web-caching.com/cacheability.html
thanks anthony
Arthur
07-20-2006, 07:11 AM
It looks good, the headers are now; Date: Thu, 20 Jul 2006 10:00:44 GMT
Server: Apache
X-Powered-By: PHP/4.3.10
Cache-Control: max-age=82800
Pragma: max-age
Last-Modified: Thu, 20 Jul 2006 09:58:58 GMT
Expires: Thu , 20 Jul 2006 11:00:59 GMT
Content-Length: 28939
Content-Type: text/html You should probably get rid of the extra space after "Thu" and "2006 in the Expires header. You can do that by changing; $inserted = date ("D , j M Y "); to $inserted = date ("D, j M Y");(no space after the D and Y). The Pragma line is missing a value for max-age.
Don't expect miracles from the caching, but it should help. Also, I'm sure you'll have noticed how much faster the pages load now, compared to how it was before without the caching.
-Arthur
PS. The server response headers were made visible by the Web Developer Toolbar in Firefox, in case anyone was wondering
anthony
07-20-2006, 07:34 AM
Thanks again A, you are right it really loads fast now!
anthony
anthony
07-20-2006, 12:11 PM
authur after we put in new news today from 0900 local to 1500 local and I noticed on the first page we could not see the new articles because of the cache we were seeing the old page, so if we have to put in new articles that show up on the first page during an arch of 6-8 hours during the day what should I do?
thanks anthony
Arthur
07-20-2006, 12:18 PM
That is something you will need to decide yourself. Whether you want the frontpage cached or not, for how long/short, or whether you just want the images and articles to be cached, etc. You need to create a strategy that works best for your site.
-Arthur
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.