This article was last updated Tuesday, 8 February 2005. View a newer version at my site, hardanswers.net/dynamic-webpage-caching
Caching
Most browsers and many intermediate servers cache web content to speed up its delivery and display. Having your content cached will usually make your site appear faster and more responsive, as well as lower your server’s bandwidth requirements. Based on this, you would think that everyone would have his or her site set up to allow caching – but there’s a downside. Dynamic content is not usually suited to caching – consider a login system, you probably don’t want a browser to cache the site in its logged-in state, as you would like the page to be refreshed every time someone attempts to access it, so you can ensure they’re still logged in. For this reason, PHP will, by default, disable caching – especially if you use sessions. Unfortunately, this means that even if your site is suitable for caching, if you’ve used PHP, the chances are it won’t be – every time a user goes to a page, that page will be reloaded, with all the extra time and overhead that requires, making your site seem slower than it need be. Fortunately, it’s possible to use PHP to enable caching of your pages in an intelligent manner.
Caching is controlled by the HTTP headers sent with every HTTP request. The basic logic is quite simple – if permitted to cache, a cache will store the page for a specified time. This is controlled by the “Cache-Control” and “Expires” headers.
In addition to this, when checking for new versions of the page, most browsers will send an “If-Modified-Since” and/or “If-None-Match” header, if a “Modified-Since” and/or “ETag” header were present in the HTTP headers received from the web server. If any of these headers indicate that the page hasn’t been modified, then the server will return a “304 Not Modified” response, and the browser will continue to use and display the current page, thus saving the server bandwidth and making the site appearing more responsive.
The first step to getting your pages cached is to send the appropriate headers:
ETag
An “ETag” header contains a “strong” identifier – that is, an identifier that is unique not only for a particular page or resource, but for the current state of that particular page or resource. In other words, if the identifier has changed, then the associated page or resource has also changed in some way. I do that by taking an MD5 hash of the filename and its last modified date. This way, if either the filename or last modified date is different, the ETag will also be different.
Last-Modified
The “Last-Modified” header simply contains the time and date the resource in question was last modified.
Cache-Control
The “Cache-Control” header instructs modern caches on how they should behave, although it is worth noting that older caches may not obey this field. “Cache-Control” can take a variety of values, such as “private” and “no-cache” – but the one we are interested in is “public”. A “public” field in a Cache-Control header indicates that the resource may be cached by any cache, which is what we want to do. “Private” indicates that the response should only be cached by non-shared caches (such as your local browser), and “no-cache”, rather obviously, indicates that the page or resource being returned must not be cached anywhere.
Expires
The Expires header gives the date and time after which a response is considered stale, that is, after which a cached copy of a page should no longer be considered valid. In other words, the Expires header indicates how long caches should store a cached copy of a page. Here we indicate that pages can be cached for one month from the current date, by specifying their expiry as a date one month in the future.
The next step is to check if the page has been modified when a request is made to the server, and if not, return a “304 Not Modified” status and stop any further processing. This is simply done with two PHP if statements:
There’s one further caveat – headers must be sent before any other output. This generally means that headers must be sent before anything else in your PHP code, in other words, that this code must go at the very top of your PHP code before anything else. One way around this is to buffer your page on the server before outputting it, which is done by using ob_start(), which I use to provide gzip compression – further increasing the responsiveness of plain text files by transmitting them compressed, and decreasing the server bandwidth used even more.
Putting all this together gives us:
For more information on:
- PHP headers, see http://www.php.net/manual/en/function.header.php
- HTTP headers, see http://www.freesoft.org/CIE/RFC/2068/155.htm
- ob_start(), see http://www.php.net/manual/en/function.ob-start.php
- To view the HTTP headers sent by a server, http://web-sniffer.net/