<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hotlink Prevention</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style type="text/css">
<!--
-->
</style>
</head>
<body>
<p style="color:#FFF;font-weight:bold;background:#FFC000;padding:1em 1ex;text-shadow:0 0 3px #000">This article was last updated Wednesday, 6 July 2005. View a newer version at my site, <a style="color:inherit" href="http://hardanswers.net/hotlink-protection">hardanswers.net/hotlink-protection</a></p>
<h1>Hotlink Prevention </h1>
<p>“Hotlinking” (sometimes called “Link Hijacking”), the action
of linking directly to a file or resource on a site, rather than the page
the site owner intended, is a common problem. The most common example is when
someone places an image on his or her site, but rather than host the image themselves,
they link directly to an image on your site. When people visit their site,
they are actually viewing your image and using your bandwidth to view it
– without even knowing it comes from your site. </p>
<p>There are two main methods of preventing “hotlinking”, and both have flaws.
The most common involves checking for referrers. Most browsers send a “referrer”
– the address of the page that referred them to your page, that is, the address
of the page they were viewing directly before they went to your page, which
is generally the page that they followed a link from to get to your page – or,
in the case of images embedded into a page, the address of the page they are
embedded into. You can perform a simple check, if the referrer is from your
site, then you allow the user to access files such as images, and if not, you
block them. The main problem with this is that you generally need to allow blank
referrers, as some people don’t send referrers at all, and if people type
an address directly into a browser, it won’t have a referrer. Some more
paranoid people also send fake referrers, in the belief that they’re some
type of privacy risk. </p>
<p>The other more complex way, which I will detail here, involves the use of
sessions – and is almost infallible, although it still has its flaws. </p>
<p>The logic behind this idea is simple – when a user first visits your site,
you start a session. A session is a simple method of maintaining information
about a specific user across a period of time. Unfortunately, <acronym title="Hyper Text Transfer Protocol (world wide web protocol)">HTTP</acronym> itself
has no way of doing this – every <acronym title="Hyper Text Transfer Protocol (world wide web protocol)">HTTP</acronym> request
is generally not associated with any other <acronym title="Hyper Text Transfer Protocol (world wide web protocol)">HTTP</acronym> request,
so a server-side method is required. Here we use <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym>’s
inbuilt session handling. </p>
<p>Having started a session when the user first visited your site, you simply
check, every time that user attempts to access a specific type of file, for
example an image, whether or not that user has a valid session. An exception
is made for certain robots, such as Google’s <a href="http://www.google.com/bot.html" title="External Site | Googlebot">Googlebot</a>. For these we simply
check that their user agent matches a list of known robot’s user agents. </p>
<p>We use Apache’s .htaccess and mod_rewrite to rewrite all requests for
specific file types, in this case jpg, png, gif, swf and mp3. This means that
when a user requests, for example, nedmartin.org/picture.jpg, this is internally
rewritten to nedmartin.org/hotlink.php?file=picture.jpg. The .htaccess code
used to do this is below: </p>
<p> .htaccess<br />
<textarea name="textarea" cols="82" rows="3" readonly="readonly" title=".htaccess code"># Prevent hotlinking - remember to manually update mime_types in hotlink.php
RewriteRule ^(hotlink.php.*) - [L]
RewriteRule ^(.+\.(jpg|png|gif|swf|mp3))$ hotlink.php?file=$1 [L]
</textarea>
</p>
<p>Having rewritten all requests for specific resources to go through your hotlink.php,
and assuming you are actually using <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym>,
you will need to create the hotlink.php file. There’s one small issue
with this method. By default, <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> will
send headers to ensure that dynamically generated content is not cached. This
is not what you want when you’re actually sending non-dynamic content,
such as images, so we have to modify the headers sent. The commented code is
as follows: </p>
<p> hotlink.php<br />
<textarea name="textarea2" cols="82" rows="123" readonly="readonly" title="PHP Code">// get the hotlink.php?file=blah
$file = $_GET['file'];
// ensure the $file is a valid file
// one may wish to add further checking, such as check the directory, here
if(!is_file($file) and
($file_ext == ('jpg' or 'gif' or 'png' or 'swf' or 'mp3')))
{
// if the file is not a valid file
// send 404 Not Found header and exit
header("HTTP/1.1 404 Not Found");
exit();
}
// get the last modified time of the file
$mtime = filemtime($file);
// format the last modified time into an HTTP compliant format
// example Mon, 22 Dec 2003 14:16:16 GMT
$gmt_mtime = gmdate('D, d M Y H:i:s', $mtime).' GMT';
// send an Etag
header('ETag: "'.md5($mtime.$file).'"');
// check if last modified date is the same as that sent
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']))
{
if ($_SERVER['HTTP_IF_MODIFIED_SINCE'] == $gmt_mtime)
{
header('HTTP/1.1 304 Not Modified');
exit();
}
}
// check if Etag is the same as that sent
if (isset($_SERVER['HTTP_IF_NONE_MATCH']))
{
if (str_replace('"', '', stripslashes($_SERVER['HTTP_IF_NONE_MATCH']))
== md5($mtime.$file))
{
header("HTTP/1.1 304 Not Modified");
exit();
}
}
// send headers to ensure resource is cached
session_cache_limiter('public');
// 30 * 24 * 60 minutes or one month
session_cache_expire('43200');
// start the session so we have access to session variables
session_start();
// must send this header here to overwrite header produced by session
header('Last-Modified: '.$gmt_mtime);
// check if the session is set or the user is in list of allowed user_agents
// such as Googlebot
if(isset($_SESSION['user']) || (preg_match('/GoogleBot|AltaVista|ia_archive|
Inktomi|Lycos|Jeeves|Slurp|Scooter|W3C_Validator/i',
$_SERVER['HTTP_USER_AGENT']) > 0))
{
// get the file extension of the file
$array = explode(".", $file);
$file_ext = $array[1];
// set an appropriate mime_type based on the file type
switch($file_ext)
{
case 'jpg':
$mime_type = 'image/jpeg';
break;
case 'gif':
$mime_type = 'image/gif';
break;
case 'png':
$mime_type = 'image/png';
break;
case 'swf':
$mime_type = 'application/x-shockwave-flash';
break;
case 'mp3':
$mime_type = 'audio/mpeg';
break;
}
// send the mime_type
header('Content-Type: '.$mime_type);
// send the file itself
readfile($file);
}
// session is not set so user is invalid
else
{
// write a log file of the invalid access attempt
if (substr($_SERVER['REQUEST_URI'],-5) != '/none')
{
// open file
// saved format will be "path/name.2004.04.hotlink.cvs"
$fp=fopen('path/name.'.gmdate('y.m').'.hotlink.csv','a');
// write data
fwrite($fp,
$_SERVER['REQUEST_URI'].','
.date("Y-m-d\TH:i:s").','
.$_SERVER['REMOTE_ADDR'].','
.$_SERVER['HTTP_REFERER'].','
.$_SERVER['HTTP_USER_AGENT']."\n"
);
// close file
fclose($fp);
}
// overwrite default headers with non-caching headers
header('ETag: "'.md5(time().$file).'"');
header('Last-Modified: '.gmdate('D, d M Y H:i:s', time()).' GMT');
header('Cache-Control: must-revalidate');
header('Expires: '.gmdate('D, d M Y H:i:s', time()).' GMT');
// return 304 Forbidden header
header('HTTP/1.1 403 Forbidden');
}
</textarea>
</p>
<p>The other thing you must do, is ensure that a session is created when a user
first visits your site. This is surprisingly simple if you’ve used good
design decisions when creating your site, and you can easily add a piece of
code to every page – just add the following code to every page on your site: </p>
<p>
<textarea cols="82" rows="8" readonly="readonly" title="Session Start Code">// use this line if you want to use only cookies
// see below for more information
// ini_set('session.use_only_cookies',1);
session_start();
if(!$_SESSION['user'])
{
$_SESSION['user'] = time();
}
</textarea>
</p>
<p>There’s, unfortunately, one major flaw with all of this. By default, <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> will
attempt to store session information, generally in the form of a large pseudo-random
number, in a cookie. Cookies have received some bad press, so some people now
consider them a privacy risk and block them. This is not true – a session cookie
is not a privacy risk, it is a simple method used to ensure that session data
is propagated across different pages and you should never block them. However,
because people do block cookies, <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> has
a secondary method it uses should cookies fail. <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> adds
“?PHPSESSID=bigPseudoRandomNumber” to every link in your page. This works well
– when a user clicks on the link, <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> can
gather the session information from it. Unfortunately though, this has one major
drawback. Search engines such as Google don’t support cookies, so when
they visit your site they will see all links with a large random number appended.
Next time they visit your site, they will see all the links with a different
random number appended. This will confuse them and they will probably decide
your links have all changed, and your site will drop into the nether regions
of search-engine-land, never to be seen again. If this is likely to be a problem
with your site, you should use the following code to force <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> to
use cookies only. Be aware though, that when using cookies only, anyone who
refuses, or doesn’t support, cookies will not be able to view the protected
resources on your site. </p>
<p>To force <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> to
use cookie only session handling, either add the line “session.use_only_cookies=1”
to a php.ini file in the main root directory, or place “ini_set('session.use_only_cookies',1);”
directly before calling “session_start()”. </p>
<p>For more information on:</p>
<ul>
<li> <acronym title="Hypertext Preprocessor (HTML-embedded scripting language)">PHP</acronym> sessions,
see <a href="http://www.php.net/session" title="External Site | www.php.net">http://www.php.net/session</a></li>
<li>.htaccess, see <a href="http://httpd.apache.org/docs/howto/htaccess.html" title="External Site | httpd.apache.org">http://httpd.apache.org/docs/howto/htaccess.html</a></li>
<li>mod_rewrite, see <a href="http://httpd.apache.org/docs/mod/mod_rewrite.html" title="External Site | httpd.apache.org">http://httpd.apache.org/docs/mod/mod_rewrite.html</a></li>
<li>Robots, see <a href="http://www.robotstxt.org/" title="External Site | www.robotstxt.org">http://www.robotstxt.org/</a></li>
</ul>
<p> </p>
</body>
</html>
Last updated Tuesday, 20 November 2012