Peter's Blog

Redefining the Impossible

Mod_rewrite Referrer (aka Referer) Spam Blocking


This is a python script to test out .htaccess mod_rewrite rules to block referrer spam. I just hate the idea of these parasites sucking my bandwidth.

   1  #
   2  # Test .htaccess
   3  #
   4  import httplib
   5  import urllib2
   6  
   7  #
   8  # Site to test
   9  #
  10  strSite = "http://www.petersblog.org"
  11  
  12  #
  13  # Bad referrers: should fail
  14  #
  15  strBadReferrers = [
  16      "http://www.blah.info",
  17      "http://blah.info",
  18      "http://any.blah.info",
  19      "http://www.blah.info/",
  20      "http://www.blah.info/this/should/still/fail"
  21  ]
  22  
  23  #
  24  # Good referrers: should pass
  25  #
  26  strGoodReferrers = [
  27      "http://www.google.com",
  28      "http://www.google.com/search?q=tecrep-inc.net",
  29      strSite,                    # allow internal referrer in
  30      strSite + "/node/123",
  31      ""                          # no referrer
  32  ]
  33  
  34  def TestReferrer( strReferrer):
  35      "Test whether a referrer is allowed in: True if so"
  36      try:
  37          request = urllib2.Request(strSite)
  38          if strReferrer != "":
  39              request.add_header("referer", strReferrer)
  40          opener = urllib2.build_opener()
  41          data = opener.open(request).read()
  42          return True
  43      except(urllib2.HTTPError):
  44          return False
  45  
  46  #
  47  # Test bad referrers.
  48  #
  49  for strReferrer in strBadReferrers:
  50      if TestReferrer( strReferrer):
  51          print "Failed: allowed %s in" % strReferrer
  52      else:
  53          print "Passed: didn't allow %s in" % strReferrer
  54  
  55  #
  56  # Test good referrers.
  57  #
  58  for strReferrer in strGoodReferrers:
  59      if TestReferrer( strReferrer):
  60          print "Passed: allowed %s in" % strReferrer
  61      else:
  62          print "Failed: didn't allow %s in" % strReferrer
Toggle Line Numbers

I find the following format for mod_rewrite referrer blocking to be effective.

RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.net($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blahblah.org($|/.*$) [NC]
RewriteRule ^.* - [F]

A good source for a list of sites to block can be found in any comment spam that happens to get through. Note that one rule such as:

RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [NC]
RewriteRule ^.* - [F]

will catch all the following permutations:

* http://www.blah.com/
* http://www.foo.blah.com/
* http://www.foo.bar.blah.com/
* http://www.foo.bar.blah.com/still/not/allowed

Note: I've seen referrer spelt 'referer' a lot and it is spelt this way in the .htaccess rules but google define assures me I'm spelling it right: referer sounds to me more like a smoker of certain narcotic substances.


Sorry but comments on this post are now closed.