This is a python script to test out .htaccess mod_rewrite rules to block referrer spam. I just hate the idea of these parasites sucking my bandwidth.
1 # 2 # Test .htaccess 3 # 4 import httplib 5 import urllib2 6 7 # 8 # Site to test 9 # 10 strSite = "http://www.petersblog.org" 11 12 # 13 # Bad referrers: should fail 14 # 15 strBadReferrers = [ 16 "http://www.blah.info", 17 "http://blah.info", 18 "http://any.blah.info", 19 "http://www.blah.info/", 20 "http://www.blah.info/this/should/still/fail" 21 ] 22 23 # 24 # Good referrers: should pass 25 # 26 strGoodReferrers = [ 27 "http://www.google.com", 28 "http://www.google.com/search?q=tecrep-inc.net", 29 strSite, # allow internal referrer in 30 strSite + "/node/123", 31 "" # no referrer 32 ] 33 34 def TestReferrer( strReferrer): 35 "Test whether a referrer is allowed in: True if so" 36 try: 37 request = urllib2.Request(strSite) 38 if strReferrer != "": 39 request.add_header("referer", strReferrer) 40 opener = urllib2.build_opener() 41 data = opener.open(request).read() 42 return True 43 except(urllib2.HTTPError): 44 return False 45 46 # 47 # Test bad referrers. 48 # 49 for strReferrer in strBadReferrers: 50 if TestReferrer( strReferrer): 51 print "Failed: allowed %s in" % strReferrer 52 else: 53 print "Passed: didn't allow %s in" % strReferrer 54 55 # 56 # Test good referrers. 57 # 58 for strReferrer in strGoodReferrers: 59 if TestReferrer( strReferrer): 60 print "Passed: allowed %s in" % strReferrer 61 else: 62 print "Failed: didn't allow %s in" % strReferrerToggle Line Numbers
I find the following format for mod_rewrite referrer blocking to be effective.
RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.net($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blahblah.org($|/.*$) [NC]
RewriteRule ^.* - [F]
A good source for a list of sites to block can be found in any comment spam that happens to get through. Note that one rule such as:
RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [NC]
RewriteRule ^.* - [F]
will catch all the following permutations:
* http://www.blah.com/ * http://www.foo.blah.com/ * http://www.foo.bar.blah.com/ * http://www.foo.bar.blah.com/still/not/allowed
Note: I've seen referrer spelt 'referer' a lot and it is spelt this way in the .htaccess rules but google define assures me I'm spelling it right: referer sounds to me more like a smoker of certain narcotic substances.

