Peter's Blog

Redefining the Impossible

Items filed under htaccess


awTags was adding an entry to the navigation menu called 'My Tags'. This was irritating me because it was presented to anonymous users and was the only reason for the navigation menu to appear. Looking in 'awtags.module', there are no options to control it so I changed the source so it will only appear for logged-in users:

   1  /*
   2   * Implementation of hook_menu
   3   */
   4  function awTags_menu($may_cache) {
   5    global $user;
   6  
   7    $items = array();
   8  
   9    if ($may_cache) {
  10  
  11      // pcw: only logged in users can have 'my tags'
  12      if( $user->uid) {
  13         // /usertags/tags (my tags)
  14         $items[] = array(
  15         'path' => "usertags/$user->uid",
  16         'title' => t('my tags'),
  17         'access' => user_access('access tags'),
  18         'callback' => '_awtags_page',
  19         'callback arguments' => $user->uid,
  20         'type' => MENU_DYNAMIC_ITEM);
  21     }
  22  
  23     ... rest of function unchanged.
  24  }

I tested this in IE where I am anonymous (like all IE users) and no change. Forgot to flush the damn Drupal cache for the umpteenth time: the menu's are cached. I took the time to knock up a php script to flush the cache for me so I don't have to fiddle with the mysql command line:

<?php
include_once 'includes/bootstrap.inc';
include_once 'includes/common.inc' ;

db_query('DELETE FROM {cache}');

echo( "Done");

?>

Save the above in a file called FlushCache.php on your server and just open it in a browser to flush the cache. It may be advisable to set up your .htaccess so that only you can access the file:

<Files "FlushCache.php">
  order deny,allow
  deny from all
  allow from [my ip address]
</Files>


mod_rewrite is a module for the Apache web server that allows web requests to be rewritten according to rules listed in a .htaccess file.


Filed under: htaccess mod_rewrite


My server is being hit by attempts to submit trackback spam which is particulary annoying as I don't have trackback. By default Drupal formats up a full web page with a fancy 'page not found' line for a 404 error (page not found). To save server time and bandwidth, I've put this at the top of my .htaccess file:

ErrorDocument 403 /fail.html

Fail.html is a minimal html file containing just the string '403 error'. Should be little enough load for the server:

<header>
<title>Error</title>
</header>
<body>
Error 403
</body>

This is added to the mod_rewrite rules:

#
# Reject any attempt to submit trackback spam
#
RewriteRule ^(.*)trackback(.*)$ - [F]

any url with 'trackback' in it is rejected with the minimal 403 error.


Filed under: drupal htaccess mod_rewrite


I hadn't been worrying much about comment spam recently as I had been banning it successfully in my .htaccess rules as every access attempt had a spammish referrer link.

Once past the .htaccess block they found a bug in my comment captch mod: by just posting without doing a prior read of the page the session was not being set up. I've modified the captcha code to cope with this. Note that this is still Drupal 4.5.1, I haven't upgraded to 4.5.2 yet.


Filed under: captcha drupal htaccess


This is a python script to test out .htaccess mod_rewrite rules to block referrer spam. I just hate the idea of these parasites sucking my bandwidth.

   1  #
   2  # Test .htaccess
   3  #
   4  import httplib
   5  import urllib2
   6  
   7  #
   8  # Site to test
   9  #
  10  strSite = "http://www.petersblog.org"
  11  
  12  #
  13  # Bad referrers: should fail
  14  #
  15  strBadReferrers = [
  16      "http://www.blah.info",
  17      "http://blah.info",
  18      "http://any.blah.info",
  19      "http://www.blah.info/",
  20      "http://www.blah.info/this/should/still/fail"
  21  ]
  22  
  23  #
  24  # Good referrers: should pass
  25  #
  26  strGoodReferrers = [
  27      "http://www.google.com",
  28      "http://www.google.com/search?q=tecrep-inc.net",
  29      strSite,                    # allow internal referrer in
  30      strSite + "/node/123",
  31      ""                          # no referrer
  32  ]
  33  
  34  def TestReferrer( strReferrer):
  35      "Test whether a referrer is allowed in: True if so"
  36      try:
  37          request = urllib2.Request(strSite)
  38          if strReferrer != "":
  39              request.add_header("referer", strReferrer)
  40          opener = urllib2.build_opener()
  41          data = opener.open(request).read()
  42          return True
  43      except(urllib2.HTTPError):
  44          return False
  45  
  46  #
  47  # Test bad referrers.
  48  #
  49  for strReferrer in strBadReferrers:
  50      if TestReferrer( strReferrer):
  51          print "Failed: allowed %s in" % strReferrer
  52      else:
  53          print "Passed: didn't allow %s in" % strReferrer
  54  
  55  #
  56  # Test good referrers.
  57  #
  58  for strReferrer in strGoodReferrers:
  59      if TestReferrer( strReferrer):
  60          print "Passed: allowed %s in" % strReferrer
  61      else:
  62          print "Failed: didn't allow %s in" % strReferrer

I find the following format for mod_rewrite referrer blocking to be effective.

RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.net($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [OR]
RewriteCond %{HTTP_REFERER} ^http://[^/]*blahblah.org($|/.*$) [NC]
RewriteRule ^.* - [F]

A good source for a list of sites to block can be found in any comment spam that happens to get through. Note that one rule such as:

RewriteCond %{HTTP_REFERER} ^http://[^/]*blah.com($|/.*$) [NC]
RewriteRule ^.* - [F]

will catch all the following permutations:

* http://www.blah.com/
* http://www.foo.blah.com/
* http://www.foo.bar.blah.com/
* http://www.foo.bar.blah.com/still/not/allowed

Note: I've seen referrer spelt 'referer' a lot and it is spelt this way in the .htaccess rules but google define assures me I'm spelling it right: referer sounds to me more like a smoker of certain narcotic substances.



I've been keeping an eye on my visitor logs to see how much my domain name problems have effected my traffic. According to Statcounter they had been climbing but yesterday there is a sudden dip. The Awstats logs provided by Site5 show no such dip.

I've seen a number of such dips in the Statcounter logs: their servers do not appear to be the most reliable. This is not a big complaint, I use them for free, more of a lamentation. Their professional service is too expensive for my simple ego brushing needs, $9 a month, but if I was paying that I would not want drop-outs approximately once a week.

The main advantage of Statcounter for me is that it counts visitors who have javascript enabled so it is essentially counting human beings rather than crawlers and referrer spam bots. It is also easy to set it up to ignore my own IP address. The Drupal statistics module does not have this feature but I could simply use phpmyadmin or another generic mysql database report generation tool to filter the drupal logs in any way I desire. The statistics module does list external referrers in reverse chronological order so it is useful for updating .htaccess referrer exclusion lists.


1 Comment

I use XMLRPC to update my blog from email. I use the python xmlrpclib to invoke the metaWeblog interface in Drupal.

The XML RPC interface can be accessed easily as follows:

>>> import xmlrpclib
>>> oDrupal = xmlrpclib.ServerProxy( 'http://www.my.web.site.com/xmlrpc.php')
>>> oDrupal.metaWeblog.getRecentPosts( '', '<username>', '<password>', 25)

From my server logs it seems that someone else has being trying to get in here so I've closed the door by editing the .htaccess file to prevent entry from the outside world:

  RewriteCond %{REMOTE_ADDR}       !^123.123.123.123$
  RewriteRule ^/xmlrpc.php          -   [F]

The above is saying that only the ip address 123.123.123.123 has access to the file xmlrpc.php. IP address is the address of my host.


Filed under: blog drupal htaccess php python


Added an atom feed to keep whatever keeps trying to find one happy. I did this as follows:

  • Installed atom module. Zero documentation.
  • Enabled module in drupal
  • Added this to .htaccess file:
    RewriteRule atom.xml atom/feed/1
    

The link is also here.

Regarding any Atom v RSS wars, I'm probably on the RSS side as the tools I have used (Python Desktop Server, Drupal) have supported it by default and they work for me. Atom is for those Blogger.com instant boilerplate blog folk.


1 Comment

I noticed in my Drupal logs that google is looking for a file called rss.xml on my site:

09/10/2004 - 13:23  404 error: 'rss.xml' not found	Anonymous

I am eager to keep google happy but how to create such a file? I had a brainwave and added a mod_rewrite rule to my .htaccess file:

RewriteRule rss.xml blog/feed/1

So any attempts to access rss.xml trigger the link that my RSS buttons point to. Note that I have 'Clean Urls' turned on.

Come back google, I'm waiting for you!



Getting Drupal to run on Site5 was not entirely straightforward. I used the fantastico script thing to install it but I got 403 errors whenever I tried to access the site. The error told me that I could not access /index.php. This was resolved by putting the following in my .htaccess file:

Options ExecCGI

Then I was still getting a 403 error on the directory /. The error log said:

[Thu Sep  9 06:07:36 2004] [error] [client 80.88.204.40] Options FollowSymLinks or
SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden:
/home/bisiand/public_html/403.shtml

So I changed .htaccess as follows:

# Set some options
Options -Indexes
Options +ExecCGI
Options +FollowSymLinks
Options +SymLinksIfOwnerMatch

Drupal started working but I kept getting the following errors on each page:

warning: Cannot modify header information - headers already sent

This was because I had been editing the Drupal conf.php file using Site5's NetAdmin tool and whenever I saved the file a blank line was added to the end. php was treating this as content to be output. I had to download the conf.php file, edit it in Vim to delete the blank lines and upload it again.

Trying to modify the Drupal theme, I then got an error from marvin.theme about no base class to inherit from. To fix this I had to move the directories themes/marvin and themes/unconed to the /tmp directory to hide them. There may be a fix to get them going but I don't really care.

After this, everything was fine.


1 Comment