Peter's Blog

Redefining the Impossible

Items filed under drupal


I've missed blogging while I've been working on PetersBlogger. My old drupal blog was limping along: while administering the site it kept logging me out. I can admit now that I was running an old version of drupal (4.7?) which I couldn't upgrade due to module dependencies (mainly awtags). The old code wasn't running that smoothly under php5 despite initial appearancies.

I'm happy with my new blog platform, it's running really nicely on my slicehost slice and I've learnt an awful lot about rails while developing it. I will try to do a braindump of rails experiencies in forthcoming posts (cross fingers).

Drupal is a fine CMS and I'm using Drupal 5 on my company intranet. My thoughts on a drupal vs ruby on rails debate would be:

  • drupal/php/apache is easier to deploy than a rails/ruby/mongrel/nginx deploy: I moved my drupal site between about four different hosts and did it each time in less than an hour. A rails setup is more convoluted (especially mongrel_cluster). However, this site is running much faster under rails than it was under drupal. The difference is probably down to php and apache being more mainstream and slightly more refined and hence easy to set up. I'm not saying rails is a total pain, just that if I only had ten minutes in which to deploy a site I would be reaching for drupal. If I only wanted to install the source and a few modules and never touch any coding then I would be happy with drupal.
  • I love ruby and rails development. There is no comparison, php seems to me as much a bastard language as visual basic 6. Ruby was cleanly developed as an object orientated language, php is having object orientation grafted on as an afterthough.
  • If I look through my drupal source I see no unit testing. Ruby/Rails has unit testing built in and I'm completely sold on it. I don't think I will ever trust any code (especially code I write) if it hasn't been unit tested. Some interesting aspects of unit testing:
    • I understand now the 'test driven' approach to development where you design a module's api by first roughing out how the tests will work: the unit test is your first experience of using the new api. It's during testing that you learn how nice an api will be.
    • If the documentation of a new ruby/rails tool is dubious, look in the unit tests to see how the author intended it to be used. If there are no tests then run.
    • If I run into a tricky bug, it's better to first reproduce it in unit tests, fix it there and then be happy that the bug will never recur. It's easier to debug code in a unit test (essentially a command line application) than on a live website.
  • all ruby on rails applications follow basically the same architecture. It is fairly easy to figure out how a new application works. Every php application is different. I know I'm comparing a web framework to a programming language but none of the php applications (drupal, phpmyadmin) or am familiar with (wordpress) share a common architecture. New php application to maintain? New learning curve.
  • my new site is being hammered with attempts to post to /comments/reply, even though it doesn't use that url for comment submission. The comment spammers know drupal and know where to test the locks. If any of them ever bother to try to find my comment submission url they will only find the same captcha that protected my drupal site (which was much easier to implement in rails, simply as a validation on the model).


This blog was offline for a while yesterday. One of the perils of shared hosting is that you have to share the mysql database with other people who tend to use up all the available connections etc and an admin has to sort it all out.

The blog was left broken and visitors were advised that the drupal sessions table was corrupted and it needed repairing. How nice of drupal to blab secrets like the name of the database. Repairing the database through the site5 interface didn't work, it didn't even list the sessions table in the results. Fixing it turned out to be quite simple, log into mysql and go:

REPAIR TABLE SESSIONS;

The sessions table had 25,000 entries in it! Since this is old session data and not at all essential (I have the only active account) I zapped it:

DELETE FROM SESSIONS;

Fixed.

Apart from server maintenance, problems with mysql seem to be the main reason for downtime on this blog. Should I move it across to my new server where all the connections are mine and mine alone? If I did I would probably want to make sure I was backing the blog up: I wouldn't want to lose three years and 43 weeks, 1480 articles, a detailed history of my World of Warcraft adventures and the internets primary cuprin0l fence sprayer vitriol page. Site5 do automatic backups, backing up on the new server would be totally down to me. Fortunately I can back it up to... my site5 account!


Filed under: drupal mysql

3 Comments

There are three ways to blog from a pocketpc that I have found:

  1. open your blog in a web browser and edit it as normal. This doesn't work so well in practise unless your web page has been designed with a css style sheet that works nicely on a pda. I tried modifying my drupal theme accordingly and made it presentable but it's not up to data entry.
  2. try using an application for posting to blogs. There are a couple of these about but those that I tried were buggy or had tiny little edit boxes.
  3. write the article and submit by email. This is how I am posting this, I use phatnotes, a notetaking application that can send the notes via email. I don't use pocket outlook for the simple reason that it doesn't seem to save sent mail so I cannot edit the article and send it again to modify it. I use my old mailbot script on the server to capture posts and poke them into drupal.

So far this has worked ok and I can compose posts offline and upload them at my leisure.

Disadvantages:

  • No preview unless I go online, hence I keep posts simple.
  • cutting and pasting urls is fiddly and I cannot be bothered with it.
  • my script adds tags by scanning the database for existing tags, searching the post for the tag words and adding the tags that it finds. This can give irrelevant tags and I cannot define new tags. I need a neat way to specify tags in the post.

Advantages:

  • the convenience of whipping my pda out and having a quick blog. I don't have to go upstairs and get my laptop.
  • can lay on settee and blog in comfort.

2 Comments

What's the single most tedious thing about blogging: having to press the 'Add Blog Item ' button. I think I've hit on why taking notes with EverNote is fast: there is a blank note at the bottom of the screen that you can just start typing in.

I've realised that this could be done in a web app, simply by having a box on every page ready to start typing in. Google mail does something like this already: there is a little text box at the bottom of each message that you can select and start typing a reply: when selected it resizes itself and formatting toolbars and stuff appear. It's all very AJAXy and slick. Most of the time you can ignore it because it's not too big, once you explicitly start using it all the associated tools appear.

Compare to posting in drupal where there's all kinds of stuff filling the page: Input format, date edit boxes, categories, tags, I have to scroll down two pages to find the 'preview' and 'submit' buttons.

Here's a rough outline of the quick blogging features:

  • Regular blog page appears with a textarea box ready for typing in
  • Title is first line (if preceeded with a - or something), then a list of tags line, each tag preceeded with a dot or something similarly lightweight.
  • Rest of post is in wilki format.
  • Big button that uses AJAX to generate a preview which appears under the text box
  • After preview a post button appears.

Everything goes in the one textarea box, no need to tab or click between controls. Now I know what I want, how to implement it?

I am growing disillusioned with EverNote, mainly because of the buggy handling of formatting: if you mark something bold, for example, it has an annoying habit of not turning the bold off, you have to fiddle around selecting a big block and turning it all off explicitly (very much like Microsoft Word). I'm happy to use markup to make things bold, it's simple and understandable.

I want all my notes on a server where I can get to them from anywhere.


Filed under: blogging drupal evernote wilki

8 Comments

I knocked up a quick python script to scan my drupal watchdog list for comment spammers. The log covers the last week. In total there were 1250 spam attempts from 448 distinct ip addresses.

All these comment spams pretend to come from Windows XP, IE 6 so they cannot be filtered out by user agent.p

My hack to the comment module to prevent urls being submitted generates watchdog messages and this script looks for these.

Here is the script:

   1  import MySQLdb
   2  
   3  o = MySQLdb.connect( '127.0.0.1', 'me', 'secret')
   4  
   5  o.select_db( 'drupal_db')
   6  
   7  c = o.cursor()
   8  
   9  c.execute( """select message, hostname from watchdog
  10                where message like 'Comment:%'""")
  11  
  12  oBadGuys = {}
  13  oGoodGuys = {}
  14  
  15  while 1:
  16      oRow = c.fetchone()
  17      if not oRow:
  18          break
  19  
  20      strMessage, strSender = oRow
  21  
  22      if strMessage.startswith( 'Comment: attempted'):
  23          oBadGuys[strSender] = oBadGuys.get( strSender, 0) + 1
  24  
  25      if strMessage.startswith( 'Comment: added'):
  26          oGoodGuys[strSender] = oGoodGuys.get( strSender, 0) + 1
  27  
  28  #
  29  # Good guys manage to submit comments without problems.
  30  # Remove them from the bad guy list.
  31  #
  32  for strKey in oGoodGuys.keys():
  33      if strKey in oBadGuys:
  34          print strKey + ' is not so bad'
  35          del oBadGuys[strKey]
  36  
  37  nTotal = 0
  38  
  39  for strKey, nCount in oBadGuys.items():
  40      print strKey, nCount
  41      nTotal += nCount
  42  
  43  print "%d spams from %d bad guys" % (nTotal, len(oBadGuys))

I must get on with my turbogears based blog so I can do more about this. Drupal logging is a bit lame: doesn't log referrer or user agent which might be useful, have to cross reference with apache logs. There is more that I can do to make it harder to suck my bandwidth but my php is not strong enough and it's more fun to do in python (won't wear my $ key out).


Filed under: captcha drupal python spam


My site is being really hammerred by comment spammers today but not one has got through thanks to my policy of refusing to allow comments containing urls to be submitted (not even for moderation: I moderate all comments, I found deleting comment spam to be tedious as well as annoying).

It is a simple hack to the drupal comment module but it is very effective. Ok, I could get spam without url's in but what's the point, apart from vandalism? They still go into the moderation queue and get deleted.

And when people do want to post url's they soon figure out how to get around the block. If they cannot do that then their comments are probably not worth consideration anyway.

I'd give the details of the hack here but it gives the spammers a clue. If you are interested then email me.

Update: following on from the vast surge in comment spam attempts (three or four a minute, 24/7), statcounter tells me people are searching for drupal captchas. I have given up on these, something about drupal states, redirects, session management or whatever stops them working reliably. The spam comment check is just part of the comment validation, there is nothing much that can go wrong with it, it is just straight if/then/else code.

In a way the spam check is a captcha (Completely Automated Public Turing Test to Tell Computers and Humans Apart), you can still get through if you show some smarts. It doesn't use graphics so it doesn't look cool and it doesn't shut out blind people.

The comment spam is coming from a range of ip addresses, maybe an array of compromised pc's (thanks Microsoft). Each 'failure' page is using some of my 10G/month bandwidth. I'll have to keep an eye out and see what kind of impact this is having. It could be even worse than inktomi slurps bots doing 100M of crawling a month and not directing anyone here through their search results.


Filed under: captcha drupal spam

9 Comments

Had some users complain about drupal generating narrow little comment boxes in version 4.6.4. drupal was generating 'cols="70"' instead of 'cols="70"' in the html.

Traced it to the function include/common.inc/form_textarea which handles cols in a different way to rows and messes it up. Worse still, it looks as if someone has done this on purpose with of course no mention why. I changed the code so cols are handled in the same way as rows and I'll wait to see what has broken.

Here is my version, reformatted to avoid the last line being 413 columns long (what's that smell?).

   1  function form_textarea($title, $name, $value,
   2                         $cols, $rows, $description = NULL,
   3                         $attributes = NULL, $required = FALSE) {
   4  // pcw  $cols = $cols ? ' cols="'. $cols .'"' : '';
   5    $pre = '';
   6    $post = '';
   7  
   8    // optionally plug in a WYSIWYG editor
   9    foreach (module_list() as $module_name) {
  10      if (module_hook($module_name, 'textarea')) {
  11        $pre  .= module_invoke($module_name, 'textarea', 'pre', $name);
  12        $post .= module_invoke($module_name, 'textarea', 'post', $name);
  13      }
  14    }
  15  
  16    return theme('form_element',
  17                 $title,
  18                 $pre .'<textarea wrap="virtual"'
  19                     .' cols="' . check_plain($cols).'"'
  20                     .' rows="'. check_plain($rows)
  21                     .'" name="edit['. $name .']" id="edit-'
  22                     . $name
  23                     .'" class="'
  24                     . _form_get_class('textarea',
  25                                   $required,
  26                                   _form_get_error($name))
  27                     .'"'
  28                     . drupal_attributes($attributes)
  29                     .'>'
  30                     . check_plain($value)
  31                     .'</textarea>'
  32                     . $post,
  33                 $description,
  34                 'edit-' . $name,
  35                 $required,
  36                 _form_get_error($name));
  37  }

Filed under: drupal


Since I started with statcounter very nearly a year ago, it tells me my site has served up 100,214 pages to 72,521 visitors.

I have quickly become bored with Google Analytics: I think it's appeal is more for marketing departments and powerpoint/pie chart enthusiasts. I like the raw detail that statcounter gives me in an easily accessable way. I can see what people are searching for and sometimes it even inspires me to update articles to be more useful.

I still think their counting is buggy: returning visitors seems unreliable, as if page refreshing counts as a return visit, so I take it with a pinch of salt and follow the trends. The numbers are roughly on a par with google analytics.

Statcounter is easier to install, it can go anywhere on the page, Google Analytics has to go in the header. For drupal this means editing the page template rather than sticking it in a block.


Filed under: drupal google statcounter

5 Comments

Been looking at Mochikit, a javascript utility library, and it's good enough to remove my cynicism about javascript.

I had to play with the AJAX stuff, the ability for javascript in a browser to send a request to a server and the server to respond with data for the javascript to poke into the existing page. The big advantage of this is that the server does not have to send back a whole page for display, so the overall effect is smoother and faster.

Mochikit makes this easier and hides some of the problems with different browsers (i.e. bugs in IE). It won't work on old broken browsers but there are enough modern browsers going for free that I don't really care.

I've put my example online here but I may remove it if it gets abused or I may rearrange things and break it. In this example you type something in the box and click the link. Whatever you type is sent to the server, processed and sent back. The browser then puts the new version at the bottom of the page. Not a big deal but it's not what it does, it's the way that it does it.

Here is the code.

First, the web page:

   1  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   2  <html>
   3      <head>
   4          <title>Peter plays with Mochikit</title>
   5          <script type="text/javascript" src="MochiKit.js"></script>
   6          <script type="text/javascript" src="Peter.js"></script>
   7      </head>
   8      <body>
   9          <h1>
  10              Peter plays with Mochikit
  11          </h1>
  12          <form name="Enter Stuff">
  13              <label>Enter something:</label>
  14              <input id="blah" name="user" value="fred"/>
  15          </form>
  16          <a href="javascript:void(0)" onclick="onDoit()">Click Me</a>
  17          <div id="putithere"/>
  18      </body>

Then the javascript (Peter.js):

   1  
   2  var gotMetadata = function (oData) {
   3      var payload = evalJSONRequest( oData);
   4      replaceChildNodes( "putithere", P( null, payload.what));
   5  };
   6  
   7  var metadataFetchFailed = function (err) {
   8    alert( "The metadata for MochiKit.Async could not be fetched");
   9  };
  10  
  11  function onDoit() {
  12      var xmlHttpReq = getXMLHttpRequest()
  13  
  14      xmlHttpReq.open( "POST", "Peter.py", true);
  15      xmlHttpReq.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');
  16      var d = sendXMLHttpRequest( xmlHttpReq, "blah=" + escape(getElement( "blah").value));
  17      d.addCallbacks(gotMetadata, metadataFetchFailed);
  18  }

Then the python cgi that runs on the server (Peter.py):

   1  #!/usr/bin/python
   2  
   3  import cgi
   4  
   5  form = cgi.FieldStorage()
   6  strBlah = form.getfirst( "blah", "")
   7  
   8  print """
   9  { "what": "It says %s" }
  10  """ % strBlah.replace( '"', '\\"')

This also requires the packed Mochikit library which is available in the Mochikit distribution (/packed/Mochikit/Mochikit.js). All the files can go in the same directory on the server. Obviously you have to get cgi working with python.

My main motivation for using this is to see if I can optimise the preview facility in this blog: I would like more-or-less instant preview generation. This is definitely achievable, the only problem being getting mochikit to work with php rather than python, thats where I am in unknown territory. I might cheat and use xml-rpc from python to drupal.

The only downside of Mochikit that I can see if that it is a 96k download which makes dial-up users suffer even more. You have my sympathy.


Filed under: drupal mochikit python

6 Comments

If my rss subscribers are getting a flood of seemingly duplicate postings it is because I decided to reformat my posts a tad to emphasise the awtags links below the node bodies. I edited the awtags source to change the word 'tags:' to the more informative 'Related Topics:' and I edited the awTags_TagLinks css style to delineate the links from the node body:

   1  .awTags_TagLinks {
   2      padding: 5px;
   3      margin: 20px 10px 10px 10px;
   4      border-top: 1px solid black;
   5  }
   6  
   7  .sticky .awTags_TagLinks {
   8      visibility: hidden;
   9      display: inline;
  10  }

This also hides awtags from sticky nodes which I use at the top of tag descriptions.


Filed under: awtags blogging drupal rss