Peter's Blog

Redefining the Impossible

Items filed under python


I have a two and a half year old python/turbogears web application that is fairly mission critical yet thoroughly software rotted. It was built with a delicate mix of python eggs running on python 2.4 and ubuntu hardy. Any attempt to upgrade any part of it yeilded obscure errors.

A recent disk failure on another server had me worrying about this state of affairs: how to migrate it to a new server? Well, an idea hit me: copy all the python2.4 site-packages to the applications own directory on the new server and set it's path up to use them rather than try to install the correct hotch-potch of system libraries.

This didn't work immediately as the site-packages directory was full of broken symbolic links to the python setup tools and mysql gems. Installing these from ubuntu packages didn't work because the installation directories had changed between ubuntu hardy and jaunty.

In the end I avoided the need for python setup tools by unpacking all the python eggs, including the mysql stuff, into bare sets of python source. This worked surprisingly well, problem solved.

Now I had a working blob of code and libraries that only needed python2.4 and it's standard libraries to work. I quickly shoved it into git in it's functional glory.

The experience made me appreciate the concept of freezing gems with a rails app:

  • everything goes into git with minimal external dependencies
  • no worry that (for example) the gem for a specific version of mysql is no longer available anywhere when redeploying (the sqlite gem on windows is precarious).
  • quicker redeploy on a clean server: no run server/get missing gem error/install gem/ run server... loop.

I find that every rails update tends to break at least one thing and I only discover them when I am doing a panicy server rebuild.


Filed under: python ruby


I blogged long ago about validating users against a windows domain server using winbind. The users were logging into a web application running on a linux box and winbind allowed their main windows passwords to be used.

Well two problems have come up since those distant days:

1) Through two upgrades of the linux installation (dapper drake->feisty fawn) the python winbind module has become broken when trying to use python version 2.4. It may work with the latest python 2.5 but I am using an old version of turbogears that relies on many old libraries, not all of which are still available in version 2.4 builds. I've tried rebuilding stuff but there seems to be a deep dependency problem between the python module and the winbind library.

2) I want to use the same trick in ruby but I cannot find a winbind gem.

Solution: shell out to a command line tool:

Python version:

bOk = False

if os.system( 'ntlm_auth --username="%s" --password="%s"' % (strUserName.strip(), strPassword.strip())):
    bOk = True

Ruby version:

bOk = system( 'ntlm_auth --username="#{strUserName.strip}" --password="#{strPassword.strip}"')

This is slightly horrible not just because it runs a shell but also because it will send plain-text passwords over the network, unencrypted so any packet sniffer can see them.

This seems to work from with fastcgi based web application. I'm not sure mod_python would allow system calls but I think I prefer fastcgi anyway as it's less flaky.

UPDATE: it is slightly more horrible because it doesn't escape special characters in the username or password: for example it is broken if the password contains and ampersand.


Filed under: linux python runy

2 Comments

Not sure any of this is recommended practise but it appears to work so I'm noting it here.

Overriding Array Methods

   1   class MyArray
   2     #
   3     # Initialise array
   4     #
   5     def initialize
   6        @zog = []
   7     end
   8  
   9     #
  10     # Read from array
  11     #
  12     def [](x)
  13         return @zog[x]
  14     end
  15  
  16     #
  17     # Assign to array
  18     #
  19     def []=(x,y)
  20       @zog[x] = y
  21     end
  22   end
Toggle Line Numbers

Used thus:

   1  irb(main):021:0* a = MyArray.new
   2  => #<MyArray:0x30d8074 @zog=[]>
   3  irb(main):024:0> a[0] = 1
   4  => 1
   5  irb(main):025:0> a[0]
   6  => 1
   7  irb(main):026:0> a[1] = 2
   8  => 2
   9  irb(main):027:0> a[0]
  10  => 1
  11  irb(main):028:0> a[1]
  12  => 2
  13  irb(main):029:0>
Toggle Line Numbers

Interesting that ruby doesn't seem to be fussy about array indices:

irb(main):031:0* g = []
=> []
irb(main):032:0> g[5] = 2
=> 2
irb(main):033:0> g[2]
=> nil

Contrast to python:

>>> a = []
>>> a[5] = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

I think if I had the choice I would rather have my wrist slapped: the ruby way looks like an opportunity for obscure bugs.

Method Missing

It took me ages to work this out but it's simple. You have a class and you want to hook into calls to any undefined methods. Maybe you want to determine the method names at runtime. The hook is called 'method_missing':

   1  class Klack
   2    def method_missing( strName, *args)
   3      print "Calling undefined method #{strName} with arguments #{args.inspect}"
   4    end
   5  end
   6  
   7  irb(main):054:0> k = Klack.new
   8  => #<Klack:0x309dd84>
   9  irb(main):055:0> k.wibble( :wobbles, :banana)
  10  Calling undefined method wibble with arguments [:wobbles, :banana]=> nil
Toggle Line Numbers

args appears to be a simple array holding the arguments the method was called with.

Note how useful the .inspect method is. It's equivalent to repr in python in making anything into a human readable string. Also gotta love how easy it is to insert the values of variables into strings.

UPDATE: this does seem to make for very fragile code. Just about any problem in your method_missing method can lead to a stack overflow. Or maybe it's a rails problem?

Wots the colon?

The colon thing is handy:

b = :blah

:blah is like a string constant, something like a string, easier to type but less complex in it's implementation. Have to be careful with them though:

   1  irb(main):058:0> b = :blah
   2  => :blah
   3  irb(main):059:0> b[1]
   4  NoMethodError: undefined method `[]' for :blah:Symbol
   5          from (irb):59
   6  irb(main):060:0> b == 'blah'
   7  => false
   8  irb(main):061:0> b == :blah
   9  => true
  10  irb(main):062:0> b.to_s == 'blah'
  11  => true
Toggle Line Numbers

They don't have common string functions and :blah is NOT comparable to 'blah'.

Regular Expression Gotcha

The =~ operator for testing a string against a regular expression returns the offset of the match.

irb(main):086:0> o = "poop1" =~ /poop(\d+)(=?)/
=> 0

In this case the result is zero as the match starts at character offset zero. The operator would return nil if there was no match.

If you want access to the expression match object then this does the job:

irb(main):088:0> o = /poop(\d+)(=?)/.match( "poop1")
=> #<MatchData:0x3085748>
irb(main):089:0> o[1]
=> "1"
irb(main):090:0>

o is the expression match object where you can examine the juicy details of the match. These are also available in global varibles such as $1, $~ etc but everyone knows that using global variables is sloppy.

Zero is True

The =~ result works nicely with an if operator to detect a match because of the strangest ruby design decision: zero is true:

irb(main):095:0> if 0
irb(main):096:1>  print 'true'
irb(main):097:1> else
irb(main):098:1*  print 'false
irb(main):099:1> end
true=> nil

nil is false but 0 is true.

This is totally at odds with python where 0 and None are both False (with a capital F). I think even basic has 0 as false.

Processing Arrays

The collect method looks useful for writing one-liners:

irb(main):103:0> [1,2,3].collect { |x| x + 1}
=> [2, 3, 4]

Each item in the array is processed by the block and the results returned in a new array. To me this is more readable than the python version:

[ i + 1 for i in [1,2,3]]

Filed under: noob python ruby

2 Comments

So how does one go about mining the information in the World of Warcraft Armory? Inspired by applications like the Warcraft Signature Generator I had to find out.

Here is a python script to interrogate the armory. This will download an xml document containing all the information about a character.

   1  # Interrogate wow armory
   2  
   3  import urllib2
   4  import xml.dom.minidom
   5  
   6  #
   7  # Character we are nosing into
   8  #
   9  strRealm = 'Aerie Peak'
  10  strCharacter = 'Pookypoo'
  11  
  12  #
  13  # Look the data up in the european armory. Change this for US.
  14  #
  15  strUrl = "http://armory.wow-europe.com/character-sheet.xml?r=%s&n=%s" %
  16            (strRealm.replace( ' ', '+'), strCharacter)
  17  
  18  #
  19  # Open url.
  20  # Need to specify firefox as user agent as this makes the server return an XML file.
  21  # If this is not done we get html.
  22  #
  23  oOpener = urllib2.build_opener()
  24  oOpener.addheaders = [
  25     ('user-agent',
  26      'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4'),
  27     ]
  28  
  29  req = urllib2.Request( strUrl)
  30  strFile = oOpener.open(req).read()
  31  
  32  #
  33  # Now have raw xml file, can print it if interested
  34  #
  35  # print strFile
  36  
  37  #
  38  # Use xml dom to parse the file
  39  #
  40  oDoc = xml.dom.minidom.parseString( strFile)
  41  
  42  #
  43  # Quick hack to display certain attributes of certain elements.
  44  #
  45  strAttributes = (( 'character', 'level'),
  46                      ( 'character', 'guildName'),
  47                      ( 'baseStats', 'agility'))
  48  
  49  for strElement, strAttribute in strAttributes:
  50      oElement = oDoc.getElementsByTagName( strElement)[0]
  51      strValue = oElement.getAttribute( strAttribute)
  52      print strElement, strAttribute, strValue
Toggle Line Numbers

This particular script will display the characters level, guild name and agility. There is much more information in there, it's just a matter of finding a use for it. I might get around to automatically updating the character levels shown on this site as the sig generator that inspired this gets hammered a lot and is not totally reliable. UPDATE: not criticising here, it's just a very popular site with heavy loading. The tools there are very nice and more than my imagination could come up with.


Filed under: games python warcraft wow

3 Comments

Right now I'm working in Python yet I am already missing Ruby Blocks.

What I am doing is I am sending commands to firmware and processing the results. I have common code like this:

def DoCommand():
   try:
      SendCommand()
      oResults = GetResults()
      #
      # Process the results
      #
   except:
      ProcessError()

Different types of command yield different results and I have to duplicate all this with different result handling code. In ruby I could do it like this:

   1  def DoCommand()
   2     begin
   3        SendCommand()
   4        oResults = GetResults()
   5        #
   6        # Process the results
   7        #
   8        yield( oResults)
   9     rescue
  10        ProcessError()
  11     end
  12  end
Toggle Line Numbers

and call it like this

DoCommand() { |oResults| print oResults }

i.e. I pass the result handling code as a block and the block is executed within the exception handler.

There are many other ways I could structure this but using blocks seems like a nice efficient way to do it using a minimum of typing. For example, in python I could do it like this:

   1  def DoCommand( oResultHandler):
   2     try:
   3        SendCommand()
   4        oResults = GetResults()
   5        #
   6        # Process the results
   7        #
   8        oResultHandler( oResults)
   9     except:
  10        ProcessError()
  11  
  12  def HandleOneKindOfResult( oResults)
  13    print oResults
  14  
  15  DoCommand( HandleOneKindOfResult)
Toggle Line Numbers

I have to define multiple functions like HandleOneKindOfResult and I have to think of names to give them where in ruby these are just nameless blocks.

So why am I doing what I am doing in python and not ruby? Two reasons:

  • short deadline
  • I am using wxPython, pyserial and pyexe and don't know of or have experience of any ruby equivalents.

Filed under: python ruby

2 Comments

After the short-lived Ruby on Rails Daily News theme died after three episodes I have been quiet on the subject. I have used Ruby on Rails for an Intranet application and I find I really like it. The system has a great depth of design, as if every little thing has been thought about, yet it is all done in a very simple way. Need a new static page on your website? Just create a .rhtml file for it, give the file the right name and it will be there on your site, no wiring required. Modifying your database schema? Want to add a column to a table? Create a migration that will both add the column and delete it if you change your mind. Once the migration is loaded the column is there in your mysql, sqlite or whatever database and also available in your object model.

The application I have built has made the users happy but needed what seems like a tiny amount of code. I haven't used the Ruby language much but you do seem to be able to express yourself more succinctly than you can in python, mainly due to the use of blocks.

My main reason for picking up Ruby-on-Rails apart from it being mainstream and commercial was that I can buy books about it and the Pragmatic Programmers books are very good, both "Agile Web Development with Ruby on Rails" and "Programming Ruby".

While the python web development platforms Django and Turbogears have no mature blogging packages ready (I can only find various "works in progress") Ruby on Rails has at least two, Typo and Mephisto. Typo comes with InstantRails and Mephisto appears to have a very small codebase given what it does. Both could be the basis for a CMS and I am very tempted to go this route since the Ruby-on-Rails platform is robust and well documented and the Ruby language is at least as good as Python.

Summary: Ruby on Rails is worth learning ruby for.


Filed under: python rails ruby

2 Comments

I don't normally put gui's on my python scripts but I am doing something for a client and cannot rely on them using the command line. All my script needs is the name of a directory to work with.

The following is a distillation of how to show the dialog for browsing for a directory under python and the win32com module.

   1  from win32com.shell import shell
   2  
   3  oNeedlesslyComplex = shell.SHBrowseForFolder(0, # parent HWND
   4                                  None, # root PIDL.
   5                                  "Choose directory to convert", # dialog title
   6                                  0, # flags
   7                                  None, # callback function
   8                                  None) # 'data' param for the callback
   9  
  10  if oNeedlesslyComplex[0] == None:
  11      pass
  12      # cancel pressed
  13  
  14  #
  15  # Get selected folder from weird return value.
  16  #
  17  strPath = shell.SHGetPathFromIDList(oNeedlesslyComplex[0])
Toggle Line Numbers

win32com is powerful but little documented.


Filed under: python windows


Been doing some python programming. Since my hassles I am now getting to grips with python 2.5 and the first thing I did required a fast database so I reached for sqlite. Looking on the pysqlite website there was no build for python 2.5. Odd. On a hunch/half remembered thing I simply tried it from the python 2.5 console:

import sqlite3

and there it was, it comes with the python standard library. Cool. The interface follows standard python dbi so it's easy for me to use.

   1  import sqlite3
   2  
   3  oDB = sqlite3.connect( "c:\\desktop\\my database")
   4  oCursor = oDB.cursor()
   5  oCursor.execute( "Select * from table")
   6  while 1:
   7      oRow = oCursor.fetchone()
   8      if not oRow:
   9          break
  10      print oRow
  11  
Toggle Line Numbers

It's fast, the job I am doing involves creating tens of thousands of records and sqlite is managing about 60,000 in a minute (1.1GHz AMD cpu). This is writing to disk files, creating the database in memory wasn't much faster (probably because my pc has only 512M and is swapping).

Sqlite is a straight database library, python is calling it directly via function calls. There is no inter-process communications, no COM overheads, no network overheads, this is pretty much as fast as databases get.

It is still SQL so there is still the overhead of parsing sql for the database to perform. Something I am getting into is this syntax:

strFirstName = "Fred"
strSurname = "Bloggs"

oCursor.execute( """SELECT * from table
                    WHERE firstname = ?, surname = ?""",
                   (strFirstName, strSurname))

The ?s get replaced with the parameters in the tuple. This is great as it saves a lot of messing around with quoting strings and escapes and whatever and it makes you more resistant to sql injection attacks, should that worry you at all. This technique has got to be faster than building sql strings holding the parameter values, passing them to sqlite and it having to parse them out again. Everybody wins.


Filed under: python


Just found out that my new site5 server has python 2.4.3. This is Good News. Last time I had a site5 server it was running python 2.2 which was getting a bit ancient. 2.3 introduced generators which have been widely adopted. The latest version is 2.5 which is (flame bait) not exactly a must have, >= 2.3 will do.

Another job for my todo list is to see if I can get django or turbogears running through cgi, but I am not optimistic.

Should I get desperate, site5 offer ruby-on-rails hosting but that would involve selling my soul in a manner akin to adopting asp.net


Filed under: django python site5 turbogears


Needed to recover a password from my FileZilla settings. It transpires that this is not very difficult. The passwords are not strongly encoded which some regard as a security flaw but the developers seem to acknowledge that if you want security, don't trust a computer.

   1  #
   2  # Dump filezilla site manager, including account name, host, user and password.
   3  #
   4  import _winreg
   5  
   6  def DecodePassword( strPass):
   7      """Decode a filezilla password"""
   8      strKey = "FILEZILLA1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ"
   9  
  10      nPassLen = len(strPass) / 3
  11      nOffset = nPassLen % len(strKey)
  12  
  13      strDecodedPass = ""
  14  
  15      for i in range(nPassLen):
  16          c = int(strPass[i * 3:(i * 3) + 3])
  17          c2 = ord(strKey[(i + nOffset) % len(strKey)])
  18          c3 = chr((c ^ c2))
  19  
  20          strDecodedPass += c3
  21  
  22      return strDecodedPass
  23  
  24  #
  25  # Walk through registry, decoding site details.
  26  #
  27  oReg = _winreg.ConnectRegistry( None, _winreg.HKEY_CURRENT_USER)
  28  oLicenceKey = _winreg.OpenKey( oReg, r'SOFTWARE\FileZilla\Site Manager')
  29  
  30  nIndex = 0
  31  while 1:
  32      try:
  33          strSite = _winreg.EnumKey( oLicenceKey, nIndex)
  34      except EnvironmentError:
  35          break
  36  
  37      oSiteKey = _winreg.OpenKey( oLicenceKey, strSite)
  38      strHost = _winreg.QueryValueEx( oSiteKey, u'Host')[0].encode( 'ascii')
  39      strUser = _winreg.QueryValueEx( oSiteKey, u'User')[0].encode( 'ascii')
  40      strPassword = DecodePassword( _winreg.QueryValueEx( oSiteKey, u'Pass')[0].encode( 'ascii'))
  41  
  42      print strSite, strHost, strUser, strPassword
  43  
  44      nIndex += 1
Toggle Line Numbers

Filed under: filezilla python

5 Comments

I still want to set up a system what will automatically record tv programs for playback on my pocketpc. For various reasons this is becoming more important: in a nutshell, I can only take so much ITV.

Been trying to setup mythtv on a new pc I have acquired. I have tried various windows PVR packages in the past but none of them were attractive, being flaky and annoying (including meedio which of now available for free as yahoo go). Mythtv was attractive because of it's flexibility.

I installed a clean kubuntu install in the box which went quite smoothly and after some exploration I established that the kernel already supported my hauppauge nova-t usb. I found an application called kaffeine that was already installed which was able to display tv and this essentially Just Worked out-of-the-box, albeit with lip-sync issues (probably because it needs the proprietary nvidia drivers).

Still desiring mythtv, I found an ubuntu repository that has mythtv packages for version 0.18 and installed that. I looked in the ubuntu package readme which says something to the effect that whoever set it up had no experience with mythtv and didn't know what he was doing. Thanks for the warning. I set it up using the mythtv-setup application, started the mythtv backend and tried to start mythweb. I am mainly interested in this as a way of remotely scheduling recordings, I care not about using the myth frontend. Mythweb wouldn't work, it complained about being a different version to the back end. The php code appears to display this error if there are any problems communicating with the back end but I decided I wouldn't mess around getting an old version to work, I would build the latest 0.19. After a few hours of installing dependant packages it built but when I tried running it I got a segmentation error. At this point I gave up with mythtv, I just don't have the time to nurse it into life. mythtv seems bloated and fragile. there is the option of knoppmyth, a dedicated mythtv distribution, but I'm not sure how cutting edge this is, whether it is any good as a general purpose linux distribution or whether the kernel will support my tv card without having to fiddle with compiling it.

I had a brief look at freevo but sourceforge was down (what an advert for oss) but found that freevo used command line tools to do the recording so I am currently investigating that approach: knocking up simple python scripts to do just what I want. I would rather debug these than mythtv (hell is other peoples source code).

Incidentally, I was browsing through some ruby source yesterday and for a few files there I was wondering whether ruby had a comment character.


3 Comments

WingIde 2.1 beta is out and... it has vi keystrokes! python ide, vi keystrokes, it just doesn't get any better.

Since I have been using Wing IDE I have not looked back. Annoyances:

  • Even on a >1Ghz processor it feels sluggish
  • It has an indentation analysis thing that examines the indentation in a file. You copy some dubiously indented code from somewhere and paste it into your beautifully crafted code and it decides to reindent your code with tabstop of 8 or something daft.
  • ugly gtk user interface
  • when editing anything other than python the tab key does nothing and I have to press ctrl-shift-dot to indent. Still, with vi keystrokes >> seems to work.

But for python development, even django and turbogears I wouldn't be without it.


Filed under: python vim wingide


Was moved to try indexing the company Intranet using Google Desktop Search. I downloaded the kongulo plugin which offers to do this.

It turns out that this is a command-line python program that scrapes a web site for links and submits each one to the google desktop indexing engine.

Well it was broken, kept coming up with the error:

pywintypes.com_error: (-2147352567, 'Exception occurred.',
(0, 'GoogleDesktopSearch.EventFactory.1', 'Component not registered',
None, 0, -2147221502), None)

Going through the developer sdk, this appears to be because the API's have changed and no-one has bothered to update kongulo.

I changed the registration code at the end of kongulo.py as follows to fix this:

   1    try:
   2      # Register with GDS.  This is a one-time operation and will return an
   3      # error if already registered.  We cheat and just catch the error and
   4      # do nothing.
   5  #    obj.RegisterComponent(_GUID,
   6      hr = obj.StartComponentRegistration( _GUID,
   7               ['Title', 'Kongulo', 'Description', 'A simple web spider that '
   8                'lets you keep copies of web sites in your Google Desktop Search '
   9                'index.', 'Icon', '%SystemRoot%\system32\SHELL32.dll,134'])
  10  
  11      oInt = obj.GetRegistrationInterface( "GoogleDesktop.EventRegistration")
  12      hr = oInt.RegisterPlugin( _GUID)
  13  
  14      oInt = obj.GetRegistrationInterface( "GoogleDesktop.IndexingRegistration")
  15      hr = oInt.RegisterIndexingPlugin( _GUID)
  16  
  17      oErr = obj.FinishComponentRegistration() # pcw
  18      # TODO Provide an unregistration mechanism.
  19    except pywintypes.com_error:
  20      # TODO narrow to only the error that GDS returns when component
  21      # already registered
  22      pass
Toggle Line Numbers

I find it odd that Google Desktop Search doesn't natively index intranets (or specified web sites): having to hack command-line python scripts to do it is hardly user friendly. It might be that they want people to buy Google Mini boxes for £2000 a pop rather than hand out free tools.

Maybe they are evil after all?

Incidently, this:

obj.UnregisterComponent( _GUID)

is how to unregister kongulo, as mentioned in the TODO (TODO is a programming term that means 'this needs doing but I can I can only summon the strength to press four keys').


Filed under: google python

2 Comments

I updated a server from ubuntu hoary hedgehog to breezy badger. I followed this and it went smoothly enough, although it took a number of

sudo apt-get upgrade
sudo apt-get dist=update
sudo apt-get update

cycles to stop it installing new packages.

Afterwards my turbogears projects were dead: the apache2 error log files said:

[Mon Feb 20 13:01:21 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:24 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:24 2006] child pid 26833 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:25 2006] child pid 26832 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:25 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:26 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:27 2006] child pid 26834 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:27 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:28 2006] child pid 26835 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:29 2006] child pid 26836 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:42 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:43 2006] mod_python: (Re)importing module 'mpcp'
[Mon Feb 20 13:01:44 2006] child pid 29065 exit signal Segmentation fault (11)
[Mon Feb 20 13:01:44 2006] mod_python: (Re)importing module 'mpcp'

Googling, the problem seems to be due to mod_php and mod_python using different versions of the mysql library. By simplifying my project I established that it was indeed due to mysql and not expat which is another possible cause. The solutions provided were to recompile. I decided the simplest thing to recompile would be MySQLdb, the python mysql module, it being the smallest component and I didn't fancy trying to rebuild mod_php. I downloaded the source for MySQLdb and looked through the setup.py file and found this line:

mysqlstatic = eval(os.getenv('mysqlstatic', 'False'))

which implies that MySQLdb can use a static library: excellent. By default it is using a dynamic library which presumably is a different version to the one mod_php is using. I defined the environment variable with

export mysqlstatic=True

and I ran

python setup.py build

This died because it couldn't find something called mysql_config. I installed the libmysqlclient12-dev package with

sudo apt-get install libmysqlclient12-dev

to get this and it built ok. Note that there are three libmysqlclient packages, 10, 12 and 14. 12 appears to be a match for the version of mysql on the server (4.0.24: latest ubuntu is lagging debian somewhat).

I uninstalled the ubuntu MySQLdb package to avoid any future problems and then installed the MySQLdb module with

sudo python setup.py install

I tested the library from the python interactive shell and no problems. I rebooted the apache server and problem solved.

I'll have to keep an eye in this, I can no longer rely on the Ubuntu package system.


Filed under: python turbogears ubuntu

1 Comment

Been fighting with unpacking using the python struct module. I try this:

>>> struct.unpack( "lh", '123456')
(875770417, 13877)
>>>

which unpacks a long (l, 4 bytes) and a short (h, 2 bytes) from the six byte string '123456'. It works and I am happy.

But if I try to get another long I get problems:

>>> struct.unpack( "lhl", '1234567890')
Traceback (most recent call last):
  File "<string>", line 1, in <string>
struct.error: unpack str size does not match format
>>>

I've asked for another long, four more bytes, I've made the string four characters longer but it complains. Why?

Well it seems that by default the struct module uses 'native' packing which on Windows/intel means that all structure members are aligned on 32 byte boundarys in a feeble attempt to squeeze a negligable amount of speed from the cpu at the expense of wasted memory and programming frustration. This means that the short actually consumes four bytes of the string being unpacked: but only if it is not the last item in the structure.

To get the struct module to work as you would expect, you have to an an '=' at the start:

>>> struct.unpack( "=lhl", '1234567890')
(875770417, 13877, 809056311)
>>>

The same thing afflicts strings, the native mode on windows/intel pads them out to a multiple of four bytes long.

Naughty python library, doing obscure compiler optimisations by default.


Filed under: python

4 Comments

Had a problem in my Turbogears app in that I was trying to set a cookie. It worked nicely under development, running on the cherrypy server but it didn't work in production when running under mod_python/apache. Eventually I established that it was because my http response was trying to send two cookies (my cookie and the session cookie) and only the last one was actually being sent to the browser.

I pinned it down to the mpcp script that I am using to interface mod_python with cherrypy. It has this code:

elif tv is str:
    req.headers_out[header] = value

which is only allowing one 'Set-Cookie' header to be defined, later ones will overwrite the original.

Changing the code to the following fixes the problem:

elif tv is str:
    if header.lower() != 'set-cookie':
        req.headers_out[header] = value
    else:
        req.headers_out.add( 'Set-Cookie', value)

2 Comments

Have a desire to generate Excel Spreadsheets from python. Under windows I can use Excel itself via COM if I want a slow, fragile, non-thread-safe solution but what about linux? I could use CSV but I want merged cells and fancy stuff. I searched and found ye olde spreadsheet format called SYLK but not only does excel display these files with both rows and columns with numbers (1:1 instead of A:1) which will confuse everyone but a google for SYLK reveals it is primarily the name of a sexual lubricant.

My first thought was to use openoffice.org as it has the pyuno bridge to control it from python and it can generate excel format files but looking into it further, setting up a openoffice server looks like another complex and fragile minefield.

So I looked for a native python library to do it and found two: pyxlwriter and pyExcelerator. The pyxlwriter web site says the project has been stopped and to use pyExcelerator so I gave that a try. I ran the formula demo and it created a spreadsheet but when I opened it in Excel 97 and all the cells were blank. The formulas were in it but needed manual recalculating (F2 and enter, one by one).

I downloaded pyxlwriter and tried that and it was OK. So I go through the source code for both libraries and try to find the difference. It boils down to this line in BiffRecords.py of pyexcelerator (version 0.6.3a line 1579):

self._rec_data = struct.pack('<3HQHL', row, col, xf_index,
                              0xFFFF000000000003, 0, 0) + rpn

Changing it to the following fixes the problem:

self._rec_data = struct.pack("<HHHdHL", row, col, xf_index, 0x00, 0x03, 0) + rpn

If I can trust this library then this should be a nice fast thread-safe solution.


Filed under: python

2 Comments

Turbogears supports two configurations, development and server, each with it's own configuration file. This is handy because I can use different databases for development and production and avoid breaking the production system while developing.

I've just moved the development database to the same server as the production database as using mysql instead of sqlite makes it easier for me to upgrade tables by dumping the sql, running "tg-admin sql drop; tg-admin sql create" and reimporting the old sql.

Just tried this on the production system and fried the development database. Why? It seems that turbogears decides whether it is running in development or production mode based on the existance of a setup.py file in the project directory. In my setup I copy all the files from the development directories on windows to the production directories on the server, I don't mess with python setup tools. Hence my production server has a copy of setup.py and so was deciding to use the development database.

I don't want to start hacking the turbogears source so I've hidden the setup.py file on the production server and modified the rsync script that I use to upload the database so that it doesn't upload setup.py.


Filed under: python turbogears


Extracts from awstats logs, bandwidth used by various visitors to this site:

crawler.bloglines.com353.02 MB
MSIECrawler 567.41 MB
Inktomi Slurp 135.79 MB
Googlebot 59.86 MB

Apparently MSIECrawler is IE sucking the entire contents of the site. A couple of people seem to have done this, what is the point? Is this site that interesting? Are the spam blogs (copies of legitimate blogs full of links to p0ker sites) using IE for their scraping technology? My attitude to the spammers turns from annoyance to pity.

Bloglines is getting a bit carried away, 353M just downloading RSS feeds.

InkTomi Slurp is still slurping and not returning any visitors from search results.

Googlebot drives 95% of my traffic so 59M is acceptable.

Here is my latest crack at apache log file analysis in python:

   1  #
   2  # Apache log file analysis.
   3  #
   4  import re
   5  import datetime
   6  
   7  #
   8  # Regular expression for parsing apache log file.
   9  #
  10  oLogRE = re.compile( r'''([\d.]+).*\s+                # host
  11                    [^\s]+\s+                        # ?
  12                    [^\s]+\s+                        # ?
  13                    \[(.*?)\]\s+                     # when
  14                    "(.*?)\s+(.*)\s+(.*)"\s+         # method, path, protocol
  15                    (\d+)\s+                         # Error code
  16                    ([^\s]+)\s+                      # Size ?
  17                    "(.*?)"\s+                       # Referrer
  18                    "(.*?)"                          # Agent
  19  ''', re.VERBOSE)
  20  
  21  LOG_Who = 0
  22  LOG_When = 1
  23  LOG_How = 2
  24  LOG_What = 3
  25  LOG_Protocol = 4
  26  LOG_Error = 5
  27  LOG_Size = 6
  28  LOG_Referrer = 7
  29  LOG_Agent = 8
  30  
  31  def ScanFile( strFile):
  32      """
  33      Scan apache log file and return hits.
  34      """
  35      for strLine in open( 'c:\\Desktop\\access.log').readlines():
  36          oMatch = oLogRE.match( strLine)
  37          if oMatch:
  38              yield( oMatch.groups())
  39          else:
  40              print 'Reject: %s' % strLine
  41  
  42  def GatherBy( oHits, nField):
  43      """
  44      Gather hits from list of hits into a dictionary keyed
  45      by unique values of a specific field.
  46      """
  47      oDict = {}
  48  
  49      for oHit in oHits:
  50          oKey = oHit[nField]
  51          if oKey in oDict:
  52              oDict[oKey].append( oHit)
  53          else:
  54              oDict[oKey] = [oHit]
  55  
  56      return oDict
  57  
  58  def FilterBy( oHits, nField, strFilter):
  59      """
  60      Filter hits from list of hits by unique values of a specific field.
  61      """
  62      oRE = re.compile( strFilter)
  63  
  64      for oHit in oHits:
  65          if oRE.search( oHit[nField]):
  66              yield( oHit)
  67  
  68  def FilterByDate( oHits,
  69                    oStartDate,
  70                    oEndDate = datetime.date.today() + datetime.timedelta(1)):
  71      """
  72      Filter hits >= Start Date and < End Date
  73      """
  74      oRE = re.compile( r'(\d+)/(\w+)/(\d+).*')
  75  
  76      for oHit in oHits:
  77          strDate = oHit[LOG_When]
  78          strDay, strMonth, strYear = oRE.match( strDate).groups()
  79  
  80          nDay = int( strDay)
  81          nMonth = ['Jan', 'Feb', 'Mar',
  82                    'Apr', 'May', 'Jun',
  83                    'Jul', 'Aug', 'Sep',
  84                    'Oct', 'Nov', 'Dec'].index( strMonth) + 1
  85          nYear = int( strYear)
  86  
  87          oDate = datetime.date( nYear, nMonth, nDay)
  88  
  89          if oDate >= oStartDate and oDate < oEndDate:
  90              yield( oHit)
  91  
  92  def AnalyseBy( oHits, nField, bJustSummary = False):
  93      """
  94      Print hits by unique values of a specific field
  95      and generate counts and bytes for each unique value.
  96      """
  97      oDict = GatherBy( oHits, nField)
  98  
  99      oKeys = oDict.keys()
 100  
 101      oKeys.sort()
 102  
 103      nGrandTotalCounts = 0
 104      nGrandTotalBytes = 0
 105  
 106      for oKey in oKeys:
 107          nCount = len( oDict[oKey])
 108  
 109          nTotal = 0
 110  
 111          for oHit in oDict[oKey]:
 112              strSize = oHit[LOG_Size]
 113              if strSize != '-':
 114                  nTotal += int(strSize)
 115  
 116          if not bJustSummary:
 117              print oKey, nCount, nTotal
 118  
 119          nGrandTotalCounts += nCount
 120          nGrandTotalBytes += nTotal
 121  
 122      print "Unique items: %d, Total Hits: %d, Total Bytes: %d" % (len(oKeys),
 123                                                                   nGrandTotalCounts,
 124                                                                   nGrandTotalBytes)
 125  
 126  oStartDate = datetime.date.today() - datetime.timedelta( 8 ) # week yesterday
 127  oEndDate = datetime.date.today() - datetime.timedelta( 1 )  # yesterday
 128  
 129  oAllHits = list( FilterByDate( ScanFile( 'c:\\Desktop\\access.log'),
 130                                 oStartDate, oEndDate))
 131  oAllHits.extend( list( FilterByDate( ScanFile( 'c:\\Desktop\\access.log.1'),
 132                                       oStartDate, oEndDate)))
 133  
 134  print "User Agents"
 135  AnalyseBy( oAllHits, LOG_Agent, True)
 136  
 137  print "All Hosts (hence all usage)"
 138  AnalyseBy( oAllHits, LOG_Who, True)
 139  
 140  print "Hits from bloglines"
 141  #
 142  # Determine different bloglines feeds and analyse each one
 143  #
 144  for strFeed, oFeedHits in GatherBy( FilterBy( oAllHits,
 145                                                LOG_Agent,
 146                                                'Bloglines'),
 147                                      LOG_What).items():
 148      #
 149      # Now analyse by agent:  agent includes number of subscribers
 150      # so we see subscribers per feed.
 151      print "Bloglines feed %s" % strFeed
 152      AnalyseBy( oFeedHits, LOG_Agent)
 153  
 154  print "Hits from MSIECrawler by host"
 155  AnalyseBy( FilterBy( oAllHits, LOG_Agent, 'MSIECrawler'), LOG_Who)
 156  
 157  print "Hits from Inktomi/yahoo slurp"
 158  AnalyseBy( FilterBy( oAllHits, LOG_Agent, 'Slurp'), LOG_Agent)
Toggle Line Numbers

Example output for week ending yesterday. Notes:

  • 659,628,604 bytes served in a week!
  • Slurp took 45,786,926 bytes
  • MSIECrawl user took 49,980,154 bytes
  • I've got more bloglines subscribers than I thought.
  • Just how many rss feed urls does drupal provide?
User Agents
Unique items: 497, Total Hits: 76052, Total Bytes: 659628604

All Hosts (hence all usage)
Unique items: 3301, Total Hits: 76052, Total Bytes: 659628604

Hits from bloglines
Bloglines feed /blog/1/feed
Bloglines/3.0-rho (http://www.bloglines.com; 1 subscriber) 252 13730570
Bloglines/3.0-rho (http://www.bloglines.com; 3 subscribers) 256 13947306
Bloglines/3.0-rho (http://www.bloglines.com; 5 subscribers) 252 13730570
Bloglines/3.0-rho (http://www.bloglines.com; 7 subscribers) 256 13947306
Unique items: 4, Total Hits: 1016, Total Bytes: 55355752
Bloglines feed /blog/feed
Bloglines/3.0-rho (http://www.bloglines.com; 1 subscriber) 242 13189698
Unique items: 1, Total Hits: 242, Total Bytes: 13189698
Bloglines feed /atom/feed
Bloglines/3.0-rho (http://www.bloglines.com; 3 subscribers) 256 1449434
Unique items: 1, Total Hits: 256, Total Bytes: 1449434
Bloglines feed /tags/18/feed
Bloglines/3.0-rho (http://www.bloglines.com; 1 subscriber) 256 12427520
Unique items: 1, Total Hits: 256, Total Bytes: 12427520
Bloglines feed /blog/feed/1
Bloglines/3.0-rho (http://www.bloglines.com; 5 subscribers) 252 0
Unique items: 1, Total Hits: 252, Total Bytes: 0
Bloglines feed /taxonomy/term/5/0/feed
Bloglines/3.0-rho (http://www.bloglines.com; 1 subscriber) 242 3832796
Unique items: 1, Total Hits: 242, Total Bytes: 3832796
Bloglines feed /node/feed
Bloglines/3.0-rho (http://www.bloglines.com; 3 subscribers) 256 13959338
Unique items: 1, Total Hits: 256, Total Bytes: 13959338
Bloglines feed /rss.xml
Bloglines/3.0-rho (http://www.bloglines.com; 3 subscribers) 256 0
Bloglines/3.0-rho (http://www.bloglines.com; 7 subscribers) 256 0
Unique items: 2, Total Hits: 512, Total Bytes: 0
Bloglines feed /tags/3/feed
Bloglines/3.0-rho (http://www.bloglines.com; 1 subscriber) 250 12999000
Unique items: 1, Total Hits: 250, Total Bytes: 12999000

Hits from MSIECrawler by host
81.159.46.223 1784 49980154
Unique items: 1, Total Hits: 1784, Total Bytes: 49980154

Hits from Inktomi/yahoo slurp
Mozilla/5.0 (compatible; Yahoo! Slurp China;) 58 800642
Mozilla/5.0 (compatible; Yahoo! Slurp;) 2066 44986284
Unique items: 2, Total Hits: 2124, Total Bytes: 45786926

4 Comments

Sessions in TurboGears are easy enough: the documents are very clear.

This ran pretty much first time. During development, using cherrypy's internal server the sessions were all lost whenever I rebooted the server, which was frustrating as it meant I had to log into my web app every time.

When I moved the app to the production server (mod_python) my sessions appeared to die after about 30 seconds. I think this is because apache reboots or something to kill any zombie/trojan activities.

I studied the cherrypy docs which are typically vague and incomplete. In combination with looking in the cherrypy source code, it seems that cherrypy defaults to storing the sessions in ram so they are deleted whenever the process dies. The fix boils down to putting the following in the config file:

sessionFilter.on = True
sessionFilter.storageType='File'
sessionFilter.storagePath='/var/www/wherever/sessions'

This stores the sessions in files on the server. This may well work on windows too, I haven't tried it yet.

From the source it seems cherrypy only supports sessions in a postgresql database, not mysql sad Still, it is possible that files are ultimately more efficient for something simple like this.

I must say I like the way TurboGears has two setup files, one for development and one for production. This makes it very easy for me to develop under windows and deploy on linux. I can use sqlite on windows and mysql on linux. I have needed to create separate startup files as well (on windows to use cherrypy with a debugger, on linux to get it working with mod_python) but once this is done the code is pretty much identical (apart from using samba.winbind under linux to validate users).


Filed under: cherrypy python turbogears

1 Comment

This lists out what files are open by what user on a Windows 2000 server. I once did this in Visual Basic but here is an untainted python version.

   1  import win32com.client
   2  
   3  oFso = win32com.client.GetObject( "WinNT://ServerName/LanmanServer")
   4  
   5  for oResource in oFso.resources():
   6      #
   7      # this sometimes dies for no obvious reason: this is
   8      # COM after all.
   9      #
  10      try:
  11          strUser = oResource.user
  12      except:
  13          strUser = ""
  14  
  15      if strUser != "" and strUser[-1] != '$':
  16          print oResource.user, oResource.Path
Toggle Line Numbers

Replace 'ServerName' with the name of your server.

This is useful if someone has an excel or access file open on the company intranet and no-one else can open it: we can find the culprit.


Filed under: python windows


I knocked up a quick python script to scan my drupal watchdog list for comment spammers. The log covers the last week. In total there were 1250 spam attempts from 448 distinct ip addresses.

All these comment spams pretend to come from Windows XP, IE 6 so they cannot be filtered out by user agent.p

My hack to the comment module to prevent urls being submitted generates watchdog messages and this script looks for these.

Here is the script:

   1  import MySQLdb
   2  
   3  o = MySQLdb.connect( '127.0.0.1', 'me', 'secret')
   4  
   5  o.select_db( 'drupal_db')
   6  
   7  c = o.cursor()
   8  
   9  c.execute( """select message, hostname from watchdog
  10                where message like 'Comment:%'""")
  11  
  12  oBadGuys = {}
  13  oGoodGuys = {}
  14  
  15  while 1:
  16      oRow = c.fetchone()
  17      if not oRow:
  18          break
  19  
  20      strMessage, strSender = oRow
  21  
  22      if strMessage.startswith( 'Comment: attempted'):
  23          oBadGuys[strSender] = oBadGuys.get( strSender, 0) + 1
  24  
  25      if strMessage.startswith( 'Comment: added'):
  26          oGoodGuys[strSender] = oGoodGuys.get( strSender, 0) + 1
  27  
  28  #
  29  # Good guys manage to submit comments without problems.
  30  # Remove them from the bad guy list.
  31  #
  32  for strKey in oGoodGuys.keys():
  33      if strKey in oBadGuys:
  34          print strKey + ' is not so bad'
  35          del oBadGuys[strKey]
  36  
  37  nTotal = 0
  38  
  39  for strKey, nCount in oBadGuys.items():
  40      print strKey, nCount
  41      nTotal += nCount
  42  
  43  print "%d spams from %d bad guys" % (nTotal, len(oBadGuys))
Toggle Line Numbers

I must get on with my turbogears based blog so I can do more about this. Drupal logging is a bit lame: doesn't log referrer or user agent which might be useful, have to cross reference with apache logs. There is more that I can do to make it harder to suck my bandwidth but my php is not strong enough and it's more fun to do in python (won't wear my $ key out).


Filed under: captcha drupal python spam


statcounter tells me this site is linked from the cheetah template website on the Who uses it page. Yes, I do use it, I like it a lot. Since I started using TurboGears I have been using kid but only because TurboGears didn't originally support Cheetah (Cheetah support is being added via template plugins).

What I like most about cheetah is that most of the time things just work the way you think they will and you can get on with your coding, instead of working out why things don't work. kid also gives me that feeling but cheetah is more powerful: kid is mainly aimed at xml/html, it can do plain text but cheetah is truly agnostic.

Kid and Cheetah share the ${variable} method of replacing text in the template with python expressions. This is very easy and powerful, although it goes against the doctrine of separating code from display. In django templates you can only substitute variables, you cannot call functions. I feel there is a fine line between python code that says 'object.name' and 'object.name()'. If you can explain one to a web designer, is it so hard to explain the other? Anyway, for my projects I am the web designer.

If there is one thing I would change about kid it would be to somehow add an else term to the if statement: I end up duplicating the test code and adding a not. The more duplication there is, the less maintainable it is.

Kid templates can be previewed as html very nicely which is useful in creating the layout of a site. If you preview kid templates as html you get substitute text where your dynamic content will go (e.g. 'Users name goes here') whereas cheetah would show the python expression that generates the content (e.g. '$(UserName}'). However, once you start working on the dynamic content you can only preview by passing through the template engine anyway. One of my projects generates a complex table, the columns and rows being generated dynamically. The kid preview is pretty useless.

Kid's other advantage is that it will gripe if your xhtml is not well formed, e.g. you miss out a closing tag. This has saved me a couple of times but there are other tools to validate your output.

Apparently there is a tantalising new version of cheetah out. Must have a look,


Filed under: cheetah kid python turbogears

2 Comments

For a long time now I have been contemplating how to validate users on a linux box against a Windows 2000 Active Directory domain. What I mean is, how to use the user names and passwords from the Windows server, without having to set up a duplicate password database on the linux box? I had been under the impression that this meant setting up a linux ldap server and migrating everything to that (as the Windows 2000 version of ldap is deliberately non-interoperable).

Been into it again today and discovered that from python it is actually very easy to validate via python samba support:

import samba.winbind

try:
    if samba.winbind.auth_plaintext( 'Domain\\%s' % strUser, strPassword) != 0:
       raise 'login failed'
except(samba.winbind.error):
    raise 'login failed'

Samba finds the domain controller and validates the user name and password. What could be easier?

This is fine for intranet activities but if you want users logging into your linux box you have to go the ldap route to give them unix groups, login shells etc. Good luck with that.


Filed under: linux python

2 Comments

Still umming and arring about django vs turbogears. I knocked up an online questionnaire for our intranet in TurboGears in a couple of days and it is now running. I'm starting another intranet program so I thought I'd give django a try. Once again I had to go through the tutorial to remember all the steps to set the project up. I fiddled with batch files to set up environment variables and everything, hacked the source to get it to run under WingIDE, started on the database model, all the while seeming to struggle with django itself. I got the admin screens up and running and was figuring out how to stop a password being displayed in a password field. As far as I could see, this involved editing the templates for the admin screen. It's about here that I got tired and went back to turbogears as I couldn't be bothered to tackle the django template system.

In about the same amount of time with turbogears, the project was underway and I was focused on getting the main template running. There is something about turbogears that is to me more 'pythonic' in terms of things working the way you expect and not having to waste time figuring out how to do things. kid Just Works, it is fundamentally simple and keeps out of the way, leaving you to get on with your work.

SQLObject does not appear to handle relations as nicely as the Django ORM: in django it appears I can delete an object and have the associated many-to-many mapping in the database deleted automatically whereas SQLObject does not delete it automatically and neither the documentation or the tests show me how to delete it explicitly. There is a 'remove' method in there but the object model is too complex for me how to work out how to reach it. The SQLObject documentation is comprehensive but is essentially a list of examples (not a bad thing), it does not seem to include a detailed api listing. I will have to fiddle around at the interactive prompt and see if I can figure out the incantation. Then again, the django ORM api is currently being made more intuitive. If I was going to do a lot of database work I'd be tempted to use my own db wrapper: I'd have to write SQL but I don't care, I know how to write SQL, I don't know how to use these librarys.

I have looked through the tests for both projects as example code and in both the tests are a bit cursory. The django tests are far more readable than the SQLObject tests (doctestish vs unittestish). The django tests also benefit from being commented.

Right now I'm back with turbogears as the project I am working on will not have many relationships and they will rarely need deleting.

Members of both projects have left comments on this blog, making me feel guilty about picking one over the other. I like them both equally, choosing one over the other is hard. Right now I'm further along the TurboGears learning curve which for me is not as steep as the Django learning curve.


Filed under: django python turbogears


So I develop an online questionnaire (and learnt to spell that word) for the online intranet in turbogears, it takes maybe a day total to have a nice sqlite database driven app that walks you through the questions, allows you to go back and forth, handles errors, themed exactly like the main intranet (drupal) really nice.

After developing under windows I casually copy the files to the ubuntu intranet server and try installing it in a virtual path under the root domain (e.g. intranet.com/subdirectory). Then I mess around for hours because the mod_python support for cherrypy appears to be hacked on, the support for mapping applications to virtual paths in turbogears using servo.webpath is broken and cherrypy support for virtual paths boils down to this hack which it states is buggy but doesn't say what the bugs are! The main bug I see is that IT DOESN'T MAKE ANY DIFFERENCE.

Argh.

I can get the app to work but only the index, none of the other methods in the controller are recognised.

While googling for this I came across an interesting django vs turbogears review here. He makes an interesting point: django sites survive slashdotting. I look at the cherrypy site and see where cherrypy is being used: um, sorry but these sites don't look all that impressive compared to the django equivalents.

So I'm disillusioned with cherrypy: while the raw api is clean, it looks a bit hacky and the documentation is poor ('in progress') by comparison to the other elements of the turbogears stack. I have no idea whether django supports virtual paths but it Just Worked under mod_python. Should I switch back to that, given that they are working on cleaning up their database api? I'd be tempted to stick with kid even if I went back to django, like this guy.

Dunno what to do, almost tempted to learn ruby, I'm that fed up with it all.

UPDATE: It took me a few days to figure the problem out: I had an extra .htaccess file I didn't know was there, causing mischief and interfering with whatever I did in the main config file. I've got rid of it and turbogears/ cherrypy is working fine.

The turbogears server.webpath="/subdir" does not seem to be working (as mentioned in the turbogears issue tracker) but if I hack my controllers.py file I can work around it:

class subdir:
    @@turbogears.expose(html="tgpcw.templates.welcome")
    def index(self):
        import time
        return dict(now=time.ctime())

class Root(controllers.Root):
    subdir = subdir()

I enabled this in the apache2 config file thusly:

<Location "/hands">
    SetHandler mod_python
    PythonHandler mpcp
    PythonDebug On
    PythonPath "['/var/www/tgpcw'] + sys.path"
    PythonOption cherrysetup tgpcw_start::mp_setup

    AllowOverride All
    Order deny,allow
    Deny from all
    Allow from all
</Location>

where tgpcw is the name of my turbogears stuff.


Filed under: django python turbogears

5 Comments

I have noticed more and more projects using trac so I decided to install it and give it a try. It is really nice. It does the following:

  • provides a web interface to subversion source repositories. The web interface allows you to look at different revisions of files, do diffs between revisions, all good stuff. You cannot commit changes, update or do anything with local working copies of files but these are just a batch file away.
  • it provides a wiki so you can document your project however you like. The wiki markup supports links to files in subversion, change sets and the like so you have no excuses for not describing the grand picture anywhere.
  • it provides a bug tracking database which is like bugzilla but cleaner and simpler.

It was easy to set up as a debian package, just a matter of installing the trac package and running the trac-admin command to create a new trac project. Tell that where your subversion repository is and you are away.

The more I use subversion, the more I like it. Not having to check files out is really nice and cuts down on the hastle: just edit any file.

Commercial development requires more formal documentation than a wiki but I do feel there could be a role for informal documentation attached to the source code: useful documentation, not the stuff that is only there to keep the QA department happy.

Oh, did I mention trac was written in python?

If I could integrate an email archive and a development blog into trac, I would have the fount of all knowledge.


Filed under: python subversion trac


Coding away with TurboGears and found that glorious nirvava where whatever I tried Just Worked. Kid, SQLObjects and cherrypy are nice and clean and turbogears has generated a nice boilerplate framework. Going through the code of the various librarys while debugging, there is magic in the way they are working but it is not causing me big troubles as the api's they present are nice and clean. Python is such a lovely language I always felt it would be great for web development and now it feels like I have the tools.

I commented previously on SQLObjects support for introspection and I was wrong here as the help command (possibly docstrings in general?) is broken and generates an exception but the online documents are fairly comprehensive.

I have a heavy cold and hence had one of those tortured nights sleep with recurring dreams. I kept seeing Kid's XML element tree sitting in memory, branches being swapped in and out and the resulting document changing in wonderful ways. It was not a bad dream. I woke up thinking it was a great way to manipulate a document compared to primitive string substitution. I have never really tried DOM models before, being reluctant to use all that memory, but now that memory is cheap I shouldn't be so cautious. Playing with SAX parsers is a pain, if I have to write another state machine I'll scream.

Another discovery, the TurboGears command `tg-admin shell` somehow picked up IPython and I was able to play with the database api whil admiring the pretty colours.

It was nice until I tried porting a theme from php and ran into an annoying works-in-ie-but-not-firefox problem. There are some things that even the nicest development environments cannot help with. I tried using html-kit to clean up the template code and it is indeed easy to preview kid templates in a browser, they Just Work.

Tip for the Day: although kid templates are xml it is better to tell WingIDE they are html as it avoids a limitation of the xml syntax highlighting where it treats Processing Instructions as errors and marks most of the file as a syntax error, making the editor run pretty sluggishly.


2 Comments

Had turbogears hanging while debugging under WingIDE but working fine from the command line. One nice thing about using a decent debugger like WingIDE is I can press the pause button and see what it is doing when it appears to be hung.

It turns out it was in a kid function called 'relativize' that looks like this:

   1  def relativize(self, file, path):
   2      from os.path import normpath, join, dirname, abspath, split, sep
   3      head, tail = (dirname(abspath(file)), '')
   4      parts = path.split(sep)
   5      paths = self.paths
   6      while 1:
   7          if head in paths or head == '/':
   8              return join(*parts)
   9          head, tail = split(head)
  10          parts.insert(0, tail)
Toggle Line Numbers

It is trying to turn an absolute path into a relative path. The code was stuck in the while loop because 'head' was never 'in paths' or == '/'. The '/' would never match as this was windows and uses '\' as a file seperator, 'in paths' was broken because the path was in there a 'c:\Project\Path' but head was equal to 'C:\Project\Path': different capitalisation on the C. Tracing back, the upper case version was being returned by os.getcwd, the lower case version by the python package module. To cut a long story short, I had set the initial debug directory in WingIDE to 'c:\Project\Path' with a lower case c and changing it to an upper case c in Wing fixed the problem.

Conclusion: Must bear in mind that kid may be flaky on Windows.

UPDATE: it broke again sad Resorted to fixing kid thusly:

   1  def relativize(self, file, path):
   2      from os.path import normpath, join, dirname, abspath, split, sep
   3      from os.path import normcase  # pcw
   4      file = normcase( file)  # pcw
   5      head, tail = (dirname(abspath(file)), '')
   6      parts = path.split(sep)
   7      paths = [normcase( strPath) for strPath in self.paths]  # pcw
   8      while 1:
   9          if head in paths or head == '\\':
  10              return join(*parts)
  11          head, tail = split(head)
  12          parts.insert(0, tail)
Toggle Line Numbers

Filed under: kid python turbogears


I've been giving TurboGears a try as a means of reimplementing this blog in python. I have been trying django but I felt the need to try something else, mainly prompted by frustrations with the database api.

TurboGears is similar in the tools that it provides for web application development. It is essentially an amalgum of a number of existing python projects:

kid
a templating system based on XML.
cherrypy
web application framework, essentially maps urls to python method calls
SQLObject
an SQL database wrapper
Mochikit
a javascript library which I have tried before

Turbogears itself provides 'glue' to put these together. It has tools for creating project files, setting up and using the database etc. The TurboGears administration script creates a boilerplate application that can be running in no time. It all feels very similar to django, the main thing missing I can see is the administration screens but they are working on something called 'catwalk' which I think will do this. At the end of the day there are many database front ends that can provide administration (phpmyadmin, webmin mysql module etc) which are not so end-user friendly but good enough for me.

The librarys that turbogears has chosen appear to be very good in their own rigght. Their designs are clean and they are well documented. Each has greater depth than the django equivalents.

Kid looks powerful: it is an intersting mix of xml and python: your kid scripts can be used like python modules: they compile to .pyc files and so are presumably only parsed once. You can define xml snippets of boilerplate code, such as an html list, and call it like a function from elsewhere in your template. As it is XML based, if your template is not XML compliant the XML parser (expat at the lowest level) slaps your wrist. An interesting side effect of the pythonic nature of it is that if a problem appears in the python embedded in your template, Wing IDE's debugger will stop on that line. Useful! Kid does not support inheritance like django's template system, it effectively gives you macro templates that could be very powerful. The syntax is clearer than django, you can use python code inline )like cheetah) and the template substitution allows you to use the same substitution twice (if, for example, you want to use the same title string in two places, header and page title): a pedantic limitation in django. Looking through the documentation, Kid shows great attention to xhtml compliance and generation of conformant XHTML. This is not the greatest concern to me, I don't lose sleep over whether my XHTML is standards compliant, I am more worried about whether browsers will display it properly, but it is nice to know that someone somewhere has thought about all this for me.

SQLObject looks very clean and the objects can be introspected nicely. You work with the classes you define directly and not with meta classes that appear from nowhere. SQLObject has also introduced me to SQLite a really nice, simple SQL database library. This looks very solid, certainly good enough for development without the hastle of getting mysql running. It would probably be good enough for production use for me, if I can get it to run SQL dumps for backup (which I prefer to backing up binarys). Certainly I will be using it in the places where I can't be bothered to set up mysql (user's, permissions, passwords, yawn).

CherryPy: TurboGears creates enough boilerplate code that I haven't needed to look at the documents for this yet. It maps urls to method calls simply enough but one thing I like about django is the regular expression based url mapping which is totally flexible but there is nothing to stop me adding such a mapping layer.

I've ported the basics of my blog (the theme I use in drupal, lists of posts etc) to TurboGears. I had to study the Kid documentation to get the TurboGears generated code to use the head from the master.kid file so I could put my css import in one fixed place. By default the head here is ignored. The solution to this conundrum is to modify master.kid like this:

<head py:match="item.tag=='{http://www.w3.org/1999/xhtml}head'">
    <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
    <title>Peter's Better Blog | ${strPageTitle}</title>
    <link rel="stylesheet" href="/static/css/style.css" />
</head>

i.e. turn it into a match template. I had the django inheritance thing in my head (not a million miles away from the drupal phptemplate engine) and I had to read the kid documentation to realise it simply works by sequentially replacing blocks of XML, in this case any later head block is replaced by the block above.