I knocked up a quick python script to scan my drupal watchdog list for comment spammers. The log covers the last week. In total there were 1250 spam attempts from 448 distinct ip addresses.
All these comment spams pretend to come from Windows XP, IE 6 so they cannot be filtered out by user agent.p
My hack to the comment module to prevent urls being submitted generates watchdog messages and this script looks for these.
Here is the script:
1 import MySQLdb 2 3 o = MySQLdb.connect( '127.0.0.1', 'me', 'secret') 4 5 o.select_db( 'drupal_db') 6 7 c = o.cursor() 8 9 c.execute( """select message, hostname from watchdog 10 where message like 'Comment:%'""") 11 12 oBadGuys = {} 13 oGoodGuys = {} 14 15 while 1: 16 oRow = c.fetchone() 17 if not oRow: 18 break 19 20 strMessage, strSender = oRow 21 22 if strMessage.startswith( 'Comment: attempted'): 23 oBadGuys[strSender] = oBadGuys.get( strSender, 0) + 1 24 25 if strMessage.startswith( 'Comment: added'): 26 oGoodGuys[strSender] = oGoodGuys.get( strSender, 0) + 1 27 28 # 29 # Good guys manage to submit comments without problems. 30 # Remove them from the bad guy list. 31 # 32 for strKey in oGoodGuys.keys(): 33 if strKey in oBadGuys: 34 print strKey + ' is not so bad' 35 del oBadGuys[strKey] 36 37 nTotal = 0 38 39 for strKey, nCount in oBadGuys.items(): 40 print strKey, nCount 41 nTotal += nCount 42 43 print "%d spams from %d bad guys" % (nTotal, len(oBadGuys))Toggle Line Numbers
I must get on with my turbogears based blog so I can do more about this. Drupal logging is a bit lame: doesn't log referrer or user agent which might be useful, have to cross reference with apache logs. There is more that I can do to make it harder to suck my bandwidth but my php is not strong enough and it's more fun to do in python (won't wear my $ key out).

