I like to listen to the Daily Source Code podcast while rowing. I've always liked talk radio and this is like talk radio with F words.
The podcast files are about 20Megs and they take a few minutes to download on my 750k broadband connection. I wanted to have my Ubuntu box download them automatically so they would be ready for me (as was the original vision of podcasting).
I decided to knock up a python script to do it for me as I didn't want a gui tool and the only other script I found was written in ruby :sick:. This script downloads the rss feed from the podcasting site, looking for mp3 files. Any that it finds it will download. It remembers which files it has downloaded so you can listen to them and delete them and they won't be downloaded again. It doesn't play the files, thats done by Totem.
Other podcasts can be added easily enough.
1
2
3
4 import xml.parsers.expat
5 import re
6 import os
7 import traceback
8 import sys
9
10 class FeedParser:
11
12 def __init__( self):
13 self.oElementStack = []
14 self.bItem = False
15 self.oItem = None
16
17 def Parse( self, strFeed, strRETitle, strTargetDir):
18
19
20
21 self.oRETitle = re.compile( strRETitle)
22 self.strTargetDir = strTargetDir
23
24
25
26
27 try:
28 self.oDB = open( strTargetDir + '.pypodder.db').read().split( '\n')
29 except:
30 self.oDB = []
31
32 p = xml.parsers.expat.ParserCreate()
33
34 p.StartElementHandler = self.start_element
35 p.EndElementHandler = self.end_element
36 p.CharacterDataHandler = self.char_data
37
38
39
40
41 strRSS = os.popen4( 'wget -q -O - "%s"' % strFeed)[1].read()
42
43 p.Parse( strRSS)
44
45 def start_element(self, name, attrs):
46
47
48
49 self.oElementStack.append( [name, []])
50
51
52
53
54 if name == 'item':
55 self.bItem = True
56 self.oItem = {}
57 elif name == 'enclosure':
58
59
60
61 strUrl = attrs.get( 'url')
62 if strUrl:
63 if self.bItem:
64 self.oItem['enclosure']=strUrl
65
66 def end_element(self, name):
67
68
69
70 strElement, strData = self.oElementStack.pop()
71
72
73
74
75 if strElement != name:
76 raise "Element mismatch: %s != %s" % (name, strElement)
77
78 if strElement != 'item':
79
80
81
82 if self.bItem:
83 strData = "".join( strData).strip()
84
85 self.oItem[strElement] = strData
86 else:
87
88
89
90
91 if self.oRETitle.match( self.oItem.get( 'title', '').encode()):
92
93
94
95 strUrl = self.oItem.get( 'enclosure')
96 if not strUrl:
97
98
99
100 strUrl = self.oItem.get( 'link', '').encode()
101
102 if strUrl and strUrl[-4:].lower() == '.mp3':
103
104
105
106 strGuid = self.oItem.get( 'guid').encode()
107 if not strGuid:
108
109
110
111 strGuid = strUrl
112
113
114
115
116 if not strGuid in self.oDB:
117
118
119
120
121 os.chdir( self.strTargetDir)
122 strResults = os.popen4( 'wget -q "%s"' % strUrl)[1].read()
123
124 strFileName = self.strTargetDir + os.path.basename( strUrl)
125 print 'Downloaded file %s' % strFileName
126 print strResults
127
128
129
130
131 self.oDB.append( strGuid)
132 open( self.strTargetDir + '.pypodder.db', 'wt').write( "\n".join( self.oDB))
133
134 self.oItem = None
135 self.bItem = False
136
137 def char_data(self, data):
138
139
140
141 self.oElementStack[-1][1].append( data)
142
143 FeedParser().Parse( "http://radio.weblogs.com/0001014/categories/dailySourceCode/rss.xml",
144 "Daily Source Code for.*", "/home/peter/DailySourceCode/")
I've set up cron to do this for me at 5:11pm every day, just before I get home from work for a row before eating (I don't recommend a half hour rowing with a full stomach).
crontab -e
11 17 * * * /usr/bin/python /home/peter/pypodder.py
Update: I have altered the script above. There are three main changes:
-
It now uses wget to do the downloading as it is more robust than using urllib2 which had a tendancy to timeout.
-
It is now using the proper Daily Source Code RSS feed, rather than Adam Curry's Weblog as the latter sometimes got the file names wrong.
-
The history of what has been downloaded is now a simple text file, making it easy to delete lines if necessary.