Generic reposter — automatically post any feed items to any blog
February 02, 2007
I took a break from my Objective-C adventures and solved a problem today that has been bugging me for quite a while now. Why can’t you post everything to everywhere? :)
No, really. Everything.
Huh?
It’s quite handy actually. Realize this: if you’re active on the Internet, you have many feeds going on. You have your blog, or maybe several, on Wordpress, Vox, Typepad, Blogger, Livejournal. You have your del.icio.us, ma.gnolia.com and Google Reader shared items and bookmarks, Flickr photos, who knows what else.
If you are interested in your friends, you must collect these feeds about them and keep track of them all, and conversely they must track your own zillion feeds. This is inconvenient: I want to have one feed about one person and aggregate stuff there.
Some tools already exist that can do parts of this. For example, del.icio.us has a handy little thing in the backend where you can make it post collection of latest bookmarks automatically in your blog every day. Or, you can suck your Flickr photos, del.icio.us links and some other things into your FeedBurner feed using their backend interface.
But this was only bits and pieces. For example, Google Reader is a very good tool for running a shared-links-blog but it can’t be aggregated in FeedBurner. And I don’t think it’s too easy to automatically repost Flickr stuff. If you use FeedBurner, you won’t have a real “web version” available of your aggregated content, it exists only in some obscure feed and is lost in void as soon as it’s superseded by newer content.
So I thought to myself… what a wonderful world… er, no. But almost. Because on one hand, we have all these nice RSS and Atom feeds that pretty much every serious service can spit out these days. And on the other hand, we have nice API-s for posting on the blogs. So if you want to aggregate all your content in one place and maintain a proper unified web archive of it and provide a feed, there must be a way to just grab all these other feeds and set up a meta-blog where they will all end up nicely friendly together, re-broadcast for all the world.
And so I did a script for myself to do just that.
Yea yea.. but what IS it, then?
Generic reposter is just a little Python script that asks from you no more than a simple configuration file, specifying the feeds that you want to aggregate, and a meta-blog endpoint where to post it. You then run it however often you want it, once a day should be fine for most. It then spiders all the feeds, grabs all the new items that it hasn’t seen yet, and then posts a single post with their titles in the given blog.
The advantage of using such custom script instead of built-in features of third-party tools is that in the course of reposting, you can re-format your content in whatever way you like. For example posting del.icio.us links to YouTube.. er.. wait, the other way, YouTube links to del.icio.us (am I the only one getting confused by all this crossposting going on :s ) is otherwise very cool but if you repost them on your blog, you see only the link and not the actual video. Using a custom reposter lets you expand the code to show the actual inline video.
You can see the Reposter in action on Misc Random where I’m re-posting my del.icio.us feed and Google Reader shared item links with this. del.icio.us is currently missing descriptions/notes that I’m hoping to fix like any second now, but on the other hand, it expands the YouTube video links to play inline and thus saves you a precious click to the YouTube site, letting you watch the videos right then and there.
You are a leet haxor! Cool that you can write all this from scratch… I’ll name my kids after you
Oh no, please don’t. I’m a lazy bastard and the first thing I always do when planning things like this is to see if someone else has done most the work already for me. Which luckily was indeed the case here.
For reading feeds in Python, head no further than straight to feedparser.org – I think it’s (one of the) best Python classes for reading feeds. Clean, simple yet powerful, works. What more could you ask for.
For posting, I found mtsend.py by Scott Yang that defines a nice class for posting using the Movabletype/Wordpress API. Thanks, Scott.
So all I did was to put two and two together and build the thing that spiders the feeds, maintains info about what it has posted already so as to not post things twice, and then does the posting. The data store is done using the simplest method I could think of – a simple Python list of feed item GUID-s kept in a pickled local data file. Should be fine as long as you don’t attempt to run multiple copies of this at the same time. No need for mega-data-multitier-server-SQL-storage.
And then put this thing in crontab in your favourite webhost and done you are.
Can I have it? Please? (wasntme)
Not now. The code is buggy and ugly as fuck and I’d just make an arse out of myself if I published it in its current state. I’ll clean it up, make configuring it really work (which it doesn’t) and perhaps then.
TODO
Here’s some little things that I’ll do with this in the short run.
- fix code so that I wouldn’t be ashamed of publishing it, and then publish it
- make (X)HTML valid. Primarily this means entity escaping (& -> & etc).
- del.icio.us – add description and (maybe) tag support, description/notes is more interesting
Let me know if you think this is interesting or you’d want to get it and maybe chime in with the development.