May 5, 2009

rss/atom feed parsing/generating

I'm working on a little personal webapp that deals with RSS and ATOM feeds and thought I'd share my decision for the library I will be using.

First, let's talk about a couple main requirements.

I need to be able to parse both feed types and store them in an agnostic state (this is so I can allow for the possibility of merging two different feed types).

I'll be dealing with a lot of feeds, so naturally, performance needs to be there. This includes support for some type of caching or intelligent updater.

So, really I wanted a library that:

- abstracted RSS 2.0/ATOM 1.0 details (a bonus if it did other versions)
- was reasonably fast
- had some type of caching or intelligent updater
- parsing was robust and liberal; generation was conservative
- well documented

Initially, I looked around for ruby libraries and found some decent stuff.

feedtools was the first gem I looked at and it did faily well. It abstracted RSS & ATOM and had caching. Unfortuntely, it uses REXML which I hear is up to 60 times slower than libxml these days. And unfortunately, the project isn't being maintained anymore. It was passed onto someone else, but he seems to have left it also and started rAtom.
I also read good things about feedzirra. It boasts blazing performance which is always cool but ended up finding another solution before trying it.

It was after feedtools failure that I decided to expand my search beyond just ruby. I came across some candidates in python and php, but the one that has caught my eye and seems to pass preliminary proofs of concept is Java's Rome.

I was pretty sure that school would be the last time I used java, but turns out I was wrong. And happily wrong.

Rome is mature. Very mature. It supports the earliest versions of RSS and ATOM, in a beautiful abstract way; performance-wise it's faster than REXML (not saying much); has an intelligent updater module; seems robust from preliminary tests but investigating further; and lastly, those java folks like their documentation. It is kinda bland to read, but whatever. Extra bonus - it has recently hit 1.0 (March 2009), which means it is still under active development and has a community around it.

I actually thought this would be a good opportunity to start trying out Jruby on Rails but I'm have a few kinks w/ rails 2.3.2 and my jruby 1.1.6. I think I need to upgrade but after looking at the potential headaches in deployment, I think I will leave this battle for another day. Right now, I just want to get moving on my project.

So, hopefully Rome turns out to be as good as I think it is. Interested to hear thoughts/comments!

No comments:

Post a Comment