I've been working on a little Syndicate Feed Reader in Python that I am calling Schwag (don't ask, don't tell), although the name is more of a place-holder for some other really cool name ™. It is less of a Reader, really, and more of a Normalizer/Converter/Aggregator that has some light reading facilities. It uses Mark Pilgrim's Ultra Liberal Feed Parser for support of all flavors of RSS / RDF / Atom feeds (even the bad one's). The feeds are normalized into Atom 0.3 format and dumped to disk. There is a light templating system that allows one to apply XSLT [1] transformations to the normalized feed to produce different representations (e.g. RSS 0.9x, RSS 1.0 / RDF, RSS 2, XHTML, etc). So, the basic idea is to have a feeder component that manages retrieval and normalization to a common format and then a templating component that provides pluggable representations of the normalized feed. This may sound semi-complex but the code is fairly simple as the feed parsing and XSLT machinery is handled by Mark's piece and libxml/libxslt, respectively. My code just kind of introduces the two to each other.

The whole feed normalization / transformation thing is cool in and of itself but nothing special, really. The ideas here have been talked about before (although the specifics are little different). Where it gets interesting, IMO, is when you get into organization and aggregation. I've decided to maintain the list of source feeds as an XBEL [2] document. This is kind of a bastardization of the format but oh well, it is actually perfect for what I need. XBEL is a simple little XML vocabulary for describing browser bookmarks. It has the concept of Folders, Bookmarks, and Aliases. You see where this is going, right? You organize feeds into Folders and can also use Aliases to link a feed into multiple Folders. Good? OK. I am using the Folder concept for more than just simple organization however, and this is where I think I may be on to something half-way useful. Put simply, folders provide aggregation points. The feeder aggregates all feeds in a folder into an “Index Feed”. To push this concept a little further, the folder aggregation is performed recursively. Index Feeds contain entries from feeds in the immediate folder as well as all feeds in descendant (xpathwise) folders. One result of this is that the Index Feed for the root folder is an aggregate of all feeds available. You can then drill down into sub folders to limit/filter aggregation.

Along with normalized Atom feed generation, the feeder component dumps out an XBEL file in each generated directory that contains the XBEL fragment for the corresponding folder. You can apply XSLT transformations to these as well. I'm currently using this to generate OPML [3] as well as XHTML representations of the folder index. So the concept of treating each Folder as a partitioning device exists here too.

At the end of the day, you end up with a system that takes an XBEL document as input and produces a directory structure containing normalized / converted feeds as well as aggregated index feeds. These are all simple files and directories so exposing via your favorite web server is straightforward. I've written XSLT for RSS 0.91, RSS 1.0/RDF, and XHTML on the feed side and OPML and XHTML on the index side. It is pretty trivial to plug in new representations for both feeds and indexes given an XSLT that takes an Atom 0.3 document on the source side.

This is all very experimental right now and while I'm using the system for day-to-day reading, I'm also breaking stuff and performing major restructing of code and concepts very often. The system is definitely not without its problems. I plan on blogging the success and failure of various approaches in moderate detail over the next couple of months. I haven't even put a distribution together yet but please feel free to browse the sources or grab a tarball if you're interested in really early, often broken applications. I will have to find a home for the project eventually (I'm trying to avoid sourceforge if possible) as all of this is hosted off of a P300 sitting in my living room with only a humble Road Runner pipe. I imagine I will get more serious about this when I think of a name or the code starts stabilizing, whichever comes first. In the meantime, please leave comments or shoot me an email if you're interested.

[1] XSLT : http://www.w3.org/TR/xslt
[2] XBEL : http://pyxml.sourceforge.net/topics/xbel/
[3] OPML : http://opml.scripting.com/

This entry has been tagged coding, python, schwag — follow a tag for an archive of related essays, weblog entries, and bookmarks.

Discuss

  1. I'm pretty interested in Schawg features but can’t access to your cvs website.

    Ismael Olea on Tuesday, January 09, 2007 at 04:39 PM #

Leave a comment





(syntax: markdown)