XML Pull-chaining with Python

So this is pretty crazy. I’m messing around with ElementTree (which has been nothing less than perfect) and trying to get it to act like a xml.dom.pulldom/XmlTextReader style pull-parser. But I’d like to be able to assemble a chain of generator producing/consuming functions (or other callable) so that the file can be read, parsed, filtered/mutated, encoded, and written all incrementally.

Check it out:

import sys
import pulltree    # that's what I'm working on :)

def upper_filter(source):
    for (ev, item) in source:
        if ev == pulltree.CHARACTERS:
            item = item.upper()
        yield (ev, item)

reader = pulltree.reader(sys.stdin)
filter = upper_filter(reader)
writer = pulltree.writer(filter, sys.stdout)

for (ev, item) in writer:
    pass

C-z

$ echo "<hello>world</hello>" | python test_filter.py 
<hello>WORLD</hello>

That felt good. More functional than a chain of SAX XMLFilters, almost as efficient, and muuuuch perdier.

Something like this might work someday soon:

import urllib2
from pulltree

XINCLUDE = '{http://www.w3.org/2001/XInclude}include'

def xinclude_filter(source):
    events = iter(source)
    for (event, item) in events:
        if event == pulltree.START_ELEMENT and elm.tag == XINCLUDE:
           href = item.attrib['href']
           for woot in pulltree.reader(urllib2.urlopen(href))
               yield woot
           pulltree.eat(elm, events) # eat events to the end of the element
        yield (ev, elm)

Granted, that’s as basic an XInclude processor could be and still be useful but you get the point.