So this is pretty crazy. I'm messing around with ElementTree (which has been nothing less than perfect) and trying to get it to act like a xml.dom.pulldom/XmlTextReader style pull-parser. But I'd like to be able to assemble a chain of generator producing/consuming functions (or other callable) so that the file can be read, parsed, filtered/mutated, encoded, and written all incrementally.

Check it out:

import sys
import pulltree    # that's what I'm working on :)

def upper_filter(source):
    for (ev, item) in source:
        if ev == pulltree.CHARACTERS:
            item = item.upper()
        yield (ev, item)

reader = pulltree.reader(sys.stdin)
filter = upper_filter(reader)
writer = pulltree.writer(filter, sys.stdout)

for (ev, item) in writer:
    pass

C-z

$ echo "<hello>world</hello>" | python test_filter.py
<hello>WORLD</hello>

That felt good. More functional than a chain of SAX XMLFilters, almost as efficient, and muuuuch perdier.

Something like this might work someday soon:

import urllib2
from pulltree

XINCLUDE = '{http://www.w3.org/2001/XInclude}include'

def xinclude_filter(source):
    events = iter(source)
    for (event, item) in events:
        if event == pulltree.START_ELEMENT and elm.tag == XINCLUDE:
           href = item.attrib['href']
           for woot in pulltree.reader(urllib2.urlopen(href))
               yield woot
           pulltree.eat(elm, events) # eat events to the end of the element
        yield (ev, elm)

Granted, that's as basic an XInclude processor could be and still be useful but you get the point.

This entry has been tagged xml, coding, python — follow a tag for an archive of related essays, weblog entries, and bookmarks.

Leave a comment





(syntax: markdown)