You are here


Stream Peekaboo With Python

The Python standard libary provides a reasonably adequate module for reading delimited data streams and there are modules available for reading everything from XLS and DIF documents to MARC data. One definiciency of many of these modules is the ability to gracefully deal with whack data; in the real world data is never clean, never correctly structured, and you are lucky if it is accurate even on the rare occasion that it is correctly encoded.

For example, when Python's CSV reader meets a garbled line in a file it throws an exception and stops, and you're done. And it does not report what record it could not parse, all you have is a traceback. Perhaps in the output you can look at the last record and guess that the error lies one record beyond that... maybe.

Fortunately most of these modules work with file-like objects. As long as the object they receive properly implements iteration they will work. Using this strength it is possible to implement a Peekaboo on the input stream which allows us to see what the current unit of work being currently processed is, or even to pre-mangle that chunk.

BIE is dead and gone.

Sadly, it appears that the BIE project is now dead and gone. The sourceforge project has been deleted, and so the mailing list is also gone. The BIE GPL site is also gone. So it appears the slate has been swept clean and Open Source has lost a truly one-of-a-kind solution. Shame upon the new owner, or un-owner, of the BIE code for this train wreck.

UPDATE: There is now OIE! The OpenGroupware Integration Engine can run many BIE BPML routes with drag-n-drop compatibility. And Python instead of Java.


Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer