/research/software/oai/2page.htm

originally: http://www.oclc.org/research/software/oai/2page.htm

2PageOAI

"Amazing! Simply, bloody amazing!!" –Art Rhyno, University of Windsor

These Python scripts are a demonstration of how short a compliant OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) can be. The following files are available:

These programs have been tested with Python 2.2.2 and 2.2.3, but should work with any 2.2 or later Python. They are completely self-contained, using standard Python XML and HTTP libraries. No additional libraries beyond those included in the standard Python distribution are needed. The Web server used by the repsository is 'built-in'.

The primary limitations are that only oai_dc records are supported and that the whole database gets parsed when initialized. The header information for each record is kept in a list, each one occupying approximately 1K bytes. Reading the database both takes some time and precludes dynamic changes to the records. As far as we know the code is completely compliant to the OAI-PMH (approximately 19 out of the program's 106 lines are dedicated to catching OAI error conditions). Resumption tokens are stateless.

Some of the coding methods used in these programs might qualify as 'clever programming,' something generally avoided in Python as a matter of style. At any rate, these programs are not presented as exemplars of good Python style -- a number of tricks are used to keep them short. As formatted here, the repository prints in two pages from my XEmacs without any lines wrapping, other than the long XML and copyright strings at the end of the program. In defense of this style, we have found that debugging a two page program is very easy.

License

Distribution

View: Readme      
Download: Harvester Repository    

Contact

Please forward comments and suggestions to Thom Hickey (hickey@oclc.org)