This is still an incomplete draft version of this work plan.
Our goal is to develop the workflow move digital image data and associated metadata from Minnesota into the HathiTrust, demonstrate that workflow by moving a defined set of images into HathiTrust, and work with HathiTrust to define the appropriate display for these images in that system. We have until the end of November to accomplish this goal, with December to report on progress.
Accomplishing this will require attention to five stages of processing: extracting master images and metadata from current repositories, wrapping these binaries and associated metadata into appropriate packages, transferring these packages to HathiTrust, seeing that these packages are ingested by HathiTrust, and providing for display of these images at HathiTrust and retrieval of the masters via API calls.
We plan to address a variety of content types, from simple continuous tone images, to compound objects made up of a series of images in a certain structural relationship, to images containing text and associated optical character recognition (OCR) derived text, to structurally complex documents with associated text-like newspapers. Demonstration content would include the roughly 50,000 MDL Reflections images, a 20,000 image subset of the MHS collection management system, and the images from one newspaper prepared by the MHS for the National Digital Newspaper Program (NDNP). We expect a total of between 50,000 and 100,000 images transferred.
After a brief start-up phase during September, this plan calls for us to enter an iterative development and testing phase through November, to be followed by a documentation and reporting phase in December.
Start Up (September)
The PM will complete the work plan and requirements for the project. The sponsors will meet to review the work plan and the requirements will be shared with sponsors for feedback, though won't likely have time for another meeting. Note that the sponsors include representation from the HathiTrust, so their input will be included in the work plan and requirements.
The PM will work with the coordinator and staff of MHS to assemble an accurate count of the items we intend to attempt preserving.
Although we don't yet have specifications for metadata wrapping and ingest to HT during start up, we will begin to use our software developer to work on the extraction stage of content originating in MDL Reflections.
An initial set of specifications regarding simple continuous tone images will mark the conclusion of the start up phase.
Development and testing (October/November)
Development will address each type of content in sequence and in increasing difficulty. This means that if there is slippage, the more difficult content types will not be addressed or at least not be fully addressed. These content types will be:
- simple continuous tone images from MDL Reflections (through October 22)
- compound textual objects from MDL Reflections (through November 5)
- simple continuous tone images from MHS (through November 20)
- an newspaper from MHS already prepped to NDNP standards (through November 30)
Specifications for a given content type will be determined before we start the development phase for that content type. Once the specifications are ready, the software developer will work on the extraction, wrapping, and transfer stages of the workflow. Meanwhile we expect HT will consider the ingest and display implications of that content type, working with the PM and the software developer to prepare for ingestion. We expect the actual process to be quite fluid, with the software developer creating and testing workflow scenarios that meet our specifications many times during each cycle. A final data transfer and ingest of each content type will be done at the tail end of each content type period.
The PM will work with HT on specifications for the content type "on deck" while development proceeds on the current content type.
The software developer, coordinator, and PM will meet weekly during the development and testing phase. We will pull HT straff and the digital preservation consultant into the conversation as needed, likely every other week. The sponsors will have access to our project workspace and we will schedule meetings at the end of each month to bring everyone up to date on status and answer questions.
Documentation (December)
During the first two weeks of December the PM and digital preservation consultant will draft a report summarizing the technological and organizational lessons of the prototype. We will describe what we have accomplished, where we exceeded or fell short of expectations, and how this model of preservation compares economically to other options available to the MDL. The report will also include recommendations for next steps.
The draft will be available for review on 12/13 and feedback will be used to generate a final version by 12/20.
The PM will be available for consultation on next steps at the beginning of January 2011.
/wiki/mdl/imagepreservationworkplan