Wiki Archive 🙊
MDL ImagePreservationDraft

Image Preservation Draft

This draft report has been developed as part of the preservation collaborative project of the Minnesota Digital Library. This version of the report will be reviewed by a small set of stakeholders, then revised for a large group of stakeholders, and finally revised with their input for the use of the MDL in further planning.

The 6 July 2010 draft is very rough and already got some thorough critique.

If you have Apple's Pages program, use this verison: MDLp-needs-draft.pages.

Otherwise I recommend you use the PDF version: MDLp-needs-draft.pdf.

The 13 July 2010 draft is almost ready to go. Please whack away!

If you have Apple's Pages program, use this verison: MDLp-needs-draft2.pages.

Otherwise I recommend you use the PDF version: MDLp-needs-draft2.pdf.

The 16 July 2010 draft shared for our meeting.

Only here in the PDF version: MDLp-needs-100716.pdf.

The 28 July 2010 draft, incorporating meeting feedback.

If you have Apple's Pages program, use this verison: MDLp-needs-draft3.pages.

Otherwise I recommend you use the PDF version: MDLp-needs-draft3.pdf.

Feedback from 23 July 2010 meeting

Notes from the feedback session are in RTF format: notes-100723.rtf.

The final report was released on 4 August 2010

If you have Apple's Pages program, use this verison: MDLp-needs-100804.pages.

Otherwise I recommend you use the PDF version: MDLp-needs-100804.pdf.

Or you can review the document at Google Docs.

Or peek at the developing matrix.

These are some remaining notes from earlier drafts...

CONTENT

Jason: We need to be clear about what constitutes an ‘image’ to the folks at HathiTrust. My recollection was that John Wilkin was focused on content (photograph, postcard, photo negative) as opposed to file format (e.g. TIFF as opposed to MP3 or AVI). Need to clarify the scope of the project—how much textual material is included?

Agreed. I think HT is actually not at all interested in newspapers, for instance, at this stage. My point in including this spectrum anyway is to get them to say that in a broader forum and have everyone understand the resulting narrowing of scope that would force. This describes what we'd like to do, but it likely won't describe what we end up doing.

Jason: 50,000 is too high a number for MDL Reflections if we mean image-only.

OK, just give me what you would expect is a reasonable number and I'll use it.

By the way, does the 100,000 number sound reasonable to you all? I just pulled it out of a hat.

Jason: Two parts to this content identification. 1) identifying the collections and establishing the baseline parameters needed to play; how to structure the information—asset and metadata—in order to facilitate valid transfer into HathiTrust, and 2) actually programming and building that transformation sequence—as we move to various systems across the state (ContentDM, FileMaker, Drupal, etc.) do we perceive the need to build one-off mechanisms to do this transformation? Or is our role to provide instructions on transformation and then centrally manage verification of them?

I think this gets ahead of the demonstration project a bit. For the demonstration we are choosing collections close at hand where we will certainly build the "mechanisms to do this transformation." These questions will be important ones for us to keep in mind as we proceed with the demonstration, because we would certainly need answers for any broader Minnesota-wide project.

Jason: The University Libraries would be interested in including content from its UMedia collections, particularly if we are an active development partner.


The U will definitely be a development partner, so I'll add UMedia collection to the draft. Any though on numbers?

Bob: U Media Archive? Vivarium at CSBSJU? Will any legacy funded image collection be ready in time? We could put these on the list, but I’m happy with the priorities as described in the draft. I’m very doubtful we’d have anything significant from the legacy projects by late fall Δ the two big projects that I can recall, GLO field notes and the DNR Conservation Volunteer, won’t be ready

WORKFLOW

John: Is part of this also to determine how much of ingest workflow must be centralized to, and thus dependent on, HT staff management. We may potentially talking about a very different set of needs and preferences from the GRIN/GROOVE-based workflow.

Absolutely, though I thought this would be part of the discussion and didn't need to be addressed in writing yet. HT has a nice OAIS slide, and one question is where do we fit? Are we off to the left, or somewhere embedded within the ingest tasks?

John: HT already has an ImageIdentifer scheme in place (see deep within http://www.lib.umich.edu/files/UMichDigitizationSpecifications20070501.pdf). In fact, I think we already see a number of HT preservation policy commitments in these specifications. A question may be whether or not this framework is immutable.

The ImageIdentifiers described in this document look local to Michigan and I'm pretty sure do not describe what we would need to provide to HT. That could be a question for clarification at the meeting, if we want to get that detailed then.

As for the policy commitments implicit in the rest of the document... we should definitely broach this topic with HT in the room. Even the simple specs for JP2 and TIFF documents would probably be violated by our masters. We need to understand whether that is OK.

GOVERNANCE

Jason: Do we foresee needing to overhaul governance MDL-wide. Will the need for more focused and well-defined governance for preservation piece also need to be seen in other areas. Perhaps out of scope for this work, but decisions made here on preservation governance will clearly have an impact in other areas.

Yes, I think this is out of scope for the document at hand, but still a good question to air.

MDL ImagePreservationDraft