Wiki Archive 🙊
MHS CloudServicesProposal

Digital Preservation and Cloud Services Proposal

This is a preliminary proposal that has been reviewed by nobody and could change after feedback from MNHS. It has not been approved or accepted in any way. Feel free to leave comments at the end of the document.

Editor's notes

These notes are only for the editor of this document. Others should avert their eyes!

Background

As described in the MHS request for proposals:

The Minnesota Historical Society (“the Society”) is facing exponential growth in storage required for preservation of digital collections consisting of images, audio, video, text, and other formats. The nature of preservation content is that it must be stored and maintained for future generations in perpetuity. It is not used in transactional applications and it requires little access beyond what is needed for preservation activities such as integrity checks and format migrations. Most of the content is high value -- unique and irreplaceable -- and as such, the files must remain unchanged.

The Society’s digital collections are growing exponentially and will likely approach 140 TB by 2015 (at a projected rate of increase of 10-20 TB/year). We are trying to decide the costs and benefits of increasing our existing storage architecture versus purchasing vendor storage services such as cloud storage (“cloud services”). The purpose of this Request for Proposal is to seek consulting services that will explore and document factors to help us determine costs and benefits.

The MHS request is quite clear about the deliverables and timeline, so this proposal is, for the most part, a reflection of that request.

Scope

The vendors to be included in the analysis will include:

  1. Amazon S3
  2. Amazon Glacier
  3. Google Durable Reduced Availability Storage
  4. Tessella Preservica
  5. Visi Cloud Services
  6. SDSC Cloud Storage
  7. DuraCloud
  8. IBM SmartCloud for State and Local Government & Education
  9. possibly University of Minnesota, but we need to discuss that

The professional organizations who's recommendations would be reviewed include:

  1. The Library of Congress
  2. The American Library Association
  3. The Association of Moving Image Archivists
  4. The Society of American Archivists
  5. possibly the Association of Research Libraries and/or the Council on Library and Information Resources, but we can discuss that

This scope would be reviewed and amended at the initial meeting with MHS.

Initial Meeting with MHS

The background in the request is quite helpful, but some elements surprise me. For example, there is no mention of the University of Minnesota as a potential partner in or source for cloud storage. I know that MHS and UMN have worked closely on this topic in the past, and so I need to understand the full intentions of this effort in order to provide the clearest possible input for MHS.

This input would best be provided in person and early in the process, which is why I appreciate that the proposal suggests and initial meeting with MHS staff. In addition to discussing process, scope, and deliverables, as described in the request, I would also like to discuss the priorities of MHS with respect to the cloud storage attributes outlined in the request. Some of these attributes should be fairly easy to assess (costs per TB, for example), while others will be a bit more slippery from vendor to vendor (retention in case of business failure). Given the time allowed for the development of this report, some data may not be successfully gathered from some vendors. I want to make sure I understand which data is most critical to MHS decision making process.

It would also be helpful to receive from MHS staff a list of recommendations from professional organizations they want to be sure are included in this analysis. I can develop a list on my own, but I don't want to miss sources you already know you wish to have incorporated.

In order to meet the overall deadline for this report, this initial meeting should take place some time before 25 January 2013.

Mid-Point Check-In with MHS

As I work with vendors to fill in the matrix of attributes that MHS would like illuminated about their services, a number of questions are sure to come to the surface. I appreciate and will require an opportunity to meet with MHS staff to share progress and ask for guidance on how to resolve these questions. For example, I imagine that many of the comparisons of contractual and other "fuzzy" attributes will be somewhat apples-to-oranges. We will have data from each vendor, but it might require judgement and the application of some assumptions to make those data points comparable to one another. MHS staff can help me understand how they would like that judgement applied.

I would also require at least one contact within MHS to whom I could address questions as the report preparation progresses. Given the tight timeline here, a smooth exchange of information on an ongoing basis will be critical to a successful outcome.

The mid-point checkin should take place during the week of 11 February 2013.

Presentation of Written Findings

My report will be presented as findings both on this wiki site and in shared Google Documents. Both can be restricted to MHS staff or shared openly, as MHS determines it would like. I expect to prepare a page-per-vendor on this wiki with a Google Spreadsheet of comparable summary items in a matrix. The example in the request of the cost matrix would be a start on that spreadsheet, but I believe other items of comparison could also be included.

I will use what I glean from professional organizations to inform the elements of the comparison and also to prepare a summary of recommendations in a separate document on the wiki. This will, as much as possible, link back to the sources of these recommendations.

Of the elements of the report requested by MHS, I fear that least attention may be paid to the cloud storage activities being used or tested in other academic, library, and cultural memory organizations. That is simply a vast domain and given the time and effort involved in simply gathering the data MHS has requested above, I may not be able to do justice to this item. I will provide an overview, but I imagine it will lack depth. This is a consequence of my being a one-person operation. If this element is vital, MHS may find other larger consultancies better suited to its request.

This work would be completed by 1 March 2013 and I would be available the following week for a final meeting to review the findings.

Costs

As a consultant, I do not charge by the hour. These services will cost $18,000. The first $5,000 would be invoiced after the completion of the initial meeting (no later than the end of January 2013), the balance of $13,000 would be invoiced in March 2013, after the report is submitted to MHS. I am a sole proprietorship and do not plan on subcontracting any of this work.

Each of the three meetings, initial, mid-point, and final, would be of two hours duration or less. I live and work in Saint Paul, so I expect these meetings would be in person at MHS without additional support beyond reimbursement for parking at MHS.

There may be some minor communications charges required for contacting vendors around the country. However, this is unlikely given that most communications can be handled via email, Google Voice, and Skype, all of which are free. Still, MHS should reserve $1,000 for these or other incidental expenses should they arise and be documented.

Past Work

I have been part of a number of projects with MHS in the past, though some of the key players from those projects, Michael Fox and Robert Horton, are no longer present at the Society. These are the development of an RFI and assembly of vendor responses for the Great Rivers Network Solicitation (requirements, use cases, etc.) in 2009 and a number of Minnesota Digital Library initiatives including a project to explore digital image preservation needs in 2010 and another to investigate digital preservation options in 2011.

In addition to staff still at MHS, those familiar with my work include:

  • Keith Ewing, St. Cloud State University, Interim Dean, Learning Resources Services
  • Robert Horton, IMLS, Associate Deputy Director for Library Services
  • John Butler, UMN Libraries, Associate University Librarian for Data & Technology

Eric Celeste

Eric brings over 20 years of library and 30 years of technology experience to his consulting. At MIT Eric shepherded the creation of DSpace, open source digital repository management software developed with HP and now deployed at hundreds of institutions worldwide. At the University of Minnesota Libraries he encouraged the development of the UThink blog service, a wiki-based staff intranet, LibData, and the University Digital Conservancy. He works with non-profit institutions on appropriate uses of technology for informing, communicating, and collaborating with their constituencies.

MHS CloudServicesProposal