originally: http://www.oclc.org/programs/ourwork/past/digpresmetadata/report.htm
Final Report on Preservation Metadata for Digital Master FilesMay 1998 BackgroundDigital materials are increasingly important in the development of research collections. In particular, the preservation and reformatting community is in the process of incorporating digitization into its repertoire along with microfilming efforts. A significant component of creating and managing digital collections is ensuring that the information essential to their continued use is preserved in an accessible form. The Working Group on Preservation Issues of Metadata was constituted in May 1997 as a first step in the process of addressing this issue. The group was asked to identify the descriptive data elements that should be associated with digital master files that have preservation-based intent. It is a commonplace that metadata serves many purposes, but to date the main emphasis has been on defining elements essential for discovery and retrieval. Consequently, the starting place for the group was to examine two prominent metadata systems that purport to offer a set of "core" elements necessary for discovery of resources: the Dublin core elements and the Program for Cooperative Cataloging's USMARC-based core record standard. The group decided to specify the elements extra to these core element lists that are important to serve preservation needs for digital masters. The list of data elements below is the result of this process. Simultaneously, another group, the RLG Working Group On Preservation and Reformatting Information, was examining the mechanism for sharing of preservation information through the medium of the USMARC record. Consequently, the metadata working group also took care to ensure that its recommendations would be compatible with the work of this other group. ScopeSince the concept of metadata takes in a lot of territory, the Working Group had to begin by defining the constraints that should govern the scope of its activity: Technological constraintsGiven the fact that the relevant technologies are in a state of ongoing and rapid development and that digitization efforts are still evolving in many respects, the group limited its task as follows: —The Working Group concluded that it is
premature to make recommendations concerning the way that preservation
information should be stored. Such information may be included in a
header of a digital file, it may exist in some separate but linked
format, or it may be incorporated in a USMARC cataloging record that
may or may not be linked to a corresponding digital file. Format constraintsThe Working Group also limited itself to a consideration of data elements that describe digital image files. Doing so allowed the group to address the most significant need within a timeframe short enough to be meaningful. Members also agreed that it would be most efficient to constitute other specialist groups to supplement the list of data elements, adding elements for other formats (e.g., audio files, moving images) as the need becomes more pressing. Functional constraintsMembers of the Working Group noted that information that is not specifically related to preservation tasks may be of potential interest to the preservation community-for example, copyright and use restriction information can be crucial and might appropriately be recorded at the time that preservation staff are creating the digital master. Members concluded that since the scope of such information often exceeds preservation needs, it should more appropriately be dealt with by other specialist groups. However, data elements that might serve other purposes as well are included as long as they address a core preservation information need.Supporting recommendationsAs a result of the considerations above , the group endorses the following recommendations: —Institutions should be encouraged to share their efforts to apply the element set with the rest of the community. —The current list of data elements should be supplemented with elements deemed necessary for other formats (e.g., audio files, moving images, etc). —The RLG PRESERV Advisory Council should continue to monitor and liaise with the Society of Motion Picture and Television Engineers (SMPTE) in its efforts to develop a universal preservation format and to define a comprehensive data dictionary (in order to ensure that such a data dictionary represents preservation needs). —The RLG PRESERV Advisory Council should
monitor and liaise as appropriate with other specialist groups
concerned with delineating metadata elements to serve specific needs
that are also of interest to the preservation community (e.g.,
copyright information).
The following list of sixteen elements represents
information that the working group deems crucial to the continued
viability of a digital master file. Institutions may exceed this list
or not, but the Working Group recommends that all the enumerated
elements that are relevant to a specific file be recorded. Since it is recognized that these elements may be
recorded according to the specifications of any one of a number of
metadata systems, no effort has been made to specify syntax. The list
below, including examples, is meant to provide a semantic framework
only. The format of the examples is intended to be illustrative, not
prescriptive. In order to demonstrate how the list might be used,
possible implementations are included in the attached appendices. 1. Date DEFINITION: Date file is created 2. Transcriber DEFINITION: 3. Producer DEFINITION: 4. Capture device DEFINITION: Indicate make and model of digital camera or
scanner 5. Capture details DEFINITION 1 (Capture device is a scanner): Name scanner
software, including version information; give scanner settings, gamma
correction, and other relevant details pertaining to scanning 6. Change history DEFINITION: A record of modifications made to the file,
and significant versions generated, identifying the person/institution
who made them and the date they were made. 7. Validation key DEFINITION: A mechanism, usually consisting of a number,
that allows one to verify that an electronically transmitted file is
what it purports to be i.e., the file is what is described in the
metadata. At the simplest level, such a key might consist of the number
of lines in a file (similar to the way that one indicates the number of
pages that are transmitted via fax). Especially prevalent is the use of
a checksum which is an algorithm based on a manipulation the sum of the
bits that make up a file to yield number that serves as a unique
identifier for that file. 8. Encryption DEFINITION: Technique by which data is scrambled before
transmission in order to insure privacy. Encrypted data must be
unscrambled (decrypted) by the receiver. If a file is encrypted, the
type of encryption should be indicated. 9. Watermark DEFINITION: Indicate whether or not some bits in the
file have been altered in order to create a "digital fingerprint" that
can serve to establish ownership of an image and prevent unauthorized
use. 10. Resolution (e.g. pixel dimensions,
dpi, ppi) DEFINITION: Traditionally determined by the number of
pixels used to represent the scanned image, expressed as pixel
dimensions, pixels per inch or dots per inch. Current research into the
use of Modulation Transfer Function (MTF - a function of the spatial
wave number) to measure resolution should allow a more objective
numerical value to be assigned as the measurement. 11. Compression DEFINITION: Indicate whether or not the file has been
compressed (i.e. reduced in size), and if it has, identify the level
and method of compression. 12. Source DEFINITION: Describe physical characteristics of the
source such as its size, condition, and its place in the chain (e.g.,
original, copy, or copy of a copy). Include information about
modifications made to the source to enable better digitization. For
images of photographs and digitized microforms, include image type
(i.e., positive or negative image). 13. Color DEFINITION: Indicate pixel depth. 14. Color management DEFINITION: Identify system, if any, that is used to
improve consistency of color across capture, display and output of an
image. 15. Color bar/Gray scale bar
DEFINITION: Indicate presence or absence of either and,
if present, identify the type. 16. Control targets DEFINITION: Include information about targets included
in the scanned file for purposes of quality control, calibration,
verification, etc. Presented below is an effort to incorporate the metadata
elements enumerated in the body of the report into a Dublin Core record
template. Some data elements have been created as extensions to
currently agreed Dublin Core metadata elements and are tagged as RLG
(for RLG Preservation Metadata) elements rather than DC elements for
illustrative purposes. This example is not intended to be prescriptive, but to
suggest directions that might be explored further and experimented with
more extensively. There are undoubtedly a number of alternative ways to
embed preservation metadata into Dublin Core records, ranging from
simple links to associated files to more elaborate container
architectures. Shared experiments in this direction and continued
discussion among the members of the preservation community might be
especially fruitful in developing future guidelines. DC.Title: [Title of digitized item] RLG.Form.Capture: [Make and model of scanner or
digital camera and relevant capture details] Note: Alternatively,
instead of Source use the Relation element to identify print version: DC.Relation The templates below offer maps of the 16 Preservation
Metadata Elements (described previously) to a USMARC record. Bracketed
numbers correspond to the list of the 16 recommended data elements.
Please note the following points: Please also note that the RLG Working Group on
Preservation and Reformatting Information, which is explicitly
concerned with the USMARC record, has prepared a discussion paper for
ALA's Machine Readable Bibliographic Information (MARBI) Committee
which would extend the 007 in order to include in coded form much of
the information that must otherwise be included in variable data
fields. That working group is also preparing examples demonstrating a
potential standard configuration of the 533 field that could be used in
conjunction with the extended 007. The adoption of these proposals
would considerably simplify the addition of information corresponding
to the recommended preservation metadata elements. Appendix 3: XML implementationThe model below shows how the conservation elements designated in the report might be configured in a simple XML record. The model record below, would, of course, reflect the specifications of a DTD which is not reproduced in this report. Note that the model below does not conform to the RDF specification which would provide another, significant way to present the requisite conservation data in XML format. Model XML record incorporating preservation metadata elements‹RLG.SOURCE_TITLE›[Title of item
that is digitized]‹/RLG.TITLE ›
‹/RLG.DIGITIZED_VERSION› |