Preparing Digital Surrogates for RLG Cultural Materials
Recommendations for
digitizing
Recommended background
Digital conversion service bureaus
Suggestions for exisiting surrogates
Recommendations for digitizing
These are general
recommendations,
not absolute requirements. Since each digitization project is unique,
there may be very good reasons for using alternative quality guidelines
or for choosing a different approach (although we strongly recommend
that you be consistent within a project). For RLG-funded digitization,
please discuss intended variations from these recommendations with Ricky.Erway@oclc.org.
For other projects, consider this guidance in the context of your
organizational requirements and proceed accordingly.
Images
- Avoid device-specific color space, format, headers,
etc.
- Size and save page images at 1:1 scale to the
dimensions
of the original pages.
- For optimal sharpness, view images on the monitor at
100
percent (i.e., each pixel on the screen representing each captured
pixel of the image). Evaluate an area of the image that depicts details
and edges.
- Be sure the whole image area (with edges) has been
scanned and no part of it has been cropped.
- Scan the image in the correct orientation or correct
the
image orientation in postprocessing.
- Avoid skew by placing the originals squarely on the
scanner. Rescan a skewed image rather than rotating it after
scanning.
- Check for artifacts such as dropout lines or pixels,
banding, lack of uniformity, poor color registration, aliasing,
flaring, and contouring.
Textual materials
- You can create just a digital image, or a digital
image
and a machine-readable text.
- Determine in advance if blank pages will be
scanned.
- Create images that meet or exceed these
characteristics:
Printed
texts and/or line drawings |
600
dpi, 1-bit |
Grayscale,
half-tone, and other black-and-white illustrations |
300
dpi, 8-bit |
Color
illustrated texts |
300
dpi, 24-bit |
Rare/early
printed texts |
300
dpi, 8- or 24-bit |
- Use Intel TIFF v 5.0 or 6.0 uncompressed or with
lossless compression (ITU Group 4 for 1-bit or LZW for 8- or 24-bit).
- RGB or PhotoYCC are recommended as acceptable
color
spaces for digital masters.
- For machine-readable text, key in or use OCR text
in
ASCII, UTF-8, or Unicode, preferably corrected to at least 99.995%
accuracy, and encoded (e.g., as specified in TEI
Text Encoding in Libraries: Guidelines for Best Encoding Practices
Version 1.0 (Digital Library
Federation, July 1999).
Pictorial materials
Use these resolutions whether scanning from
originals or
intermediates:
Black-and-white
photos |
400
dpi, 8-bit |
Color
photos |
400
dpi, 24-bit |
Slides
or small negatives |
Effective
resolution of 400 dpi, 8- or 24-bit |
- Use Intel TIFF v 5.0 or 6.0 uncompressed or with
lossless compression (LZW).
- RGB or PhotoYCC are recommended as acceptable
color
spaces for digitalmasters.
Audio
Where these recommendations offer a choice, make
your
decision based on the nature of the original. For example,
spoken-word conversion requirements are sometimes lower than those for
other recorded sound such as music. However, an old music
recording may not merit such high-quality capture as an excellent
spoken-word recording.
Master
file:
96 or 48 kHz; 24 bits |
Bitstream: Uncompressed PCM
Configuration: Monophonic or stereo depending upon characteristics of
source item
Sampling frequency: 96 or 48 kHz depending upon characteristics of
source item, 24-bit word length (in some cases, 44.1 kHz/16 bit
suffices)
File format: WAVE
Enhancement: none or as determined by contributor
|
Service
file:
MP3 (aka MPEG-1/2 layer 3 audio) |
Bitstream: MP3
Quality: Data rate of 192 or 128 kilobits/second, as determined by
contributor
|
Motion
Master
file |
Component digital video
bitstream (4:2:2 sampling rate) uncompressed. Note: the data rate for
4:2:2 is 270 Mbits/sec. |
Service
file |
Compressed
MPEG-2 files at
pixel dimensions and data rates determined by contributor,
possibly from a low of 1.2 Mbits/sec to a high of 15
Mbits/sec. |
Complex digital objects
- When digitizing component parts of an object, take
care
to maintain their relationships. For example, when capturing an album,
consider, What is the relationship of the parts to the whole? Should
each page be captured separately or should two pages be captured at
once? Do the album pages have intrinsic significance, or is it
sufficient to capture the images from each page? Is there a
relationship between the spreads that should be maintained, or is an
indication of sequence enough to recreate the experience of looking
through the album?
- Provide structural metadata for complex digital
objects
to allow for navigation within the object. Preferably, use the Metadata
Encoding Transmission Standard (METS). If you do not use
METS, include a link in the record to the text file (if there is one),
and a start image and end image.
File naming
- Use file-naming schemes that are compatible
across
platforms and systems. Minimize the length of the name. Use only lower
case characters a-z, numerical digits, and the following special
characters: . _ - (period, underscore, and hyphen). Do not use spaces
or any other special characters.
- Prefer a numbering scheme that reflects numbers
already
used in an existing cataloging system; if scanning precedes cataloging,
use serial file names that will be incorporated into the catalog
record.
- When developing a file-naming scheme, have a good
understanding of the whole project. How many images will be scanned?
Will they be stored in different directories? Are the files
part of larger complex objects?
- Use standard file extensions (e.g., tif, .wav,
.mp3,
mpg, .rm, .txt., .sgm, .xml) in lower case only.
- Make sure the file references in your descriptive
records match the file names (the extension may be omitted only if it
is the same for every image.) The case of file references in the
descriptive records must match the case of the actual file names.
- Replicate the directory structure as referenced in
the descriptive
records.
- Don't overload directories with too many files.
Naming a collection
of thousands of simple objects (e.g., a photographic collection)
- Subdivide them in a meaningful way (by series or
group)
or by chunk (same prefix or in groups of a thousand).
- Use the reproduction, accession, or a serial
number as
the stem of the file name.
- Add a code for special features:
b for back, if scanning information on the back of
a print
d for a detail of a larger image
Naming a complex
object such as a book
- Create a directory for each object, using an
identifying
string for the object as the directory name.
- If there is a text file for the whole object
(e.g., an
SGML file), use the same string in its file name.
- For page image file names, use a sequential image
number
followed by the printed page number (when present), both with leading
zeros, to fit the pattern "cccpppf", where:
- "ccc" is the image control number. These first
three
digits are used to assign a set of sequential numbers to all of the
images for the book. The first image from the book is assigned control
number 001; it reproduces the book cover. Control number 002 might be
the illustrated end paper, 003 might be a title page, etc. depending on
the book. If a document-start target is provided, scan it and give it
the file name stem, 000000. If missing pages are encountered, scan a
"missing page" target and assign the relevant control number.
- "ppp" is the printed page number. These next
three
digits carry the actual printed page number with leading zeros. If the
number is Roman, provide the Arabic translation. If there is no printed
page number, use 000.
- Assign a code for special features:
g — Title Page (if the work has more
than one, indicate the main title page)
n — Table of Contents (if more than one
page, indicate all pages)
l — List of Illustrations (if more than
one page, indicate all pages)
f — Illustration (not a page
image including an illustration, but an additional image cropped to
include only the illustration)
x — Index (if more than one page,
indicate all pages)
y — Missing page or other irregularity
target
Example: a book with the ID "mas 014" would be in
a
directory named mas014; it might contain these files:
mas014/mas014.sgm
mas014/000000.tif (target)
mas014/001000c.tif (cover)
mas014/002000.tif
mas014/003000.tif
mas014/003000f.tif (illus)
mas014/004000g.tif (title page)
mas014/005000.tif
mas014/006000n.tif (contents)
mas014/007000n.tif (contents cont.)
mas014/008003.tif (first numbered page)
etc.
Naming a
manuscript
collection
- Create a directory for the collection, and
subdirectories for each series, box, and/or folder.
- If there is a text file for the whole collection
(or for
each series, box, or folder) use the same string in its file name and
place it in that directory.
- The page image file names will consist of a
sequential
image number with leading zeros. Since folders generally contain fewer
than a thousand pages, you can use a three-digit number (including
leading zeroes) for page-image naming. If a document-start target is
provided, scan it and give it the file name stem, 000.
- Assign a code for special features:
b for back side of a page
s for start of a new document—since
documents and pages are not equivalent, indicate when a new document
(report, letter, etc.) begins by adding an s at the end of the file
name for each image that represents the start of a new document
Example: a manuscript collection with the
collection
identifier stw, would be in the directory stw, with the following
subdirectories and files:
stw/corresp/81/23/23.sgm
stw/corresp/81/23/001s.tif (first page)
stw/corresp/81/23/002.tif
stw/corresp/81/23/003.tif
stw/corresp/81/23/004s.tif (start of new document)
stw/corresp/81/23/005.tif
etc.
stw/corresp/81/24/001s.tif
etc.
stw/reports/01/01/001s.tif
etc.
Submission
Choose one of these methods:
- On media: ISO 9660 CDs or TAR on DLT.
- For RLG to pick up via FTP: provide access to
the
directory structure as referenced in the records.
- By FTP to RLG: copy the file directory
structure
referenced in records
Sources
Recommended background for digitizing decisions
Selection
Dan Hazen, Jeffrey Horrell, and Jan Merrill-Oldham, Selecting Research Collections for
Digitization Council on Library and Information
Resources, August 1998. (decision matrix)
Selecting Library and Archive
Collections for Digital Reformatting. Proceedings
from an RLG Symposium Held November 5-6, 1995 in Washington, DC.
Outsourcing
RLG
Guidelines for Creating a
Request for Proposal for Digital Imaging Services
(pdf)
RLG, 1997 (May 1998).
RLG
Model Request for Information for Digital Imaging Services
(pdf) RLG, 1997.
RLG
Model Request for Proposal for
Digital Imaging Services (pdf) RLG, 1997.
Cost estimating
RLG
Worksheet for Estimating
Digital Reformatting Costs (pdf) RLG, 1997 (May
1998).
Imaging
Anne R. Kenney and Oya Y. Rieger, Moving
Theory Into Practice; Digital Imaging for Libraries and Archives
RLG, 2000 (see RLG
Programs Books and Reports).
Guides
to Quality in Visual Resource Imaging Digital
Library Federation (DLF) and RLG, 2000.
Steven
Puglia, "The Costs of Digital Imaging Projects", RLG
DigiNews vol. 3, no. 5 (October 15,
1999).
Imaging
halftones: Anne R.
Kenney and
Louise Sharpe II, "Illustrated Book Study: Digital Conversion
Requirements of Printed Illustrations", The Library of Congress Preservation (July, 1999).
Imaging from microfilm: Louis
H.
Sharpe II, et al., Library of Congress Manuscript
Digitization Demonstration Project Final Report
October 1998.
Selection,
preparation, capture, metadata,
archiving: Joint RLG and NPO Preservation Conference:
Guidelines
for Digital Imaging, September 1998.
RLG
Working Group on Preservation Issues of Metadata, Final
Report RLG, May 1998.
Franziska Frey, Digital Imaging for Photographic
Collections: Foundations for Technical Standards", RLG
DigiNews, vol. 1 no. 3 (December 15, 1997).
Howard Besser and Jennifer Trant, An Introduction to Imaging, Getty
Information Institute, 1995.
Text
TEI: The TEI Guidelines TEI, 2001.
TEI Text Encoding in Libraries: Guidelines
for Best Encoding Practices Version 1.0 Digital Library
Federation, July 1999.
Alan Morrison, Michael Popham, and Karen Wilkander, Creating
and Documenting Electronic Texts: A Guide to Good Practice
AHDS Guides to Good Practice, 1998.
Audio
Bruce Fries with Marty Fries, The
MP3 and Internet Audio Handbook TeamCom Books, 2000: Chapter
11, "A Digital Audio Primer" and
Chapter 12, "Digital Audio Formats"
Motion
Dave Anderson, The
PC Technology Guide: Digital Video (2002).
Digital conversion service bureaus
RLG did not endorse these service providers,
but received positive reports from those who had used them.
Apex CoVantage ePublishing Solutions
120 Presidents Plaza
198 Van Buren Street
Herndon, VA 20170
Phone: 703-709-3000
Fax: 703.709.0333
E-mail: info@apexcovantage.com
Contacts: Margaret Boryczka or Tom O'Brien
text conversion, SGML markup, EAD
Backstage Library Works
1180 South 800 East
Orem, Utah 84097
Phone: 800-316-2759
Fax: 801.356.8220
E-mail: jmoore@bslw.com
Contact: Jodi Moore, Marketing Manager
on-site/off-site scanning;
text/prints/transparencies/realia; oversize; bound; data conversion;
metadata processing; OCR
Bar-Hama Blumenthal Digital Photography
450 Park avenue
Suite 2702
New York, NY 10022
Tel: 212-400-3281
Fax: 212.400.3293
E-mail: ardon@barhama.com
Contacts: Ardon Bar-Hama or George Blumenthal
on-site, high resolution digital photography of rare
books & manuscripts
Boston Photo Imaging
20 Newbury Street
Boston, MA 02116
Phone: 617-267-4086
Fax: 617.267.8711
Contact: David Sempberger
photo scanning
DCL
Data Conversion Laboratory, Inc.
61-18 190th St., 2nd Floor
Fresh Meadows, NY 11365
Phone: 718-357-8700
Fax: 718.357.8776
Contact: Shavy Schwimmer, convert@dclab.com
scanning, OCR and text entry, SGML
Direct Data Capture Ltd (UK and
NY)
73 B Ormskirk Business Park
New Court Way
Ormskirk, Lancashire
L39 2YT, UK
Phone: 01695 570707
E-mail: brett@ddcltd.co.uk
bound volume/microfilm scanning, text conversion
Higher Education Digitisation Service
University of Hertfordshire
College Lane
Hatfield, Hertfordshire
AL10 9AB UK
Phone: +44 1707 286078
E-mail: heds@herts.ac.uk
digitization of all manner of originals
Innodata
Innodata Content Services
Three University Plaza
Hackensack, New Jersey 07601
Phone: 201-488-1200
Fax: 201.488.9099
Contact: Joan Meyer, joan_meyer@inod.com,
or Steven Keyes, steven_keyes@inod.com,
or Jan Palmen
data aggregation and conversion, XML transformation,
OCR, and image scanning
Input Solutions, Inc (ISI)
Gaithersburg, MD
Phone: 301-948-6620
Contact: John Solomon
scanning and conversion, microfilm, oversize, text, SGML
JJT,
Inc.
Corporate Headquarters, R&D & Production Center
26 Howland St.
Plymouth, MA 02360
Phone: 508-747-9889
Fax: 508-747-9289
Email: info@jjt.com
JJT,
Inc.
New York Production Center
231 W. 29th Street
Suite 701
New York, NY 10001
Phone: 212-594-5106
Email: atroncale@jjt.com
Contact: Anthony Troncale
high-quality digital reproductions of pictorial works,
including line and photographic images and manuscripts; specializing in
conversion of large collections
Kirtas Technologies, Inc.
7620 Omnitech Place
Victor, New York 14564-9782
Phone: (585) 924-2420, ext. 3008
E-mail: mmaxwell@kirtas.com
Contact: Michael Maxwell, Director of Worldwide Sales
Non-destructive, high quality, inexpensive, bound
document scanning (on and off-site) of books, journals, magazines, lab
notebooks, etc. with OCR and metadata capture capabilities
Luna Imaging, Inc.
3542 Hayden Ave., Bldg. One
Culver City, CA 90232-2413
Phone: 310-452-8370
Fax: 310.452.8389
E-mail: sales@luna-img.com
film and print scanning, direct digital photography,
image editing and post-production, on-site services, image
studio/workflow consulting
Northern Micrographics
2004 Kramer Street
LaCrosse, Wisconsin 54602
Contact: Tom Ringdahl, tringdahl@normicro.com
scanning from paper or film
Preservation Resources
9 Commerce Way
Bethlehem, PA 18017
Phone: 800-773-7222
or 610-758-8700
Fax: 610.758.9700
Contact: presres@oclc.org
microfilm scanning
Saztec International
6700 Corporate Dr.
Kansas City, MO 64120
Phone: 816-483-6900
Fax: 816.241.4966
text conversion, SGML
Systems
Integration Group, Inc.
9701 Philadelphia Court
Building 17, Suite A
Lanham, Maryland 20706
Phone: 301-731-3900
Fax: 301.731.3907
on-site/off-site document scanning, text conversion,
SGML
Two Cat Digital, Inc.
14717 Catalina Street
San Leandro, CA 94577
Phone: 510-940-2670
Fax: 510.940.2632
Contact: Howard Brainen, howard@twocatdigital.com
film and print scanning, direct digital photography,
image editing, bulk image processing services, automated systems, image
databases, on-site services, digital imaging consulting
Suggestions if you've already created your digital
surrogates
The following were suggestions
for the quality and format of the files already digitized. These
were not requirements.
2D
images
Formats and compression: In
general,
you'll probably want to keep a TIFF (Tagged Image File Format, version
5 or 6 with Intel headers) version of the image with lossless
compression (ITU 4 for black and white or LZW for grayscale or color)
or no compression, but a JPEG compressed image will suffice for
contribution to RLG Cultural Materials. Alternatively, PhotoCD images
may meet your local needs, and JPEGs can be created from those images
for contribution to RLG Cultural Materials.
Source |
Resolution
|
Black-and-white
text and line art |
300-600
dpi bitonal |
Halftone
illustrations |
300-400
dpi, 8 bpp or 24bpp |
Oversized
(e.g., maps or posters) |
300 dpi
bitonal, 8 bpp or 24 bpp |
Manuscript
page images |
300-400
dpi, 8 bpp (24 bpp for color, tinted, or
discolored originals) |
35mm
photographic negatives or slides
(reverse polarity if negative) |
3000
pixels in long dimension, 8 bpp or 24 bpp |
Photographic
prints and transparencies
(4x5, 6x8, 8x10) |
4000-6000
pixels in long dimension, 8 bpp or 24
bpp |
Text
Source |
Quality |
Format |
Encoding
(optional) |
Printed
page (OCR or rekey) |
99.95%
accuracy as compared to original |
ASCII 7-
or 8-bit |
HTML, XML,
SGML, RTF |
Compound
document, in Portable Document
Format (PDF) |
Text and
images as indicated above |
PDF |
|
Audio and motion
Formats and compression:
Any of these
are acceptable: Microsoft Wave (.wav), MPEG (.mp3, .mpg, .mpeg), "Audio
Video Interleave" for Windows (.avi), QuickTime (.qt, .mov), RealMedia
(.rm, .ra, .ram).
Source |
Quality |
Spoken word |
11-22 kHz sampling, 16
bit, mono |
Music |
44.1 kHz sampling, 16
bit, stereo |
Video |
320x240 30 fps/1.2kbps |
|