/research/publications/about.htm

originally: http://www.oclc.org/research/publications/about.htm

About the OCLC Research Publications Repository

Scope

This repository contains works produced, sponsored, or submitted by OCLC Research. In general, the works are research-oriented and are in the subject area of library and information science. Many items describe OCLC Research projects, activities, and programs and were originally published by OCLC, while others are from peer-reviewed scholarly journals.

The repository contains metadata (MARC, Dublin Core) about publications and, whenever available and permitted, a link to the full digital text of items described.

The repository is under construction. At present the repository contains:

  • 700 metadata records (out of 919 items published by OCLC staff since 1979)
  • links to the full text of 303 items.

It contains current publications back to 1997, all "born digital" publications, and at least 75% of OCLC Research's corpus of work.

A complete bibliography of OCLC Research publications is available at www.oclc.org/research/publications/2000-2009.htm.

Access

The records are available for searching on the OCLC Research Web site. The search interface is a working prototype, and your comments are welcome.

The metadata records may also be searched via SRW/U.

The metadata records may be harvested as an OAI file; please feel free to harvest them.

The MARC records are in WorldCat, and are searchable in FirstSearch and search engines participating in OCLC's Open WorldCat Program.

Full text of articles are linked from the metadata record wherever permitted by the copyright holder. Your library may be able to supply items not linked, either from the library's collection or by interlibrary loan.

The repository also has an RSS feed and supports Open URLs. (Please note that these are prototypes and also that on an interim basis we are refreshing our full database regularly, i.e. approximately weekly.)

OCLC technologies and services used (AKA shameless self-promotion)

Data preparation

OCLC Library Technical Services (LTS) Custom Cataloging catalogued the backlog of items on our bibliography into WorldCat. (Some items have in the past been catalogued by OCLC members.)

OCLC Preservation Service Center staff scanned/digitized paper items (items for which we had permission to archive) and applied Optical Character Recognition (OCR) to create searchable PDF files.

The OCLC Digital Archive warehouses the OCLC-scanned items.

Access preparation

Pears Database Software was used to create OCLC Research's publications database; it was populated with records from WorldCat using a Z39.50 client.

SRW/U Software provides an the searching interface as well as XSLT process which applies a style sheet that makes the records available to an OAI repository.

The ERRoL family of services is used to create and massage the records into the OAI repository, and to do so it relies on OAI Cat, OCLC's SRW/U to OAI Gateway Service, and University of Illinois Urbana Champaign's OAI Registry to publicize the availability of the OAI repository. The ERRoL service has OAI, OpenURL, RSS, and Search APIs. The OCLC Research Publications Repository's interface plugs into ERRoL's Search API.

The work and data flows were pretty complex, made magically simple by the use of Z39.50, Pears Software, SRW/U Software, and the ERRoLs family of services. Pears and SRW/U are open source, and they and ERRoL are available from the OCLC Research web page.

DSpace, Greenstone, ePrints and other institutional repository software (including, we must add, ContentDM), are excellent and useful tools! We didn't use one of these simply because they were more than we needed for our collection and our workflow.

Behind the scenes

Who is creating the metadata?

Many repositories are intended for distributed description and submittal, i.e. for author-submission and author-created-description. This is not one of those: it is not a repository created and maintained by authors. Acknowledging the experience of many institutional repositories that it is unlikely authors would have the time or inclination or habit to catalog their own work, we used professional catalogers and, where needed for older materials, professional scanners.

The life cycle of our metadata

  • Research staff developed a bibliography of titles written or sponsored by OCLC Research.
  • We collected copies of those works or links to the digital versions of those works.
  • Various OCLC members cataloged older items on our bibliography. OCLC Library Technical Services (LTS) Custom Cataloging catalogued the backlog of items on our bibliography into WorldCat.
  • OCLC Research staff sought archiving permissions where necessary (for older materials).
  • OCLC Preservation Service Center staff scanned/digitized paper items (items which we had permission to archive).
  • OCLC Library staff put the scanned items in the OCLC Digital Archive, linking them to the WorldCat records.
  • We tracked the status of these steps for every citation, in an Excel spreadsheet.
  • OCLC Research staff pulled the metadata from WorldCat, and put the metadata in the OR publications database.
  • Research staff made the data available as an OAI repository.
  • Database and the web interface design work went on in parallel as the metadata and archive was built.
  • The process continues with each new research publication. On an ongoing basis, we run an automated script against WorldCat to refresh our Pears-based database. The ERRoL service automatically, transparently, and instantly serves up these records as if they were OAI, complete with ERRoL's attendant value-added services such as a Search API, RSS feed, and prototype Open URL API.

Here's the right-brained version:

OCLC Research Publications Repository flowchart

What did we learn along the way?

Intellectual heritage is indeed a fragile thing. In the 25 years of OCLC Research's existence, OCLC has already lost a handful of items—we discovered citations for originally-paper-based items, for which even the authors have no copies.

It is easier to preserve one's copyright from the get-go, than to get it after-the-fact. Seeking permissions, even for older materials, was one of our hardest tasks. Nowadays OCLC Research's policy is that OCLC should retain copyright, secure archiving rights up-front, and publish in open-access friendly journals, whenever feasible.

"Clean data is more fun to work with than dirty data"—words from our database expert. He was glad to receive professionally catalogued records.