/research/projects/publisherns/default.htm
originally: http://www.oclc.org/research/projects/publisherns/default.htm
Publisher name server
Research project idea
OCLC Research will prototype a service which:
- Resolves ISBN prefixes to publisher name
- Resolves variant publisher names to a preferred form
- Captures and makes available for use various attributes of individual publishers (specifics TBD, but the following are anticipated:
- Location of publisher
- Language(s) of materials published
- Genre(s)/format(s) of materials published
- Dominant subject domain(s) of the publisher's output
- Parent company and subsidiaries)
Research goals
The primary deliverable of the project is a service, which will support advanced collection intelligence by: 1. facilitating the reliable clustering of collected objects based on their issuing entity (as can be determined via metadata about the objects), and 2. gaining intelligence about the nature of individual publishers which can in turn be used alone or in tandem with other data sources (e.g., usage logs, holdings) to reveal critical collection intelligence, acquisition patterns and user behavior.
The primary high level requirements are for the service to achieve acceptable reliability in resolving:
- ISBN prefixes to publisher name
- Variant publisher names to a preferred form.
- Primary emphasis: addressing names in Latin script
- As time and resources allow: addressing names in other scripts
Although the impetus to undertake the project is chiefly to facilitate collection intelligence investigations and services, it is anticipated that the prototype service may have value to a wide range of parties inside OCLC including units engaged in activities such as:
- Content acquisition/licensing: the service may be useful for revealing publishers with desirable output/consumption patterns.
- Metadata processing: "publisher" can be a valuable match point for duplicate record resolution and other activities.
This project likely will have potential synergies with OCLC Research's FRBR-related activities such as xISBN, and that the project may itself prove instrumental as a tool in other current or future OCLC Research activities.
Success will be measured in two ways:
- Mechanical: The delivery of a working prototype service
- Data reliability: The various associations made in the database must be complete and reliable in accordance with specified standards
The project will be considered complete when:
- The prototype service has been built and delivered
- The data delivered are complete and reliable in accordance with specified standards
- Alternatively, if the proposed service cannot be built satisfactorily within the time allowed and with the resources available, the project shall be considered concluded when a formal determination has been made of same, and the project is discontinued.
Research methodology
The project will adopt two primary research modes:
- Consultation: Experts within OCLC will be consulted as specifications are written to assure the best possible results are achieved. Additionally, in anticipation that the service might prove useful beyond the bounds of the research project, input will be sought about non-research requirements and the relative value of various data that might be included in the database.
- Prototyping/trial-and-error: Interested OCLC staff will be invited to test and provide feedback on the prototype
Timing
This will be a twelve-month project, divided into three phases:
- Phase 1: Resolve ISBN prefixes to publisher name
- Phase 2: Resolve variant publisher names to a preferred form
- Phase 3: Capture and make available for use various attributes of individual publishers. The specifics of this phase will be determined as work progresses, but the following are anticipated:
- location of publisher
- language(s) of materials published
- genre(s)/format(s) of materials published
- dominant subject domain(s) of the publisher's output
- parent company and subsidiaries.
Resources
- Connaway, Lynn Silipigni, and Timothy J. Dickey. 2008. "Data Mining, Advanced Collection Analysis, and Publisher Profiles: An Update on the OCLC Publisher Name Authority File." Presentation given at the XXVIII Annual Charleston Conference, 7 November 2008, Charleston, South Carolina (USA). Available online at: http://www.oclc.org/research/presentations/connaway/charleston2008.ppt (.ppt: 761K/33 slides).
- Connaway, Lynn Silipigni, and Akeisha Heard. 2005. "Publisher Name Authority Project: An Attempt to Enhance Data Mining for Collection Analysis & Comparison." Presentation given at the XXV Annual Charleston Conference, 4 November 2005, Charleston, South Carolina (USA). Available online at: http://www.oclc.org/research/presentations/connaway/charleston2005.ppt (.ppt:183K/39slides).
- Connaway, Lynn Silipigni, and Akeisha Heard. 2005. "Publisher Name Authority Project: An Attempt to Enhance Data Mining for Collection Analysis & Comparison, A Selective Bibliography." Available online at http://www.oclc.org/research/projects/publisherns/bibliography.pdf (.pdf: 22K/3 pp.)
Team members