Betatest WorldCat Identities project
Note: This project has been completed.
The problem: There is no one source of information for users to uniquely identify a person or corporate body. National authority files, identifying entities from published works, provide scant information and do not include the scripts that the author or corporate body itself may have used. The LC/NACO authority file, for example, is designed for librarians, and the practice of distinguishing between authors by birth dates is insufficient for most users.
Users generally have no access to any one resource that illustrates the history and works by and about persons and corporate bodies that may be known by a variety of names depending on location. Researchers need a tool to support discovery of publication "pedigrees" (to establish authority and relevance of a title)—and the ability to disambiguate publisher names. Data mining of institutional resources—in addition to WorldCat—can help institutions also manage names across resources and institutions.
Users generally have no access to any one resource that
illustrates the history and works by and about persons and corporate
bodies that may be known by a variety of names depending on location.
Researchers need a tool to support discovery of publication
"pedigrees" (to establish authority and relevance
of a title)—and the ability to disambiguate publisher names.
Data mining of institutional resources—in addition to
WorldCat—can help institutions also manage names across
resources and institutions.
The prototype solution: The WorldCat Identities aims to
address end users' need to uniquely identify authors—both persons and corporate bodies. WorldCat Identities compiles information from a variety of
resources, including information data-mined from WorldCat, to
illustrate the history and works by and about persons and corporate
bodies that may be known by a variety of names depending on location.
The twenty million identities covered are more than in any other
resource currently available.
Ninety-five staff from twenty-one RLG Programs partners participated in a beta test February 1 through April 30, 2007 to evaluate the resource and its potential to provide the information to uniquely identify a "creative entity" among many similar ones within different contexts and systems. Feedback came primarily from librarians, rather than from end users. The Publication Timeline—graphically illustrating the publication dates of works by the author, including those published posthumously, and work about the author—was particularly praised.
General feedback was positive:
"This is a very creative and impressive database with much potential for a wide variety of users ... All in all this is an exciting product, but, more importantly, a useful one. (Susan Flanagan, Getty Research Institute)
"WorldCat Identities is one of the most exciting
things I've seen from OCLC. It takes a giant step toward crossing the
gap between what authority files are meant to do and what users really
want from them." (Stephen Hearn, U. Minnesota)
"This is a very exciting-looking project that should
allow for new perspectives on 'resource discovery.'" (William Kopycki,
U. Pennsylvania)
"The Publication Timeline gives an interesting
snapshot view of scholarship over time, at least as applied to a single
person." (Daniel Mack, Pennsylvania State)
"The WorldCat Identities beta tool may have its roots
in the 'authority file' mode. But, adding the data-mining tools, it
begins to transform into something larger. I think these are exactly
the kinds of things we need to be building." (Martin Schreiner, Harvard
University)
"I found [the Publication Timeline] one of the most
intriguing features in Identities. It suggests a whole new discipline
of 'bio-bibliometrics.' For example, compare the timelines of Aristotle
and Plato to visualize the course of medieval and modern philosophy or
compare the different course of the careers and reputations of Byron
and Shelley. Admiral Nelson's timeline shows a
peak around 1905, which interestingly coincides with the naval arms
race of the early 20th century and the centenary of the Battle of
Trafalgar." (Richard Wakefield, British Library)
Enhancements made during the beta test period, primarily
due to feedback from beta testers:
- Corporate names were added.
- Subject headings or genres, when present, were added
to the list of identities retrieved to differentiate authors with
similar names (e.g., John Adams the composer from John Adams the
President).
- Subject headings and genres were linked to retrieve
other identities within the same subject and genre. The links retrieve
"identity clouds" of the top 100 authors in the subject or genre as
well as an alphabetical list of all identities associated with a
specific subject or heading.
- Colors were added to the Publication Timeline to
differentiate works by a person during the author's lifetime from works
published posthumously and from works written about the author.
- The ability to retrieve publications for particular
years in the Timeline was added.
- The Timeline was refined by discarding unknown dates,
publication dates before the author's birth date, and large date
ranges.
- The HTML titles and links were improved to enhance
rankings by Web search engines.
- Related Names were improved to lead to searches where
both the Identity and the Related Name appear.
- Roles were made to link to the WorldCat records they
came from.
- A "More" option was added to expand the list of
citations by and about to 20.
- Links to Wikipedia were doubled, to 50,000.
- Links to the German national authority file were
added. (The names established in Germany may be one of the "alternate
names.")
- Icons were added to the results list to differentiate
between personal identities and corporate identities and the icons that
are controlled (link to the LC/NACO authority file) were colored.
- Name format was reverted to first name, last name on
retrieval of an Identity ("Works by Normal Mailer" rather than "Works
by Mailer, Norman").
The beta testers pointed out several areas that need to
be improved:
- WorldCat Identities reflects cataloging variations in
WorldCat itself, resulting in duplicate entries for the same author. We
are working on better normalization that can reduce the number of
duplicates.
- Subject headings in WorldCat can also vary (e.g.,
French author vs. French novelist), excluding identities who write in
the same subject area. We will be adding multiple subject headings that
will provide a more comprehensive view of the subjects the authors
write in.
- Context is lacking. We will add a brief explanation
of what WorldCat Identities is on the home page with a link to
documentation of the contents and sources. We will add a More About
link to explain how Audience Level is derived.
Other enhancements suggested by beta testers that are
under consideration:
- Possibly adding these "useful links": Internet Movie
Database (IMDb), home Web pages of institutions cited.
- Adding formats (in addition to languages) of the
works represented (e.g., Brad Pitt's works are likely to be movies,
Prince's works are likely to be musical recordings).
- Refining the presentation of Uniform Titles, with the
title (field 245) alongside. For example, Dostoyevski's most widely
held works would be listed with their English as well as their Russian
titles. Or for Aaron Copland, "Fanfare for the common man; Rodeo;
Appalachian spring (suite)" would appear instead of "Orchestra Music.
Selections."
- Making Audience Level selectable to adjust results to
only children's literature or to specialists.
WorldCat Identities was incorporated into WorldCat.org in November 2007. Clicking on the About the Author(s) link under the Details tab brings you to the WorldCat Identities page for the author.
RLG Programs
partners beta testers for WorldCat Identities
- British Library
- Columbia University
- Getty Research Institute
- Harvard University
- Indiana University
- Library of Congress
- National Library of Australia
- New York Public Library
- New York University
- Pennsylvania State University
- Princeton University
- Rutgers, The State University of New Jersey
- Swiss National Library
- University of California, Berkeley
- University of California, Los Angeles
- University of Michigan
- University of Minnesota
- University of Oxford
- University of Pennsylvania
- University of Washington
- Yale University
RLG Programs
project liaisons
Thomas Hickey
Chief Scientist
hickey@oclc.org
Karen Smith-Yoshimura
Program Officer
karen_smith-yoshimura@oclc.org
|
|