/research/projects/audience/default.htm

originally: http://www.oclc.org/research/projects/audience/default.htm

Audience Level

This research project explores using library holdings data in WorldCat to calculate audience-level indicators for books represented in the WorldCat database, based on the types of libraries that hold the titles.

Background

There are a variety of ways to characterize library materials. The type of reader believed to be interested in a particular item is one. Such an indicator, generally known as the audience level, is potentially useful for a variety of activities, including the development of new ways to improve information relevance for retrieval, reference services (including readers advisory) and collection development. Audience-level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs.

Methodology

Determining a monograph's audience level is difficult because there is no bibliographic practice or standard requiring the inclusion of this information in the bibliographic record, except for the fixed field in the Machine Readable Code (MARC) record and the Library of Congress Subject Heading (LCSH) subdivision often used to identify juvenile literature and fiction. Thus, many bibliographic records have no direct indication of the target audience for the item represented.

Recognizing that different types of libraries typically serve different populations, OCLC researchers considered whether library types could be related to audience levels. They decided to explore whether the pattern of holdings of materials in WorldCat might be leveraged to provide an audience-level indicator.

OCLC researchers hypothesized that audience level could be inferred from the types of library holding the material, if the holdings symbols were weighted by a numeric code for library type.

OCLC's WorldCat database provides an excellent data source for this project because it contains more than 50 million bibliographic records and a billion holding locations.

The fixed field in the Machine Readable Code (MARC) record includes a "Target Audience" indicator (008/22), described as: "The intellectual level of the audience for which the item is intended." The following table lists these codes and the audiences they represent, along with the weight we assigned to each code.

If the Target Audience indicator exists in a title’s MARC record, the title is assigned the Audience Level as indicated in this table.

MARC code Description Audience Level
a preschool 0.0
b primary (K - 3) 0.1
c elementary and junior high (grades 4 - 8) 0.15
j juvenile (through age 15 or grade 9) 0.15
d secondary (grades 9 - 12) 0.25
e adult N/A
f specialized N/A
g general N/A

If the Target Audience indicator does not exist, an audience level is calculated for the title based on the library holdings data attached to the bibliographic record.

Each bibliographic record in OCLC has some number of holdings symbols attached to it. These symbols represent the individual libraries that are said to "hold" the item represented by the record.

Researchers determined the type of library for each holdings symbol in the database. They used 4 main categories: Association of Research Libraries (ARL) members, academic (non-ARL), public, and school. Any of the library symbols that did not fit into one of these groups were discarded.

After the library type of each holdings symbol was determined, researchers assigned a weight to each library type:

Library type Weight
ARL 1.0
Academic 0.67
Public 0.33
School 0.0

Once the weights were assigned, researchers constructed an indication of audience level by averaging the weights of the holdings symbols on the record. The formula for this averaging is:

(Number of ARL holdings symbols on the record * 1.0)
+ (Number of academic-library holdings symbols on the record * 0.67)
+ (Number of public-library holdings symbols on the record * 0.33)
+ (Number of school-library holdings symbols on the record * 0.0)
/ (Total number of holdings symbols on the record)
= The average library-type weight of libraries holding the item.

For example, say we have a record with the following holdings symbols:

1 ABC DEF GHI JKL MNO

where 1 is the OCLC number for the item, and ABC, DEF, etc. are the holdings symbols. Suppose ABC, DEF, and GHI are academic libraries, JKL is a public library, and MNO is a school library. The formula used to determine audience level for this item would be:

(3 * 1.0) + (1 * 0.67) + (1 * 0.33) / 5 = 0.8.

Furthermore, we can use this method to determine the audience level of a FRBR work by finding all of the items in that work and computing the average (weighted by holdings) of each of their respective audience levels. For example, consider a workset containing the following items:

1 5 .8
2 10 .76
3 7 .94

Where {1,2,3} are the OCLC numbers, {5,10,7} are the holdings counts that were used to compute the audience level, and {.8,.76,.94} are the respective audience levels of each item. The average audience level for the work would then be computed by:

[(5 * 0.8) + (10 * 0.76) + (7 * 0.94)] / (5 + 10 + 7) = 0.826

This approach can be used to calculate overall audience-level measures for collections or other groups of records.

The overall audience-level assessment for the WorldCat database itself is 0.63.

A wrinkle

We believe this approach produces interesing and usable results. For example:

Title Author ISBN Audience Level
Operations Research for Libraries and Information Agencies Kraft & Boyce 012424520X 0.78
The Kite Runner Khaled Hosseini 1573222453 0.43
The Da Vinci Code Dan Brown 0385504209 0.43
Harry Potter and the Sorcerer's Stone J.K. Rowling 0590353403 0.15

These values, which are for the FRBR work, are approximately what one would expect.

Of course, we need to remember what this approach measures. For example, if one were to assign a 'reading level' to Nietzsche's Thus Spake Zarathrustra (ISBN 0394608089) one might expect it to be high - maybe .8 or higher. However, we return a score of 0.61.

As a classic of philosophy this title has a wide potential audience, and is widely represented in public, academic and ARL collections. The manifestation-level records display audience-level measures ranging from 0.33 to 1.0.

OCLC Researchers continue to explore ways to account for and manage such distributional effects.

Why we are working on this

The findings from this research will benefit the development of new ways to improve information relevance for retrieval, reference services (including readers advisory) and collection development.

Audience level filters could be implemented in existing retrieval systems to assist users in finding content based on their information needs.

This effort is one of several data mining projects whereby OCLC Research seeks to extract intelligence from the data we have, and use it in different ways that provide value to libraries.

Feedback

This approach gives an indication of audience level. Is it useful? How could it be used? We are interested in your ideas! Please let us know what you think.

Resources

Research team