An index routine is declared as one of the parameter settings for an index definition within a Pears database description configuration file. It sets up how the OCLC SiteSearch Pears software extracts index terms from input data and how the software acts on that data (e.g., handling punctuation, extracting codes). Index routines create index terms in one of two basic formats: keyword, where each word is its own index entry, or phrase, where the contents of an entire field is an index entry. Pears provides a wide range of index routines for both keyword and phrase indexes. Words and Phrase routines are the most commonly used.
The following table lists the current index routines used by Pears to extract terms from input data to build indexes:
Routine |
Description |
Words Routines |
ORG.oclc.pears.Words |
Extends Phrase and extracts and stores individual terms from fields in a record
The following table lists parameters that you can use with the Words routine to more specifically define how it extracts terms to build an index:
Parameters |
delimiters=\t\n\r+-=<>(){}[]:;/\\\"!? |
extraDelimiters= |
removeDelimiters= |
minWordLength= |
maxWordLength= |
maxWords= |
|
ORG.oclc.pears.PluralWords |
Extends Words to stem plural endings from terms as they are extracted so that only the singular form of the term is stored in the index |
ORG.oclc.pears. StopwordEnforcer |
Ensures that stop words are not stored as terms in an index |
ORG.oclc.pears.SmartWords |
Extends PluralWords to ensure that terms are greater than two characters in length |
ORG.oclc.pears. WordsMinusBoundPhrases |
Allows you to declare open and closed boundaries (such as quotation marks) to identify data within a phrase that is to be ignored during the extraction process
Note: When using this indexing routine, you must also use the bounds parameter within the index definition. Values for the bounds parameter must be declared in character pairs. An example would be: bounds = "" . In this example, anything between double quotes would not be indexed.
|
Phrase Routines |
ORG.oclc.pears.Phrase |
Creates simple bound phrases by extracting the contents of a field as a single index term
The following table lists parameters that you can use with the Phrase routine to more specifically define how it extracts terms to build an index:
Parameter |
Description |
Collapse= <list of characters> |
Removes any of the characters in the list from the field |
ExtraTrimChars= <list of characters> |
Adds the list of characters to the default list of trimChars for the current index only |
TrimChars= <list of characters> |
Removes any of the characters on the list form the beginning or end of the field (default set: ' & . , : *) |
MaxLength=<number> |
Shortens the field to the specified number of characters |
StartOffset=<number> |
Ignores the first specified number of characters in the field Note: The offset is performed before any other trim or collapse rules are applied. |
ExtraIndex=<index ID> |
Any terms extracted for this index are also sent to the specified index ID. |
indicator1= <list of characters> |
Requires that indicator1 for this field must have a value from the specified list of characters Note: This can be used only with MARC-like records. |
indicator2= <list of characters> |
Requires that indicator2 for this field must have a value from the specified list of characters Note: This can be used only with MARC-like records. |
indicators= <list of character pairs> |
Requires that the two indicators must have a vlaue from the specified list of character pairs Note: This can be used only with MARC-like records. |
notIndicator1= <list of characters> |
Inidcator1 for this field must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
notIndicator2= <list of characters> |
Indicator2 for this field must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
notIndicators= <list of character pairs> |
Two indicators must not have a value from the specified list of characters. Note: This can be used only with MARC-like records. |
NonFilingIndicator1= true |
Value of the first indicator determines the number of characters to remove from the beginning of the field |
NonFilingIndicator2= true |
Value of the second indicator determines the number of characters to remove from the beginning of the field |
Example: |
Since titles often have a trailing slash that needs to be removed...
[title] index=1 routine=ORG.oclc.pears.IndexRoutines.Phrase tagpath=245/1 extratrimchars=/ nonFilingIndicator2=true |
|
MARC Routines |
ORG.oclc.pears. MarcBibliographicLevel |
Extends Words to find the bibliographic byte in the leader string in a Marc record and generates an index term based on the code that it finds there
Bib Level Code |
Type of Material |
Index Term Returned |
a |
analytic monograph |
analytic |
b |
analytic serial |
analytic |
m |
mongraph |
monograph |
s |
serial |
serial |
c |
collection |
collection |
d |
subunit |
subunit |
Example: |
[BibLevel] index=1 routine=ORG.oclc.pears.IndexRoutines. \ MarcBibliographicLevel tagpath=0 startOffset=1 |
|
ORG.oclc.pears. MarcFormat |
Extends Words to find the record type and bibliographic bytes in the leader string in a Marc record and generates an index term based on the codes that it finds in those two places
Record Type |
Bibliographic Level Code |
Abbreviation |
Type of Material |
a, t |
m, c, a, d |
bks |
Books |
e, f |
any |
map |
Maps |
p, b |
any |
mix |
Mixed Materials |
m |
any |
com |
Computer Files |
c, d |
any |
sco |
Scores |
any |
s, b |
ser |
Serials |
i, j |
any |
rec |
Sound Recordings |
g, k, o, r |
any |
vis |
Visual Material |
The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions:
Parameter |
Description |
DebugMarcFormat=<true/false> |
Turns on internal debugging |
Example: |
To extract material type from a MARC leader . . .
[format] index=1 routine=ORG.oclc.pears.IndexRoutines. \ MarcFormat tagpath=0 staroffset=1
|
|
ORG.oclc.pears. MarcTypeOfMaterial |
Extends Words to find the type of material byte in the leader string in a Marc record and generates an index term based upon what it finds there
Type Code |
Abbreviation |
Type of Material |
a, t |
bks |
Books |
e, f |
map |
Maps |
p |
mix |
Mixed Materials |
m |
com |
Computer Files |
c, d |
sco |
Scores |
s |
ser |
Serials |
i, j |
rec |
Sound Recordings |
g, k, o, r |
vis |
Visual Materials |
The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions:
Parameter |
Description |
DebugMarcTypeOfMaterial=<true/false> |
Turns on internal debugging |
Example: |
To extract material type from MARC 006 . . .
[materialtype] index=1 routine=ORG.oclc.pears.IndexRoutines. \ MarcTypeOfMaterial tagpath=6
|
|
Number Routines |
ORG.oclc.pears. Numbers |
Extends the Words routine and only extracts digit strings |
ORG.oclc.pears. LCCardNumber |
Extends Phrase to convert the LCCard number field in a Marc record into a searchable term |
Date Routines |
ORG.oclc.pears. PublicationDate |
Extends the Words routine in order to extract and normalize the publication date field in a Marc record |
Language Routines |
ORG.oclc.pears. ISO639Language |
Works with the HandleChinaMarc record handling routine to change Chinese two-character language codes into their English equivalent search terms |
ORG.oclc.pears. MarcLanguage |
Extends Words to convert the Marc three-letter language codes into English equivalent search terms
The following is a parameter that you can use with the MarcFormat routine to more specifically define how it functions:
Parameter |
Description |
DebugMarcLanguage=<true/false> |
Turns on internal debugging |
Example: |
To extract language from the MARC 008 field . . .
[language] index=1 routine=ORG.oclc.pears.IndexRoutines. \ MarcLanguage tagpath=8 staroffset=35
|
|
Miscellaneous Routines |
ORG.oclc.pears. IndexRoutines |
Abstract class that contains base methods for extracting index terms
Note: The Phrase routine implements IndexRoutines and all other Pears indexing routines extend Phrase.
|