/research/projects/schematrans/seel/tutorial/section1.htm

originally: http://www.oclc.org/research/projects/schematrans/seel/tutorial/section1.htm

Section 1: Seel in a Nutshell

Seel and crosswalks

Seel was inspired by the need to make the information in a crosswalk machine-processable. It can be interpreted as a computational model for a crosswalk.

The basic unit in a Seel script is a map, which pairs two elements to be translated: Dublin Core Title maps to MARC 245, ONIX Contributor maps to MARC 100, or whatever else is required by a metadata conversion task.

If we think of a crosswalk as represented by a table like this, a Seel map roughly corresponds to the information contained in a single row. Just as a crosswalk table usually has more than one row, a Seel translation usually consists of a linear series of maps.

There are more parallels between Seel maps and crosswalks. For example, Seel is a descriptive language, whose purpose is to encode true assertions, just as a row in a crosswalk might assert that the equivalence between MARC 245 $a and Dublin Core "title" is true or valid. And just as metadata experts typically develop a crosswalk table one row at a time, so we develop a Seel script one map at a time, until we have a translation that achieves the desired coverage. Furthermore, just as each row in a table is independent and self-contained, so, too, is a Seel map.

This design has certain advantages, which we discuss in Section 5. But for the newcomer to Seel, it means that the process of deciphering and writing Seel never gets too intimidating because it's rarely necessary to puzzle over more than about a screenful of code at a time.

Morfrom

Seel operates on records expressed in the Morfrom structure. The name "Morfrom" is a palindrome, a made-up word intended to suggest a reversible operation. The input to Seel is a Morfrom record in the source semantics; the output is the corresponding Morfrom record in the target semantics. If a Seel script is carefully written, it is possible to produce a "round-trip" translation that reconstructs the original record, a point we will discuss in more detail in Sections 4 and 5.

Here is a simple MARC record coded in Morfrom:

<record>
  <header>
    <schema namespace="http://www.loc.gov/MARC21/slim" name="marc"/>
  </header>
  <field name="100">
    <field name="a"><value>Franz Kafka</value></field>
  </field>
  <field name="245">
    <field name="i1"><value>0</value></field>
    <field name="i2"><value>0</value></field>
    <field name="a"><value>The Trial</value></field>
  </field>
</record>

As this example shows, a record is enclosed in <record> tags and consists of at least two elements: a <header>, which specifies a namespace; and one or more <field> elements, whose "name" attribute identifies element names in the native format. In this example, one field is nested in another, but fields can be nested to whatever depth is required to preserve the hierarchical structure of the native record. The data at the leaf node, the unique information in the record, is enclosed in <value> tags. Since it's a mouthful to talk about leaf nodes, leaf-node data, Morfrom <value> tags, or <value> elements nested inside <field> elements, we've adopted the practice of saying Morfrom values, or just values, when referring to this data.

Morfrom is similar to MARC-XML but is more flexible. Unlike MARC-XML, the granularity of the data is determined by the needs of your application, not the specification of a widely used standard. For example, in the record shown above, the indicators <i1> and <i2> have the same logical status as variable field names: they are children of the field named "245," just as "a" is. But if the application demanded it, the indicators could be collapsed into a single field or omitted altogether. And unlike MARC-XML, Morfrom is a template for representing records in many formats, not just MARC.

The translation process is greatly simplified when it operates on a standardized syntax like Morfrom, as we will show in this tutorial, especially in Section 5. But to be useful, the result must eventually be converted to the native syntax of a standard. For example, the Morfrom record shown above often gets converted to a more familiar-looking MARC record like this:

100    $a Franz Kafka
245 00 $a The Trial

The conversion is accomplished by two processes: a Morfrom reader, which converts a native source record to Morfrom; and a Morfrom writer, which converts the translated Morfrom record to the native output. Morfrom readers and writers are outside the scope of the Seel tutorial, but we have working samples for the metadata formats we have studied and a set of utilities for streamlining their creation. Click here to access the Morfrom document type definition.

So, the bottom line is this: once your data is in the Morfrom syntax, Seel can take it from there.

A Seel translation

We can illustrate the Seel idiom by walking through a simple translation that consists of a single map. The purpose of this section is to give you an idea of what a small chunk of real Seel code looks like. Subtleties will be dealt with in later sections.

Since Seel is expressed in an XML syntax, the standard XML prologue is required. The top-level element is <translation>, and the first child under that should be the <header> element. It should have two children, <sourceschema> and <targetschema>. Those two elements refer to the two metadata schemas being mapped. Each should have a namespace attribute with the URI of the relevant schema and a name attribute with a name for the schema. More precisely, the namespace named in <sourceschema> should be identical to the namespace listed in the <header><schema> element of the Morfrom record. And the Morfrom record produced by the translation will contain a <header><schema> element with the namespace listed in the Seel <targetschema> element.

Here is a sample header for a translation from Dublin Core to MARC:

<header>
  <sourceschema namespace="http://purl.org/dc/elements/1.1/" name="Dublin Core"/>
  <targetschema namespace="http://www.loc.gov/marc/bibliographic/ecbdhome.html" name="MARC"/>
</header>

To complete the translation, one or more <map> elements must follow the header block.

In the map we'll develop in this section, the Dublin Core "Title" element is translated to the corresponding MARC element, 245 $a. Here is a first draft of the Seel code:

Map 1.1

<translation>
  <header>
    <sourceschema namespace="http://purl.org/dc/elements/1.1/" name="Dublin Core"/>
    <targetschema namespace="http://www.loc.gov/marc/bibliographic/ecbdhome.html" name="MARC"/>
  </header>
  <map id="map:01">
    <source>
      <mainpath>
        <step name="title"/>
      </mainpath>
    </source>
    <target>
      <mainpath>
        <step name="245"/>
        <step name="a"/>
      </mainpath>
    </target>
  </map>
</translation>

In English, the Seel map says something like this: the source element title is mapped to the target element 245 a. Technically speaking, each <step name="xx"/> statement corresponds to a <field name="xx"> element in a data record coded in the Morfrom syntax. The <source> elements pick out a path in an existing record, while the <target> elements construct the corresponding path in a new record. The data at the leaf node of the source path, or the source value, is automatically transferred to the leaf node of the target, where it is assigned as the target value. No explicit Seel code is required to make the transfer happen. Stripped of bells and whistles, this is the essential Seel operation. We say that a map between elements in two standards is turned into executable code by aligning equivalent paths in the source and the target. Once this is done, the value at the bottom of the source path is copied to the target.

For the sake of completeness, we also need to point out the "id" attribute on the <map> element. This is a place to record a unique index for internal processing, which will be used to support future enhancements.

To see the effects of Map 1.1, consider a Morfrom record with the usual header information but just one field:

<record>
  <header>
    <schema namespace="http://purl.org/dc/elements/1.1/" name="Dublin Core"/>
  </header>
  <field name="title"><value>After the fall</value></field>
</record>

Map 1.1 produces a translation that looks like this:

<record>
  <header>
    <schema namespace="http://www.loc.gov/marc/bibliographic/ecbdhome.html" name="MARC"/>
  </header>
  <field name="245">
    <field name="a"><value>After the fall</value></field>
  </field>
</record>

If MARC were this simple, our Seel map would be complete. But a decent MARC record must also contain indicators, whose value depends on how the 245 $a subfield is established. If the source is a title, two indicators must be created and set to "0." In Seel, indicators can be created with a <context> element nested under the <target> element. Two <equals> elements construct the data, one for each indicator. The fragment in Map 1.2 accomplishes the desired effect:

Map 1.2

<context>
  <equals>
    <path place="before"><step name="i1"/></path>
    <value>0</value>
  </equals>
  <equals>
    <path place="before"><step name="i2"/></path>
    <value>0</value>
  </equals>
</context>

This code says that when a "245 a" path is created in the target, additional context is required for a complete translation. In this case, two new paths are created with the specified values and placed in the target before the "245 a" path.

Map 1.3 represents the complete map.

Map 1.3

<map id="map:01">
  <source>
    <mainpath>
      <step name="title"/>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="245"/>
      <step name="a"/>
    </mainpath>
    <context>
      <equals>
        <path place="before"><step name="i1"/></path>
        <value>0</value>
      </equals>
      <equals>
        <path place="before"><step name="i2"/></path>
        <value>0</value>
      </equals>
    </context>
  </target>
</map>

Given the same source file we used as input to the previous map, the new map produces this translation:

<field name="245">
  <field name="i1"><value>0</value></field>
  <field name="i2"><value>0</value></field>
  <field name="a"><value>After the fall</value></field>
</field>

A Morfrom writer would use this file to generate the line 245 00 $a After the fall.

The result is closer to a good field in a MARC record, but the map needs one more change before it's done. As it stands, Map 1.3 will translate every "title" element in the source record to a new 245 $a. The correct result is produced when the input record contains only one title, but many records contain subtitles, alternative titles, series titles, and so on. But since the 245 field is non-repeatable, we want to make sure that only one gets created. We can do this by saying that only the first title element should be mapped to 245. This is expressed in Seel by adding a "position" attribute to the <step> element, as shown in Map 1.4.

Map 1.4

<map id="map:01">
  <source>
    <mainpath>
      <step name="title" position="1"/>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="245"/>
      <step name="a"/>
    </mainpath>
    <context>
      <equals>
        <path place="before"><step name="i1"/></path>
        <value>0</value>
      </equals>
      <equals>
        <path place="before"><step name="i2"/></path>
        <value>0</value>
      </equals>
    </context>
  </target>
</map>

This map will select only the first "title" element in a source record. Any other title elements would be ignored unless the translation contains a second map that is identical to this one, except that the <path> statement would specify position=">1" instead of position="1".

Summary

We have now developed a minimal but functioning Seel translation. But we have covered lots of ground to get to this point. We have described the major technical details of a Seel script:

Stepping back from the details, we have also demonstrated something about the internal logic and motivation for Seel. Our example gets us into some fairly arcane features of the MARC standard. For example, a MARC record consists of fields, subfields, and indicators arranged in a particular order. And one of those fields, the 245, commonly represents a title in other metadata standards. If a source record has more than one title, the first instance has to be handled differently from the others when it is translated to MARC. These idiosyncrasies are managed in a small set of operations that work on Morfrom, a standardized record with a simple structure that makes heavy use of attribute values to represent arbitrary metadata. The Morfrom record, coupled with a Seel translation script, defines a problem space for metadata translation that is elegant and powerful, yet simple.

We now turn to a more detailed discussion of the <context> element.