/research/projects/schematrans/seel/tutorial/section3.htm

originally: http://www.oclc.org/research/projects/schematrans/seel/tutorial/section3.htm

Section 3: Paths

This section describes the implementation of paths in Seel maps. In our earlier discussion, we showed some fairly straightforward translations from a subtree in a source record with one or two branches to a subtree in a target record with the same structure. But here we generalize a bit and show that Seel is flexible enough to refer to any part of a record, as long as it's coded in the Morfrom syntax.

The translation path

Before getting into examples, we need to introduce some more precise language, focusing on the Seel "path" statements. To make the discussion concrete, we'll refer to Map 3.1, an abbreviated version of Map 2.2 in Section 2. This map describes elements <b037> PersonNameInverted and <b043> TitlesAfterNames as children of <contributor> in a source ONIX record. They are mapped to elements <a> and <c>, respectively, which are children of 100 Personal Name in the target MARC record.

Three Seel elements are required to pick out these nodes: <mainpath>, <path>, and <step>. Loosely speaking, <mainpath> identifies the highest node, or the translation root node, of the subtree that participates in the map, either as a source or a target. The <path> element is an arbitrary subtree of the mainpath. And each <step> element, which is optional under <mainpath> and required under <path>, refers to an actual field in the Morfrom record. The last <step> in the translation path points to the reference field in a Morfrom record, where the data to be translated resides.

More technically, <mainpath> establishes an anchor relative to the record root (the <record> element in Morform syntax). The nth <step> in the <mainpath> names the Morfrom <field> element that is n nodes removed from the root. For example, in the <source> block of Map 3.1, <contributor> is named in the first <step> under <mainpath>. Accordingly, <contributor> is a direct descendant of <record>. The <contributor> element is also defined by these statements as the translation root node, from which the associated <path> elements are children.

The <path> element has the same internal structure as <mainpath>. In Map 3.1, each <path> contains only one <step>, so we can infer: 1) that the elements named there are direct descendants of the translation root node, and 2) that these elements are siblings. Thus the <source> block of Map 3.1 describes an ONIX record with a <field name="contributor"> element and two children, <field name="b037"> PersonNameInverted and <field name="b043"> TitleAfterNames, as our sample record shows.

The same logic creates the target subtree with MARC <100> as the root and the siblings <a> and <c> as reference fields. More literally, the pid "path id" attributes ensure that <field name="contributor"><field name="b037"> is translated to <field name="100"><field name="a">, while <field name="b043"> is translated to <field name="c">. In short, the effect of Map 3.1 is to align two subtrees with exactly the same structure and transfer the feference field in the source translation path to the corresponding location in the target.

Map 3.1

<map id="map:00">
  <source>
    <mainpath>
      <step name="contributor"/>
      <path pid="1"><step name="b037"/></path>
      <path pid="2"><step name="b043"/></path>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="100"/>
      <path pid="1"><step name="a"/></path>
      <path pid="2"><step name="c"/></path>
    </mainpath>
  </target>
</map>

Now, let's generalize the discussion. In Map 3.2, the source and target subtrees differ in complexity because the reference field at <source><mainpath> is two steps away from the <record> root, while the reference field at <target><mainpath> is just one. Map 3.2 is a model for creating a flatter record from a more structured record, as might be required when a rich format such as MARC-XML must be translated to a less detailed format such as Simple Dublin Core.

Map 3.2

<map id="map:00">
  <source>
    <mainpath>
      <step name="100"/>
      <step name="a"/>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="creator"/>
    </mainpath>
  </target>
</map>

Click here to see a complete MARC-Dublin Core Seel translation and its effect on a sample record.

Map 3.3 shows how to describe a hierarchical structure in the target record that doesn't correspond to a contiguous subtree in a source record. This map depicts part of a hypothetical but plausible translation from MARC to LOM, a standard widely used for describing learning objects. Most of a LOM record is devoted to the annotation of details about the recommended technical and educational context for the learning object being described. But it also has an introductory section, headed by the <general> tag, which lists elements similar to those found in a bibliographic description.

The <target> block of Map 3.3 constructs a LOM <general> node from a list of MARC elements. The single <step> element in <mainpath> defines a direct descendant of the <record> root and names it "general". Each of the six paths contains <step> elements, where the LOM elements "identifier", "title", "description", and "keyword" are named as reference fields. In <path pid="1">, two steps define the path that would eventually be rendered by a Morfrom writer as <general><identifier><entity>. In the other paths, the reference field is only one node beyond the translation root.

This much of Map 3.3 should look roughly familiar from the discussion of Maps 3.1 and 3.2. But the <source> block has a couple of new wrinkles. First, notice that the <mainpath> element doesn't start with a <step>. It's just a list of <path> elements. It has this unexpected structure because the elements that must be mapped to the LOM <general> node come from all over the MARC record. In other words, the translation root node is <record>, not a descendant of <record>, as was the case in the other two maps discussed in this section.

Another new wrinkle appears in <path pid="6">, where the <step> attribute is "name_re" instead of the expected "name". The "name_re" attribute identifies a regular-expression string instead of the literal strings found in the other <steps>. The path at pid="6" specifies that all MARC fields starting with 6 (and containing "a") map to LOM <general><keyword>. This is Seel's way of saying that MARC 6xx fields contain some sort of subject term.

Map 3.3

<map id="map:08">
  <source>
    <mainpath>
      <path pid="1"><step name="856"/><step name="u"/></path>
      <path pid="2"><step name="245"/><step name="a"/></path>
      <path pid="3"><step name="245"/><step name="b"/></path>
      <path pid="4"><step name="246"/><step name="a"/></path>
      <path pid="5"><step name="520"/><step name="a"/></path>
      <path pid="6"><step name_re="6.."/><step name="a"/></path>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="general"/>
      <path pid="1"><step name="identifier"/><step name="entity"/></path>
      <path pid="2"><step name="title"/></path>
      <path pid="3"><step name="title"/></path>
      <path pid="4"><step name="title"/></path>
      <path pid="5"><step name="description"/></path>
      <path pid="6"><step name="keyword"/></path>
    </mainpath>
  </target>
</map>

Relative paths

The examples we have discussed in this section show how to construct aligned subtrees in a Seel map. First, <mainpath> elements define translation roots for the source and target. Paths to the data affected by the translation must then be defined from this starting point, using <path> elements containing one or more <step>s. Once the source and target paths are aligned, the reference field in the source is transferred to the target.

This is the default use of paths in Seel, and it's pretty straightforward as long as the map has no complex conditions. But we've already seen maps in Sections 1 and 2 that describe conditional translations. And they involve <path>s embedded in <context> elements, which we glossed over a bit. Now we have the framework to discuss them in a more sophisticated way and connect them to the current discussion.

Before wading through the details, we need to emphasize a major point. A path originating from a <mainpath> element is an absolute path that gives a complete list of instructions for locating a piece of data in a Morfrom record. A relative path can also be defined, which veers off an absolute path and is used for expressing a condition of some sort. So, by definition, absolute paths always originate from <mainpath> elements, while relative paths always originate from <context> elements. Absolute paths are required because a map always involves an alignment between subtrees in the source and target. But relative paths are optional because many translations are valid without any special circumstances.

Map 3.4 shows a conditional translation with a relative path. It is conceptually similar to Map 2.1, or for that matter, the very first complete map we discussed, Map 1.3. In the ONIX-to-MARC crosswalk, the <othertext><d104> Text element contains descriptions of ancillary objects, such as summaries, reviews, tables of contents, companion Web sites, and so on. These structures are fundamental to ONIX records, so Seel has to be able to handle them correctly. The <othertext><d104> elements are usually (but not always) mapped to a MARC 520 Summary field. But the interpretation of the <d104> element (and therefore, decisions about how to map it) hinges on the value of <d102> TextTypeCode, which is a sibling of <d104>, as our sample record shows.

Map 3.4 shows the Seel code for a portion of the map from <othertext><d104> to MARC 520. In this simplified example, the value of the first MARC indicator is set to 2 if the value of <d102>TextTypeCode is 32, an ONIX code that labels the "OtherText" as a summary.

The <mainpath> code in the <source> and <target> blocks should look pretty familiar. In the source, the first <step> element identifies <othertext> as the translation root node, which has three children: <d104>Text, <d107>Author, and <d108> TextSourceTitle. These paths align in the target with a MARC 520 field containing subfields <a>, <r>, and <t>, respectively. To express the conditional relationships in this map, we need a <context> element on the path with id =1. Though we discussed contexts in Section 2 of this tutorial, we can now talk more precisely about how this code operates by describing the interaction of relative and absolute paths.

Let's work our way carefully through this path. Starting from familiar-looking code in the <target>, we see that the <context> describes the circumstance in which the following record fragment is created:

<field name="520"><field name="il"><value>2</value></field><field name="a">...

As in the previous examples in this section, the code in the <target> block defines an absolute path that starts with the translation root node--here, "520"--and ends with the reference field, "a." But this map also has a <context><equals> block that describes a relative path with the <step name="i1"> element. This is a new path that will be created in the translation. It will be positioned in the record with respect to the reference field defined in the absolute path according to the directive named in the "place" attribute on the preceding <path> element. Since the reference field is named "a" and the value of "place" is "before," the "i1" field appears before the "a" field in translated record, the correct result. And since the relative path is newly constructed, a value must be assigned to it. This is the purpose of the <value> element right below the <path> element.

The same logic applies to the source, except that the relative path is used not to construct, but to query. The path we just described will get created if a source record contains this fragment:

<field name="othertext"><field name="d102"><value>32</value></field>...

The translation root node for pid=1 is <field name="othertext"> and the reference field is <field name="d104">. But as we said above, <field name="d102">, the node containing the data to be checked, is not in the current path. It's a sibling. The desired location is specified using the "from" attribute on the <path> element in the context block, which defines a location in the source record relative to the reference field. Here it points to a position one step up and over, using a notation familiar to users of Unix directory manipulation tools. The same notation can be used to identify any arbitrary position in a record, given the reference field as an anchor.

Map 3.4

<map id="map:00">
  <source>
    <mainpath>
      <step name="othertext"/>
      <path pid="1"><step name="d104"/></path>
      <path pid="2"><step name="d107"/></path>
      <path pid="3"><step name="d018"/></path>
    </mainpath>
    <context pid="1">
      <equals>
        <path from=".."><step name="d102"/></path>
        <value>32</value>
      </equals>
    </context>
  </source>
  <target>
    <mainpath>
      <step name="520"/>
      <path pid="1"><step name="a"/></path>
      <path pid="2"><step name="r"/></path>
      <path pid="3"><step name="t"/></path>
    </mainpath>
    <context pid="1">
      <equals>
        <path place="before"><step name="i1"/></path>
        <value>2</value>
      </equals>
    </context>
  </target>
</map>

Morform values

We've covered a lot of ground in this section, but we need to discuss one more example before finishing our discussion of paths. The four maps described above illustrate different ways to refer to a subtree in the source and target of a translation. But a Seel map actually does two jobs: it builds a tree structure and it populates that structure with data enclosed in <value> tags in a Morfrom record. Seel is designed so that once there is an alignment between the source and target translation paths, the data at the specified path in the source is transferred to the corresponding path in the target without explicitly having to say so.

But in some circumstances, it is necessary to make an explicit reference to a Morfrom value. For example, consider Map 3.5, which maps ONIX <a196> RecordSourceIdentifier to MARC 040 Cataloging Source. Most of the map should look familiar, but in this particular crosswalk, the data in the source translation path appears twice. (As metadata experts, we have our suspicions about about this particular map, but it's a nice pedagogical example.) As in earlier maps, the <context> element in the <target> produces a path named "a" with indicators preceding it. But the last four of lines of the context also create a sibling named "c," place it after "a," and copy the data from the source translation path into it using the <transfer> element. Since there is only one translation path in the <source> block, the reference to the appropriate subtree in the source record is unambiguous.

Map 3.5

<map id="map:01">
  <source>
    <mainpath>
      <step name="a196"/>
    </mainpath>
  </source>
  <target>
    <mainpath>
      <step name="040"/><step name="a"/>
    </mainpath>
    <context>
      <equals>
        <path place="before"><step name="i1"/></path>
        <value>#</value>
      </equals>
      <equals>
        <path place="before"><step name="i2"/></path>
        <value>#</value>
      </equals>
      <equals>
        <transfer/>
        <path place="after"><step name="c"/></path>
      </equals>
    </context>
  </target>
</map>

Summary

In this section, we described the Seel syntax for referring to paths in the source and target of a translation. With absolute and relative references, it is possible to refer to data at any location in a Morfrom record by starting with a translation root defined in <mainpath> and taking a series of steps down the path toward the unique data.

To readers who are familiar with XSLT, the Seel path syntax might be reminiscent of XPath, the World Wide Web Consortium standard for identifying arbitrary paths in XML documents. The Seel path language is different in two respects. First, it works only on Morfrom documents. But, unlike XPath, the Seel path language can construct a path as well as refer to one, as we have shown in the Seel maps in this section. As a result, paths in the source and the target of a Seel translation can be expressed in the same way, which ensures that the critical information in a Seel map is both explicitly labeled and symmetric. The corresponding XSLT encoding lacks these important properties. We discuss the implications of Seel's design in Section 5.