GRN Functional Requirements Draft

This is a draft intended for further revision by MHS staff.

These functional requirements and questions are derived from the use cases developed with staff of MHS. We are still learning what is possible, so this list should not be taken as terribly firm, it is rather our attempt to lay out what we currently understand as our requirements and seek input from vendors who have their own experience to draw from. Certain items that we know may be difficult and may not be critical to success in this application are marked “(desirable)”. Other items are more questions seeking a description from vendors as to how the system they propose to meet a given challenge; these are marked “(describe)”.

This map of these functional requirements may also be helpful in keeping track of the whole while embedded in the parts.

These requirements are also available as plain text and as an OPML document for outlining software.

1. system must be able to ingest data

  • 1.1. from a variety of sources, including

    • 1.1.1. data from partner sites not under MHS control

    • 1.1.2. data from databases

      • 1.1.2.1. that adhere to well defined standards, including

        • 1.1.2.1.1. OAI (desirable)

        • 1.1.2.1.2. EAD (desirable)

        • 1.1.2.1.3. EAC (desirable)

        • 1.1.2.1.4. MARC (desirable)

        • 1.1.2.1.5. DC (desirable)

      • 1.1.2.2. and also some that don’t adhere to common standards, such as our own “nominal indexes” that hold records about people’s names, births, and deaths

    • 1.1.3. data from websites

      • 1.1.3.1. the ability to parse certain pages from the web, like those containing directory information (desirable)
    • 1.1.4. data from documents

      • 1.1.4.1. spreadsheets (desirable)

      • 1.1.4.2. PDF

      • 1.1.4.3. text documents

      • 1.1.4.4. photos (desirable)

      • 1.1.4.5. multi-media (desirable)

      • 1.1.4.6. XML (desirable)

    • 1.1.5. support data from feeds (desirable)

  • 1.2. from individuals leaving commentary about items in the system (desirable)

    • 1.2.1. which would be persistently linked to those items even as the system is updated and reindexed (desirable)

2. users must be able to query the system

  • 2.1. using full-text searching techniques familiar from experience with Google (describe)

  • 2.2. using fielded techniques critical to searching our nominal indexes (such as birth and death records) (describe)

    • 2.2.1. results from fielded searches should be sortable by these fields as well
  • 2.3. with the ability to further refine searches by searching again within the result set (desirable)

  • 2.4. results should include navigable facets for refining searches (desirable)

  • 2.5. results must be returned quickly

    • 2.5.1. no page refresh should take longer than 2 seconds (desirable)
  • 2.6. system should be able to incorporate data from federated searches (searches executed on other remote engines)

    • 2.6.1. should support the federated z39.50 standard (desirable)
  • 2.7. system should be able to save search queries for the user

3. system must present results comprehensible to the user

  • 3.1. any single query will return results from a variety of data sources that tell alternative stories to the user, describe how the system would put clear enough boundaries around each of these stories so that the user could choose where to focus for best success (describe)

    • 3.1.1. for example, the system how would the system sense the context in which the users is searching, and present results appropriate to that contexts, including (describe)

      • 3.1.1.1. people finder searches

      • 3.1.1.2. history finder searches (desirable)

      • 3.1.1.3. place finder searches (desirable)

      • 3.1.1.4. site searches

  • 3.2. when appropriate, the system should provide direct viewing of objects in the result set (desirable)

    • 3.2.1. for example, images should be presented as thumbnails rather than just lists of titles (desirable)
  • 3.3. partners should be able to view results in a view that is “skinned” with their organizational identity

4. the system should be able to provide web services

  • 4.1. site maps should be available for crawlers (desirable)

  • 4.2. a search API for web developers (desirable)

    • 4.2.1. provide “raw” results (minus HTML niceties) (desirable)

      • 4.2.1.1. XML results (desirable)

      • 4.2.1.2. JSON results (desirable)

  • 4.3. the system must be easily integrated into the web ecosystem

    • 4.3.1. for example, queries should be expressed in the URL so they can be easily turned into deep links on other sites

5. describe the tools the system would provide (describe)

  • 5.1. for managing workflow tasks

    • 5.1.1. updating harvest schedules

    • 5.1.2. testing changes in a test system

    • 5.1.3. building and modifying filters for different data types

    • 5.1.4. building and modifying crosswalks for fielded data

  • 5.2. the system should allow partners to initiate certain changes (describe)

    • 5.2.1. how would partners change the schedule

    • 5.2.2. how would partners modify the skin used to present results in 3.3

    • 5.2.3. how would partners suggest modifications in the mapping of their data

6. system management information must be available to administrators and partners

  • 6.1. site and query statistics

  • 6.2. harvest schedule (desirable)

7. the system must provide documentation and assistance for administrators and partners

  • 7.1. where would guidelines for data formats be found (describe)

    • 7.1.1. non-standard data would need particularly clear definitions, where would those be housed (describe)
  • 7.2. where would API documentation be found (describe)

  • 7.3. where would we place documentation for partners (describe)

8. the system must be sustainable

  • 8.1. describe the costs of maintaining the system (describe)

  • 8.2. describe the maintenance tasks involved in running the system (describe)

    • 8.2.1. how often are upgrades made available, how hard are they to install (describe)

    • 8.2.2. how easily can the system be enhanced (describe)

      • 8.2.2.1. content

        • 8.2.2.1.1. workflow for content updates (described)

        • 8.2.2.1.2. how much downtime is involved in re-indexing portions of the collection (describe)

        • 8.2.2.1.3. how long would a complete re-indexing take and what circumstances might require this step (describe)

      • 8.2.2.2. what would be required to add a new resource into the system (describe)

      • 8.2.2.3. what would be required to accommodate evolving standards (describe)

  • 8.3. what support is available for the system (describe)

    • 8.3.1. professional support (describe)

    • 8.3.2. user community forums and other gathering spots (describe)

  • 8.4. how open is the architecture of the system (describe)

    • 8.4.1. under what licenses are components of the system distributed (describe)

    • 8.4.2. what restrictions would inhibit our sharing the system with partners (describe)