Wiki Archive 🙊
MDL WideAreaSearch

Investigating Wide Area Search

NOTE: This is a proposal that builds on the Social Side of Reflections project. It expands on the notion of providing SSR data for indexing within Reflections. Comments welcome (see below).

The MDL wants to explore "universal access" during the current grant period. In MDL-speak "universal access" means a way to search a wide range of digital collections including MDL and other historical society and campus digital object collections. Since "universal access" also has connotations of "access for the disabled" and "usability" I don't want to use the term. I will call this endeavor "wide area search" instead.

The first demonstration of wide area search would be building a search that covers all MDL systems. Since MDL now includes Reflections and the SSR, what would a search that incorporates both sets of data look like. This would be a unified MDL search.

The next demonstration of wide area search would incorporate non-MDL collections at affiliated organizations. If a historical society or school has a digital object collection, could we include their data in a single search that touches MDL collections and their own? This would be a true wide area search.

Both of these objectives could be reached through the careful application of a tool such as Google's Customized Search Engine (Google CSE). However, the Google CSE has two drawbacks with regard to a wide area search of collections like these. The Google system is pretty much designed to direct the user to the source of the indexed text that led to the hit. Google systems also display data only from the record indexed.

If we want a more sophisticated result, one that allows multiple sources of data to direct users to a single record and one that displays some special data for each record, then we need to dig more deeply into the problem. Allowing multiple sources to point to one record would allow (for example) both the original object and the social-side commentary about that object to point back to the object's record on Reflections. Allowing the display of particular data would allow, for example, the thumbnail of an image to be displayed with all records that are attached to the same original item.

Minimum Deliverables

A Google CSE implementation that demonstrates the concept of wide area search and allows at least a broad search across MDL systems and those of at least two non-MDL partners.

A mock-up illustration of a more sophisticated search which could be used to describe the concept and inspire solutions.

A report of the options for the more sophisticated wide area search: systems available, their strengths and weaknesses, architectural implications, and their suitability for MDL.

Additional Possible Deliverable

If a suitable candidate system is found and the MDL management team agrees to an implementation, then the research report described in the minimum deliverables could be forgone in favor of developing a working prototype of a more sophisticated wide area search.

Timeline

Work on this project would get under way on April 16. The initial period would require research of the Google CSE and discussions with non-MDL partners about the accessibility of their systems to web crawlers. To work with the Google CSE sites will have to provide a way to navigate to every item in their collection and provide complete metadata on some indexible page. A Google CSE implementation would be available in May.

Once the Google CSE is in place, the effort would move to exploring more sophisticated searching options. This would include some discussion of options with others at the DLF meeting in late April and with MHS about the IDOL system they are bringing online. This would be a "discovery" process, seeking solutions but not guaranteed to find them. A research report and, if feasible, a prototype "sophisticated" system would be delivered by mid-June 2008.

Costs

This project will cost no more than $16,000, which may include a working prototype of a wide-area search mechanism. If no working prototype of a sophisticated wide-area search is developed and delivered the cost will be less than $16,000. The first $6,000 will be billed upon delivery of the Google CSE system. A further charge of not more than $10,000 will be billed upon delivery of the research report on searching options and, if feasible, delivery of a prototype wide-area search tool.

A final invoice would be due by 20 June 2008.

Eric Celeste

Eric brings over 15 years of library and 25 years of technology experience to his consulting. At MIT Eric shepherded the creation of DSpace, open source digital repository management software developed with HP and now deployed at hundreds of institutions worldwide. At the University of Minnesota Libraries he encouraged the development of the UThink blog service, a wiki-based staff intranet, LibData, and the University Digital Conservancy. He works with non-profit institutions on appropriate uses of technology for informing, communicating, and collaborating with their constituencies.

MDL WideAreaSearch