Planning an OSS-ILS at Duke

Perhaps adding some perspective to the issue of library-related resources and finding aids vs. new Internet resources and tools is the latest PEW study

which shows that Internet-savvy Generation Y users (age 18-30) are the largest group of library users-a finding that challenges the assumption that libraries are losing relevance in the Internet age.

As for thinking about the role of libraries (and archives), the following article about Canadian archives and the concern about the “lost decades” of electronic information shows that we need to keep in mind that finding aids are only useful in so far that the resources they point to are collected, processed, stored, organized, and made accessible-the age-old task of libraries and archives that one might think would be alleviated with advanced technology, but perhaps with every advance there are new problems. fa8-9acf-61afd87adfbd&k=28075

Thomas Brenndorfer, B.A, M.L.I.S.

Am I nuts? I’m looking for outcomes of using NG library sites vs “old” library sites.

I say “NG library sites” to mean the “one-stop shop” the library puts up on the Internet–with info about the lib’s services, the “stuff” it holds, guides to “using it”, and ways to get to information pros/librarians, etc. based on ideas or discussions which have been touched on here.

I guess I’d like some help in finding literature to:

1) Find results for a survey taken by “gurus” in “the field” about what kind of stuff would be in this one-stop shop

2) Find results of some comparative tests with a NG protoype library site vs. the “old” library site/opac/databases. (To find out if it really was better and how it was better). And, perhaps, at the root of the matter, did users use this NG library site more than the alternatives (e.g., Google).

I see people implementing Scriblio, Endeca, etc. but what are the measurable outcomes?

I guess I would like things like:

  • Is the time spent in finding a library book in the OPAC reduced when:

    -… there is faceted search vs. no facets? -… there is user tagging vs. no user tagging? -… there is “other users who checked out this book also checked out

____” vs. when there is none?

  • Is the “give-up” ratio the same when either or all of these features in

place? Is the repeat usage (or satisfaction?) greater when either or all of these features in place?

  • Is there more user loyalty, or satisfaction (or “using Google less”) when

users are exposed to these “NG” tools for.. say, 60 days?

Any pointers would be welcome =)

We’ve been linking from our DVD list to the IMDB for a few years now. It’s not in the catalog record, I just stick the title after;mx=20;q=


It usually works, although it often drops you in a “disambiguation page”. Mind you, we have separated out our DVDs into their own list, since Voyager is a bit crappy for browsing DVDs:


On Nov 8, 2007 2:24 PM, Jeremy Dunck <jdunck> wrote: > On Nov 8, 2007 12:43 PM, Andrew Gray <shimgray> wrote: > > On 08/11/2007, Kevin Kierans <kevink> wrote: > ... > > Seems a pretty good idea for films, especially if you don't index > > things like rating or genre. (We're about to do a large DVD > > acquisition spree - I might well recommend we catalogue in IMDB links, > > now you mention it!) > > A published cross-reference of identifier systems would be useful, and > a fairly cheap by-product of such indexing. > > > IMDB is a well-established site, and is certainly the best out there > > for its subject. It's fairly comprehensive; I don't think you need to > > worry about the content you'll be linking out to. > > > > If interested, all of IMDB's freely-available data is available for > download and direct use. (There is a for-pay service of IMDB as well; > that data is not freely downloadable.) > <> > > And here are licensing terms for that data: > <> > > Also, here is a linking guide: > <> > > I suspect that much indexing effort could be saved by automatically > making a list of candidates for each holding and having a worker > review those choices.

The discussion so far has more than once meandered a bit and run into less than fruitful controversies. Let me try to outline where I think it should be going, i.e., what questions the debate should try to answer. These questions can probably be discussed presently without too much speculation. This list is not meant to be exhaustive or to reflect priorities.

  1. Where are current LIS OPACs actually deficient? Can a

    checklist be drawn up against which to match the specifics of actual systems? (And post results to the vendors)

  2. Are there ways to improve the situation significantly without

    a revolution in rules and/or formats? Keeping in mind that we can do precious little to improve the quality and content of legacy data. Or can we? With what money?

2a. Is the current situation so bad we should consider a breach in

consistency of metadata in beginning something very new?
  1. Will global collaboration and standarization help a lot?

    Keeping in mind that it is the actual documents that readers want, not the metadata, and that transborder ILL is a costly and time-consuming option. It is the local collections, the stuff that readers can at once lay there hands on, that matter the most to them. Or is it not? Projects like VIAF, however, aim at improving search options, not at amassing more title data in ever larger files!

  2. Can new partnerships be forged to open up new opportunities?

    I mention here the Google-WorldCat alliance which might be extended into more functions on the local level, to enhance search options for local collections. Keeping in mind libraries have no options to do significant amounts of local digitization plus OCR plus full-text indexing themselves. GoogleBooks, on the other hand, definitely could profit from inclusion of more library catalog metadata. OPACs and G.B. could make great complements, neither of them profits from going their way alone.

  3. Can ToC harvesting and indexing be done collaboratively, sharing

    the results, internationally, on a large scale - to provide local catalogs with extra fodder for indexing without extra manual input? Considering that ToC data are probably the next best thing to full-text, and supposedly full of relevant terms for searching. Arguably, even better than the full text? (In fact, the occurrence of a term in the ToC is likely to enhance ranking in GoogleBooks - or if not, it should.)

  4. What new communication functions should catalogs be able

    to support? Surely they ought to speak XML, but keep in mind that XML is no replacement for MARC, only for ISO2709. Is there an XML Schema that is likely to advance to standard and thus worthwhile to invest in (to make the LIS “speak” and “understand” it)?

  5. Can AI products help improve legacy data and quality of searching?

  6. Are there AI products that can provide new input for catalogs to

    augment or replace human input (considering results of 1.)? Can this input augment or improve or revolutionize authority control in the near term? Classification? LCSH? Or something new altogether?

  7. Will catalogs of the future need index browsing as an extra

    option? If yes, just for authority data (names, subjects) or also for descriptive data (title strings, keywords, series titles)?

  8. Will RDA be a step into the right direction? Will it be more

    than that or less? Or is FRBR rather an academic concept with on the whole not too much impact on real-world search situations?

Bernhard Eversberg

From: Tim Spalding <tim LIBRARYTHING.COM>

I have no opinion on funding aggregation and I’m squishy on open-source pledges. I think the question is more basic:

If libraries paid their tech people better, they’d get better ones to start with, and retain the good ones longer.

So, if that’s true, what barriers—financial, institutional, cultural—prevent that from coming to pass?

Now, I’m going to be my own “on the other hand.” Actually, as Paul Graham argues*, the best hackers aren’t really motivated by money—unless it’s a life-changing amount. Although he was talking about private-sector wages—the difference between 80k and 130k, for example—there’s still something there. Good hackers care about their freedom on the job (and the amount of bs they have to deal with), the problems they’re given and the tools they get to use. In those respects too, libraries are more severely disadvantaged.


“Great programmers are sometimes said to be indifferent to money. This isn’t quite true. It is true that all they really care about is doing interesting work. But if you make enough money, you get to work on whatever you want, and for that reason hackers are attracted by the idea of making really large amounts of money. But as long as they still have to show up for work every day, they care more about what they do there than how much they get paid for it. Economically, this is a fact of the greatest importance, because it means you don’t have to pay great hackers anything like what they’re worth. A great programmer might be ten or a hundred times as productive as an ordinary one, but he’ll consider himself lucky to get paid three times as much. … But it’s also because money is not the main thing they want.”

On 8/31/07, Dan Scott <denials> wrote: > On 31/08/2007, Casey Durfee <casey> wrote: > > I would like to see a bounty/pledge board for open source library software. > > Libraries could pledge monetary support or staff resources towards > > particular projects or features. Pooling money together would attract more > > interest from developers outside of libraries. which would be especially > > beneficial for a project like Evergreen, which is written in programming > > languages there's not a lot of library world expertise or interest in. > > Eh? Most of the code in Evergreen is Perl, which is pretty > library-world friendly (although it's also pretty heavily OO in > places, which could be perceived as unfriendly). There's some C, but > that's largely infrastructure that doesn't need to be touched by most > mortals; and then the remainder is Python (for a not-yet-primetime > configuration interface). There's some Java coming (for acquisitions > support), but there's a fair bit of that in the library world too. > > That being said, I like the idea of a bounty / pledge board... > > -- > Dan Scott > Laurentian University

My two cents on naming: “LibraryThing” was a joke. To be precise, it was a joke on Lovecraftian prose—“The Thing in the Library!” (hence the logo font). Initial reactions were all negaive—what a stupid name! Then it turned. Now we’ve even spawned other “thing” names.

So, names are irrelevant. You could call the catalog “pickle” and it would work. Or take the art-tagging project “Steve” ( It doesn’t mean a damn thing. But if it succeeds, we’ll all be “steveing” right and left.

The thing that doesn’t work about OPAC is that libraries still behave as if they were in charge of delivering an “Online [sic] Public access [sic] Catalog.”


Speak of the devil, I appear. Anyway, I feel empowered to boil my perspective down as simply as possible:

I think digitization and now social media are multiplying the finding options and cutting into the value of the core “librarian” options. The pie has gotten a lot bigger, but librarians have the same quantity of apples as they used to have, and some of what they have isn’t as attractive in light of the newer options.

It seems to me there are a few defensive approaches which I think of as “dodges,” even if they may contain some truth:

  • The new ways are not really any good; people are just crazy and/or

stupid when they prefer them.

  • Librarians need to communicate their value more forcefully; people

are just ignorant.

  • The new tools are fine, but you need a librarian to show you how to

use them right.

I propose two basic strategies which are NOT dodges:

  1. Figure out what you do that digitization and social media doesn’t or can’t do well. These are powerful, world-changing trends, but they don’t solve every problem. Tagging, for example, is a wonderful way to do some things, but not all. Figuring out what’s “chick lit” is tagging at it’s best. Complex, controlled hieararchical finding is something tagging doesn’t do well, and which still has value. Focus on what you do best; you’ll find you do it even better.

  2. Get aggresive about blending what you’ve done with the new stuff to create something with the value of both. Instead of ignoring social media, allow tagging and subjects to play together. Put librarian-created reading lists alongside patron-lists, bibliography lists, etc.

Or go past mashups to use what’s new to electrify what’s old. Faceted browsing is one excellent example–taking the data you already have and using it in wonderful, previously impossible ways. Or take the LibraryThing’s recommendation system that’s based not on social media, but on a statistical analysis of the patterns in DDC, LCC and LCSH. In a similar vein I have proposed ways of adding relevancy ranking to otherwise unranked subject headings, and creating a new DDC which takes social media patterns into account but maintains some of the strengths of a formal, stable system.

In sum, I think librarians need to think hard and realistically about what they do best, and think creatively about how their tools can be enhanced by new approaches.


Whatever the considerable benefits of browse displays (I > read, and took to heart Thomas Mann’s comments), the fact > remains that, when I look at our search log stats, users (as > opposed to librarians) simply do NOT browse (and it’s not for > lack of instruction).

I’m convinced that the underlying “problem” with our OPACs (from a usability perspective) is that they are sold once to librarians, rather than many times to end users. If each user was making an individual purchase decision, OPACs would have quickly evolved to meet their needs. I believe ILS vendors (who we often unfairly blame) are quite capable of producing an awesome OPAC. But the vendors are building OPACs to meet our (i.e. librarians) perceived needs, because vendors are smart and are in business to make money and they understand that *we* are the ones writing that big check every 10-15 years or so. As Selden points out, OPAC features that are important/essential to us, are often ones that our users could care less about, despite all our well-meaning instruction.

And that is assuming that OPAC functionality/usability is even a prime consideration in the purchase decision of an ILS. Very often that’s not the case, as acquisitions, cataloging, or circulation module features drive the decision and the OPAC is an afterthought. If we want to find out who’s responsible for sucky OPACs, the first place we need to look is in the mirror [1].

On the bright side, products like VUFind, Primo, AquaBrowser, and Endeca unbundle the OPAC from the ILS, giving us a chance to atone for past ILS purchase decisions (which can’t easily be undone). One of the problems inherent in an ILS-bundled OPAC is that the 10-15 year (give or take) ILS replacement cycle does not allow for significant changes to what quickly becomes a calcified code base. I’m particularly excited about Andrew Nagy’s recently released open-source OPAC; with VUFind, the library-land development community has a golden opportunity to craft an OPAC that genuinely meets our users needs. However, doing so will require that we resist the temptation to create the ideal OPAC for *librarians*, but instead focus on creating on OPAC that meets our *users’* search needs. I think that would be an OPAC that doesn’t require instruction (however well-meaning) or require an initial

search page that is 80% search tips.

Just my opinion…

– Michael

[1] Karen Schneider asks: “But the interesting questions are: Why don’t online catalog vendors offer true search in the first place? and Why [don’t we] demand it? Save the time of the reader!” I would answer that vendors don’t offer it, and we don’t demand it, because the ILS (OPAC) check-writers have other priorities. See: Karen Schneider, How OPACs Suck, Part 1

  1. Michael Doran, Systems Librarian
  2. University of Texas at Arlington

I’m convinced that the underlying “problem” with our OPACs (from a usability perspective) is that they are sold once to librarians, rather than many times to end users. If each user was making an individual purchase decision, OPACs would have quickly evolved to meet their needs. I believe ILS vendors (who we often unfairly blame) are quite capable of producing an awesome OPAC. But the vendors are building OPACs to meet our (i.e. librarians) perceived needs, because vendors are smart and are in business to make money and they understand that *we* are the ones writing that big check every 10-15 years or so.

I used to work in K-8 educational publishing, specifically the technology side of it. The dynamic was very similar. We didn’t write software for students or the teachers or even the schools. We made it for the state textbook committees. In the case of technology, we were just a piece of a much larger thing, so we did it for some technology subcommittee whose role was basically to approve or disapprove. We called our work a “checklist item.” And we did it once every 6-8 years. Between approvals, we had a monopoly. Oh, and because so much money was on the line, the textbook committees would demand little changes that made every state’s or even district’s textbook different from the others, massively reducing economies of scale. Not surprisingly, the technology that comes with your third-grader’s textbook is generally crap.

The situation with OPACs is much the same. Libraries make multi-year purchases. The systems are monoliths, so something like the OPAC can become a checklist item. Libraries want systems with “their” tweaks. And the decision-making is a few steps removed from the people who will use it most.


Has anyone on this list made that suggestion, ever? I do not believe > so. And I do not believe that is true.

Okay, I’ll bite. It *is* true. If librarians do not innovate, the relative value of their work will decline.

Rather, it will *continue* to decline. The web has made it easier for regular people to get all sorts of information they once needed a library to get. This is a *great* thing. But it does pose dangers to librarian jobs. And it threatens what’s more important—the many things that librarians do that computers can’t, and (pace Alexander) never will be able to do.

Broadly speaking, I see three responses at work:

  1. Ignoring change. However we feel about it, this is NOT an irrational strategy. Libraries are not going to vanish any time soon. Change is difficult and expensive. Funding doesn’t seem in immediate danger, people still use libraries a lot, and I’m retiring in a few years.

  2. Attacking change. To many librarians, the world has simply gone mad; the internet has no editor, Google is lousy, Wikipedia doesn’t work. The web and libraries are at odds, and libraries are better. By this view, Librarians must merely dig in their heels and make people understand how *wrong* they are. Libraries must get better at “explaining” and “promoting” their services to the nitwits. At the margins, this is a good thing. But only at the margins. The world has not gone mad.

  3. Embracing change. Needless to say, I think this is the future and the only sure way to protect librarian jobs and what’s best about librarianship.

I see a lot of opportunities. Personally, I want to spend the next five years helping libraries leverage what they’ve got—a bunch of great, unexploited data—and help them steal a little fire from Google and MySpace through social networking, algorithmics and so forth.

But I also see obstacles. The worst are institutional, structural. In general—yes, there are exceptions—libraries aren’t built to change rapidly, to take risks, to open up to outsiders. There are some good things in the predominant structural characteristics of libraries—non-profit and generally arms-length funding, strong hierarchies that favor tenure over merit, sharp lines between specialties, limited labor fluidity, terrible pay, powerful unions, powerful member organizations—but they are not likely to foster rapid change.

If there is a strong point here it is libraries historical devotion to openness and cooperation. There is something old and something very new in this. What makes libraries great is, in a sense, what makes Linux great.

Ultimately, this is why I despise closed data so much. Libraries are behind the curve in so many ways. They are, in my opinion, losing. If they’re not going to take advantage of their greatest cultural asset… well, how the heck do you expect to win anyway?


On 5/24/07, Alexander Johannesen <alexander.johannesen> wrote: > Hiya, > > > Ted P Gemberling wrote: > > > If we buy too quickly into the idea that computers and full texts will > > > solve all the problems, some of our jobs really might be lost. > > On 5/25/07, Jonathan Rochkind <rochkind> wrote: > > Has anyone on this list made that suggestion, ever? I do not believe > > so. > > Actually, I think it's been said here a number of times (I know I > have), especially for certain things. But all things in context, of > course. > > > And I do not believe that is true. > > Why not? As software becomes smarter, computing becomes cheaper and > the amount of information becomes unhumanly incomprehensible, what is > it that librarians can do that computers can't? Think 5, 10 or 20 > years into the future, and tell me why librarians, as they are, should > hope to have a job still.