Sunday, November 11, 2007

On being in bed with Google - in defence of the Google library project

The Google library project has come in for some criticism in relation to the restrictions (and non-disclosure agreements) Google placed on the libraries taking part. Paul Courant Library chief at the University of Michigan, which is involved in the Google library project, says the critics have got it wrong.

One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the contrary, although I may be proved wrong, I believe that the University of Michigan (and the other partner libraries) and Google are changing the world for the better. Four years from now, all seven million volumes in the University of Michigan Libraries will have been digitized – the largest such library digitization project in history. Google Book Search and our own MBooks collection already provide full-text access to well over a hundred thousand public domain works, and make it possible to search for keywords and phrases within hundreds of thousands more in-copyright materials. This access is altering the way that we do research. At least as important, the project is itself an experiment in the provision and use of digitized print collections in large research libraries. I do not see how we can discover the best ways to use such collections without experiments at this scale. In sum, I believe that our library is doing exactly what it should do in the best interests of scholarship and our users, now and in the future.

So I’m puzzled when people ask, “How could serious libraries be doing this? How could they abdicate their responsibilities as custodians of the world’s knowledge by offering their collections up as a sacrifice on the altar of corporate power? Why don’t they join the virtuous ranks of the Open Content Alliance partners, who pay thousands of dollars to digitize books at a rate of tens of thousands of volumes a year?” It seems like those who ask such questions have little appreciation of what Michigan and the other Google partners are actually up to.

Google is on pace to scan over 7 million volumes from U-M libraries in six years at no cost to the University. As part of our arrangement with Google, they give us copies of all the digital files, and we can keep them forever. Our only financial outlay is for storage and the cost of providing library services to our users. Anyone who searches U-M’s library catalog, Mirlyn, can access the scanned files via our MBooks interface. That’s right, anyone. (Copyright law constrains what we can display in full text, and what we can offer only for searching, but we share as much as we can consistent with prudent interpretations of the law.) For an example of an MBook, take a look at The Acquisitive Society by R. H. Tawney.

In a recent New York Times article about mass digitization projects, Brewster Kahle was quoted as saying: “Scanning the great libraries is a wonderful idea, but if only one corporation controls access to this digital collection, we’ll have handed too much control to a private entity.”

I agree with him. I’m an economist with a particular interest in public goods, which is how I came to be involved with libraries in the first place. Libraries have a long and honorable history of preserving information and making it accessible. Moreover, even at their best, for-profit institutions cannot be expected to serve general public interests when those interests run counter to those of their shareholders. So I would be distressed if a single corporation controlled access to the collections of the great academic libraries, just as I find it troubling, on a smaller scale, that a handful of publishers control access to much of the current scientific literature.

But Google has no such control. After Google scans a book, they return the book to the library (like any other user), and they give us a copy of the digital file. Google is not the only entity controlling access to the collection – the University of Michigan and other partner libraries control access as well. Except we don’t think of it as controlling access so much as providing it."

Siva Vaidhyanathan has a response, which also includes a follow up response from Courant in the comments. Vaidhyanathan says:

"Sadly, Paul does not actually address the real-world consequences of the Google project:

• He dismisses serious search problems as temporary, yet fails to confront the problem that Google cannot and will not explain the factors and standards that put one book above another in search results.

• As users discover poorly-scanned files on the Google index, how can they alert Google to the problem? Why does nothing in the contract between Michigan and Google include quality-control standards or methods?

• How do we know this index will last for decades? What image file system is Google using and what ensures its preservation?

• How is the "library copy," that electronic file that Michigan and others receive as payment for allowing Google to exploit their treasures, NOT an audacious infringement of copyright? It violates both the copyright holder's right to copy and right to distribute. Doesn't a university library have an obligation to explain this?

• What about user confidentiality? Why have university failed to make a stand on this issue?

I look forward to responses from Paul and others. I have been waiting two years for them, of course. And all I get is the silence created by non-disclosure agreements.

BTW, should public university librarians be signing non-disclosure agreements about their core services?"

Courant is again robust in responding and it is good to see the debate getting a public airing. Thanks to Peter Suber for the link.

No comments: