« Gallacher on Cite Neutrality | Main | Ah the Good Life: Firms and Keeping Associates Happy »
November 24, 2007
Conditions for the Digital Library of Alexandria
I have been in the middle of a major rethink of search engines' efforts to digitize books. As it started I enthusiastically celebrated their potential to tame information overload. But major research librarians are now questioning search engines' practices here:
Several major research libraries have rebuffed offers from Google and Microsoft to scan their books into computer databases, saying they are put off by restrictions these companies want to place on the new digital collections. The research libraries, including a large consortium in the Boston area, are instead signing on with the Open Content Alliance [OCA], a nonprofit effort aimed at making their materials broadly available.
As the article notes, "many in the academic and nonprofit world are intent on pursuing a vision of the Web as a global repository of knowledge that is free of business interests or restrictions."
As noble as I think this project is, I doubt it can ultimately compete with the monetary brawn of a Google. And why should delicate old books get scanned 3 or 4 times by duplicative efforts of Google, Microsoft, the OCA, and who knows what other private competitor? I also worry that a fragmented archiving system might create a library of Babel. So what is to be done?
My new position is: leverage current copyright challenges to Google's book search program to guarantee that it serves the public interest. Here's how that might work:
Google’s plans to scan and index hundreds of thousands of copyrighted books have provoked extraordinary public controversy and private litigation. This project aims to archive and provide text-based indexing for an enormous number of books. Google’s scanning of copyrighted books is prima facie infringement, but Google is presently asserting a fair use defense. The debate has largely centered on the rival property rights of Google and the owners of the copyrights of the books it would scan and edit.
Given Google’s alliance with some of the leading libraries in the world, journalistic narratives have largely portrayed the Google Book Search project as an untrammeled advance in public access to knowledge. However, other libraries are beginning to question the restrictive terms of the contracts that Google strikes when it agrees to scan and create a digital database of a library’s books. While each library is guaranteed access to the books it agrees to have scanned, it is not guaranteed access to the entire index of scanned works.
Those restrictive terms foreshadow potential future restrictions on and tiering of their book search services. Well-funded libraries may pay a premium to gain access to all sources; lesser institutions may be left to scrounge among digital scraps. If permitted to become prevalent, such tiered access to information would threaten to rigidify and reinforce existing inequalities in access to knowledge, and life chances. Such tiering divides society into two groups–those who can afford to access the information, and those who cannot. To the extent that the latter group’s relative poverty is not its own fault, information tiering inequitably subjects it to yet another disadvantage, whereby others’ wealth can be leveraged into status, educational, or occupational advantage.
Given the diciness of the fair use case for projects like Google Book Search, courts should condition the legality of such archiving of copyrighted content on universal access to the contents of the resulting database. Landmark cases like Sony v. Universal have set a precedent for taking such broad public interests into account in the course of copyright litigation. Given the importance of “commerciality” in the first of the four fair use factors, suspicion of tiered access could also be figured into that prong of the test. A more ambitious (if less likely) solution would require Congress to set such terms in a legislative settlement of the issue.
However the matter is ultimately settled, any outcome in favor of dominant categorizers should be conditioned on their maintaining open access to search results. Such a condition would help assure that the type of “tiered access” common for legal resources would not further pervade the networked world. If Google’s proposed extension of the fair use defense succeeds, such a holding should be limited to current versions of the services that conduce to a common informational infrastructure. To the extent it or other search engines limit access to parts of their index, their public-spirited defenses of their archiving and indexing projects are suspect.
PS: For more thoughts on the future of digital archiving, see Diane Leenheer Zimmerman's Can Our Culture Be Saved?
PPS: This post is part of a series, which starts here.
Photo Credit: ekornblut, Wall of Library of Alexandria.
Posted by Frank Pasquale at November 24, 2007 08:11 PM
Trackback Pings
TrackBack URL for this entry:
http://www.concurringopinions.com/movabletype/mt-tb.cgi/2772.
Comments
Your position is exactly backwards. Far and away the biggest threat to the universally accessible library is hold-up from copyright holders. They can kill the library. A dominant search engine can only delay or hamper it.
Duplication of scanning efforts is not a problem. If we spend 10x as much as we should making 5 scanned copies of everything, so what? It's still money well spent, and good search engines will still iron out any inconsistencies. And overspending on the digitization project is a far far easier way to prevent a tiered access future than trying to scan once and set exactly the right conditions on it. We should all be pushing as hard as we can for scanning and searching to be an unambiguous fair use, so that lots of institutions get into the scan+search business. (Keep in mind, too, that the costs of scanning are continually falling, whereas the stock of things that need to be scanned is not growing at anywhere near the same rate.)
In general, you're too eager to see search markets as natural monopolies. They aren't. Universal search does have high costs and thus high barriers to entry, but smaller pieces of the search market are still cheap to play in. Especially given mobility of users, Google's dominance today isn't a function of unique market factors forcing concentration; it's a result of a Schumpeterian breakthrough in search technology that Google spearheaded and is still milking. Another paradigm shift in how things are done could dethrone it in the space of a few years -- and it's quite possible that that shift could be to something not under the control of any one company.
There are serious issues that large search engines present. We should face and address those issues. But for a lot of the deeper problems you worry about in search, there are more pressing present dangers from other powerful entities. That's the case with neutrality (the incumbent broadband ISPs) and it's the case wit book scanning (the publishers).
Posted by: James Grimmelmann at November 25, 2007 12:51 AM
James,
Some responses:
1. You say: "[T]here are more pressing present dangers from other powerful entities. That's the case with neutrality (the incumbent broadband ISPs) and it's the case with book scanning (the publishers)."
Agreed. Google book search without the conditions I've advanced above would be better than a Google book search contingent on a million licensing deals.
The real question is whether conditions like mine would scuttle the project. And I don't think they are that burdensome. They can also be pared down; for example, there is a much greater societal need to have scholarship indices being free and open access than, say, indices of Danielle Steele books.
2. You say "smaller pieces of the search market are still cheap to play in." I agree, but I think a) any one of those people is still pretty reliant on Google to route it customers and b) Google does not need to monopolize that space to still be a dominant force that deserves scrutiny.
Do you really think someone else is poised to make a "Schumpeterian breakthrough" on general-purpose search?
3. The falling costs of scanning are a good argument for multiple scanning enterprises. And yes, the risk of one inaccurate or incomplete scan should be factored in against the risk of harming a book via multiple scans. But I still think that this project is such a small part of the overall Google business plan that even if the conditions I've proposed were applied, they would not significantly deter the project.
Nobody is saying "Google can't advertise on the index"--that's their core business plan. I'm just saying, don't try to make money off tiered access--just as Google says to the carriers when it lobbies for net neutrality.
Posted by: Frank at November 25, 2007 04:29 PM
And my meta-responses:
The real question is whether conditions like mine would scuttle the project. And I don't think they are that burdensome.
Perhaps not as an end-result, but your means of getting there -- through current copyright challenges to doing the project at all -- is playing with fire.
[A]ny one of these people is still pretty reliant on Google to route it customers
Maybe, maybe not. Dopplr mostly isn't, Abebooks mostly isn't, Altlaw mostly isn't. The principal Google searches they care about are navigational queries on their own names.
Do you really think someone else is poised to make a "Schumpeterian breakthrough" on general-purpose search?
Yes. The key will be to redefine the problem; my best guess is that whatever comes next will be significantly user-generated.
Posted by: James Grimmelmann at November 26, 2007 07:43 AM









