The Magazine

Google and Its Enemies

The much-hyped project to digitize 32 million books sounds like a good idea. Why are so many people taking shots at it?

Dec 10, 2007, Vol. 13, No. 13 • By JONATHAN V. LAST
Widget tooltip
Single Page Print Larger Text Smaller Text Alerts

And here lies Google's dilemma: Out-of-copyright books account for about one-sixth of all titles. Most books--75 percent of them--are in copyright, but out of print. Only about 10 percent of all books are both copyrighted and in print. Google has decided to get around this problem of copyright protection by simply ignoring it: forging ahead and scanning books, regardless of their copyright status. If a book is in the public domain, its full text is displayed to users, but if the book is protected, then Google shows users only a "snippet" of the text surrounding the search result. It is relevant to note that "snippet" is Google's word and is intentionally not a legal term; how much text is displayed is entirely at Google's discretion.

Concerned by this imposition on the copyright, authors and publishers began complaining to Google in mid-2005. That August, Google announced that it would suspend the scanning of copyrighted works for three months so as to allow copyright holders to "opt out" of the program and keep their works out of the database. A month later, the Authors Guild filed suit in New York's Second Circuit on the grounds of copyright infringement; a month after that, a group of publishers filed a separate suit on similar grounds.

Many of the publishers party to this suit were also, coincidentally, working with Google under the Partner Program. The publishers are seeking only to stop Google from scanning books without explicit permission; the Authors Guild seeks damages as well. As the Guild's Paul Aiken told the New Yorker, "Google is doing something that is likely to be very profitable for them, and they should pay for it. It's not enough to say that it will help the sales of some books. If you make a movie of a book, that may spur sales, but that doesn't mean you don't license the books." Both cases are winding their way slowly through the courts.

Google has, as they say, all the right enemies. Anytime the ALA, Microsoft, France, a trade guild, and a bunch of trial lawyers are lined up on one side of an argument, the other side is going to look extremely attractive. And there is a seductive appeal to the idea of Google Book Search, to the dream of having millions of books at your fingertips. Yet there are the aspects of the project that should give us pause.

Google's Wal-Mart-like obsession with secrecy does not engender trust in either its practices or arguments. As silly as most of Jean-Noël Jeanneney's broadside against Google is, it's easy to see why a book search without transparency of either its data set or its search algorithm would be suspicious and not obviously objective. Page and Brin admitted as much in the research paper that became the foundation of Google, "Anatomy of a Large-Scale Hypertextual Web Search Engine." They wrote:

The goals of the advertising business model do not always correspond to providing quality search to users. .  .  . For this type of reason and historical experience with other media, we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers.

Free-market competition should lessen this concern, of course. And, as previously mentioned, a number of competitors to Google have materialized. But Google's principal advantage is that its competitors have abided by the letter of intellectual property law and not scanned copyrighted materials without the express permission of the owners. Google's willingness to flout the law is the actual source of its competitive advantage.

To defend this advantage, Google has adopted a legal defense aimed straight at copyright law. The defense is multipronged, but the two most startling aspects relate to the establishment of the "opt out" option for copyright owners and Google's claim of a transformative nature to the Book Search. Each challenges the current understanding of the copyright in a fundamental way.

Google maintains that by giving copyright owners the chance to opt out of the program, it has performed due diligence with respect to the copyright. This turns traditional law--which stipulates that someone wanting to use copyrighted material must seek and receive affirmative permission--on its head. Yet Google has found a slim precedent in the 2006 case Field v. Google.

Blake Field sued Google for copying and caching 51 works from his website. The court ruled in Google's favor, citing in particular the ease of Google's "opt out" feature, but the decision was based in part on dubious grounds. The court said that Field had "invited" Google's spiders--web robots which crawl through the Internet cataloguing and indexing pages for a search engine--by not including code on his website which discouraged them. In other words, by not telling Google to stay away, Field was asking to have his copyright violated. It's the intellectual property version of "She wore a red dress to the bar on Saturday night."

In another part of the decision, the court ruled that Field's works were only a thimbleful of the "billions" Google had copied, and, presumably, Google had cached many of those without permission, too. The sheer volume of the copying provides them cover, since no one entry stands out in the sea. The violation of one copyright is a crime, the violation of 20 million is a statistic. There's an evident weakness in Google's citing this legal argument: In the relatively closed system of Google Book Search, most of the entries will likely be from protected works used without permission. In the Field decision, moreover, the court made much of the fact that works were copied by automated spiders and that there was "no evidence of any market for Field's works." Neither is true in the case of the book-scanning project.

The Internet has become, like the 17th-century printing press, incapable of observing copyrights. In the same way the printing press encouraged the mass production of books and magazines and newspapers, the Internet cries out for the distribution of all information--everything from blog entries to pictures to books. And as it distributes all of this information, it exerts a leveling force that diminishes the value of everything it touches. There is no reason that the Internet, unlike the printing press before it, should be exempt from the same protections of creative value. Yet, this is what Google's defense would achieve.

If the copyright protection is shifted so that it must be invoked--precisely what Google's "opt out" policy establishes--it will become the burden of holders. They will have to find and petition all those using their works to cease and desist. Georgetown Law professor Jonathan Band dismisses this concern in the course of a measured, intriguing defense of Google in the journal Plagiary. Band writes, "As a practical matter .  .  . only a small number of search engine firms have the resources to engage in digitization programs on the scale of Google's Library Project." But this is an odd argument: So long as only Google in-fringes on the copyright, then it should be allowed to do so, because opting out will only be a burden if everyone else is allowed to infringe on the copyright, too.

The second, larger, aspect of Google's defense is that Google Book Search is a "transformative work," which would provide for the fair use of previously copyrighted material. It might seem obvious that creating an index of protected works--whose primary value and advantage lies in the number of works in the set--and simply allowing users to search it, is not "transformative." Google Book Search is in important ways similar to Lexis-Nexis, the search database which catalogues newspaper, wire service, and magazine articles. LexisNexis pays content providers for the right to include their material, even though all it does is aggregate that material and render it searchable. The copyright protection of this material was solid enough that the Supreme Court decided in favor of freelance writers who sought compensation for this electronic reuse of their materials in the 2001 case New York Times Co. v. Tasini.

Tasini is not perfectly on-point because LexisNexis gives the full text of written works to paying customers where Google is proposing to give only snippets to its users. Here Google finds redoubt in the 2003 case Kelly v. Arriba Soft. Photographer Leslie Kelly sued Arriba Soft because its search engine copied photographs posted on her website, created thumbnail-sized versions of them, and placed them in its search index. The Ninth Circuit found that Arriba's copying and usage met fair-use standards because the searchable thumbnails constituted a transformed work. (They also voiced the red dress and thimble arguments that would be later brought to bear in Field.)

This ruling would seem to offer comfort to Google because there is some similarity between Kelly's thumbnail images and the snippets of copyrighted books Google is giving away--both are abstractions of larger works and neither eliminates the need for the original. It assumes, however, that the violation of the copyright occurs when Google gives material to the user. In reality, the infringement occurs when Google scans and archives an entire book without permission. It is the presence of millions of these whole, copyrighted books inside Google's database that creates commercial opportunities, albeit indirect ones, for the company. If Google Book Search included only works in the public domain, it would be almost indistinguishable from its competitors.

Google has tried to sidestep this problem by promising not to run advertisements on the snippet-delivering pages of copyrighted books. But the presence of the protected works in the database is what renders the ad space on the public domain book pages so valuable. And Google's promise of access to millions and millions of protected works is what creates the commercial opportunity for the rest of the project. If the courts do not recognize this principle, Google will have changed the landscape of intellectual property law.

So where does Google go from here? The lawsuits fall in the Second Circuit. If the court finds against Google, it may produce a conflict with the Ninth Circuit, a conflict the Supreme Court may decide to resolve. It's also possible that Google will buy its way out of the problem and make a deal with the publishers and the Authors Guild. There is additional incentive because such a settlement could function as a high barrier to entry and keep the competing enterprises from beginning to use protected works.

If the courts were to find against Google, however, the Book Search would likely die on the vine. As Georgetown's Band notes, it would be extremely difficult to construct a licensing regime for books modeled on the ASCAP/BMI models for musical compositions. And if Google were to try to go legit, the transaction costs of identifying, locating, and contacting copyright holders to seek permission could easily stretch to tens of billions of dollars. Band puts the best guess in the neighborhood of $25 billion.

Yet even if Google finds a way to realize its dreams, it's unclear exactly how useful the Book Search would ever be for the average user. Is there value in seeing "snippets" of this or that text? The only way the project could really achieve its goal of disseminating knowledge to the masses would be by ignoring copyrights and putting all texts into the public domain. Which is, of course, what the logic of the Internet ultimately wants. "Information wants to be free," according to one of the web's founding mantras.

If Google was a different company, with a different set of motivating principles, it might well have constructed its Library project along the lines of Apple's iTunes model--that is, it would have spent time and money not perfecting a mass scanning operation designed to gobble up as many pages as possible per hour, but in securing the rights to a large catalogue of books which it could then sell as downloads. After all, it's not as though the current delivery mechanism for books is in any way optimal.

But this concept is beyond its ken. Google's corporate philosophy is based on the model which brought them success: organizing and giving away other people's content, creating space for advertisements in the process. The enormous success Google found with that model in the search engine business spurred it to try and impose it in every arena. In the Google worldview, content is individually valueless. No one page is more important than the next; the value lies in the page view. And a page view is a page view, regardless of whether the page in question has a picture of a cat, a single link to another site, or the full text of Freakonomics. When all you're selling is ad space, the value shifts from the content to the viewer. And ultimately the content is valued at nothing. And here, finally, is the larger problem posed by Google's actions. Books are not in any important sense user-centric. Whether or not a book has readers matters little. Books stand on their own, over time, as ideas and creations. In the world of books, it is the ideas and the authors that matter most, not the readers. That is why the copyright exists in the first place, to protect the value of these created works, a value which Google is trying mightily to deny.

As much as any other American business, Google is the corporate embodiment of the Internet's first principles. And as with so much else on the Internet, the promise of Google Book Search lies somewhere off on the horizon, while the dangers it poses today are very real.

Jonathan V. Last is a staff writer at THE WEEKLY STANDARD.