The Magazine

Google and Its Enemies

The much-hyped project to digitize 32 million books sounds like a good idea. Why are so many people taking shots at it?

Dec 10, 2007, Vol. 13, No. 13 • By JONATHAN V. LAST
Widget tooltip
Single Page Print Larger Text Smaller Text Alerts

In 1998 Larry Page and Sergey Brin founded a company called Google, about which you likely know quite a bit. The outgrowth of work Page and Brin began in 1996 on hypertextual search engines, Google has moved from darling little high-concept innovator to Microsoft-like behemoth in record time. Google employs over 15,000 people, has a stock price hovering near $700 a share, and is the all-powerful advertising and search force on the Internet. It is gradually pushing and purchasing its way into entertainment, business software, and even the cellular telephone market.

Before Page and Brin started Google, however, they were graduate students working on Stanford's Digital Library Technologies project, which sought to digitally store and catalogue books, newspapers, and scholarly journals. Page, in particular, seems to carry a torch for this endeavor. In 2002 he approached his alma mater, the University of Michigan, about digitizing the library. It was the birth of the Google Library Project, one of the most ambitious undertakings in the history of the written word. It was also a move that would create for Google--a company obsessed with its own beneficence--a crowd of enemies.

In July 2004, Google began quietly scanning and digitizing Michigan's library. Five months later, in December 2004, the company officially announced the "Google Print for Libraries" project. (After the effort hit snags and received some bad press, it was rebranded "Google Book Search.") Google partnered with five major libraries--Michigan, Stanford, Harvard, Oxford's Bodleian, and the New York Public Library--in an attempt to scan the pages of 15 million volumes. These digital books would be kept and indexed in a Google database, which would be made available, for free, to the public.

The scope has changed in the intervening years. Initially Google planned to scan the 15 million books in six years. That projection was revised upwards to more than 20 million books, and the New Yorker recently reported that Google is now aiming to scan at least 32 million books, besting the number of titles in the largest bibliographic database, WorldCat. It hopes to finish within ten years. As one Googlehead told the New Yorker's Jeffrey Toobin, "I think of Google Books as our moon shot."

It remains to be seen how realistic this goal is. Google will not divulge how many books it is scanning currently, or how many titles are already in its database, which went live to the public in May 2005 at books.google.com. To get a rough sense of things, the University of Michigan library has 7 million volumes and Google estimates it will have annexed them all by 2013, noting that it is scanning tens of thousands of books each week. Google will not reveal how it scans the books. As for the cost, this too is closely guarded by Google. In a similar venture, Microsoft is spending $2.5 million to scan 100,000 books; if that scale were to hold, Google might spend as much as $800 million.

Google has also expanded its list of library partners to include 13 additional libraries, ranging from the Bavarian State Library to the University of Virginia. Most of the agreements are private, so it is unclear what the participating institutions get from the deal, other than a digital copy of books they already own. For Google, the potential upside must seem enormous: The ebook movement of a few years ago failed but the Holy Grail of the digital library movement remains a massive archive of books, all searchable, which can be accessed from anywhere on the planet. Already a company called OnDemandBooks has created a machine called "Espresso" which can take the digital text of a book, print it, and bind it into soft cover in about four minutes. The commercial promise--and downright coolness--of Google's undertaking staggers the mind. Which is why many recent accounts of the project, from Toobin's to Jason Epstein's in the New York Review of Books to Michael Hirschorn's in the Atlantic, vibrate with fidgety, egg-headed excitement.

Not everyone is thrilled, though. As a class, users seem underwhelmed by the product itself, poking fun on blogs at the page-scans, the titles included, and the odd results that appear in response to search queries. Google's book-reader interface is unwieldy: It is difficult to navigate through the books; what may be read is full of poorly explained limits; and "page unavailable" messages often appear in the middle of books. Some books are presented without advertisements. Others have ads embedded in the browser window, which appear to run on a keyword algorithm similar to Google's Ad Words service. The entry for Mark Twain's Life on the Mississippi, for instance, carries ads for sightseeing tours on the Mississippi River and a volume from Twain's collected works.

Nor is everyone pleased by the idea of Google's online library. Just three days after Google announced the project, the president of the American Library Association took to the pages of the Los Angeles Times to proclaim the superior value of bricks-and-mortar libraries and caution against irrational Google exuberance: "This latest version of Google hype will no doubt join taking personal commuter helicopters to work and carrying the Library of Congress in a briefcase on microfilm as 'back to the future' failures, for the simple reason that they were solutions in search of a problem."

Competitors have also appeared. Amazon.com has scanned hundreds of thousands of books which can be accessed on the website and last month introduced its version of the ebook, called the "Kindle." As of now, it makes available 90,000 books for purchase and download. In 2005, Microsoft and the Alfred P. Sloan Foundation formed the Open Content Alliance, in conjunction with such institutions as the Boston Public Library and Johns Hopkins University. Google's chief competitor in the search engine business, Yahoo!, provides web hosting for the OCA. The publisher HarperCollins announced that it would scan 20,000 of its titles and provide the texts to all search engines, gratis.

On a much grander scale, the governments of China and India joined with the Library of Alexandria and eight U.S. universities on a "Million Book Project." They are moving aggressively: China has 18 digitization centers up and running, India has 22. Part of this consortium, Carnegie Mellon's "Universal Library," already has about 500,000 books digitized.

In Europe, the reaction to Google was striking. Jean-Noël Jeanneney, president of the Bibliothèque Nationale de France, wrote an op-ed that became a book, Google and the Myth of Universal Knowledge. It principally attacked Google's library project as a piece of Anglo-Saxon cultural imperialism. Jeanneney's book, which has been translated into several languages and sold briskly, is full of irritatingly French clichés. He laments the Monica Lewinsky affair and shakes his head in bewilderment at George W. Bush's reelection. At one point he worries that "English .  .  . if not contained, will become ever more dominant," because of projects such as Google Book Search. He did, however, prod some Europeans into taking Google seriously. The French Ministry of Culture has signed up some 30 libraries to its own digital library project. European governments are even contemplating the creation of a state-owned search engine--the embryonic project is called "Quaero"--with an eye toward competing with Google. The model Jeanneney cites for this endeavor is Airbus.

And then there are the lawsuits. The Google Library is composed of two different tracks, the "Partner Program" (originally called the "Publisher Program") and the "Library Project." Under the Partner Program, authors and publishers can volunteer their works for inclusion in the Google database. In return, they're given a portion of the revenue Google generates from ads that appear on pages featuring their books. A number of authors and major publishers have joined up, including Simon & Schuster, Penguin, and McGraw-Hill. Books scanned under the Partner Program will not give viewers access to the full text, but rather to a few pages on either side of the search result.

The legal problems lie with the Library Project. Copyright has its foundations in English law and the Licensing Act of 1662. The falling costs of printing had created rampant book piracy in England. Concerned that such behavior would blunt creativity and harm the book business, Charles II established a register of licensed books to protect authors and publishers. A hundred years later, the copyright was the only right the Founding Fathers gauged important enough to recognize explicitly in the Constitution itself. In the intervening years, it has evolved somewhat. Today, works published before 1923 are generally in the public domain. There are exceptions and complexities, but works published after 1978 are protected by copyright for 70 years from the author's death. As for works published between 1923 and 1978, they were given an original copyright protection of 28 years from first publication and another 67 years of protection upon renewal of the copyright. Got that?

And here lies Google's dilemma: Out-of-copyright books account for about one-sixth of all titles. Most books--75 percent of them--are in copyright, but out of print. Only about 10 percent of all books are both copyrighted and in print. Google has decided to get around this problem of copyright protection by simply ignoring it: forging ahead and scanning books, regardless of their copyright status. If a book is in the public domain, its full text is displayed to users, but if the book is protected, then Google shows users only a "snippet" of the text surrounding the search result. It is relevant to note that "snippet" is Google's word and is intentionally not a legal term; how much text is displayed is entirely at Google's discretion.

Concerned by this imposition on the copyright, authors and publishers began complaining to Google in mid-2005. That August, Google announced that it would suspend the scanning of copyrighted works for three months so as to allow copyright holders to "opt out" of the program and keep their works out of the database. A month later, the Authors Guild filed suit in New York's Second Circuit on the grounds of copyright infringement; a month after that, a group of publishers filed a separate suit on similar grounds.

Many of the publishers party to this suit were also, coincidentally, working with Google under the Partner Program. The publishers are seeking only to stop Google from scanning books without explicit permission; the Authors Guild seeks damages as well. As the Guild's Paul Aiken told the New Yorker, "Google is doing something that is likely to be very profitable for them, and they should pay for it. It's not enough to say that it will help the sales of some books. If you make a movie of a book, that may spur sales, but that doesn't mean you don't license the books." Both cases are winding their way slowly through the courts.

Google has, as they say, all the right enemies. Anytime the ALA, Microsoft, France, a trade guild, and a bunch of trial lawyers are lined up on one side of an argument, the other side is going to look extremely attractive. And there is a seductive appeal to the idea of Google Book Search, to the dream of having millions of books at your fingertips. Yet there are the aspects of the project that should give us pause.

Google's Wal-Mart-like obsession with secrecy does not engender trust in either its practices or arguments. As silly as most of Jean-Noël Jeanneney's broadside against Google is, it's easy to see why a book search without transparency of either its data set or its search algorithm would be suspicious and not obviously objective. Page and Brin admitted as much in the research paper that became the foundation of Google, "Anatomy of a Large-Scale Hypertextual Web Search Engine." They wrote:

The goals of the advertising business model do not always correspond to providing quality search to users. .  .  . For this type of reason and historical experience with other media, we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers.

Free-market competition should lessen this concern, of course. And, as previously mentioned, a number of competitors to Google have materialized. But Google's principal advantage is that its competitors have abided by the letter of intellectual property law and not scanned copyrighted materials without the express permission of the owners. Google's willingness to flout the law is the actual source of its competitive advantage.

To defend this advantage, Google has adopted a legal defense aimed straight at copyright law. The defense is multipronged, but the two most startling aspects relate to the establishment of the "opt out" option for copyright owners and Google's claim of a transformative nature to the Book Search. Each challenges the current understanding of the copyright in a fundamental way.

Google maintains that by giving copyright owners the chance to opt out of the program, it has performed due diligence with respect to the copyright. This turns traditional law--which stipulates that someone wanting to use copyrighted material must seek and receive affirmative permission--on its head. Yet Google has found a slim precedent in the 2006 case Field v. Google.

Blake Field sued Google for copying and caching 51 works from his website. The court ruled in Google's favor, citing in particular the ease of Google's "opt out" feature, but the decision was based in part on dubious grounds. The court said that Field had "invited" Google's spiders--web robots which crawl through the Internet cataloguing and indexing pages for a search engine--by not including code on his website which discouraged them. In other words, by not telling Google to stay away, Field was asking to have his copyright violated. It's the intellectual property version of "She wore a red dress to the bar on Saturday night."

In another part of the decision, the court ruled that Field's works were only a thimbleful of the "billions" Google had copied, and, presumably, Google had cached many of those without permission, too. The sheer volume of the copying provides them cover, since no one entry stands out in the sea. The violation of one copyright is a crime, the violation of 20 million is a statistic. There's an evident weakness in Google's citing this legal argument: In the relatively closed system of Google Book Search, most of the entries will likely be from protected works used without permission. In the Field decision, moreover, the court made much of the fact that works were copied by automated spiders and that there was "no evidence of any market for Field's works." Neither is true in the case of the book-scanning project.

The Internet has become, like the 17th-century printing press, incapable of observing copyrights. In the same way the printing press encouraged the mass production of books and magazines and newspapers, the Internet cries out for the distribution of all information--everything from blog entries to pictures to books. And as it distributes all of this information, it exerts a leveling force that diminishes the value of everything it touches. There is no reason that the Internet, unlike the printing press before it, should be exempt from the same protections of creative value. Yet, this is what Google's defense would achieve.

If the copyright protection is shifted so that it must be invoked--precisely what Google's "opt out" policy establishes--it will become the burden of holders. They will have to find and petition all those using their works to cease and desist. Georgetown Law professor Jonathan Band dismisses this concern in the course of a measured, intriguing defense of Google in the journal Plagiary. Band writes, "As a practical matter .  .  . only a small number of search engine firms have the resources to engage in digitization programs on the scale of Google's Library Project." But this is an odd argument: So long as only Google in-fringes on the copyright, then it should be allowed to do so, because opting out will only be a burden if everyone else is allowed to infringe on the copyright, too.

The second, larger, aspect of Google's defense is that Google Book Search is a "transformative work," which would provide for the fair use of previously copyrighted material. It might seem obvious that creating an index of protected works--whose primary value and advantage lies in the number of works in the set--and simply allowing users to search it, is not "transformative." Google Book Search is in important ways similar to Lexis-Nexis, the search database which catalogues newspaper, wire service, and magazine articles. LexisNexis pays content providers for the right to include their material, even though all it does is aggregate that material and render it searchable. The copyright protection of this material was solid enough that the Supreme Court decided in favor of freelance writers who sought compensation for this electronic reuse of their materials in the 2001 case New York Times Co. v. Tasini.

Tasini is not perfectly on-point because LexisNexis gives the full text of written works to paying customers where Google is proposing to give only snippets to its users. Here Google finds redoubt in the 2003 case Kelly v. Arriba Soft. Photographer Leslie Kelly sued Arriba Soft because its search engine copied photographs posted on her website, created thumbnail-sized versions of them, and placed them in its search index. The Ninth Circuit found that Arriba's copying and usage met fair-use standards because the searchable thumbnails constituted a transformed work. (They also voiced the red dress and thimble arguments that would be later brought to bear in Field.)

This ruling would seem to offer comfort to Google because there is some similarity between Kelly's thumbnail images and the snippets of copyrighted books Google is giving away--both are abstractions of larger works and neither eliminates the need for the original. It assumes, however, that the violation of the copyright occurs when Google gives material to the user. In reality, the infringement occurs when Google scans and archives an entire book without permission. It is the presence of millions of these whole, copyrighted books inside Google's database that creates commercial opportunities, albeit indirect ones, for the company. If Google Book Search included only works in the public domain, it would be almost indistinguishable from its competitors.

Google has tried to sidestep this problem by promising not to run advertisements on the snippet-delivering pages of copyrighted books. But the presence of the protected works in the database is what renders the ad space on the public domain book pages so valuable. And Google's promise of access to millions and millions of protected works is what creates the commercial opportunity for the rest of the project. If the courts do not recognize this principle, Google will have changed the landscape of intellectual property law.

So where does Google go from here? The lawsuits fall in the Second Circuit. If the court finds against Google, it may produce a conflict with the Ninth Circuit, a conflict the Supreme Court may decide to resolve. It's also possible that Google will buy its way out of the problem and make a deal with the publishers and the Authors Guild. There is additional incentive because such a settlement could function as a high barrier to entry and keep the competing enterprises from beginning to use protected works.

If the courts were to find against Google, however, the Book Search would likely die on the vine. As Georgetown's Band notes, it would be extremely difficult to construct a licensing regime for books modeled on the ASCAP/BMI models for musical compositions. And if Google were to try to go legit, the transaction costs of identifying, locating, and contacting copyright holders to seek permission could easily stretch to tens of billions of dollars. Band puts the best guess in the neighborhood of $25 billion.

Yet even if Google finds a way to realize its dreams, it's unclear exactly how useful the Book Search would ever be for the average user. Is there value in seeing "snippets" of this or that text? The only way the project could really achieve its goal of disseminating knowledge to the masses would be by ignoring copyrights and putting all texts into the public domain. Which is, of course, what the logic of the Internet ultimately wants. "Information wants to be free," according to one of the web's founding mantras.

If Google was a different company, with a different set of motivating principles, it might well have constructed its Library project along the lines of Apple's iTunes model--that is, it would have spent time and money not perfecting a mass scanning operation designed to gobble up as many pages as possible per hour, but in securing the rights to a large catalogue of books which it could then sell as downloads. After all, it's not as though the current delivery mechanism for books is in any way optimal.

But this concept is beyond its ken. Google's corporate philosophy is based on the model which brought them success: organizing and giving away other people's content, creating space for advertisements in the process. The enormous success Google found with that model in the search engine business spurred it to try and impose it in every arena. In the Google worldview, content is individually valueless. No one page is more important than the next; the value lies in the page view. And a page view is a page view, regardless of whether the page in question has a picture of a cat, a single link to another site, or the full text of Freakonomics. When all you're selling is ad space, the value shifts from the content to the viewer. And ultimately the content is valued at nothing. And here, finally, is the larger problem posed by Google's actions. Books are not in any important sense user-centric. Whether or not a book has readers matters little. Books stand on their own, over time, as ideas and creations. In the world of books, it is the ideas and the authors that matter most, not the readers. That is why the copyright exists in the first place, to protect the value of these created works, a value which Google is trying mightily to deny.

As much as any other American business, Google is the corporate embodiment of the Internet's first principles. And as with so much else on the Internet, the promise of Google Book Search lies somewhere off on the horizon, while the dangers it poses today are very real.

Jonathan V. Last is a staff writer at THE WEEKLY STANDARD.