MIKE WENDLAND: U-M's entire library to be put on Google
Billion-dollar project will move text of 7 million volumes online
December 14, 2004
BY MIKE WENDLAND
FREE PRESS COLUMNIST
Google, the ubiquitous Internet search engine, is taking the University of Michigan's library from Ann Arbor to the world.
U-M and the California-based information company will announce an agreement today under which the complete text of all 7 million volumes in U-M's library will be digitized -- that is, turned into a computer-readable format -- and made instantly searchable by anyone using Google.
The massive project means that within a few years, people doing research about practically anything -- whether for a scholarly paper, a high school project or a family tree -- will be able to consult U-M's collections online almost as easily as they could if they were sitting in the landmark library building on the university's central campus.
It is the largest such digital scanning project ever undertaken, and one that promises to take online searching far beyond the traditional Web pages, news and shopping sites that make up most searches today.
"This project signals an era when the printed record of civilization is accessible to every person in the world with Internet access," said U-M President Mary Sue Coleman. "It is an initiative with tremendous impact today and endless future possibilities."
Besides digitizing U-M's massive collection, Google plans to scan parts of other research libraries, including those at Harvard, Stanford, Oxford University in England and the New York Public Library. Those projects are much smaller in scope than Google's plans for U-M. At Harvard, for example, only 40,000 of the university's 15 million volumes will be digitized.
U-M's library, often ranked among the nation's top 10 research collections, has been a leader in the drive to convert printed information into digital form, which scholars say will preserve fragile items and make it easier for researchers to find the information they want.
During the past several years, the university has scanned about 22,000 volumes, one of the most ambitious digital efforts among U.S. universities. When Google offered technology that could handle the entire collection, U-M jumped at the opportunity.
Google has a strong connection to Ann Arbor: Larry Page, one of the company's two founders, is a graduate of U-M's engineering school. He was the first recipient of the University of Michigan Alumni Society's recent engineering graduate award.
The size of the U-M undertaking is staggering. It involves the use of new technology developed by Google that greatly speeds the digitizing process. Without that technology -- which Google won't discuss in detail -- the task would be impossible, says John Wilkin, the U-M associate librarian who is heading the project.
"Going as fast as we can with the traditional means of doing this, it would take us about 1,600 years to do all 7 million volumes," he said. "Google will do it in six years."
Under the agreement, the library will get a digital copy of every book scanned. With those copies, the library can prepare special research projects, virtual exhibitions and more relevant scholarly and academic material for its students and faculty.
"If we were to do this job ourselves, it would probably cost us $600 million," Wilkin said. "That's just the human cost of preparing the material for scanning, packing it up and sending it out to vendors and then quality-control checking of the results. This is easily a billion-dollar effort."
Although a few sample volumes were to be made available online today to highlight the project, significant amounts of material from the library won't be online until mid-2005. All 7 million volumes should be digitized into the Google database sometime shortly after 2010.
For Google, digitizing the collection is part of an effort called Google Print (http://print.google.com
), in which the popular search site is working to create digital databases of books, reports, manuscripts and other printed materials. The goal is for Web users accessing the search site to be able to type in a phrase or key words and be presented with direct access to in-depth research and literary material.
The prospect of expanding that effort to include U-M's 7 million items has researchers buzzing.
"It's a noble effort, and a huge undertaking," said Gary Price, editor of ResourceShelf (www.resourceshelf.com
), a site geared toward information professionals. "But it's so huge a project that the concern I have is that people may be lost in a sea of possible links."
Price said he believes the project will lead to similar efforts by Microsoft and Yahoo.
"Both of them have the money and the expertise to do this," Price said, "and there are a lot more libraries around the country. They won't want Google to have this kind of an advantage over them."
Google refuses to say how many people will be at U-M doing the digitizing work. "All we can say is this is a very large project, and we will be working on it aggressively," said Susan Wojcicki, Google's director of program management.
What users will see when they search the U-M collection online depends upon whether the information is still covered by copyright. For older items, users will be able to search for and read every word on each page of a book or document. But for material under copyright, the university will put a short synopsis of the material online, with information that links to the publisher or libraries where the work can be obtained.