Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt...
Transcript of Mass Digitization Projects Celebration and Challenges Presented to the 2 nd ICUDL Alexandria, Egypt...
Mass Digitization ProjectsCelebration and Challenges
Presented to the 2nd ICUDL
Alexandria, Egypt
by
Dr. Gloriana St. ClairCarnegie Mellon University
Thesis
• Mass digitization projects are creating a revolution in information retrieval.
• Focusing human attention must be the new research agenda.
Main Points
• History and current state– Million Book Project– Google Print/Book– Open Content Alliance
• Challenges – Technology– Metadata– Legal issues
• What’s next– Organization for learning
Million Book Project
• Began in 2000
• Universal Library project
• Free to read
• Out-of-copyright; scanned with permission
• 800,000 volumes
• Funding: NSF, India, China, I’net Archive
• Pilot project in Qatar in 2007
Google Print/BookSearch
• Began in 2004
• Google and a half dozen partners
• Search yields snippets, then buy or borrow book…
• Presumes its strategy respects copyright
• Funding: online advertising
• Google’s ease of use shapes expectations
Open Content Alliance (OCA)
• Alliance of non-profits and universities, including Million Book Project
• Led by Brewster Kahle, Internet Archive
• Targets in-copyright books
• Digitizes onsite in libraries
Challenges
• Technology
• Metadata
• Legal issues
Technology
• Proprietary equipment Google, OCA
• Changing standards grayscale, color
• Bandwidth v. cost images v. OCRd
• Readability v. cost corrected v. uncorrected
Metadata
• MARC/WorldCat access issues; in English
• Creating on the fly inaccurate
• Native cataloging various standards
• Traditional cataloging cost; suitability
• Non-book formats no standards
Legal Issues
• Copyright is our biggest constraint
– In much of the world, a book is in copyright for the life of the author + 70 years
– U.S. book copyright renewal records are searchable online (thanks to Michael Lesk)
– Verifying copyright is time consuming and expensive
Copyright Strategies
• MBP Approaches publishers to digitize entire o.p. holdings, not title-by-title.
• Google Publishers sued over snippets. Now pushes users to analog books.
• 1st ICUDL Michael Shamos proposed machine summarization as a way to deliver content without breaking copyright.
What Will Happen to Books?
“What will happen to books?
Reader, Take heart!
(Publisher, be very, very afraid.)
Internet search engines
will set them free.”
—Kevin Kelly, 2006
What’s Next
• How will this digital repository contribute to learning, help create new knowledge and build a better future?
“Learning takes place in the head
of the student, and depends entirely
on the activities of the student.”
—Herbert A. Simon, 2002
Technology + Learning Theory
• Competition for time, attention– Develop expert systems to assist selection
• Mastery of a discipline is now impossible– Sampling– Problem-solving– Just-in-time learning
Organizing Information
• Selected search– Discipline-specific gateways and portals
• Pattern recognition– IF-THEN sequences
Presenting Knowledge
• Creation of a dynamic pedagogy– Engage students– Relate concepts– Focus on learners, learning styles
Conclusions
• Much to celebrate – TEST BED Critical mass of digital materials
for scholars, for computer science research. – NEW FACES, NEW IDEAS Involvement of
new partners, launching of new projects. – ICUDL An international group that faces
similar problems and concerns, works together, and shares solutions.
Conclusions
• Next, most difficult challenge Focusing human attention
– Selecting information– Presenting information– Enabling learning
• ICUDL Let’s look forward to celebrating that victory as well, as partners.
Thank You
Dr. Gloriana St. Clair [email protected] of University LibrariesCarnegie Mellon University
Pittsburgh, Pennsylvania 15213-3890 U.S.A.