Post on 27-Mar-2015
Stephen Rhind-Tutt, President
ITHAKA Sustainable Scholarship Conference
September 2011
The Challenge
By 2020 the Web will contain…???
• 90% of published works prior to 1923
• Majority of works published to 2020
• > 20 billion pages of e-mail, phone logs, databases, blogs and websites (currently 12 billion)
• > 10 billion photographs
• > 40 million pages of facsimiles of manuscripts
• > 50 million audio files
• > 500 million video files
A Darwinian environment
– SilverPlatter MEDLINE (>$10m in sales)– Royalties to the NLM (<$200k)– Seven other vendors also making $$$
– SilverPlatter ERIC ($1.5m in sales)– Royalties to Dept. of Education (<$100k)– Many other vendors
– SilverPlatter SEC Online– No royalties going back to the SEC
What I remember of the environment in the early 1990s
– PubMed provides free access to the world
– ERIC offered free to the world – SEC filings offered free to the world
– What’s happened to the vendors?
Environment in 2011
– Ovid and others continue to profit from public domain MEDLINE
– New entrants – SilverChair, Collexis…– SEC filings continue to sell –
Bloomberg, Yahoo and many new entrants
– Aries Systems moved into publisher services
– CSC provides free access to all for ERIC with a 5 year contract for $29m
Environment in 2011
What’s going on?
This is a commodity…
This is not a commodity
Information isn’t a commodity!
Black & White
Grayscale
24 bit color
48 bit color
100 dpi
600 dpi
JPG
TIFF
Citation
MARC Record
Dirty OCR
99.995% rekeying
Semantic Indexing
Thumbnails
100 dpi
Page
Collection
Letter
Facsimiles
Transcriptions
EAD Finding Aid
Repository
Mobile Web
TCP-IP
Information isn’t a commodity
Source: Data, Information, Knowledge, and Wisdom, Gene Bellinger, Durval Castro, Anthony Mills. http://www.systems-thinking.org/
Who, What, When, Where?
Therefore
Why?
Evolution of tasks
Fading Growing
Typesetting Printing
Print monograph Print directory
Public domain reprintsSimple, one database search
Rare and unpublished material
Linking
Licensing
Free materials
Semantic indexing
Process integration
Unified search software
Workflow tools
Warehousing
Community buildingAsset management
Commissioning?
Editorial?
Quality?Selection?
Speed?
With literally billions of pages…
What tools will we need ?
• Beyond paper
• Higher editorial value
• High functionality
• Semantically organized
• More comprehensive
• Individually customizable
• Discipline, community centric
• Web/network centric
• Add value to public domain– Rare, hard to find materials– Contextual essays and supporting material– Semantic Indexing– Unique functionality
• Go beyond public domain– Publish copyright material– Persuade publishers to release key content for electronic
publication– Commission new material ourselves
ASP experience…
The American Civil War Research Database
Great functionality
Women and Social Movements
• Collaboration with the Center for the Historical Study of Women and Gender at SUNY Binghamton and ASP• Original site is free –new content is for fee.• Usage across the free site dipped only slightly – more usage following commercial launch.• Added video, audio, > 200k pages, new functionality.
Be of the web
Music
NewspapersWebsites
Monographs
Primary Works
Journals
Building the network…
Unhelpful
•Legal warnings not to link
•Changing links constantly
•Disabling links
•No permanent URLs
•No crawling
•Randomly changing URLs
•Insisting on one interface and one access point
•Unattached pages
Helpful
•Visibility
•Permanent URLs
•RSS feeds
•OpenURL
•Design for multiple interfaces
•Open to crawling
•Published open APIs
•Welcome linking
•Ask others to do the same
A Darwinian environment