Making Silent Voices HeardStephen Rhind-Tutt, President
Charting Vanishing Voices WorkshopJune 29, 2012
1. About Alexander Street2. The Challenge3. The Nature of Virtual Space4. Examples from Alexander Street5. Partnerships and Collaboration
Agenda
1. About ASP
• Founded in 2000 by executives who used to work for Chadwyck-Healey, SilverPlatter, Wolters-Kluwer, Gale and Wilson.
• Headquartered just outside Washington DC, USA• Offices in Stevenage, England; Shanghai, China;
Kuala Lumpur, Malaysia; Sydney, Australia; Brazil; New Zealand
• 3,000 customers• 2,500 licensors
About Alexander Street Press
Making silent voices heard…
Collaboration
More examples
2. The Challenge
The Challenge
By 2020 the web will have• > 5 Bn users, (currently 2.3 Bn - 37% of the world)• > 90% of published works prior to 1923• > Most works published to 2020• > 4 Billion websites (currently 555m, 71% growth p.a)• > 1 Trillion photographs (Facebook adds 300m daily)• > 100 Million pages of facsimiles of manuscripts • > 100 Million audio files • > 1 Billion video files (YouTube adds 72 hrs every minute)
Preservation and Access
• More than 6,500 endangered languages
• Countless cultural artifacts, audio, video, texts
• Hidden collections
• (Personal) archives
• Field Notes
• Data sets
• Little or no cataloging
• Mostly undigitized
• Decaying film and audio formats
• Increasing opportunities to embellish (HD-video, 3-D models, social annotation etc)
How are we going to do all of this?
3. The Nature of Virtual Space
“You must consult the laws of nature…you say “What do you want brick?” and the brick says to you “I like an arch” and you say to brick “Look, I want one too, but arches are expensive…” Brick says “I like an arch”…
“Honor the material you use”
Louis Kahn (1979)
The nature of virtual space…
• Steel – High cost to create, strong, easy to stamp shapes, medium weight…
• Wood – Low cost to create, moderately strong, needs to be crafted, light weight…
• Glass – Medium cost to create, weak, easy to craft, transparent
• The Web - ?
Understanding the medium
Nature of electronic publications
• Atomic• Interconnected • Interdependent• The link matters more than the object
• Pliable• Evolving quickly• Unlimited in size
Page Page Page
Page
Page Page Page
Page Page
Understanding the medium
0111010011010000101101101000101110100010001110101010101010101011111010101010101111101011100100011101
Binary
Machine Code
Assembly Code
Programming languages
C++, PERL, VB, etc…
Understanding the medium
Communications Protocols – TCP-IP, Modems
Display Standards – Super VGA
Font Standards – Postscript
Plug-in standards – Java
Browser Standards – IE 7.0
Document formats - PDF
Mark-up Standards – SGML, XML, HTML
Image Standards – JPG, TIFF, etc, etc
Understanding the medium
Phone standards – 3G, 4G, 5G
Four Square
Twitter – local, custom, news
Network protocols – 801
Map Standard - Google Maps, Open Map
iOS, Android,
Devices – Nook, Kindle, iPad,
Video Standards – H264, Silverlight, Flash
Evolving quickly
• Processing speed – by 2015 machines 4 times more powerful than today’s.
• Storage space – by 2015 20 Terabytes of storage (8 Bn pages) will cost under $100
• > than 90% of all developed world will have Web access• Significant improvements in the developing world • Phone Bandwidth > 1.5 Mb/s
On current trends…
Evolving quickly
On current trends…
Year Hard Disk Size (MB)1988 20 1990 40 1991 80 1993 160 1994 320 1996 640 1997 1,280 1999 2,560 2000 5,120 2002 10,240 2003 20,480
Year Hard Disk Size (MB)2000 20,000 2002 40,000 2003 80,000 2005 160,000 2006 320,000 2008 640,000 2009 1,280,000 2011 2,560,000 2012 5,120,000 2014 10,240,000 2015 20,480,000
Where we’re headed…
After Data, Information, Knowledge, and Wisdom, Gene Bellinger, Durval Castro, Anthony Mills. http://www.systems-thinking.org/
Who, What, When, Where?
Therefore
Why?
Value in the electronic world is about...
Understanding electronic products
“The manner in which or the efficiency with which something reacts or fulfills its intended purpose”
Webster’s Unabridged
What do we need to do? • Comprehensive - everything on the network
• Everyone on the network
• Local and personal (unique verified identity)
• Ubiquitous access (everywhere, all devices)
• High quality (peer review)
• Workflow integration and analysis (deep links to relevant content and tools)
• Maximize efficiencies (easy ingestion and dissemination)
• Real time currency
Devices
Inbound Discovery Quality
BandwidthEncodes# of pixelsSampling
ToolsTranscriptsSubtitlesChapteringTranslationUsage Stats
PermissionsPrivacyPermissionsAnonymityShibboleth
IndexingMARCSemanticControlled vocabularies
Outbound Discovery
API HarvestingPromotionConferencesAdsenseE-mailMailings
IngestionScanningUploadingData Crosswalking
CommunityPeer ReviewCrowdsourceAnnotationPlaylists
ProducingFilmingRecordingLicensingWritingCommissioning
Evolution of tasks
Fading Growing Typesetting
Printing Compiling Directories
Simple, One database Search
Rare and unpublished material
Inbound discovery
Republishing public domain
Process integrationWorkflow tools & apps
Warehousing
Community BuildingOutbound discovery
Automated ingestion and tagging
Human tagging
Permissions
Evolution of tasks
Fading Growing Typesetting
Printing Compiling Directories
Simple, One database Search
Rare and unpublished material
Inbound discovery
Licensing? Republishing public domain
Process integrationWorkflow tools & apps
Warehousing
Community BuildingOutbound discovery
Automated ingestion and tagging
Human tagging
Commissioning?
Editorial?
Quality?
Selection?
Permissions
Marketing?
4. Examples
Searchability
Make video searchable…
30 minutes of news12 double-spaced pages 5 minutes to read in depth2 minutes to scan
=
Great functionality
Let it be embedded in courses
Annotation
Studio
Inbound discovery
Be of the web
Music Newspapers
Websites
Monographs
Primary Works
Journals
Library Branded Interface
Embeddable Search Box
Major Collections Individual Titles
Federated Search Engines
Make it accessible widely…
Indexing, discovery and analysis
The strain on keyword search…
Questions • Google: Martin Luther King – 8.3m hits (2005), 32.5m
(2012)• Google Scholar: 202k hits, options to restrict:
• Article • Legal document• Date range (year published)• Patent or Citation
‘Semantic’ Indexing
Collection
Series
Book or Volume
Chapter
Page
Word
Where ?When ?
What ?Who ?
Traditional in
dexing
>
‘Semantic’ indexing >
Increases in Utility
Access Keyword Search
Fielded Search
Semantic Search
Do youhave the booktitled…
All mentions of ‘Star Wars’
All mentions of ‘Star Wars’ in texts about Regan published in 1985
All mentions of ‘Star Wars’ by Regan in speeches he delivered in 1985
• Identify and divide texts into content elements (e.g. letter, diary entry…)
• Identify key concepts for these elements(e.g. authors, sources, battles, encounters…)• Index both elements and associated concepts• Integrate to form a cohesive whole
• Unique ways of browsing through concepts • Unique ways to ask questions
What is Semantic Indexing ?
Semantic Indexing…
Encounter Author SourceEncounter NameCultural GroupsEstimated # of peopleStart yearStart monthStart dayLocationExpeditionEncounter TypeFatalitiesEtc…
NameDate of birthPlace of birthDate of deathPlace of deathNationalityReligionSexual OrientationOccupationEtc…
SourceEditor/TranslatorOriginal Language PublisherPublication DatePublication PlaceSubject of WorkEtc…
DocumentTextAuthor IDEncounter IDSource IDDateSubjectAge writingEtc…
Semantic Indexing…
Encounter Author SourceEncounter NameCultural GroupsEstimated # of peopleStart year, month, dayLocationExpeditionEncounter TypeFatalitiesEtc…
NameDate of birthPlace of birthDate of deathPlace of deathNationalityReligionSexual OrientationOccupationEtc…
SourceEditor/TranslatorOriginal Language PublisherPublication DatePublication PlaceSubject of WorkEtc…
DocumentTextAuthor IDEncounter IDSource IDDateSubjectAge writingEtc…
“Show me writings by Jesuits, originally written in French, that discuss trade involving the Huron.”
Early Encounters in North America
Fauna and Flora
Geophysical, Natural Phenomena
Peoples
Personal & Cultural Events
Specific entry points for American Indian Studies…
Encounter database
Encounter database
Early Encounters in North America
Early Encounters in North America
• More than a way to answer questions• A framework by which users can be guided to
understand, explore, discover and learn.• A route-map to guide users through data - saving time and effort.• The intellectual fabric by which information should be
organized…• Delivers answers that cannot be asked elsewhere
• Discipline specific• Oriented towards the user and the content • At the ‘right’ level• Thoroughly controlled• Metadata should be open
Semantic Indexing…
Outbound discovery
Higher value linkages…
Loosely Held Tightly Held
Free Websites
Loosely integrated
Tightlyintegrated
Refuse to License
License widely
License widelyand be a Licensor
• Higher value links• Semantic indexing and keyword
searching of more than 3,000 oral history collections.
• Represents the personal histories of some 300,000 people.
• Value:– Context– Selection– Search Power– Licensed material– Integration
Higher value linkages…
Context and Selection
Search Power
Organized Results
Building the network…
Unhelpful• Legal warnings not to link• Changing links constantly• Disabling links • No permanent URLs• No crawling• Randomly changing URLs• Insisting on one interface and
one access point • Unattached pages
Helpful• Visibility• Permanent URLs• RSS feeds• OpenURL, Open Metadata• Design for multiple interfaces• Open to crawling• Published open APIs• Welcome linking• Ask others to do the same
5. Partnerships & Collaboration
Where will the £££ come from?
JSTOR$52m Revenues in 2010
American Memory
Women and Social Movements
• Collaboration with the Center for the Historical Study of Women and Gender at SUNY Binghamton and ASP
• Original site is free –new content is for fee.
• Usage across the free site dipped only slightly – more usage following commercial launch.
• Added video, audio, > 200k pages, new functionality.
We’re engaged in a leviathan taskMoney is neededFor fee content can sit alongside open contentPublishers can helpNeed for collaboration and openness
Summary
• It will all be available in digital form• It will not cost too much• Many more people will use it • It will be enriched through better display, better
integration, better links, better context, etc, etc
Good for publishers
Good for academics
Good for “society”
Where we’re headed…
www.alexanderstreet.comwww.alexanderstreet.com
Top Related