376 sspin2011 bradleyallen
-
Upload
society-for-scholarly-publishing -
Category
Documents
-
view
156 -
download
1
Transcript of 376 sspin2011 bradleyallen
Innovation and the STM publisher of the future
Bradley P. Allen, Elsevier Labs
Innovation Session, SSP IN Conference 2011
Arlington, VA, USA
2011-09-19
Peak physical media
• “Music Sales”, New York Times, 1 August 2009. http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html
• “Initial Circs per student”, William Denton, 31 January 2011. http://www.miskatonic.org/2011/01/31/initial-circs-student
• “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer-Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book-Publishing-Business.aspx 2
A simple model of the evolution of publishing
Print era: 1600s -1980
• Packaged as books and articles
• Physically distributed
• Access and discovery through libraries
Digital Library era: 1980 – 2010s
• Packaged as books and articles
• Digitally distributed
• Access and discovery through search engines
Platform-as-a-Service era: 2010s
• Packaged as apps and APIs
• Digitally distributed
• Access and discovery through social networks
3
Facets of STM publishing in the PaaS era
AcquisitionExtract, Load
and Transform
Enhancement IndexingDiscovery and
AccessComposition Delivery
Submitting
Crawling
Syndicating
Formatting
Mapping
Cleansing
Indexing
Querying
Updating
Storing
Annotating
Subject tagging
Classification
Entity recognition
Author
Supplier
Web site
Typesetter
Automated process
Subject matter expert
Search engine
Content repository
Entity registry
Product catalog
Editor
Reviewer
User
Designer
Developer
E-book
Mobile app
Mobile-enhanced Web site
API
Entity extraction
Fact extraction
Clustering
Aggregating
Ordering
Summarizing
Filtering
Analysis
Data science
Rendering
Design
Publishing
Accessing
Retrieving
Deleting
Entity Activity
Process Type
Article
Book
Media object
Entity record
Asset metadata
Relational metadata
Provenance metadata
Usage metadata
Taxonomy
Ontology
User-generated content
Content Type
4
STM publishing as business intelligence
Surajit Chaudhuri, Umeshwar Dayal, and Vivek Narasayya. 2011. An overview of business intelligence technology. Commun.
ACM 54, 8 (August 2011), 88-98. http://doi.acm.org/10.1145/1978542.1978562
5
Some scenarios to compare the two digital eras
Scenario Digital Library era Platform-as-a-service era
A new medical term relevant to an emerging healthcare issue (e.g. a new type of avian flu virus) needs to be incorporated into a search index immediately
Organizational governance issues about how taxonomies are be updated, coupled with manually-intensive workflows and ad-hocapproaches to content tagging, inhibit rapid response
A single, automated and standardized taxonomy management and content enhancement workflow allows rapid and timely update of search applications
Application developers want to mash up epidemiological data with medical journal articles to create topic-specific Web resource
Data silos without easy means of programmatic access by developers, coupled with governance and business model questions , inhibit data reuse
Content API and single-point-of-access repository allow data and content to be accessed, discovered and reused across multiple applications
Digital library developers want to stagecontent into single repository for unified search index generation
Duplication of core content leads to synchronization, quality control issues
Consolidation of duplicate repositories into a single point of truth across all content accessible and discoverable through a Content API eliminates the need forduplication and synchronization
Third party solutions providers want to integrate content (e.g. tagged medical journal articles, medical taxonomies) into point-of-care solutions
No standards, no APIs for point-of-care content integration across all content and data
Standards and APIs that scale across multiple partners, for all content types, for all delivery formats
Publishers want to deliver their content to tablets and e-readers in delivery formats that take advantage of the displays and interaction modalities on those devices
No clear standard or approach for targeting emerging eReader, tablet devices, multipleand divergent approaches leading to siloedsolutions, duplication of effort
Web- and industry-standards for eReader, tablet devices supported as part of standard automated processing into delivery channel-specific formats, regularly updated and exposed through a Content API
Journal publisher wants to integrate content enhancements across multiple subject matter areas to add value to products leveraging Article of the Future technology
No single point of access to content enhancements, no standards for contentenhancement suppliers and partners to deliver enhancements for integration
Easy access to multiple opportunities for content enhancements embedded in standard next-generation article formats and provided using standard content enhancement formats
6
• Craft content acquisition, production and management systems that support with equal capability and flexibility a broad range of content types and delivery channels
• Make it easy for authors, editors and reviewers to work with bundles of content and data in the aggregate
• Make it easy to discover and access, across all content assets, information in fragments smaller than the unit of publication
• Then make it easy to aggregate and compose these fragments into new products and services
• Leverage the tremendous power of Web architectural standards and formats to increase the ease of content integration and interoperability
Goals for the publisher of the future
7
• Broad range of content types– Must treat as first-class objects video, audio,
images, datasets, metadata and knowledge organization systems in addition to articles and books
• Standards-based– Web-standard formats to support ease of
integration and interoperability
• Fine-grained– Must be decomposable into and addressable in
fragments smaller than the unit of publication; e.g., down to the level of specific words, phrases, images, table cells in articles or book chapters, key frames and segments in videos
• Discoverable– Must be easily located across all levels of
granularity,
• Accessible– Must be easily accessed through content
creation, retrieval, update and deletion (CRUD) services
• Flexible– New content types and associated schemas
must be easily added through configuration
• Reusable– It must be efficient for product developers to
aggregate and compose content fragments into new products
• Modifiable– Support the enhancement and correction of
content at any time following creation
• Broad range of delivery formats– Content standards and services must support
fulfillment, delivery and presentation across desktop, notebook, tablet and mobile computing devices
New requirements for content management
8
Leveraging Web standards for sharing
1. Use URIs to name things
2. Use HTTP URIs so they can be looked up
3. Return useful data when things are looked up
4. Include links to other things in the returned data
“Linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.”
Tennison J, 2010. Why Linked Data for data.gov.uk?
http://www.jenitennison.com/blog/node/140
Shotton D, Portwin K, Klyne G, Miles A, 2009. Adventures in Semantic Publishing:
Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol 5(4):
e1000361. doi:10.1371/journal.pcbi.1000361 9
Relational Metadata
Relational Metadata
Relational Metadata
Relational metadata
10
From books and articles to evolving research objects
Linked data
Acquire
Transform,
Enhance, Compose
Deliver
Article
Entity record
Media object
Relational metadata
Relational metadata
Relational metadata
• Emergent technologies driven by consumer Web applications emphasize design choices that focus on delivering cheap, robust and scalable Web applications– Schemaless document stores provide read/write at Web scale with
support for analytics• For more dynamic, fine-grained content and linked data• For easier usage and citation analysis, bibliometrics and scientometrics
– Web application development frameworks that leverage HTML5/CSS/JS to deliver across desktops, notebooks, tablets and smartphones
– Deploying in the cloud and moving scale-out from development to operations to reduce time-to-market, cost of failure for emerging, niche publishing opportunities
• As we shift to the Platform-as-a-Service era, these features become an important part of the STM publishing technology stack
Leveraging consumer Web innovations
11
Examples from Elsevier: Linked Data Repository
12
Examples from Elsevier: SciVal
13
Examples from Elsevier: SciVerse
14
• This stuff is not just for big publishers
• These are the tools that new consumer Internet businesses are using to create new products and services today… quickly and on the cheap
• Smaller publishers and societies can use lean startup techniques to drive app and API design and development starting from existing web presences and third-party APIs
The publisher of the future as lean startup
15
Example: Impact metrics in Klout
16
Example: Content acquisition using Github
17
Example: SciVerse/Mendeley integration
18
• When content can be mashed up at a fine-level of granularity using multiple third-party APIs, what are the rights associated with the resulting product? What are the appropriate business models?
• What standards should there be for research objects?
• Who gets credit for research objects? How is impact determined and reputation managed?
• What is an acceptable trade off between content flexibility and high-touch presentation design?
Challenges for the publisher of the future
19
• STM publishing is only beginning the transition from print to online
• Articles and books are no longer sufficient containers for scholarly communication
• Tools to effect this change come from the consumer Internet and the business intelligence worlds
• Publishers of the future will leverage the best practices emerging around these tools to create innovative new products to serve their communities
In summary
20