Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation
One Scientist’s Wish List for Scientific Publishers
-
Upload
philip-bourne -
Category
Education
-
view
2.285 -
download
1
description
Transcript of One Scientist’s Wish List for Scientific Publishers
One Scientist’s Wish List for Scientific Publishers
Philip E. Bourne
University of California San Diego
www.sdsc.edu/pb
http://www.slideshare.net/pebourne
Conference of ACS Editors
Jan 6, 2012, La Jolla CA1
My {Biased} Perspective
• I am a domain scientist (computational molecular biology)
• I co-direct the RCSB Protein Data Bank• I co-founded and am EIC of one Public Library of
Science (PLoS) open access journals – There must be a business model
• I co-founded a company, SciVee Inc., that is attempting to leverage the perceived changes in scholarly communication
• I support a small academic scholarly communication group
2
What Drives Me….
The Story of Meredith
3
Meredith was Successful Because….
• She is an exceptional individual
• Much of the data and knowledge she needed she had to beg for; she used the UCSD library
• She easily got access to computer time
I want others less smart like myself to be successful too..
4
What is Wrong Today?
• Formal science communication:– Occurs too slowly – Reaches too few people– Costs too much– Ignores the data– Is very hard to reproduce
• Is stuck in the era of the printing press – we need to move Beyond the PDF and use the power of the medium
https://sites.google.com/site/beyondthepdf/http://www.force11.org
5
It’s a Start…
• We have scratched the surface, but have yet to explore the core
• But look what we could do…
6
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
My Dream
1. User reads a paper (one view of the info)
2. Clicks on a figure which can be analyzed
3. Clicking the figure gives a composite database + journal view
4. This takes you to yet more papers or databases
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
7
Literature
DataMethods
The Dream Can Only be Realized in the Context of the Complete
Research Enterprise
8
My Current Reality
http://www.flickr.com/photos/51282757@N05/5585299226/lightbox/9
Let Me Try and Illustrate What is Possible Through Our Own
Little Contributions
10
Much of What We Try and Do Leverages Open Access
Content
11
2008 Open Access: Taking Full Advantage of the Content PLoS Comp. Biol. 4(3) e1000037
Literature: OA Biosciences
• Meredith certainly benefited from the OA literature, but not enough …
• … we have a long way to go in several respects
PubMed Central Contents Nov 8, 2011
PubMed Contents Nov 8, 2011
Literature
DataMethods
Literature
DataMethods
Literature: Semantically Enriched Version of PubMed Central
13http://biolit.ucsd.edu; NAR 2008 36(S2) W385-389
Literature: So What has Happened with This?
• The value seems to be in restful services that enable databases to recognize where in the literature their data are referenced
• We use this in the PDB in the following way….
14
Literature
DataMethods
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Data & Literature: IntegrationLiterature
DataMethods
15
BMC Bioinformatics 2010 11:220
Literature: Word Add-in for Authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
Literature
DataMethods
16
BMC Bioinformatics 2010 11:103.
Literature: Knowledge Discovery
Immunology Literature
Cardiac DiseaseLiterature
Shared Function
Literature
DataMethods
17
Lets Turn Our Attention to Data ….
18
Data: Thoughts from the Biosciences
• Growth – DNA sequence data are doubling every 5 months
• NLM/NCBI is amazing• Other great resources – but a slow realization
they are appearing stove piped• Data are undervalued (more on this)• We need a data registry for deep search and
comparison
Literature
DataMethods
19
Data: Data Journals – Lots of Talk
• Nature Publishing Group• F1000• PLoS at Large• eLife• Global Biodiversity Information Facility
• Some fields do it naturally – Astronomy, Earth Sciences …
Literature
DataMethods
20http://www.astro.caltech.edu/~pls/astronomy/archives.html
Data: PLoS Data Pages
• To be introduced later this year
• Shared metadata description?
• Data in Dryad or equivalent
• History could repeat itself – JMS revisited
Literature
DataMethods
21
.. And Finally Methods
22
Methods: Is Reproducibility a Myth?
• My views of reproducibility:– We all express the importance, but the only time
it is tested is when something is truly novel or error is suspected
– Reproducability covers a spectrum of meaning – by whom and with how much effort
– The longer the time lag the less likely something is reproducible
Literature
DataMethods
23Nature Reviews Drug Discovery 10, 643-644 (September 2011)
PLoS Comp Biol Software
• Requires source be deposited in an open source public repository
• Encourages a copy of record be deposited with the article
• Requires that the reviewer be able to test the software if they wish - implies data, documentation, test parameters and output be provided for checking
Motivation: S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack Persistence and Usability.
PLoS Comp. Biol. . 4(7): e1000136
Literature
DataMethods
24
Methods: Workflow Tools Might be the Answer?
Taverna
Wings
Literature
DataMethods
25
Methods: Workflows - Our Own Experience
• Its hard and embarrassing• We have a working prototype using Wings• I can feel the potential productivity gains• My students are more doubtful• Its been a lot of fun and will enable us to
improve our processes regardless of the workflow system itself
Literature
DataMethods
26
Methods: Yes The Workflow is Real
Literature
DataMethods
27
Methods: Problems with Publishing Workflows
• Workflows are not linear• Workflow : paper is not 1:1• Confidentiality• Peer review• Infrastructure• Community acceptance• Reward system• No publisher seems willing to touch them
Literature
DataMethods
28
And Now for Something Completely Different
http://www.scivee.tv
29
Experiments in Rich MediaMashups
http://www.scivee.tv30
Pubcast – Video Integrated with the Full Text of the Paper
http://www.scivee.tv31
Products
ApplicationProduct Primary Customers
Journals PubCast Journals, publishers, societies
Meetings PosterCast Societies, conference orgs.SlideCast
Comm. PaperCast Societies, journalsPodcastSlideCast
Education PosterCast Societies, universitiesSlideCast
Books BookCast Publishers, book sellers
Rich Media as Scholarship
http://www.scivee.tv32
AndroidiPhone
Windows Phone 7
Step 1presenter starts
PowerPoint
Step 2presenter starts
recording onsmart phone
Step 3presenter stops recording and
initiates upload
Slides
Website
Step 5slides and podcastare automatically
synchronizedSync FilePodcast
Step 6listener
plays back synchronized presentation
Podium CaptureMacPC
Step 4slides areuploaded
http://www.scivee.tv
33
Nothing is Going to Change Unless The Reward System Changes
The Right Thing To Do Reward
34P.E. Bourne 2011 Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia. PLoS Comp. Biol. 7(1) e1002001.
Interim Solution: Use the Traditional Reward System
The Wikipedia Experiment – Topic Pages
• Identify areas of Wikipedia that relate to the journal that are missing of stubs
• Develop a Wikipedia page in the sandbox
• Have a Topic Page Editor Review the page
• Publish the copy of record with associated rewards
• Release the living version into Wikipedia
35
In Summary:
I Do Not Want to Do Any of This – I Want You to Do It
P.E. Bourne 2010 What Do I Want from the Publisher of the Future? PLoS Comp Biol 6(5): e1000787
36
What Does That Mean? The “Publisher” becomes Part of the
Scientific WorkflowScientist
Idea
Experiment
Data
Conclusions
Publish
Laboratory
Publisher
Maybe The Line is Somewhere Else?
uzar.wordpress.com
37
Maybe The Line is Somewhere Else?
Scientist
Idea
Experiment
Data
Conclusions
Publish
Laboratory
Publisher
Institution?
Lab Notebook
?
38
Scientist
Idea
Experiment
Data
Conclusions
Publish
Laboratory
Publisher
Institution?
Lab Notebook
Maybe The Line is Somewhere Else??
39