Rin goble-published
-
Upload
carole-goble -
Category
Technology
-
view
956 -
download
1
description
Transcript of Rin goble-published
Data sharingData management
The SysMO-SEEK Story
Professor Carole Goble FREng FBCS CITPUniversity of Manchester, [email protected]
13 teams91 institutes, 300 scientistsMulti-site, multi-disciplinaryEach three year duration
Data generationData consumptionData analysis
Data management:Local – Shared – Long term
Pan European Systems Biology
http://www.sysmo.net
Own data solutions. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets.
Extreme caution over sharing.Modellers vs experimentalist tribalism
Many institutions, many projects, overlapping memberships, changing membership. Projects ending, starting, carrying on the same, carrying on differently.
Legacy
Suspicion
Dynamics
Expert scientists, inexpert informaticians. Few resources.
Skills
Patchy standards, incomparable data, afterthought.
Data
Scientist Lab Collaborators Competitors
Programm
ePublished
Post-Publication
Pre-Publication
Data mine-ing
“my impression of researchers, and I can criticize myself in this, is that we’re much more interested in sharing data when we mean sharing somebody else’s as opposed [to] sharing ours.”
E-infrastructure - taking forward the strategy, RIN report, 2010
Competitive advantage.Adoption.
Kudos & Credit.Help.Fame.
Reputation.
Being scooped.Scrutiny.
Misinterpretation.Cost.
Blame. Reputation.
Rew
ards
Risk
s
Nature 461, 145 (10 September 2009)
1. Sharing
“It’s not ready yet”
“I need to get (another) publication first”
“We don’t have the resources or skills to prepare it for others, esp. now we finished that project”
“Its faster/easier to do it myself, and will keep the credit/control too”
“Its not described enough to be usable”
“I don’t trust the quality. Its not reliable enough. Its too noisy.
“Others won’t use it properly.” “It’s not worth my while”“They are my competitors!!”
Pseudo Sharing
2. Preparation for Use Curation StandardsReusabilityReproducibilityAccountability & QualityData discipline Silo busting
CIMR Core Information for Metabolomics ReportingMIABE Minimal Information About a Bioactive Entity MIACA Minimal Information About a Cellular Assay MIAME Minimum Information About a Microarray Experiment MIAME/Env MIAME / Environmental transcriptomic experiment MIAME/Nutr MIAME / Nutrigenomics MIAME/Plant MIAME / Plant transcriptomics MIAME/Tox MIAME / Toxicogenomics MIAPA Minimum Information About a Phylogenetic Analysis MIAPAR Minimum Information About a Protein Affinity Reagent MIAPE Minimum Information About a Proteomics Experiment MIARE Minimum Information About a RNAi Experiment MIASE Minimum Information About a Simulation Experiment MIENS Minimum Information about an ENvironmental Sequence MIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGen Minimum Information about a Genotyping Experiment MIGS Minimum Information about a Genome Sequence MIMIx Minimum Information about a Molecular Interaction Experiment MIMPP Minimal Information for Mouse Phenotyping Procedures MINI Minimum Information about a Neuroscience Investigation MINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFE Minimal Information for Protein Functional Evaluation MIQAS Minimal Information for QTLs and Association Studies MIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experimentMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry
ExperimentsSTRENDA Standards for Reporting Enzymology DataTBC Tox Biology Checklist
BioPAX : Biological Pathways Exchange http://www.biopax.org/FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditionshttp://www.mibbi.org/index.php/MIBBI_portal
Minimum Information for Biological and Biomedical Investigations
Metadata Minefield
http://usefulchem.wikispaces.com/page/code/EXPLAN001
http://www.mygrid.org.uk/tools/taverna/
Publishing Process
modelssoftware
methods
scripts
http://openwetware.org
standard operating procedures
Community Curation Responsiblity
Blue Collar ScienceJohn Quackenbush
Difficult and time consuming
Poor Creditor Reward
Shabby CareerPaths & Prospects
3. Credit Crisis• Reward sharing, curation and
reuse rather than reinvention. • Credit. Attribution. Citation.• For software, methods and
standards too.
• Technical (DataCite.org).• Cultural (Respected policy).• Institutional.• Funding bodies.
4. Infrastructure, Capability & Capacity• Three year
PhD/project cycle• Local data control• Realistic paths to
adoption by busy people.
• Spreadsheets, wikis, catalogues and yellow pages.
• Content and Tools
http://www.biosharing.org
Identity ManagementSharednames DataCiteLSID DOIs ORCID
5. Data Ecosystem
Resources
6. Sustained Resources• Three year projects.• Three year lifespan of data (and its software).• Sunsets and Sustains• Reinvention rewarded
• Institution.• Funding councils.• Funding panels.• Publishers• Libraries• National data centres• International data centres
Free. Like Puppies
Incentives.Sensitivity to Behaviours
Infrastructure
Community building
Trusted service
CoordinationGovernance
Policy
Capability
Community Integration
A Partnership• Software engineers• Computational scientists• Experimental Scientists• Domain informaticians• Service providers• Funding agencies
• But the community credit crisis continues….
Summary• Science is a complex social activity
undertaken by tribes of people and dominated by trust issues.
• Infrastructure has to be there and fit for purpose but its not the real the problem.
• Need a cultural shift (on all sides) that truly honours data.