Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million...
Transcript of Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million...
![Page 1: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/1.jpg)
Enabling Open Science: Data
Discoverability, Access and Use
Jo McEntyre
Head of Literature Services
www.ebi.ac.uk
![Page 2: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/2.jpg)
About EMBL-EBI
• Part of the European Molecular Biology Laboratory
• International, non-profit research institute
• Europe’s hub for biological data services and research
![Page 3: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/3.jpg)
A stable, freely available, shared repository
Europe PMC
• Abstracts: 30 million
• Full-text articles: 3 million
• Article citation counts
• Grants
• ORCIDs
• Semantic annotation
• Data citations
• Data integration
Europe PMC is a member of the PMC
International Collaboration.
Funded by 26 European funders of life science research
![Page 4: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/4.jpg)
Degrees of Data
Unstructured/semi-structured
Structured
Added Value
Metadata
A picture of a graph
A spreadsheet of my results
A record in a DNA
sequence
database
A graphical display of a genome
A narrative with
citations, pictures
and attachments Article
![Page 5: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/5.jpg)
The scent of information
It's healthy to remember that
users are selfish, lazy, and
ruthless in applying their
cost-benefit analyses Information Foraging: by JAKOB NIELSEN, June 30, 2003
http://www.nngroup.com/articles/information-scent/
• Data placement: where people will find it
• Scent alone is not scientific …
“retaining files and being prepared to share them” ≠ accessible data
![Page 6: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/6.jpg)
1. Discoverability through accessibility
• Deposit in a public/open database
• Where possible, structured archive (e.g. PDB,
ENA) >> unstructured archive (e.g. Zenodo,
Figshare)
• Uniquely identify it: PID, Accession number, DOI,
ROI
• Give it context: metadata (and more)
• All of the above = citable =
![Page 7: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/7.jpg)
Data Citation Principles
• Data as legitimate, citable products
of research
• Attribution and credit
• Cited as evidence for a claim
• Unique identification
• Access
• Persistence
• Specificity and verifiability
(provenance)
• Interoperability and flexibility
https://www.force11.org/datacitation
![Page 8: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/8.jpg)
2. Discoverability through structured data
structured data is one of the true
enablers of life science
- Discovery of homology between genes across species
- Predicting function based on protein folds
• Structured data can be cross-analysed, compared by
algorithm, and encourages development of new products
and tools
![Page 9: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/9.jpg)
Discoverability through structured data
![Page 10: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/10.jpg)
Metadata – critical to discoverability
Generic: title, submitters, date, file format, version.
citation basic search
Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1
gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006
(Rel. 89, Last updated, Version 7). BN000065
Specific: organism, tissue, assay, page number …
deep search analysis computation
![Page 11: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/11.jpg)
Structured data is good value for money
Annual cost of generating new protein
structure data in labs around the world
Annual cost of
maintaining it
in a central
database
![Page 12: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/12.jpg)
• In-built QA: otherwise you couldn’t do this!
3. Discoverable through added value
![Page 13: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/13.jpg)
Literature-Data Integration
![Page 14: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/14.jpg)
Quality control and trustworthiness
Pre publication
• QA: a range of activities from “is this file valid” to
“is it described adequately/richly by metadata”?
Post publication
• Data are immutable: time and repetition required
to show outliers.
• Openness: enabling assessment & reuse
• Provenance & connectivity with related data,
methods, code, tools, articles …
![Page 15: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/15.jpg)
Text and Data Mining (TDM)
• The more structure there is – the more reusable data
becomes
• TDM builds bridges between unstructured (including
articles) and structured data worlds
• Such analyses can drive adoption of more structure ….
• Open Access infrastructure is a critical enabler
Graphic kindly supplied by the National Centre for Text Mining (NaCTeM), UK
![Page 16: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/16.jpg)
BioStudy EBI
BioStudy database for unstructured data
Study
Publications
Ontologies
Data files
Other DBs
Metadata
Other DBs
![Page 17: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/17.jpg)
Making data discoverable
Labs around the
world deposit
data and we…
Archive it
Classify it Share it with
other data
providers
Analyse, add
value and
integrate it
…provide
tools to help
researchers
use it
A collaborative
enterprise
![Page 18: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/18.jpg)
Elixir: An international distributed infrastructure for • Data
• Standards
• Tools
• Compute
• Training
• Industry
![Page 19: Enabling Open Science: Data Discoverability, Access and Use · Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs](https://reader035.fdocuments.us/reader035/viewer/2022071115/5ffbef1c5977ea4b06140375/html5/thumbnails/19.jpg)
Some … of many … Open Questions
• What can stakeholders do to capitalize on the research
investment?
• Incentives and credit for good data management
• Can all data outputs to be peer reviewed (like articles)?
How far does the article model apply to data?
• Negative results?
• What are we prepared to spend: costs—benefits?
• How does this apply to big data?
• What is the public interest in discoverable data?