Science Bioinformatics Data Resources
-
Upload
sarahryanrhetoric -
Category
Documents
-
view
215 -
download
0
Transcript of Science Bioinformatics Data Resources
-
8/12/2019 Science Bioinformatics Data Resources
1/40
-
8/12/2019 Science Bioinformatics Data Resources
2/40
June 10/11, 2014
Michelle Hudson, Science & Social Science Data LibrarianKristin Bogdan , Science & Social Science Data Librarian
Kayleigh Bohmier, Science Research Support Librarian for Astronomy,
Geology & Geophysics, and Physics
Rolando Garcia-Milian, Biomedical Sciences Research Support
Science Data Resources: From Astronomy to Bioinformatics
-
8/12/2019 Science Bioinformatics Data Resources
3/40
A Brief Overview of Data in theSciences
-
8/12/2019 Science Bioinformatics Data Resources
4/40
Examples of data
questions
Where would you find
these? Spectroscopy on jars found
in wrecks Spectra of M31 ApoE structure Ice cores Genomic sequences for
extinct mammals
-
8/12/2019 Science Bioinformatics Data Resources
5/40
Types of Data Observational
data captured in real time, irreplaceable sensor readings, telescope images, geologic samples
Experimental data from lab equipment, expensive to reproduce gene sequences
Simulation data generated from models (models are more important than the
data) Derived or compiled
data put together from other information 3D models, compileddatabases
-
8/12/2019 Science Bioinformatics Data Resources
6/40
Formats of data Documents, spreadsheets, lab notebooks, questionnaires, survey
responses, health indicators, audio recordings, video recordings,
protein and gene sequences, images, films, spectra, slides,
artifacts, specimens, samples, models, algorithms, scripts,
software code, etc.
-
8/12/2019 Science Bioinformatics Data Resources
7/40
General resources data.gov: http://www.data.gov/ DataONE: http://www.dataone.org/ NCBI: http://www.ncbi.nlm.nih.gov/ EBI: http://www.ebi.ac.uk/ FigShare: http://figshare.com/ Dryad: http://datadryad.org/ PLOS|ONE: http://www.plosone.org/
Data journals:http://mlibrarydata.wordpress.com/2014/05/09/data-journals/
Research guide: http://guides.library.yale.edu/sciencedata
http://mlibrarydata.wordpress.com/2014/05/09/data-journals/http://guides.library.yale.edu/sciencedatahttp://guides.library.yale.edu/sciencedatahttp://guides.library.yale.edu/sciencedatahttp://guides.library.yale.edu/sciencedatahttp://guides.library.yale.edu/sciencedatahttp://mlibrarydata.wordpress.com/2014/05/09/data-journals/http://mlibrarydata.wordpress.com/2014/05/09/data-journals/http://mlibrarydata.wordpress.com/2014/05/09/data-journals/http://mlibrarydata.wordpress.com/2014/05/09/data-journals/ -
8/12/2019 Science Bioinformatics Data Resources
8/40
Astronomy,or,
Massively Open Online Archives
-
8/12/2019 Science Bioinformatics Data Resources
9/40
Astronomy data Self-collected Data.NASA.gov US Virtual Observatory
(US VO) ApJ supplement Research centers and
collaborations
Figshare Astronomy Dataverse github
Screenshot: Virtual Observatory Data Explorer search for M31
-
8/12/2019 Science Bioinformatics Data Resources
10/40
Government Data NASA Data processing levels
Ranked 0-4 Level 0 is raw data Level 1 has been
processed and error-corrected
This is, incidentally, where conspiracy theoriescome from
Level 2 data maycontain derivedparameters
Levels 3+ have furtherprocessing
http://data.nasa.gov/http://archive.eso.org/cms.htmlhttp://vao.stsci.edu/portal/Mashup/Clients/Portal/DataDiscovery.htmlhttp://archive.stsci.edu/ -
8/12/2019 Science Bioinformatics Data Resources
11/40
Data fromResearchers
Astrophysical Journal
Supplement http://dx.doi.org/10.1088/00
67-0049/212/2/26 http://dx.doi.org/10.1088/00
67-0049/212/2/19 http://dx.doi.org/10.1088/00
67-0049/212/1/6 Project web pages (i.e.,
Kepler) Intermediary data products
remain a problem (i.e., code,
analyzed data sets)
Olausen, S. A., & Kaspi, V. M. (2014). Table 2 from TheMcGill Magnetar Catalog. The Astrophysical Journal Supplement Series, 212 (1), 1-22. doi:10.1088/0067-0049/212/1/6
http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/1/6http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/19http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26http://dx.doi.org/10.1088/0067-0049/212/2/26 -
8/12/2019 Science Bioinformatics Data Resources
12/40
Harvards Astronomy Dataverse A solution for intermediate stage data products
-
8/12/2019 Science Bioinformatics Data Resources
13/40
Physics:
A Little Less Open,But Everyone Knows Where To Find It
-
8/12/2019 Science Bioinformatics Data Resources
14/40
Reference Data National Nuclear Data
Center at Brookhaven
National Laboratory:
http://www.nndc.bnl.gov/ Department of Energy Data
Explorer:
http://www.osti.gov/dataexp
lorer/ MCPlots (Monte Carlo plots
reference for HEP):
http://mcplots.cern.ch/
Monte Carlo plot reference from MCPlots
http://www.nndc.bnl.gov/http://www.osti.gov/dataexplorer/http://www.osti.gov/dataexplorer/http://mcplots.cern.ch/http://mcplots.cern.ch/http://mcplots.cern.ch/http://www.osti.gov/dataexplorer/http://www.osti.gov/dataexplorer/http://www.osti.gov/dataexplorer/http://www.nndc.bnl.gov/http://www.nndc.bnl.gov/ -
8/12/2019 Science Bioinformatics Data Resources
15/40
Experimental Data Durham HepData Project
Reactions database Data from active
experiments
Data reviews http://durpdg.dur.ac.uk/H
EPDATA/REAC Example:
http://durpdg.dur.ac.uk/view/ins1297226
Experimental data releases IceCube:
http://icecube.wisc.edu/science/data
http://durpdg.dur.ac.uk/HEPDATA/REAChttp://durpdg.dur.ac.uk/HEPDATA/REAChttp://durpdg.dur.ac.uk/view/ins1297226http://durpdg.dur.ac.uk/view/ins1297226http://icecube.wisc.edu/science/datahttp://icecube.wisc.edu/science/datahttp://icecube.wisc.edu/science/datahttp://icecube.wisc.edu/science/datahttp://icecube.wisc.edu/science/datahttp://durpdg.dur.ac.uk/view/ins1297226http://durpdg.dur.ac.uk/view/ins1297226http://durpdg.dur.ac.uk/view/ins1297226http://durpdg.dur.ac.uk/HEPDATA/REAChttp://durpdg.dur.ac.uk/HEPDATA/REAChttp://durpdg.dur.ac.uk/HEPDATA/REAC -
8/12/2019 Science Bioinformatics Data Resources
16/40
and then, we have the data grids.
-
8/12/2019 Science Bioinformatics Data Resources
17/40
Geoscience Data Resources
-
8/12/2019 Science Bioinformatics Data Resources
18/40
Kinds of
Geoscience Data Geospatial Rocks and Minerals Economic Geology Paleobiology Climate History Geochemistry Physical Samples
Image credit: USGS, via Wikimedia Commons:http://commons.wikimedia.org/wiki/File:Seismograph_Pinat ubo.jpg
-
8/12/2019 Science Bioinformatics Data Resources
19/40
Physical Samples
as Data Identified as data in NSF
guidelines Different analyses = new
data New techniques
developed over time Repositories for samples
specific metadata
requiredGry, Parent. 7 May 2011. Peronopsis interstrictus. Retrieved
from the Wikimedia Commons athttp://commons.wikimedia.org/wiki/File%3APeronopsis_interstrictus_White%2C_1874_2.jpg
-
8/12/2019 Science Bioinformatics Data Resources
20/40
Geo/Paleo Sample Repositories/Registries International Geo Sample Number (IGSN) -
http://www.geosamples.org/ Peabody Museum - http://peabody.yale.edu/collections/search-
collections PaleoBioDB - http://paleobiodb.org/#/
http://www.geosamples.org/http://peabody.yale.edu/collections/search-collectionshttp://peabody.yale.edu/collections/search-collectionshttp://paleobiodb.org/http://paleobiodb.org/http://paleobiodb.org/http://peabody.yale.edu/collections/search-collectionshttp://peabody.yale.edu/collections/search-collectionshttp://peabody.yale.edu/collections/search-collectionshttp://peabody.yale.edu/collections/search-collectionshttp://www.geosamples.org/http://www.geosamples.org/ -
8/12/2019 Science Bioinformatics Data Resources
21/40
Other Resources for Geoscience USGS Earth Explorer - http://earthexplorer.usgs.gov/ Data.Gov - http://www.data.gov/ GeoGratis - http://geogratis.cgdi.gc.ca/ Morphobank - http://morphobank.org/ EarthCube http://earthcube.org/ CINERGI - http://workspace.earthcube.org/cinergi
http://earthexplorer.usgs.gov/http://geogratis.cgdi.gc.ca/http://morphobank.org/http://earthcube.org/http://workspace.earthcube.org/cinergihttp://workspace.earthcube.org/cinergihttp://workspace.earthcube.org/cinergihttp://earthcube.org/http://earthcube.org/http://morphobank.org/http://morphobank.org/http://geogratis.cgdi.gc.ca/http://geogratis.cgdi.gc.ca/http://earthexplorer.usgs.gov/http://earthexplorer.usgs.gov/ -
8/12/2019 Science Bioinformatics Data Resources
22/40
-
8/12/2019 Science Bioinformatics Data Resources
23/40
Problem Rapid Growth of Biomedical data
GenBank Statistics http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
2000 2001 20022003 20042005 200620072008 2009 2010 2011 2012
M i l l i o n s
Samples Submitted to Gene ExpressionOmnibus Database
Compiled from GEO historic datahttp://www.ncbi.nlm.nih.gov/geo/summary/?type=history
http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/http://www.ncbi.nlm.nih.gov/geo/summary/?type=historyhttp://www.ncbi.nlm.nih.gov/geo/summary/?type=historyhttp://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/http://www.ncbi.nlm.nih.gov/genbank/genbankstats-2008/ -
8/12/2019 Science Bioinformatics Data Resources
24/40
Compiled by from PubMedhttp://www.ncbi.nlm.nih.gov/pubmed
0.00
5.00
10.00
15.00
20.00
25.00
1940 1960 1980 2000 2020
M i l l i o n s
Number of Records in PubMed
Biomedical Literature
Problem Growth of the Biomedical Literature
Huge volume (PubMed 23132342citations)
High diversity
High quality (peer review)
Users overwhelmed by long list of search results
1/3 of Pubmed queries result in 100 or more citations (Islamaj,2009)
http://www.ncbi.nlm.nih.gov/pubmedhttp://www.ncbi.nlm.nih.gov/pubmed -
8/12/2019 Science Bioinformatics Data Resources
25/40
Querying the biomedical literature becomes more difficult
Medical Subject HeadingsFiltersBoolean operators
Problem Querying the Biomedical Literature
-
8/12/2019 Science Bioinformatics Data Resources
26/40
Modified from OpenHelix
EGFR
retrieves documents/ records
T14D inh ibi ted EGF receptor internal izat ion
EGFR regulates tum or cel l pro l i ferat ion
EGFR is express ed in SCCHN
extracts facts
Information Retrieval
records
Information Extraction
records
Information Retrieval vs Information Extraction
-
8/12/2019 Science Bioinformatics Data Resources
27/40
Alternative Tools for Mining the Biomedical Literature
Alternative tools for mining the biomedical literature combine:
Statistical methods,
Ontologies / Controlled vocabularies
Natural Language Processing tools,
Visualization tools
Reduced time for discovering meaningfulresults.
-
8/12/2019 Science Bioinformatics Data Resources
28/40
Alternative Mining Tools for the Biomedical Literature
-
8/12/2019 Science Bioinformatics Data Resources
29/40
Alternative Tools for Mining the Biomedical Literature
Main gene query
Protein/gene associated
Synonym
Medical terminology (MeSH)
-
8/12/2019 Science Bioinformatics Data Resources
30/40
Alternative Tools for Mining the Biomedical Literature
Linked to Entrez Geneand OMIM database
-
8/12/2019 Science Bioinformatics Data Resources
31/40
Workshop- Novel Online Tools for Mining the BiomedicalLiterature
-
8/12/2019 Science Bioinformatics Data Resources
32/40
Case 1 Few Results in the Biomedical Literature
Searching for novel genes
-
8/12/2019 Science Bioinformatics Data Resources
33/40
Case 2 Few Results in the Biomedical Literature
Searching for side effects of drugs: Cerebyx respiratory failure
-
8/12/2019 Science Bioinformatics Data Resources
34/40
Phenotypic information can be usedto infer molecular interactions andhinting at new uses of marketeddrugs (Campillos, 2008)
Case 2 Few Results in the Biomedical Literature
-
8/12/2019 Science Bioinformatics Data Resources
35/40
Data Annotation/ Integration / Visualization Tools GenomeBrowsers
-
8/12/2019 Science Bioinformatics Data Resources
36/40
Workshop- Novel Online Tools for Mining the BiomedicalLiterature
-
8/12/2019 Science Bioinformatics Data Resources
37/40
Contextualizing Data/Results in the Biomedical Knowledge
Resulting list of upregulated genes aftertreatment of prostatecancer cells with VitD
Microarray dataobtained from GeneExpression Omnibusrepository wasanalyzed withGEO2R statisticalsoftware
-
8/12/2019 Science Bioinformatics Data Resources
38/40
Contextualizing Data/Results in the Biomedical Knowledge
-
8/12/2019 Science Bioinformatics Data Resources
39/40
References
Campillos M*, Kuhn M*, Gavin AC, Jensen LJ, Bork P. Drug target identification using side-
effect similarity. Science. 2008 Jul 11;321(5886):263-6.http://www.ncbi.nlm.nih.gov/pubmed/18621671
Islamaj Dogan R, Murray GC, Nvol A, Lu Z. (2009) Understanding PubMed user search behavior. Database (Oxford) http://www.ncbi.nlm.nih.gov/pubmed/20157491
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The humangenome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006.http://www.ncbi.nlm.nih.gov/pubmed/12045153
Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capturephenotypic effects of drugs. Mol Syst Biol. 2010;6:343. Epub 2010 Jan 19.http://sideeffects.embl.de/drugs/56338/
Rindflesch, T.C. et al. (2011) Semantic MEDLINE: An advanced information management
application for biomedicine. Information Services & Use, 31, 15-21.http://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdf
http://www.ncbi.nlm.nih.gov/pubmed/18621671http://www.ncbi.nlm.nih.gov/pubmed/20157491http://www.ncbi.nlm.nih.gov/pubmed/12045153http://sideeffects.embl.de/drugs/56338/http://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://lhncbc.nlm.nih.gov/system/files/pub-lhncbc-2011-109.pdfhttp://sideeffects.embl.de/drugs/56338/http://www.ncbi.nlm.nih.gov/pubmed/12045153http://www.ncbi.nlm.nih.gov/pubmed/20157491http://www.ncbi.nlm.nih.gov/pubmed/20157491http://www.ncbi.nlm.nih.gov/pubmed/18621671 -
8/12/2019 Science Bioinformatics Data Resources
40/40