Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications
-
Upload
jim-mccusker -
Category
Technology
-
view
308 -
download
0
description
Transcript of Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications
![Page 1: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/1.jpg)
Next Generation Cancer Data Discovery, Access, and
Integration Using Prizms and Nanopublications
Jim McCusker@jpmccu, Timothy Lebo@timrdf, Michael Krauthammer,
and Deborah McGuinness@dlmcguinness
![Page 2: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/2.jpg)
What we’re trying to fix From: Data Sharing and Management SNAFU in 3 Acts
![Page 3: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/3.jpg)
What we’re trying to fix
Ah yes, SAM1 is the level of CXCR4 expression.
What is the content of the field called
“SAM1”?
From: Data Sharing and Management SNAFU in 3 Acts
![Page 4: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/4.jpg)
What we’re trying to fix
That is logical if you think about it.
And what is the content of the field
called “SAM2”?
From: Data Sharing and Management SNAFU in 3 Acts
![Page 5: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/5.jpg)
What we’re trying to fix
… What is the content of the field called
“SAM2”?
I don’t remember.
From: Data Sharing and Management SNAFU in 3 Acts
![Page 6: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/6.jpg)
Life Science data seems to start its life very
scruffy.
![Page 7: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/7.jpg)
5 Levels of Data Sharing, from scruffy to neat
Level 1: Basic data sharing Who, what, when, where, why
Level 2: Automated Conversion Computable RDF representations
Level 3: Semantic enhancement Human-enhanced RDF representations
Level 4: Semantic eScience Use of vocabularies with formal semantics
Level 5: Community-Based Standards Consensus use of preferred ontologies
![Page 8: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/8.jpg)
The Prizms Architecture
![Page 9: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/9.jpg)
Prizms User Interactions
![Page 10: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/10.jpg)
Provenance of Prizms
Prizms
healthdata.tw.rpi.edu
lod.melagrid.org
More Prizms Nodes: https://github.com/timrdf/prizms/wiki/Prizms-Nodes
prov:wasDerivedFrom
prov:wasDerivedFrom
Linking Open Govt. Data prov:wasDerivedFrom
![Page 11: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/11.jpg)
5 Levels in Prizms
Level 1: Basic data sharing CKAN dataset metadata + datapubs
Level 2: Automated Conversion Prizms raw conversions
Level 3: Semantic Conversion Prizms enhanced conversions
Level 4: Semantic eScience Level 3 + NCBO ontology recommender + similar tools
Level 5: Community-Based Standards Level 4 + Vocabulary reuse analysis
![Page 12: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/12.jpg)
Level 1: Basic Data Sharing
CKAN1 and Datapubs
1Comprehensive Knowledge Archive Network
![Page 13: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/13.jpg)
What is CKAN?
• A data portal for all kinds of data
• Link or upload • Linked Data-
friendly • Link to:
o Files o APIs o SPARQL
endpoints o Metadata o Publications o Visualizations…
![Page 14: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/14.jpg)
• A data portal for all kinds of data • Link or upload
• Linked Data-friendly • Link to: o Files
o APIs o SPARQL endpoints
o Metadata o Publications
o Visualizations…
data.melagrid.org A portal for melanoma data
![Page 15: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/15.jpg)
What is a Datapub?
Viewing Relations, Attributes, and Entities in RDF (VRAER)dl.dropboxusercontent.com/u/9752413/dils2013/exome-‐‑variants-‐‑in-‐‑melanoma.ttl Redraw
hasAttribution
hasSupporting
hasAssertion
hasProvenance
exome-variants-in-melanomaa Nanopublication
provenancea Provenance
attributiona Attribution
supportinga Supporting
assertiona Assertion
Groth et al., 2010
![Page 16: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/16.jpg)
Anatomy of a Datapub: Assertion
Viewing Relations, Attributes, and Entities in RDF (VRAER)http://dl.dropboxusercontent.com/u/9752413/dils2013/exome-‐‑variants-‐‑in-‐‑melanoma-‐‑assertion.ttl Redraw
IMT
homepage
distribution
exome_aa_variants_final.xlsa DistributionaccessURL: exome_aa_variants_final.xls
xls
value: xls
Variant data from "Exome sequencing identifiesrecurrent somatic RAC1 mutations in melanoma"
a Datasetdescription: Variant data from M. Krauthammer, Y. Kong, B. Ha,
P. Evans, A. Bacchiocchi, J.P. McCusker, E.Cheng, M.J. Davis, G. Goh, M. Choi, S. Ariyan, D.Narayan, K. Dutton-Regester, A. Capatana, E.C.Holman, M. Bosenberg, M. Sznol, H.M. Kluger, D.E.Brash, D.F. Stern, M.A. Materin, R.S. Lo, S. Mane,S. Ma, K.K. Kidd, N.K. Hayward, R.P. Lifton, J.Schlessinger, T.J. Boggon, and R. Halaban, Exomesequencing identifies recurrent somatic RAC1mutations in melanoma. Nature Genetics, 2012. inpress. **Tab 1: Description** This worksheetcontains a description of the variant calling method.**Tab 2: SNVs** This worksheet containsautomatically called somatic non-silent SNVs inmatched melanoma samples. Annotations fromMU2A. **Tab 3: InDels** This worksheet containsautomatically called somatic InDels in matchedmelanoma samples. Annotations from VEP. **Tab 4:Splice Site Variants** This worksheet containsautomatically called somatic splice site variants inmatched melanoma samples. Annotations fromVEP. **Tab 5: Additional mutations** This worksheetcontains additional somatic mutations. Thesemutations are either inferred in unmatched samples(see Methods overview above), or have beenSanger-validated via PCR amplified products, aftermanual inspections of sequencing reads.Annotations from MU2A/VEP. Nomenclature --------**SNV:** Single Nucleotide Variant **DNV:**Dinucleotide Variant **DNV*: ** Two SNVs affectingthe same codon, at positions 1 and 3 of the codon**TNV:** Trinucleotide Variant **Parentheses ingenotype calls:** Nucleotides that appear inparentheses are true variant calls in tumor whichhave not been called somatic by the automaticpipeline. These variants are shown if anotherposition in the same codon has a somatic call. Thecorresponding SNP position, if known, is alsoshown. **InDel:** Insertions and Deletions**HGVS:** Human Genome Variation Societyvariant format **COSMIC:** Catalogue of SomaticMutations -http://www.sanger.ac.uk/perl/genetics/CGP/cosmic/**SNP:** This column provides SNP-IDs if availablefor any the mutated positions in tumors **PhyoP:**Computation of p-values for conservation oracceleration(http://compgen.bscb.cornell.edu/phast/faq.php).Data from UCSC genome browser. References ------ **MU2A:** Garla V, Kong Y, Szpakowski S,Krauthammer M. MU2A--reconciling the genomeand transcriptome to determine the effects of basesubstitutions. Bioinformatics. 2011 Feb 1;27(3):416-8. Epub 2010 Dec 12. PubMed PMID: 21149339;PubMed Central PMCID: PMC3031033. **VEP:**McLaren W, Pritchard B, Rios D, Chen Y, Flicek P,Cunningham F. Deriving the consequences ofgenomic variants with the Ensembl API and SNPEffect Predictor. Bioinformatics. 2010 Aug
15;26(16):2069-70. Epub 2010 Jun 18. PubMedPMID: 20562413; PubMed Central PMCID:PMC2916720.
keyword: exome-sequencing, homo-sapiensidentifier: exome-variants-in-melanoma
![Page 17: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/17.jpg)
Anatomy of a Datapub: Attribution, Evidence
Viewing Relations, Attributes, and Entities in RDF (VRAER)http://dl.dropboxusercontent.com/u/9752413/dils2013/exome-‐‑variants-‐‑in-‐‑melanoma-‐‑attribution.ttl Redraw
contributor
creatorexome-variants-in-melanoma
rights: cc-by
James McCusker
mbox: mailto:[email protected]
Michael Krauthammer
mbox: mailto:[email protected]
Attribution
Evidence
![Page 18: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/18.jpg)
Citing a Dataset using Datapubs
![Page 19: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/19.jpg)
Citing a Dataset using Datapubs
![Page 20: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/20.jpg)
Levels 2-3: Automated Conversion, Semantic
Conversion Prizms raw conversions, enhanced conversions
![Page 21: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/21.jpg)
Prizms RDF Converter
smart, naïve bootstrap
"Hawaii","Alii Garden Market Place", "75-6129 Alii Drive", "Kailua-Kona", "96740", "-155.9819183", "19.61436844"
ds4383:thing_1367 raw:column_1 "Hawaii"; raw:column_2 "Alii Garden Market Place"; raw:column_3 "75-6129 Alii Drive"; raw:column_4 "Kailua-Kona"; raw:column_5 "96740"; raw:column_6 "-155.9819183"; raw:column_7 "19.61436844" .
ds4383:thing_1367 con:preferredURI ds4383:farmersMarket_1367 .
ds4383:farmersMarket_1367 a ds4383_vocab:FarmersMarket; con:address :address_1367; dcterms:title "Alii Garden Market Place"; wgs:lat -155.9; wgs:long 19.6 .
:address_1367 a con:Address; con:stateOrProvince typed_state:Hawaii; con:street "75-6129 Alii Drive"; con:city "Kailua-Kona"; con:zip "96740" .
typed_state:Hawaii a ds4383_vocab:State; dcterms:identifier "Hawaii"; rdfs:label "Hawaii"; owl:sameAs <http://sws.geonames.org/5855797/>, govtrackusgov:HI, dbpedia:Hawaii .
enhancementTime Domain
ExpertiseSemWebExpertise
Time Domain Expertise
SemWebExpertise
Lebo et al., 2012
![Page 22: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/22.jpg)
Prizms Benefits
Prizms has worked with: • BFO/IAO/OBI • SIO • RDF Data Cube
Vocabulary • PROV • VOID • FOAF • etc.
For free, you get: • Provenance at
dataset and triple levels
• Automatic source/dataset/version URI generation
• Automated conversion as data changes
![Page 23: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/23.jpg)
Future Work: Supporting Levels 4-5
Level 1: Basic data sharing CKAN dataset metadata + datapubs
Level 2: Automated Conversion Prizms raw conversions
Level 3: Semantic Conversion Prizms enhanced conversions
Level 4: Semantic eScience Level 3 + NCBO ontology recommender + similar tools
Level 5: Community-Based Standards Level 4 + Vocabulary reuse analysis✔✔
✔
✔
✔
![Page 24: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/24.jpg)
Publishing Custom Linked Data Using LODSPeaKr
• Custom templates for RDF and HTML
• Templates driven by rdf:type
• Web-based template editor
• Embed easy-to-generate visualizations
![Page 25: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/25.jpg)
Conclusions
• Prizms is an infrastructure for sharing data on many levels of sophistication
• Good support for Level 1-3 Data Sharing • Initial support for Level 4-5 Data Sharing • Didn't just make life science data better, it made future
Linked Data better! • More to be done, but lots of progress
![Page 26: Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications](https://reader036.fdocuments.us/reader036/viewer/2022062405/554e81ecb4c9054a698b5528/html5/thumbnails/26.jpg)
Thanks!
• Rensselaer Polytechnic (Tetherless World): o Alvaro Graves o John Erickson o The LOGD Team
• The Open Knowledge Foundation Network (OKFN)
• Yale University: o Ruth Halaban o Tobias Kuhn
• Grant support from: o Yale SPORE in Skin Cancer o Semantic Sea Ice Interoperability Initiative