Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
-
Upload
richard-finkers -
Category
Data & Analytics
-
view
15 -
download
0
Transcript of Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
Data Driven Innovation
Interoperability Tech Track (#agridata)
18 & 19 March 2015, Wageningen (@rfinkers)
Outline
Introduction “Interoperable Genetic Diversity”
Concept ”Bring Your Own Data” party
Aim BYOD Green Genetics?
Outcome BYOD Green Genetics
Hands on
2
Climate change & Social disruption
4Photograph: AFP/Getty Imageshttp://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1
Select a genetically diverse collection
6
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome Annotation
Genome Variation Data (re-sequencing collections) & SNP annotation
Accession Passport Information
Accession Phenotype Information
Interoperable Genetic Diversity
Genebanks should utilize genomics data
●But should not store them!
Genomics studies should make variant data available
●But need access to passport and characterization & evaluation data.
Breeders needs tools to access diversity
Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689
Genebank (s)
Genomics provider(s)
Intermezzo: Linked Open Data
Standardization makes the information interoperable• Controlled vocabularies• Machine readable• Can all be queried by a single question vs. visiting
many websites
Interoperable Genetic Diversity (2)
Implications:
●Data can be stored at many different locations, but can be found by computers
●Newly published information (in the correct format) will be included automatically.
●Tools can be written to dedicated questions, such as assessing allelic variation or utilize for collection management
Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689
Genebank (s)
Genomics provider(s)
Interdisciplinary Approach Needed
Need for Data Scientists & Domain Experts
12
Genebanks Genomics provider(s)
Format: Bring your own Data Workshop
1. Users define the question(s)2. Users and Linked data experts define concepts and ontologies3. Experts help to create linked data and formulate query
Bring Your Own Data Workshop
More Info: http://www.dtls.nl/fair-data/byod/
14
Data owners
Domain Experts
Trainers Linked Data
Experts
Select a genetically diverse collection
17
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome Annotation
Genome Variation Data (re-sequencing collections) & SNP annotation
Accession Passport Information
Accession Phenotype Information
Summary
Blueprint “Interoperable Genetic Diversity Shown”
BYOD resulted in interoperable data which could be queried
●Request your own BYOD?
Public <-> Private integration possible
Select a genetically diverse collection
22
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome Annotation
Genome Variation Data (re-sequencing collections) & SNP annotation
Accession Passport Information
Accession Phenotype Information
Select a genetically diverse collection
23
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome Annotation
Genome Variation Data (re-sequencing collections) & SNP annotation
Accession Passport Information
Accession Phenotype Information
Questions?
Acknowledgements:
BYOD team
Theo van Hinthum & Frank Menting (CGN)
Denis Guryunov & Martijn van Kaauwen (prototype)
et. all.
HaploSmasher Hands On Session
HaploSmasher Prototype:
●genomic regions as input: SL2.40ch03:10000..10200
●Solyc gene identifiers: Solyc10g085020
●Filter SNPs on impact type ● HIGH, MODERATE, LOW, MODIFIER
(SNPEff )
●No input validation yet● Use correct notation, existing Solyc
gene ID’s
HaploSmasher
Query CGN FAIRdata graph
● Prototype is only generating links to CGN passport data now
● Graph data of three CGN accessions is available in our testset
Example queries
http://www.plantbreeding.wur.nl/hs/
Also, explore variation data & Linked resources
●http://www.tomatogenome.net
Examples:
●Beta-tubulin: Solyc10g085020●HIGH & MODERATE vs. ALL effects
●Glutamate dehydrogenase Solyc05g052100●Uridine kinase Solyc02g067880●magnesium chelatase Solyc04g015750
30
HaploSmasher examples:
Conserved housekeeping genes:
● Beta-tubulin Solyc10g085020 439 AA
● 1 SNP (HIGH & MODERATE effect) , two haplotypes
HaploSmasher examples:
● Beta-tubulin Solyc10g085020 439 AA
● 136 SNPs (all SNPEff impact types)
● Part of haplotype groups:
HaploSmasher examples:
● Uridine kinase Solyc02g067880
● 23 SNPs (HIGH, MODERATE)
● Example haplotype groups: