KnetMiner - Knowledge Network Miner
-
Upload
keywan-hassani-pak -
Category
Software
-
view
95 -
download
1
Transcript of KnetMiner - Knowledge Network Miner
Mining biological knowledge networks for gene-phenotype discovery
Keywan Hassani-Pakhttp://knetminer.rothamsted.ac.uk/
Plant and Animal Genomes Conference 2017
@KnetMiner
The Genotype to Phenotype Challenge
GenotypeSNPs and Indels
OmicsIncludes any ‘omics
PhenotypeFloweringDefence
DevelopmentStress tolerance
Biological Knowledge Network
1. Methods to assemble and visualise an integrated knowledge network of the cell
2. Methods to use the knowledge network to translate genotype to phenotype
• Free and open source
• Data warehousing using a graph-database
• Platform to integrate public and private datasets in various formats
• Provides a GUI, CLI and APIs for reproducible data integration workflows
Ondex – Data Integration Platform
Ondexwww.ondex.org
The approach is generic and works similarly for other species
Let’s get a GWAS dataset…
http://plants.ensembl.org/biomart
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
… transform into a network
close
to
(SNP)
(Phenotype)
associated
inte
ract
s
interacts
close
to
(SNP)
(Phenotype)
associated
… add biological interactions
• Gene-GO• Gene-Phenotype
Gene knock-out or overexpressionText mining publications
• Gene-Publication• Gene-Pathway• Homology to yeast• Homology to crops
Wheat
… finally add other open linked data
>500,000 nodes
>1,500,000 links
Genome-scale knowledge network
Relationships in Crop Knowledge Networks
GO
TO
encodes ortholog
domain
text-mining
involved_in
published
GWAS P-Value 10-8
41% identityEnsemblCompara
Genes Homology Annotations
phen
otyp
e
encodes
Inferred from Mutant PhenotypePMID: 15598800
Genetics
QTL
GWAS
Marker
Interactions Phenotype
Mutations in TTG2 cause phenotypic defects seed color
pigmentation. PMID: 17766401
• Methods needed to evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision making
• Interpretation should be the task of domain experts i.e. biologists!
How to search and interpret too much information?
KnetMiner – Systematic and evidence-based gene discovery
http://knetminer.rothamsted.ac.uk
Web Browser
KnetMiner Client
KnetMiner Server
Servlets and JSP Page
Java Socket
KnowledgeGraph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded Java Server
HTML, JSON, XML and images over HTTP via Ajax
Views
Java Socket
Java Applet
Flash
KnetMiner Software Architecture
Major improvements to the user-interface.
Re-implemented Java Applet and Flash components in JavaScript.
Now compatible with most OS and touch devices.
Which associations (genes) are worth following up?Often a highly subjective decision
How is genotype translated to phenotype?Often involves multi-omics interactions
KnetMiner search interface
KnetMiner Outputs
Use Case 1 – Mining GWAS and QTL data
• 96 or 192 Arabidopsis inbred lines• Genotyped: 250,000 SNPs• 107 phenotypes were measured
https://arapheno.1001genomes.org/study/1/o Floweringo Defenceo Ionomicso Developmental
• Wilcoxon and EMMA (control population structure) statistical tests
GWAS of 107 Phenotypes in Arabidopsis
Atwell et al., Nature 2010
Examples where GWAS results are simple to interpret
Sodium concentration (Na)
Lesioning (LES)
AvrRpm1
Single, sharp peak of association centred on causal polymorphism
LD decays within 10 kb on average in Arabidopsis
Examples where GWAS results are complex to interpret
FLC gene expression (FLC)
Leaf Number (LN22)
Days to flowering (FT Field)
Peaks are diffuse covering several hundred kb without a clear centre
Causal polymorphisms have not always strongest association
Using KnetMiner to interpret GWAS results
Wilcoxon results
EMMA results
Atwell et al., Nature 2010
Flowering Locus C (FLC) gene expression
Demo: Exploring genes and networks controlling FLC expression
• Petal size QTL in Arabidopsis (in collaboration with John Doonan)
Using KnetMiner to prioritise genes in QTL
Use Case 2 – Mining differentially expressed genes
#25
White grained wheat is more prone to pre-harvest sprouting (PHS)
• PHS is the result of premature germination of grain in the ear and results in loss of bread-making quality
• Red grain colour is associated with increased dormancy and resistance to PHS
• Grain colour is due to proanthocyanidins (condensed tannins) in the testa
Sprouting
Grain colour
+ = white
o = red
Groos et al. (2002)TAG 104, 39-47
Red grain 20dpa
Andy Phillips
67 down-regulated genes37 up-regulated genes
Over hundred statistically significant genes.How are these linked to grain colour and PHS?
Differential Gene Expression Analysis
Google-like search interface
• Search knowledge graph using trait-based keywords
• Real-time user feedback and query suggestions
Trait related keywords
Query term suggestions
Genes linked to grain colour and/or PHS
Genes with direct or indirect links to grain colour and PHS
#29
KnetMiner methodology
Ondex Text-Mining Plugin
Input data• 27,416 Arabidopsis gene names from Phytozome• 52,561 Abstracts from PubMed that contain Arabidopsis• 22,201 curated citations from TAIR• 1,349 Trait Ontology terms from Planteome
Hassani-Pak et al., 2010
0.7
0.6 2.0
text-mining
0.51.0 x
y
BA
occurrs_in
Publication
Concepts
published_in weighted association network
IP=1.7; M=1.2; N=2
yx
BAGeneTO
TO
Text-mining output
These steps connect 5553 Arabidopsis genes to 409 TO terms based on 18,341 co-citations
• Uses TF*IDF to rank documents by their relevance to a search term
• Additionally, considers the properties of gene-evidence networks such as the specificity of documents to a gene the frequency of evidence concepts
• Smart pre-indexing of the knowledge network makes the computation of the score very fast
Gene Ranking
• Web application for very fast search of large genome-scale knowledge graphs
• Ranking of candidate genes based on knowledge mining
• Interactive visualisation of genome and knowledge maps
• Facilitates hypothesis validation and generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
Acknowledgements
John Doonan
Sergio FeingoldMartin Castellote
Uwe ScholzMatthias Lange
Andy Law
Keywan Hassani-PakAjit SinghMarco BrandiziMonika MistryLisa LillChris Rawlings
Dave EdwardsPhilipp Bayer
Misha KapusheskyKevin Dialdestoro
@KnetMiner