KnetMiner - Knowledge Network Miner

35
Mining biological knowledge networks for gene-phenotype discovery Keywan Hassani-Pak http://knetminer.rothamsted.ac.uk/ Plant and Animal Genomes Conference 2017 @KnetMiner

Transcript of KnetMiner - Knowledge Network Miner

Page 1: KnetMiner - Knowledge Network Miner

Mining biological knowledge networks for gene-phenotype discovery

Keywan Hassani-Pakhttp://knetminer.rothamsted.ac.uk/

Plant and Animal Genomes Conference 2017

@KnetMiner

Page 2: KnetMiner - Knowledge Network Miner

The Genotype to Phenotype Challenge

GenotypeSNPs and Indels

OmicsIncludes any ‘omics

PhenotypeFloweringDefence

DevelopmentStress tolerance

Biological Knowledge Network

1. Methods to assemble and visualise an integrated knowledge network of the cell

2. Methods to use the knowledge network to translate genotype to phenotype

Page 3: KnetMiner - Knowledge Network Miner

• Free and open source

• Data warehousing using a graph-database

• Platform to integrate public and private datasets in various formats

• Provides a GUI, CLI and APIs for reproducible data integration workflows

Ondex – Data Integration Platform

Ondexwww.ondex.org

Page 4: KnetMiner - Knowledge Network Miner

The approach is generic and works similarly for other species

Page 5: KnetMiner - Knowledge Network Miner

Let’s get a GWAS dataset…

http://plants.ensembl.org/biomart

#SNP=66,816 | #Gene=27,502 | #Phenotype=107

Page 6: KnetMiner - Knowledge Network Miner

… transform into a network

close

to

(SNP)

(Phenotype)

associated

Page 7: KnetMiner - Knowledge Network Miner

Biological interaction datasets

http://thebiogrid.org

Page 8: KnetMiner - Knowledge Network Miner

inte

ract

s

interacts

close

to

(SNP)

(Phenotype)

associated

… add biological interactions

Page 9: KnetMiner - Knowledge Network Miner

• Gene-GO• Gene-Phenotype

Gene knock-out or overexpressionText mining publications

• Gene-Publication• Gene-Pathway• Homology to yeast• Homology to crops

Wheat

… finally add other open linked data

>500,000 nodes

>1,500,000 links

Genome-scale knowledge network

Page 10: KnetMiner - Knowledge Network Miner

Relationships in Crop Knowledge Networks

GO

TO

encodes ortholog

domain

text-mining

involved_in

published

GWAS P-Value 10-8

41% identityEnsemblCompara

Genes Homology Annotations

phen

otyp

e

encodes

Inferred from Mutant PhenotypePMID: 15598800

Genetics

QTL

GWAS

Marker

Interactions Phenotype

Mutations in TTG2 cause phenotypic defects seed color

pigmentation. PMID: 17766401

Page 11: KnetMiner - Knowledge Network Miner

• Methods needed to evaluate millions of relationships in knowledge network, prioritize genes and extract relevant subnetworks

• Interactive and exploratory tools needed to

enable knowledge discovery and decision making

• Interpretation should be the task of domain experts i.e. biologists!

How to search and interpret too much information?

Page 12: KnetMiner - Knowledge Network Miner

KnetMiner – Systematic and evidence-based gene discovery

http://knetminer.rothamsted.ac.uk

Page 13: KnetMiner - Knowledge Network Miner

Web Browser

KnetMiner Client

KnetMiner Server

Servlets and JSP Page

Java Socket

KnowledgeGraph DBOndex API

DHTML

JavaScript

Apache Tomcat

Multithreaded Java Server

HTML, JSON, XML and images over HTTP via Ajax

Views

Java Socket

Java Applet

Flash

KnetMiner Software Architecture

Major improvements to the user-interface.

Re-implemented Java Applet and Flash components in JavaScript.

Now compatible with most OS and touch devices.

Page 14: KnetMiner - Knowledge Network Miner

Which associations (genes) are worth following up?Often a highly subjective decision

How is genotype translated to phenotype?Often involves multi-omics interactions

Page 15: KnetMiner - Knowledge Network Miner

KnetMiner search interface

Page 16: KnetMiner - Knowledge Network Miner

KnetMiner Outputs

Page 17: KnetMiner - Knowledge Network Miner

Use Case 1 – Mining GWAS and QTL data

Page 18: KnetMiner - Knowledge Network Miner

• 96 or 192 Arabidopsis inbred lines• Genotyped: 250,000 SNPs• 107 phenotypes were measured

https://arapheno.1001genomes.org/study/1/o Floweringo Defenceo Ionomicso Developmental

• Wilcoxon and EMMA (control population structure) statistical tests

GWAS of 107 Phenotypes in Arabidopsis

Atwell et al., Nature 2010

Page 19: KnetMiner - Knowledge Network Miner

Examples where GWAS results are simple to interpret

Sodium concentration (Na)

Lesioning (LES)

AvrRpm1

Single, sharp peak of association centred on causal polymorphism

LD decays within 10 kb on average in Arabidopsis

Page 20: KnetMiner - Knowledge Network Miner

Examples where GWAS results are complex to interpret

FLC gene expression (FLC)

Leaf Number (LN22)

Days to flowering (FT Field)

Peaks are diffuse covering several hundred kb without a clear centre

Causal polymorphisms have not always strongest association

Page 21: KnetMiner - Knowledge Network Miner

Using KnetMiner to interpret GWAS results

Wilcoxon results

EMMA results

Atwell et al., Nature 2010

Flowering Locus C (FLC) gene expression

Page 22: KnetMiner - Knowledge Network Miner

Demo: Exploring genes and networks controlling FLC expression

Page 23: KnetMiner - Knowledge Network Miner

• Petal size QTL in Arabidopsis (in collaboration with John Doonan)

Using KnetMiner to prioritise genes in QTL

Page 24: KnetMiner - Knowledge Network Miner

Use Case 2 – Mining differentially expressed genes

Page 25: KnetMiner - Knowledge Network Miner

#25

White grained wheat is more prone to pre-harvest sprouting (PHS)

• PHS is the result of premature germination of grain in the ear and results in loss of bread-making quality

• Red grain colour is associated with increased dormancy and resistance to PHS

• Grain colour is due to proanthocyanidins (condensed tannins) in the testa

Sprouting

Grain colour

+ = white

o = red

Groos et al. (2002)TAG 104, 39-47

Red grain 20dpa

Andy Phillips

Page 26: KnetMiner - Knowledge Network Miner

67 down-regulated genes37 up-regulated genes

Over hundred statistically significant genes.How are these linked to grain colour and PHS?

Differential Gene Expression Analysis

Page 27: KnetMiner - Knowledge Network Miner

Google-like search interface

• Search knowledge graph using trait-based keywords

• Real-time user feedback and query suggestions

Trait related keywords

Query term suggestions

Page 28: KnetMiner - Knowledge Network Miner

Genes linked to grain colour and/or PHS

Page 29: KnetMiner - Knowledge Network Miner

Genes with direct or indirect links to grain colour and PHS

#29

Page 30: KnetMiner - Knowledge Network Miner

KnetMiner methodology

Page 31: KnetMiner - Knowledge Network Miner

Ondex Text-Mining Plugin

Input data• 27,416 Arabidopsis gene names from Phytozome• 52,561 Abstracts from PubMed that contain Arabidopsis• 22,201 curated citations from TAIR• 1,349 Trait Ontology terms from Planteome

Hassani-Pak et al., 2010

0.7

0.6 2.0

text-mining

0.51.0 x

y

BA

occurrs_in

Publication

Concepts

published_in weighted association network

IP=1.7; M=1.2; N=2

yx

BAGeneTO

TO

Page 32: KnetMiner - Knowledge Network Miner

Text-mining output

These steps connect 5553 Arabidopsis genes to 409 TO terms based on 18,341 co-citations

Page 33: KnetMiner - Knowledge Network Miner

• Uses TF*IDF to rank documents by their relevance to a search term

• Additionally, considers the properties of gene-evidence networks such as the specificity of documents to a gene the frequency of evidence concepts

• Smart pre-indexing of the knowledge network makes the computation of the score very fast

Gene Ranking

Page 34: KnetMiner - Knowledge Network Miner

• Web application for very fast search of large genome-scale knowledge graphs

• Ranking of candidate genes based on knowledge mining

• Interactive visualisation of genome and knowledge maps

• Facilitates hypothesis validation and generation

KnetMiner – Making Gene Discovery Efficient & Fun

http://knetminer.rothamsted.ac.uk/

Page 35: KnetMiner - Knowledge Network Miner

Acknowledgements

John Doonan

Sergio FeingoldMartin Castellote

Uwe ScholzMatthias Lange

Andy Law

Keywan Hassani-PakAjit SinghMarco BrandiziMonika MistryLisa LillChris Rawlings

Dave EdwardsPhilipp Bayer

Misha KapusheskyKevin Dialdestoro

@KnetMiner