The power of graphs to analyze biological data
-
Upload
datablend -
Category
Technology
-
view
2.008 -
download
0
description
Transcript of The power of graphs to analyze biological data
![Page 1: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/1.jpg)
the power of graphs for analyzing biological datasets
Davy Suvee
Janssen Pharmaceutica
![Page 2: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/2.jpg)
about me
➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets
• hands-on expertise in big data and NoSQL technologies
who am i ...
Davy Suvee@DSUVEE
➡ founder of datablend• provide big data and NoSQL consultancy
• share practical knowledge and big data use cases via blog
![Page 3: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/3.jpg)
outline
➡ getting visual insights into big data sets
➡ fluxgraph, a time machine for you graphs ...
★ gene expression clustering (mongodb, Neo4j, Gephi)★ Mutation prevalence (cassandra, Neo4j, Gephi)
![Page 4: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/4.jpg)
insights in big data
➡ typical approach through warehousing★ star schema with fact tables and dimension tables
![Page 5: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/5.jpg)
insights in big data
➡ typical approach through warehousing★ star schema with fact tables and dimension tables
![Page 6: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/6.jpg)
insights in big data
★ real-time visualization★ filtering★ metrics★ layouting★ modular 1, 2
1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
![Page 7: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/7.jpg)
gene expression clustering
★ 4.800 samples★ 27.000 genes
➡ oncology data set:
➡ Question:★ for a particular subset of samples, which genes are co-expressed?
![Page 8: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/8.jpg)
mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} , "sample_name" : "122551hp133a21.cel" , "genomics_id" : 122551 , "sample_id" : 343981 , "donor_id" : 143981 , "sample_type" : "Tissue" , "sample_site" : "Ascending colon" , "pathology_category" : "MALIGNANT" , "pathology_morphology" : "Adenocarcinoma" , "pathology_type" : "Primary malignant neoplasm of colon" , "primary_site" : "Colon" , "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} , { "gene" : "X10_at" , "expression" : 3.92335121981739} , { "gene" : "X100_at" , "expression" : 7.81638155662255} , { "gene" : "X1000_at" , "expression" : 5.44318512260619} , … ]}
![Page 9: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/9.jpg)
pearson correlation through map-reduce
pearson correlation
x y
43 99
21 65
25 79
42 75
57 87
59 81
0,52
![Page 10: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/10.jpg)
co-expression graph
➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
![Page 11: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/11.jpg)
co-expression graph
![Page 12: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/12.jpg)
graphs and time ...
➡ fluxgraph: a blueprints-compatible graph on top of Datomic
➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison
➡ towards a time-aware graph ...
➡ reproducible graph state
![Page 13: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/13.jpg)
travel through time
FluxGraph fg = new FluxGraph();
![Page 14: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/14.jpg)
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
![Page 15: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/15.jpg)
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
Peter
Vertex peter = ...
![Page 16: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/16.jpg)
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
![Page 17: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/17.jpg)
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
Edge e1 = fg.addEdge(davy, peter,“knows”);
knows
![Page 18: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/18.jpg)
travel through time
Date checkpoint = new Date();
Michael
Davy
Peter
knows
![Page 19: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/19.jpg)
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Davy
Peter
knows
![Page 20: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/20.jpg)
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
knows
David
![Page 21: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/21.jpg)
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
Edge e2 = fg.addEdge(davy, michael,“knows”);
knows
David
knows
![Page 22: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/22.jpg)
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
by default
![Page 23: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/23.jpg)
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
fg.setCheckpointTime(checkpoint);
![Page 24: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/24.jpg)
tcurrrentt3t2
time-scoped iteration
change change change
Davy’’’Davy’ Davy’’
t1
Davy
➡ how to find the version of the vertex you are interested in?
![Page 25: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/25.jpg)
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
![Page 26: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/26.jpg)
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
![Page 27: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/27.jpg)
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
![Page 28: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/28.jpg)
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
![Page 29: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/29.jpg)
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();
![Page 30: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/30.jpg)
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
![Page 31: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/31.jpg)
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
![Page 32: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/32.jpg)
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
➡ ... and each element is time-scoped!
![Page 33: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/33.jpg)
MichaelMichael
Davy
Peter
David Davy
Peter
temporal graph comparison
knows
knows
knows
current checkpoint
what changed?
![Page 34: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/34.jpg)
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
![Page 35: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/35.jpg)
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
difference ( , ) =
David
knows
![Page 36: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/36.jpg)
t3t2t1
use case: longitudinal patient data
patient patient
smoking
patient
smoking
t4
patient
cancer
t5
patient
cancer
death
![Page 37: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/37.jpg)
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
![Page 38: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/38.jpg)
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing
patients that smoked before 2005
patients that never smoked
![Page 39: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/39.jpg)
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
![Page 40: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/40.jpg)
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
![Page 41: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/41.jpg)
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
![Page 42: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/42.jpg)
use case: longitudinal patient data
boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }
}).iterator().hasNext();
➡ which patients were smoking before 2005?
![Page 43: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/43.jpg)
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
![Page 44: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/44.jpg)
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
➡ extract the patients that have an edge to the cancer node
![Page 45: The power of graphs to analyze biological data](https://reader034.fdocuments.us/reader034/viewer/2022051412/54825b38b47959000d8b4792/html5/thumbnails/45.jpg)
Questions?