Data Visualization in Molecular Biology

62
Data Visualization in Molecular Biology Alexander Lex July 29, 2013

description

Data Visualization in Molecular Biology. Alexander Lex. July 29, 2013. r esearch requires understanding data b ut there is so much of it…. What is important? Where are the connections? . p rotein expression. publications. microRNA expression. clinical parameters. mRNA expression. - PowerPoint PPT Presentation

Transcript of Data Visualization in Molecular Biology

Page 1: Data Visualization in Molecular Biology

Data Visualization in Molecular BiologyAlexander Lex

July 29, 2013

Page 2: Data Visualization in Molecular Biology

2

research requires understanding data

but there is so much of it…

Page 3: Data Visualization in Molecular Biology

3

What is important? Where are the connections?

Page 4: Data Visualization in Molecular Biology

4

methylation levels

mRNA expression

copy number status

mutation status

microRNA expression

clinical parameters

pathways

protein expression

sequencingdata

publications

Page 5: Data Visualization in Molecular Biology

5

Data Visualization… makes the data accessible… combines strengths of humans and computers… enables insight… communicates

Page 6: Data Visualization in Molecular Biology

6

THREE TOPICS

Teaser Picture

Page 7: Data Visualization in Molecular Biology

7

Cancer Subtype Analysis

Page 8: Data Visualization in Molecular Biology

8

Pathways & Experimental Data

Page 9: Data Visualization in Molecular Biology

9

Managing Pathways & Cross-Pathway Analysis

Page 10: Data Visualization in Molecular Biology

10

Who am I?PostDoc @ Harvard, Hanspeter Pfister’s GroupPhD from TU Graz, AustriaCo-leader ofCaleydo Project

Page 11: Data Visualization in Molecular Biology

11

What is Caleydo?Software analyzing molecular biology data

Software for doing research in visualizationdeveloped in academic setting

platform for trying out radically new visualization ideas

Quest for compromise between academic prototyping and ready-to-use software

Page 12: Data Visualization in Molecular Biology

12

What is Caleydo?Open source platform for developing visualization and data analysis techniques

easily extendible due to plug-in architecture

you can create your own views

you can plug-in your own algorithms

Page 13: Data Visualization in Molecular Biology

13

The TeamMarc Streit Johannes Kepler University Linz, AT

Christian Partl Graz University of Technology, AT

Samuel Gratzl Johannes Kepler University Linz, AT

Nils Gehlenborg Harvard Medical School, Boston, USA

Dieter Schmalstieg Graz University of Technology, AT

Hanspeter Pfister Harvard University, Cambridge, USA

Page 14: Data Visualization in Molecular Biology

14

CANCER SUBTYPE VISUALIZATION

Caleydo StratomeX

Page 15: Data Visualization in Molecular Biology

15

Cancer SubtypesCancer types are not homogeneousThey are divided into Subtypes

different histology

different molecular alterations

Subtypes have serious implicationsdifferent treatment for subtypes

prognosis varies between subtypes

Page 16: Data Visualization in Molecular Biology

16

Cancer Subtype Analysis

Done using many different types of data,for large numbers of patients.

Page 17: Data Visualization in Molecular Biology

18

Goal: Support tumor subtype

characterization through

Integrative visual analysis of cancer genomics data sets.

Page 18: Data Visualization in Molecular Biology

19

StratificationPa

tient

sTabular Data

Candidate Subtypes

Genes, Proteins, etc.

Page 19: Data Visualization in Molecular Biology

20

Stratification of a Single Dataset

Cluster A1

Cluster A2

Cluster A3

Subtypes are identified by stratifying datasets, e.g.,

based on an expression pattern

a mutation status

a copy number alteration

a combination of these

Page 20: Data Visualization in Molecular Biology

21

Patie

nts

Genes

Candidate Subtype /Heat Map

Header /Summary of whole Stratification

Page 21: Data Visualization in Molecular Biology

22

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 22: Data Visualization in Molecular Biology

23

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 23: Data Visualization in Molecular Biology

24

Clustering of mRNA Data

Stratification on Copy Number Status

Many shared Patients

Page 24: Data Visualization in Molecular Biology

25

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 25: Data Visualization in Molecular Biology

26

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Dependent Data,e.g. clinical data

Dep. C1

Dep. C2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 26: Data Visualization in Molecular Biology

27

Survival data in Kaplan Meier plots

Page 27: Data Visualization in Molecular Biology

28

Detail View

Page 28: Data Visualization in Molecular Biology

29Dependent Pathway

Page 29: Data Visualization in Molecular Biology

30

Page 30: Data Visualization in Molecular Biology

31Stratification based on

clinical variable (gender)

Page 31: Data Visualization in Molecular Biology

32

How to Choose Stratifications?~ 15 clusterings per matrix~ 15,000 stratifications for copy number & mutations~ 500 pathways~ 20 clinical variablesCalculating scores for matchesRanking the results

Page 32: Data Visualization in Molecular Biology

33

Query column Result column

Ranked Stratifications

Considered Datasets

Page 33: Data Visualization in Molecular Biology

34

Algorithms for finding..… matching stratification… matching subtype… mutual exclusivity… relevant pathway… stratification with significant effect in survival… high/low structural variation

Page 34: Data Visualization in Molecular Biology

35

Live-Demo!http://stratomex.caleydo.org

Page 35: Data Visualization in Molecular Biology

36

PATHWAYS & EXPERIMENTAL DATA

Caleydo enRoute

Teaser Picture

Page 36: Data Visualization in Molecular Biology

37

Experimental Data and Pathways

Cannot account for variation found in real-world dataBranches can be (in)activated due to

mutation,

changed gene expression,

modulation due to drug treatment,

etc.

Page 37: Data Visualization in Molecular Biology

38

Why use Visualization?Efficient communication of information

A -3.4B 2.8C 3.1

D -3E 0.5F 0.3

C

B

D

F

A

E

Page 38: Data Visualization in Molecular Biology

39

Experimental Data and Pathways

[Lindroos2002]

[KEGG]

Page 39: Data Visualization in Molecular Biology

40

Five RequirementsIdeal visualization technique addresses all

Talking about 3 today

Page 40: Data Visualization in Molecular Biology

41

R I: Data ScaleLarge number of experiments

Large datasets have more than 500 experiments

Multiple groups/conditions

Page 41: Data Visualization in Molecular Biology

42

R II: Data HeterogeneityDifferent types of data, e.g.,

mRNA expression numerical

mutation statuscategorical

copy number variation ordered categorical

metabolite concentration numerical

Require different visualization techniques

Page 42: Data Visualization in Molecular Biology

43

R V: Supporting Multiple Tasks

Two central tasks:Explore topology of pathway

Explore the attributes of the nodes (experimental data)

Need to support both!

C

B

D

F

A

E

Page 43: Data Visualization in Molecular Biology

44

Pathway View

A

E

C

B

D

F

Pathway View

C

B

D

F

A

E

enRoute View

Concept

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Page 44: Data Visualization in Molecular Biology

45

Pathway View

On-Node Mapping

Path highlighting with Bubble Sets [Collins2009]

IGF-1

low high

Page 45: Data Visualization in Molecular Biology

46

enRoute View

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Path Representation

Page 46: Data Visualization in Molecular Biology

47

enRoute View

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Experimental Data Representation

Page 47: Data Visualization in Molecular Biology

48

Experimental Data Representation

Gene Expression Data (Numerical)

Copy Number Data (Ordered Categorical)

Mutation Data

Page 48: Data Visualization in Molecular Biology

49

enRoute View – Putting All Together

Page 49: Data Visualization in Molecular Biology

50

CCLE Cell lines & Cancer Drugs

Analysis by Anne Mai Wasserman

Page 50: Data Visualization in Molecular Biology

51

MANAGING PATHWAYS & CROSS-PATHWAY ANALYSIS

Collaboration with AM Wassermann, M Borowsky, M Glick @NIBR

Teaser Picture

Page 51: Data Visualization in Molecular Biology

52

PathwaysPartitioning in pathways is artificialPurpose: reduce complexity

“Relevant” subset of nodes and edge

Makes it hard tounderstand cross-talk

identify role of nodes in other pathways

Page 52: Data Visualization in Molecular Biology

53[Klukas & Schreiber 2006]

Page 53: Data Visualization in Molecular Biology

54

Solution: Contextual Subsets

Page 54: Data Visualization in Molecular Biology

55

Solution: Contextual Subsets

Page 55: Data Visualization in Molecular Biology

56

Page 56: Data Visualization in Molecular Biology

57

Levels of Detail

Experimental data highlights

Thumbnail showing paths

Page 57: Data Visualization in Molecular Biology

58

How to Select Pathways?Search PathwayFind pathways that contain focus nodeFind pathway that is similar to another one

Page 58: Data Visualization in Molecular Biology

59

Ranked pathway list

Datasets

Selected Path

Page 59: Data Visualization in Molecular Biology

60

Integrating enRoute

Page 60: Data Visualization in Molecular Biology

61

Video!http://enroute.caleydo.org

Page 61: Data Visualization in Molecular Biology

62

More Information

http://caleydo.orgSoftware, Help, Project Information, Publications, Videos

Page 62: Data Visualization in Molecular Biology

63

Data Visualization In Molecular Biology

Alexander Lex, Harvard [email protected]://alexander-lex.com

?Credits:

Marc Streit, Nils Gehlenborg, Hanspeter Pfister, Anne Mai

Wasserman, Mark Borowsky,, Christian Partl, Denis Kalkofen,

Samuel Gratzl, Dieter Schmalstieg