Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh,...

40
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi 1 University of Virginia, Department of Computer Science deepmotif.org

Transcript of Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh,...

Page 1: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep

Neural Networks

Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi

1

University of Virginia, Department of Computer Science

deepmotif.org

Page 2: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

2University of Virginia

“Dog”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 3: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

3University of Virginia

“Dog”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 4: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

4

“Dog”

ATGCGATCAAGTCTG “Protein Binding Site”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 5: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

“Dog”

ATGCGATCAAGTCTG “Protein Binding Site”

5

Page 6: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

ATGCGATCAAGTCTG “Protein Binding Site”

6DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

???

Page 7: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

ATGCGATCAAGTCTG “Protein Binding Site”

Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications

7DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 8: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

8DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications

ATGCGATCAAGTCTG “Protein Binding Site”

Page 9: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

9DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 10: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

GCGACGAATCG AACGATATGCT CATATCATTTC TGTCAAG CTCGAGTC TATCAAGCTG

Transcription Factor Binding Sites (TFBSs)

TF1 TF1TF2 TF3

10

TF2TF3

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 11: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

AACGATATGCT

TGTCAAGCAAG

ATATCGATATA

AGCATATGCGA

TF1 ( )

TFBS Classification Datasets

11

GCGACGAATCG

CTCGAGTCTCA

CGATATGCTTC

AAGAAGCATTA

CATATCATTTC

TATCAAGCTGG

CGAATGCATAC

ACGACGATTAT

TF2 ( )

TF3 ( )

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 12: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Deep Motif (DeMo) Dashboard Approach

GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA

1.TFBS

NO TFBSTFBSTFBS

NO TFBS

12DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 13: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Deep Motif (DeMo) Dashboard Approach

GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA

1.

2.

TFBSNO TFBS

TFBSTFBS

NO TFBS

13DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 14: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

14DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 15: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Convolutional (CNN)2. Recurrent (RNN)

3. Convolutional-Recurrent (CNN-RNN)

Neural Network Models

15DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 16: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

3 Neural Models

16

Input SequenceProbability ofBinding Site

1. Convolutional (CNN) (short local patterns, or motifs)

3. Convolutional- Recurrent (CNN-RNN)

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

2. Recurrent (RNN) (long term dependencies)

(long term dependencies among motifs)

Page 17: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

17DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 18: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Maps2. Temporal Output Values

3. Class Optimization

Visualization Methods

18DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 19: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Map

Which nucleotides are most important for classification?

positive binding siteX S+

19DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 20: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Map

positive binding siteX S+

20

= “saliency map”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 21: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Map

Positive sentimentX S+

21

This movie has one of the best plots I have seen

= important for classification

This movie has one of the best plots I have seen

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 22: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Map

Positive Test Sequence

Saliency Map

= important nucleotide for prediction

positive binding siteX S+

22DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 23: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

2. Temporal Output Values

What are the model’s predictions at each timestep of the DNA sequence?

positive binding siteX S+

23DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 24: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

2. Temporal Output Values

Check the RNN’s prediction scores when we vary the input of the RNN starting from the beginning to the end of a sequence.

positive binding siteX S+

24DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 25: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

I don’t like the actors, but I really enjoyed this movie

2. Temporal Output Values

positive sentimentXS+

I don’t like the actors, but I really enjoyed this movie

= negative sentiment = positive sentiment25

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 26: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

2. Temporal Output Values

positive binding siteX S+

Positive Test Sequence

RNN Forward Output

RNN Backward Output

= negative binding site prediction = positive binding site prediction 26DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 27: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Maps2. Temporal Output Values

3. Class Optimization

Visualization Methods

27

SequenceSpecific

TF Specific

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 28: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

3. Class Optimization

For a particular TF, what does the optimal binding site sequence look like?

? positive binding site for TF “CBX3”

28DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 29: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

3. Class Optimization

positive binding site for TF “CBX3”

Where X is the input sequence and the score S+ is probability of sequence X being a positive binding site

29DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 30: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Optimal binding site for TF “CBX3”

3. Class Optimization

positive binding site for TF “CBX3”

30DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Page 31: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

31DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 32: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Experimental Setup

Dataset● Alipanahi et al. “Predicting the sequence specificities of DNA- and

RNA-binding proteins by deep learning”. Nature Biotechnology 2015.● 108 cancer cell TFs (train separate model for each TF)● Each sequence is 101-length centered around ChIP-seq peak

Models● Test several variations of 3 different models (CNN, RNN, CNN-RNN)

Evaluation● Compare models using AUC scores on test set● Evaluate visualization methods manually and by motif matching

32DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 33: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Model Accuracy (AUC Scores)

Our Models33

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 34: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

1. Saliency Maps

34DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

= important nucleotide for prediction

GATA1

Page 35: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

2. Temporal Output Values

35DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

= positive binding site prediction

= negative binding site prediction

NFYB

Page 36: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Saliency Map AND Temporal Output Values

36DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

NFYB

Page 37: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

3. Class Optimization

37DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

GATA1

Page 38: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

DeMo Dashboard

38DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 39: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Deep Motif (DeMo) Dashboard Contributions and Results

1. Comparative analysis of 3 different neural models on TFBS task

● CNN-RNNs perform the best

2. Presented 3 different visualization techniques to understand the predictions of neural models

● Although TFBSs are influenced by motifs, the interactions among motifs are also important

39DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Page 40: Deep Motif Dashboard: Visualizing and Understanding ... · DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia. Thank You! 40 UVA Machine Learning and Biomedicine

Thank You!

40

UVA Machine Learning and Biomedicine Group

Ritambhara Singh Beilun Wang Dr. Yanjun Qi

code available at: deepmotif.org

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia