Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

30
This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof. RNA-seq for DE analysis training The biology behind expression differences Joachim Jacob 22 and 24 April 2014

description

Sixth part of the training session 'RNA-seq for Differential expression analysis'. We explain how we extract biological meaningful results from differential expression analysis results, based on RNA-seq. Interested in following this session? Please contact http://www.jakonix.be/contact.html

Transcript of Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

Page 1: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.

RNA-seq for DE analysis training

The biology behind expression differencesJoachim Jacob22 and 24 April 2014

Page 2: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

2 of 30

Overview

http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html

Page 3: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

3 of 30

Analyzing the DE analysis results

The 'detect differential expression' tool gives you four results: the first is the report including graphs.

Only lower than cut-off and with indep filtering.

All genes, with indep filtering applied.

Complete DESeq results, without indep filtering applied.

Page 4: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

4 of 30

Analyzing the DE analysis results

Only lower than cut-off and with indep filtering.

All genes, with indep filtering applied.

Complete DESeq results, without indep filtering applied.

Page 5: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

5 of 30

Setting a cut-off

You choose a cut-off! You can go over the genes one by one, and look for 'interesting' genes, and try to link it to the experimental conditions.

Alternative: we can take all genes, ranked by their p-value (which stands a 'level of surprise'). Pro: we don't need our arbitrary cut-off.

Page 6: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

6 of 30

Analysis of the list of DE genes

All genes (6666 yeast genes)Genes sensible to test (filtered out 10% of the lowest genes) (5830 yeast genes)

DE genes with p-value cut-off of 0,01 (637 genes)

Page 7: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

7 of 30

Gene set enrichment

● We use the knowledge already available on biology. We construct list of genes for:● Pathways● Biological processes● Cellular components● Molecular functions● Transcription binding sites● ...

http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis

Page 8: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

8 of 30

Getting lists of genes

● Gene Ontology consortium

● Reactome:

Page 9: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

9 of 30

A many-to-many relationLinking gene IDs to molecular function.

… to binding partners

... to transcription factorbinding sites.

Page 10: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

10 of 30

Biomart can help you fetch sets

Page 11: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

11 of 30

Biomart can help you

Page 12: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

12 of 30

Contingency approach

637/5830

DE results Gene set 1

15/56

Is the portion ofDE Genes equal?

(hypergeometric test)

Significantly DE genes

Page 13: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

13 of 30

Contingency approach

637/5830

DE results Gene set 2

5/30

Page 14: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

14 of 30

Contingency approach

637/5830

DE results Gene set 3

34/78

Not equal! Gene set enriched

Page 15: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

15 of 30

Artificial?DE results

But our cut-off remains artificial, arbitrarily chosen. Rerun with different cut-off: you will detect other significant sets!

The background needs to be carefully chosen. This approach favors gene sets with genes whose expression differs a lot ('high level of surprise', p-value).

Pick me!

Page 16: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

16 of 30

Contingency table approach tools

http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis

Page 17: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

17 of 30

DAVID uses the contingency approach

Need to define the complete gene set tested!

Your list of DE genes

Page 18: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

18 of 30

Cut-off free approach: GSEA

No cut-off needs to be chosen using GSEA and derived methods!

We take into account all genes for which we get a reliable p-value. (see the p-value histogram chart).

The genes are sorted/ranked according to 'level of surprise', i.e. by their p-value. (other options are test-statistics (T,...))

Page 19: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

19 of 30

Intuition of GSEA

0 1p-value

Gene set 1

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html

Running sum:Every occurrence

increases the sum, every absence

decreases the sum.The maximum is

the MES, the final score

Page 20: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

20 of 30

Intuition of GSEA

0 1p-value

Gene set 2 Higher running sum MES

Gene set 3

Gene set 4

Median running sum MES

Low running sum MES

The scores are compared to permutated/shuffled gene set (sample label versus gene label permutation).

Page 21: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

21 of 30

Cut-off free approach: GSEA

The advantages:● Robustness about mapping errors influencing counts● The set can be detected even if some genes are not present.● Tolerance if gene set contains incorrect genes.● Strong signal if all genes are only seemingly lightly overexpressed.

Page 22: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

22 of 30

With cut-off applied

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html

Significant DE genes (p-value <0,05)

Genes involved in oxidative phosphorylation

Page 23: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

23 of 30

Cut-off free approach

Genes involved in oxidative phosphorylation are nearly all slightly overexpressed. This can be detected by gene set analysis.

Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html

Page 24: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

24 of 30

GSEA has inspired others.

Varemo et al. http://nar.oxfordjournals.org/content/early/2013/02/26/nar.gkt111

Different methods exist to rank the genes, to calculate the running sum, and to check significance of the running sum. In addition, directionality of the changes can be incorporated.

Page 25: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

25 of 30

GSEA has inspired others

Piano

SPIA

Page 26: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

26 of 30

Piano provides a consensus output

Piano has combined different GSEA methods and calculates a consensus score. It does this for 5 different types of 'directionality classes'.

The main output is a heatmap with gene set significantly enriched, depleted or just changed.

Ranks! Lower is 'more important'Ranks! Lower is 'more important'

The sets

Page 27: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

27 of 30

Piano provides a consensus output

1) distinct-directional down: gene set as a whole is downregulated.2) mixed-directional down: A subset of the set is significantly downregulated3) non-directional: the set is enriched in significant DE genes without takinginto account directionality.4) mixed-directional up: A subset of the set is significantly upregulated5) distinct-directional up: gene set as a whole is upregulated.

Page 28: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

28 of 30

KeywordsGene set

Contingency approach

T-statistic

P-value histogram

GSEA

heatmap

Directionality of expression changes

The meaning of the p-value cut-off

Write in your own words what the terms mean

Page 30: Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results

30 of 30

Finish!

● Congratulate yourself for your new skills! Enjoy!

Figure: http://kristiholl.net/writers-blog/2013/10/press-on-to-finish-strong/