Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results
-
Upload
joachim-jacob -
Category
Science
-
view
347 -
download
1
description
Transcript of Part 6 of RNA-seq for DE analysis: Detecting biology from differential expression analysis results
This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
RNA-seq for DE analysis training
The biology behind expression differencesJoachim Jacob22 and 24 April 2014
2 of 30
Overview
http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
3 of 30
Analyzing the DE analysis results
The 'detect differential expression' tool gives you four results: the first is the report including graphs.
Only lower than cut-off and with indep filtering.
All genes, with indep filtering applied.
Complete DESeq results, without indep filtering applied.
4 of 30
Analyzing the DE analysis results
Only lower than cut-off and with indep filtering.
All genes, with indep filtering applied.
Complete DESeq results, without indep filtering applied.
5 of 30
Setting a cut-off
You choose a cut-off! You can go over the genes one by one, and look for 'interesting' genes, and try to link it to the experimental conditions.
Alternative: we can take all genes, ranked by their p-value (which stands a 'level of surprise'). Pro: we don't need our arbitrary cut-off.
6 of 30
Analysis of the list of DE genes
All genes (6666 yeast genes)Genes sensible to test (filtered out 10% of the lowest genes) (5830 yeast genes)
DE genes with p-value cut-off of 0,01 (637 genes)
7 of 30
Gene set enrichment
● We use the knowledge already available on biology. We construct list of genes for:● Pathways● Biological processes● Cellular components● Molecular functions● Transcription binding sites● ...
http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
8 of 30
Getting lists of genes
● Gene Ontology consortium
● Reactome:
9 of 30
A many-to-many relationLinking gene IDs to molecular function.
… to binding partners
... to transcription factorbinding sites.
10 of 30
Biomart can help you fetch sets
11 of 30
Biomart can help you
12 of 30
Contingency approach
637/5830
DE results Gene set 1
15/56
Is the portion ofDE Genes equal?
(hypergeometric test)
Significantly DE genes
13 of 30
Contingency approach
637/5830
DE results Gene set 2
5/30
14 of 30
Contingency approach
637/5830
DE results Gene set 3
34/78
Not equal! Gene set enriched
15 of 30
Artificial?DE results
But our cut-off remains artificial, arbitrarily chosen. Rerun with different cut-off: you will detect other significant sets!
The background needs to be carefully chosen. This approach favors gene sets with genes whose expression differs a lot ('high level of surprise', p-value).
Pick me!
16 of 30
Contingency table approach tools
http://wiki.bits.vib.be/index.php/Gene_set_enrichment_analysis
17 of 30
DAVID uses the contingency approach
Need to define the complete gene set tested!
Your list of DE genes
18 of 30
Cut-off free approach: GSEA
No cut-off needs to be chosen using GSEA and derived methods!
We take into account all genes for which we get a reliable p-value. (see the p-value histogram chart).
The genes are sorted/ranked according to 'level of surprise', i.e. by their p-value. (other options are test-statistics (T,...))
19 of 30
Intuition of GSEA
0 1p-value
Gene set 1
Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
Running sum:Every occurrence
increases the sum, every absence
decreases the sum.The maximum is
the MES, the final score
20 of 30
Intuition of GSEA
0 1p-value
Gene set 2 Higher running sum MES
Gene set 3
Gene set 4
Median running sum MES
Low running sum MES
The scores are compared to permutated/shuffled gene set (sample label versus gene label permutation).
21 of 30
Cut-off free approach: GSEA
The advantages:● Robustness about mapping errors influencing counts● The set can be detected even if some genes are not present.● Tolerance if gene set contains incorrect genes.● Strong signal if all genes are only seemingly lightly overexpressed.
22 of 30
With cut-off applied
Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
Significant DE genes (p-value <0,05)
Genes involved in oxidative phosphorylation
23 of 30
Cut-off free approach
Genes involved in oxidative phosphorylation are nearly all slightly overexpressed. This can be detected by gene set analysis.
Mootha et al. http://www.nature.com/ng/journal/v34/n3/full/ng1180.html
24 of 30
GSEA has inspired others.
Varemo et al. http://nar.oxfordjournals.org/content/early/2013/02/26/nar.gkt111
Different methods exist to rank the genes, to calculate the running sum, and to check significance of the running sum. In addition, directionality of the changes can be incorporated.
25 of 30
GSEA has inspired others
Piano
SPIA
26 of 30
Piano provides a consensus output
Piano has combined different GSEA methods and calculates a consensus score. It does this for 5 different types of 'directionality classes'.
The main output is a heatmap with gene set significantly enriched, depleted or just changed.
Ranks! Lower is 'more important'Ranks! Lower is 'more important'
The sets
27 of 30
Piano provides a consensus output
1) distinct-directional down: gene set as a whole is downregulated.2) mixed-directional down: A subset of the set is significantly downregulated3) non-directional: the set is enriched in significant DE genes without takinginto account directionality.4) mixed-directional up: A subset of the set is significantly upregulated5) distinct-directional up: gene set as a whole is upregulated.
28 of 30
KeywordsGene set
Contingency approach
T-statistic
P-value histogram
GSEA
heatmap
Directionality of expression changes
The meaning of the p-value cut-off
Write in your own words what the terms mean
29 of 30
Exercise
● → Exploring the biology behind observed changes
30 of 30
Finish!
● Congratulate yourself for your new skills! Enjoy!
Figure: http://kristiholl.net/writers-blog/2013/10/press-on-to-finish-strong/