Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
-
Upload
functional-genomics-data-society -
Category
Technology
-
view
147 -
download
0
description
Transcript of Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
![Page 1: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/1.jpg)
One gene – one transcript… mostly
Mar Gonzales Porta, Adam Frankish, Johan Rung, Jennifer Harrow, Alvis Brazma European Bioinformatics Institute European Molecular Biology Laboratory Wellcome Trust Sanger Institute
![Page 2: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/2.jpg)
Analysis of RNA-seq data across different tissues and cell lines reveals that the majority of genes have a single dominant transcript
Mar Gonzàlez-Porta Adam Frankish (Sanger) Johan Rung (EBI) Jennifer Harrow (Sanger) Alvis Brazma (EBI)
To appear in Genome Biology on July 1, 2013 http://genomebiology.com/
![Page 3: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/3.jpg)
aaaa CAP ATG
ter
polypeptide
transcription
splicing
translation
Figure 1
Central dogma of molecular biolgoy
Gene
Transcript
Spliced transcript
![Page 4: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/4.jpg)
The number of genes in human genome
• Estimates before the Human Genome Project • ~100,000 genes
• After • ~21,000 genes
• By comparison • Yeast ~6,000 genes
• C. Elegance ~17,000
![Page 5: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/5.jpg)
Central gogma revised – one gene many transcritps and proteins via alternative splicing
In human, there are 21,405 protein coding genes and 141,031 different isoforms (92,581 of which are protein coding) annotated, 17,413 genes have >1 isoform
![Page 6: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/6.jpg)
From RNA-seq and other recent experiments
• Most human genes have more than one splice-form expressed [Pan et al, Nat Genetics 2008], [Wang et al, Nat Genetics 2008] [Mortazavi, Nat Methods 2008]
• Several isoforms per gene are often expressed to significant levels either in the same cell type or across different [Wang et al, Nat Genetics 2008], [Tang et al, Nat Methods 2009], [Trapnell et al, Nat Biotechy 2010]
• Isoform expression is regulated [Waks et al, Mol Syst Biol 2011] but splicing can be noisy [Melamud, NAR 2009]
![Page 7: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/7.jpg)
ENCODE – Nature, September 2012
• ‘Isoform expression by a gene does not follow a minimalistic expression strategy, resulting in a tendency for genes to express many isoforms simultaneously’
• ‘[…] alternative isoforms within a gene are not expressed at similar levels, and one isoform dominates in a given condition)’
• Which is it then?
![Page 8: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/8.jpg)
A fundamental question still remains – are different isoforms of the same gene expressed at similar levels?
Expression level
Isoforms 1 2 3 4 5 6
?
Isoforms 1 2 3 4 5 6
We think this is a fundamental question related to the complexity of transcritpome
![Page 9: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/9.jpg)
Data and methods
Illumina Body Map 16 tissues PE (80M) no replicates HiSeq 2000
ENCODE 5 cell lines PE (40M) technical replicates GAII
• 2 different datasets: 46 samples
• Three different state-of-art tools for transcript quantification (MISO, Cufflinks and mmseq)
• Direct evidence from splice junctions • Simulated data – can the methods distinuish between the two
scenarios? • All approaches produced very consistent results
![Page 10: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/10.jpg)
Most genes express one predominant transcript
Expression level
79% of cases 56% of cases
2-fold dominance 5-fold dominance
![Page 11: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/11.jpg)
Consistent across all studied tissues
![Page 12: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/12.jpg)
What are the RPKM distributions of first, second, … most abundant isoforms for all genes?
![Page 13: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/13.jpg)
Most genes express one predominant transcript
![Page 14: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/14.jpg)
Most genes express one predominant transcript
We detect a total of 31,902 transcripts expressed above 1 FPKM in at least one tissue and 26,641 of these are major transcripts (ratio 1.12)
![Page 15: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/15.jpg)
![Page 16: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/16.jpg)
There is just over one highly expressed transcript per gene! 85% of transcriptome comes from the dominant transcripts Is it the same dominant transcript in all tissues or different tissues tend to have different dominant transcripts?
![Page 17: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/17.jpg)
Major transcripts tend to be recurrent across tissues
![Page 18: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/18.jpg)
Switch events exist ~ 100 strong switches that change the protein sequence
![Page 19: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/19.jpg)
![Page 20: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/20.jpg)
Some of the major transcripts are non-canonical
AES
![Page 21: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/21.jpg)
Do the dominant transcripts of protein coding genes always code proteins?
• Only in about 80% cases • However much more often in cytosol than in nucleolus
• In nucleolus the retained intron is predominantly located towards the 3’ end of the transcript
• Is this because we extract mRNA before the splicing is completed or can a retained intron be a form of expression regulation?
![Page 22: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/22.jpg)
Conclusions • Most genes express one predominant transcript over the
rest
• ~85% of the mRNA pool comes from major transcripts
• Major transcripts tend to be recurrent across samples, switch events exist but only a small number of these are likely to express different proteins
• Despite the transcriptome complexity the central dogma of molecular biology may be closer to the truth than recently believed!
![Page 23: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/23.jpg)
Mar Gonzales Porta
![Page 24: Mar Gonzales Porta, One gene One transcript, fged_seattle_2013](https://reader033.fdocuments.us/reader033/viewer/2022051611/54ba21444a795974278b46a1/html5/thumbnails/24.jpg)
Funding
• EMBL member countries • European Commission FP7 grants