On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza...
-
Upload
lawrence-norton -
Category
Documents
-
view
212 -
download
0
Transcript of On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza...
![Page 1: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/1.jpg)
On the biological On the biological significance of significance of
alternative splicing: a alternative splicing: a bioinformatics approachbioinformatics approach
Sandro J. de Souza
TDR, 07/05/2004
RNA 10:757-765, 2004
![Page 2: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/2.jpg)
Genomics
Bioinformatics
Large-scale Biology
![Page 3: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/3.jpg)
The Real Revolution
Early 20th century: Mendel and the inheritance laws
Mid 20th century: DNA as the genetic element (Avery)
Mid 20th century: Watson and Crick and the structure of DNA.
70’s and 80’s: Molecular biology/biotechnology
90’s and 21th century: Genomics and Bioinformatics
Paradigm in Biology: Evolution by means of natural selection(Darwin and Wallace, mid 19th century)
![Page 4: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/4.jpg)
BioinformaticsBioinformatics
Development of toolsDevelopment of tools Gateway to explore new datasetsGateway to explore new datasets Processing of data derived from Processing of data derived from
large-scale projectslarge-scale projects A new way to do hypothesis-driven A new way to do hypothesis-driven
sciencescience
![Page 5: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/5.jpg)
![Page 6: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/6.jpg)
Splicing (1977)Splicing (1977)Roberts and Sharp (Nobel 1993)Roberts and Sharp (Nobel 1993)
![Page 7: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/7.jpg)
Exons Introns
mRNA
Coding Non-coding
![Page 8: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/8.jpg)
![Page 9: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/9.jpg)
![Page 10: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/10.jpg)
Exon Intron Exon A G G U A A G U … Py12 N C A G N 64 73 100 100 62 68 84 63 65 100 100 5’ site 3’ site
SplicingSplicing
Splicing depends on recognition of exon-intron boundaries
Splice sites are generic and consist solely of:
5’ boundary3’ boundaryAcceptor sitePolypyrimidine tract
![Page 11: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/11.jpg)
.....if they occur at the boundaries of the regions to be spliced
out, can change the splicing pattern, resulting in the deletion
or addition of whole sequences of amino acids.
Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.
![Page 12: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/12.jpg)
At least half of all human genes undergo alternative
splicing
Biological significance or spurious events?
![Page 13: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/13.jpg)
Alternative splicing
1. Chromosomal ratio activates txn of Sxl in females only
2. SXL controls splicing of tra-2 mRNA
3. Females: exon 2 (which has a stop codon) is removed via SXLMales: exon 2 is not removed.
4. Males: no active TRAFemales: TRA is made.
5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.
![Page 14: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/14.jpg)
Alternative Splicing – Auditory Hair CellsAlternative Splicing – Auditory Hair Cells
Cytosol
PM
AVSGRKAVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK
Ca2+ concentration at which K+ channel opens depends on alternative splicing of K+ channel – 576 possible alternative splicing combinations
K+ channel
Dotted lines show regions of the protein dependent on splicing
Picture of human cochleal hair cells from http://www.sickkids.on.ca/otolaryngology/Hearloss.asp
Sound frequency
Cytosolic Ca2+ concentration
K+ channel opens
Therefore Ca2+ concentration ‘decodes’ frequency
![Page 15: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/15.jpg)
Types of alternative splicing:
Exon skipping
Intron Retention
5´ 3´
Alternative 5’ splic. site
Alternative 3’ splic. site
mRNA
![Page 16: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/16.jpg)
Large-scale analysis Large-scale analysis of intron retention in of intron retention in
the human the human transcriptometranscriptome
Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager,Sandro J. de Souza
![Page 17: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/17.jpg)
Examples of intron retention Examples of intron retention events with biological events with biological
significancesignificance
Msl2 in DrosophilaMsl2 in Drosophila P element in DrosophilaP element in Drosophila retrovirusesretroviruses
![Page 18: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/18.jpg)
Transmembrane domain
In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region.
Hydrophilic stretchThis intron is not removed in activated B cells, giving rise to a truncated (secreted) product
Ig gene Immature B Cell
Stop codonsStop codonsHydrophilic tailTransmembrane domain
Activation
Immature B cells express membrane-bound Ig. Activation leads to production of secreted form
![Page 19: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/19.jpg)
Intron retention and cancerIntron retention and cancer
CD44 several tumorsGastrin receptor pancreasRet tyrosine kinase pheochromocytomasFas receptor T-cell lymphoma
![Page 20: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/20.jpg)
Transcriptome Database
EST data
Known mRNAs
SAGE data
Genome Data
![Page 21: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/21.jpg)
Genome-based cDNA clusteringGenome-based cDNA clustering
Exon 1
DNA
RNAm
cluster
Exon 2 Exon 3
![Page 22: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/22.jpg)
Transcript Mapping
P53
![Page 23: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/23.jpg)
Types of Data
![Page 24: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/24.jpg)
RetentionRetentionPrototypePrototype
Full length Full length ESTEST TotalTotal
Full length Full length 640640 691691 11201120
ESTEST 25942594 n.dn.d 25942594
TotalTotal 27932793 691691 31273127
Dataset
![Page 25: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/25.jpg)
Experimental validationExperimental validation
![Page 26: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/26.jpg)
14% of all human genes show evidence of intron retention
Kan, States & Gish (2002)36% of RefSeq database!
After sample statistics: 5%
![Page 27: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/27.jpg)
Distribution of events along transcripts.
elite groupelite group
events inevents in observeobservedd
expectedexpected
CDSCDS 287 287 (53%)(53%)
502 502 (93%)(93%)
5’ UTR5’ UTR 84 (15%)84 (15%) 27 (5%)27 (5%)
3’ UTR3’ UTR 170 170 (32%)(32%)
12 (2%)12 (2%)
MGCMGC
ObservedObserved expectedexpected
87 (52%)87 (52%) 155 (93%)155 (93%)
15 (9%)15 (9%) 8 (5%)8 (5%)
65 (39%)65 (39%) 4 (2%)4 (2%)
This bias can be a product of:
Underreporting of sequences
Nonsense-mediated decay (NMD)
p << 0.005
p << 0.005
![Page 28: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/28.jpg)
2563 out of 3195 (80%) 2563 out of 3195 (80%) sequences with a retained sequences with a retained intron had an exon/exon intron had an exon/exon boundary downstream of the boundary downstream of the retention event.retention event.
![Page 29: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/29.jpg)
Retained introns are shorter Retained introns are shorter
P<<<<0.001
![Page 30: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/30.jpg)
Domains encoded by retained Domains encoded by retained intronsintrons
![Page 31: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/31.jpg)
Number of domains entirely encoded by:Retained introns only: 02Exon-intron-exon: 31
Number of domains partially encoded by:Retained introns only: 25Exon-intron-exon: 10
![Page 32: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/32.jpg)
Retained introns have a higher GC content
P<<<<0.001
![Page 33: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/33.jpg)
Did retained introns encode Did retained introns encode protein domains?protein domains?
Only retained introns in the CDS Only retained introns in the CDS were used.were used.
Only retained introns defined by full-Only retained introns defined by full-length mRNAs were used.length mRNAs were used.
Protein sequences were searched Protein sequences were searched against PFAM database.against PFAM database.
![Page 34: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/34.jpg)
Codon UsageCodon Usage
![Page 35: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/35.jpg)
Conservation of intron retention in mouse cDNA sequences
40%-57% of all retained introns present a mouse hit
Identity of orthologous retained introns is 84%
Non-retained introns is 60%; Exons 87%
Mouse cDNA also corresponds to an retention variant
26% - 10 out of 46
![Page 36: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/36.jpg)
Frequency of stop codon
Expected: 1064
88 cases where the retention generates a putative truncated protein
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA
exon exonretained intron
Stop codons – TAG, TGA, TAA
Found 651 stop codons
mRNA
mRNAcds
stopcds
p-value << 0.005
TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC
![Page 37: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/37.jpg)
GC content for sequences upstream and downstream the premature stop codon – 88 cases
GC 58%stop
exon exonretained intron
GC 49%
Are under selective pressure for coding potential
5’ 3’
![Page 38: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/38.jpg)
Why the argument of ‘selection’ is important?
•As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one
•If, however, the new variant has some biological significance, selection will act to maintain the function of this variant.
•Therefore, there should not be any “negative selection” on this variant.
![Page 39: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/39.jpg)
TissueTissue T/NT/N IRIR
BreastBreast TT 1.521.52**
NN 0.620.62
ProstatProstatee
TT 1.451.45**
NN 0.440.44
BrainBrain TT 2.522.52**
NN 3.163.16
ColonColon TT 0.850.85
NN 0.600.60
Intron Retention in Tumors
![Page 40: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/40.jpg)
w/ w/ downstream downstream spliced intronspliced intron
w/ hit w/ w/ hit w/ mouse mouse cDNAs*cDNAs*
encoding encoding protein protein domains*domains*
experimentallexperimentally validated y validated (both forms)(both forms)
2563/31952563/319580 %80 %
74/15274/15249 %49 %
47/15147/15131 %31 %
2/22/2
* full-length vs full-length set andretained intron entirely in the CDS
Towards a reliable set of intron retention events
![Page 41: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/41.jpg)
Second International Conference on Bioinformatics and
Computational Biologywww.icobicobi.com.br
25-28/10/2004Angra dos Reis
![Page 42: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/42.jpg)
Group of Group of Computational Biology Computational Biology
Sandro J. de Souza tennis playerHelena Samaia Research AssistantAna C. Pereira Admin. AssistantMaarten Leerkes Ph.D studentNoboru Sakabe Ph.D studentMaria Vibranovski Ph.D studentElza Helena Ph.D studentNatanja Slater Ph.D studentPedro Galante Ph.D studentElisson C. Osorio programmerJorge E. de Souza Ph.D studentRodrigo Soares programmerAndre Zaiats system admin.
![Page 43: On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.](https://reader036.fdocuments.us/reader036/viewer/2022070415/5697c02b1a28abf838cd8ac0/html5/thumbnails/43.jpg)