CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data
description
Transcript of CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data
![Page 1: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/1.jpg)
CRC Project on Robust Transcript Discovery and Quantification from Sequencing
Data
Oct. 7, 2011 live call
UCONN: Ion Mandoiu, Sahar ElsisiGSU: Alex Zelikovsky, Serghei Mangul, Adrian Caciula
Lifetech PI: Dumitru Brinza
![Page 2: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/2.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 3: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/3.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 4: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/4.jpg)
IsoEM: Isoform Expression Level Estimation
• Expectation-Maximization algorithm• Unified probabilistic model incorporating– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores– Repeat and hexamer bias correction
![Page 5: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/5.jpg)
Read-isoform compatibility graphirw ,
a
aaair FQOw ,
![Page 6: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/6.jpg)
Fragment length distribution
• Paired reads
A B C
A C
A B C
A CA C
A B Ci
j
Series1
Fa(i)
Series1
Fa (j)
![Page 7: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/7.jpg)
Fragment length distribution
• Single reads
A B C
A C
A B C
A C
A B C
A C
i
j
Series1
Fa(i)
Series1
Fa (j)
![Page 8: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/8.jpg)
IsoEM Plugin
![Page 9: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/9.jpg)
IsoEM Plugin Outputs
![Page 10: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/10.jpg)
IsoEM Plugin Outputs: FPKM estimates
![Page 11: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/11.jpg)
IsoEM Plugin Outputs: UCSC tracks
![Page 12: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/12.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 13: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/13.jpg)
MAQC data
• 2 RNA samples: UHRR, HBRR • 5 ION runs each
• Gold standard• Gene expression levels measured in quadruplicate by qPCR
for 832 Ensembl genes [MAQC Consortium 06]
![Page 14: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/14.jpg)
IsoEM results on ION HBR runs
![Page 15: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/15.jpg)
IsoEM results on ION UHR runs
![Page 16: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/16.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 17: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/17.jpg)
IsoEM vs. Cufflinks 1.0.3 on ION reads
IsoEM HBR Cufflinks HBR IsoEM UHR Cufflinks UHR0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
R2 fo
r Iso
EM/C
cuffl
inks
Esti
mat
es v
s qP
CR
![Page 18: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/18.jpg)
HBR GOG-139_281
1 10 1001
10
100
1000
R² = 0.208761354805839
Cufflinks
qPCR Estimates
Cuffl
inks
Esti
mat
es
1 10 1001
10
100
R² = 0.736479062391342
IsoEM
qPCR Estimates
IsoE
M E
stim
ates
![Page 19: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/19.jpg)
MAQC Illumina datasets
![Page 20: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/20.jpg)
250k 500k 1M 2M 4M 7M all0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Reads
R2
Average R2 for 5 ION Torrent MAQC HBR Runs (avg. 1,559,842 reads)R2 for combined reads from 5 ION Torrent MAQC HBR Runs (7,799,210 reads)
R2 of IsoEM estimates from ION & Illumina HBR reads
![Page 21: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/21.jpg)
250k 500k 1M 2M 4M 7M all0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Reads
R2
Average R2 for 5 ION Torrent MAQC UHR Runs (average 1,941,663 reads)R2 for combined reads from 5 ION Torrent MAQC UHR Runs (9,708,315 reads)
R2 of IsoEM estimates from ION & Illumina UHR reads
![Page 22: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/22.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 23: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/23.jpg)
Virtual Transcript Expectation Maximization (VTEM)
ML estimates of transcriptfrequencies
Computeexpected exons
frequencies
Update weightsof reads in
virtual transcript
EM(Partially) Annotated
Genome+ Virtual Transcript
with 0-weightsin virtual transcript
Virtual Transcript frequencychange>ε?
Output overexpressed
exons
EM
YESNO
*
![Page 24: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/24.jpg)
Discovery and Reconstruction of Unannotated Transcripts (DRUT)
![Page 25: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/25.jpg)
Experimental results
![Page 26: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/26.jpg)
Outline
1. IsoEM plugin for accurate prediction of transcription level
2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and
deployment on VM
![Page 27: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/27.jpg)
What we would have liked…
• Available RNA-Seq runs on the demo VM• Documentation for XML tag sets used in
instance.html• Installed plugins that run without errors on
demo data• Remote access to production Torrent server
dedicated to plugin developers– Running tests on VM is very slow
![Page 28: CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814573550346895db2438a/html5/thumbnails/28.jpg)
Work in progress for IsoEM plugin v.2– More genomes/transcript libraries– Expose more options to the user• Read mapping algorithm• Filtering of local alignments• Hexamer bias correction• Quality scores• Inference of fragment length distribution (for PE data)
– Bias correction using ERCC– Inference of allele specific isoform expression
(requires RNA-Seq SNP calling plugin)