Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian ...
description
Transcript of Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian ...
Alex ZelikovskyDepartment of Computer Science
Georgia State University
Joint work with Adrian Caciula (GSU), Serghei Mangul (UCLA) James Lindsay, Ion Mandoiu (UCONN)
Monte-Carlo Regression Algorithm for IsoformFrequency Estimation from RNA-Seq Data
IEEE ICCABS 2013, New Orleans, LA
• RNA-Seq: Introduction
• MCReg: Monte Carlo Regression based Algorithm
• Experimental Results
• Conclusions and Future Work
IEEE ICCABS 2013, New Orleans, LA
Outline
Genome-Guided RNA-Seq Protocol
RNA-Seq enables transcript-level resolution of gene expression
From RNA – through the process of hybridization-
Make cDNA & shatter into fragments
Sequence fragment ends
A B C D E
Map reads to genome
Gene Expression (GE)Isoform Expression (IE)
A B C
A C
D E
Isoform Discovery (ID)
[Nicolae, et. al., 11] IEEE ICCABS 2013, New Orleans, LA
• RNA-Seq: Introduction
• MCReg: Monte Carlo Regression based Algorithm- Observed Read Distribution- MC-Based Estimation of Expected Read Distribution- Regression-Based Estimation of Isoform Frequencies
• Experimental Results
• Conclusions and Future Work
IEEE ICCABS 2013, New Orleans, LA
Outline
MCReg: Monte-Carlo Regression
MCReg Motivation:•Reducing the error rate is critical for detecting similar transcripts especially in those cases when one is a subset of another:
Screenshot from Genome browse:
IEEE ICCABS 2013, New Orleans, LA
• Map paired-end reads onto the library of known isoforms using an ungapped aligner (e.g., Bowtie)
• B. Langmead, C. Trapnell, et. al., “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, p. R25, 2009.
• Group reads that have been mapped to the same transcripts into classes
• Monte-Carlo-Based Estimation of Expected Read Distribution using e.g. Grinder simulator
• F.E. Angly et. al. Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic acids research, 2012
• Solve the regression:The least-square formulation can be solved with a constrained quadratic programming solver
• M. S. Andersen et. al. CVXOPT: A Python package for convex optimization, Available at cvxopt.org, 2013.
General Method Overview
Observed Read Distribution
IEEE ICCABS 2013, New Orleans, LA
Monte-Carlo-Based Estimation of Expected Read Distribution
IEEE ICCABS 2013, New Orleans, LA
MC-Based Estimation of Expected Read Distribution
IEEE ICCABS 2013, New Orleans, LA
Regression-Based Estimation of Isoform Frequencies
IEEE ICCABS 2013, New Orleans, LA
Regression-Based Estimation of Isoform Frequencies
IEEE ICCABS 2013, New Orleans, LA
• RNA-Seq: Introduction
• MCReg: Monte Carlo Regression based Algorithm
• Experimental Results
• Conclusions and Future Work
IEEE ICCABS 2013, New Orleans, LA
Outline
Simulation Setup
IEEE ICCABS 2013, New Orleans, LA
Experimental Results• Frequency estimation accuracy was assessed using
the coefficient of determination r2.
• For IsoEM r2 = 0.92, while for MCReg r2 = 0.97.
• The results shows better correlation compared with IsoEM especially because of those cases of sub-transcripts where IsoEM skewed the estimated frequency toward super-transcripts.
IEEE ICCABS 2013, New Orleans, LA
Thanks!
IEEE ICCABS 2013, New Orleans, LA