Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas...
-
Upload
andrea-lawrence -
Category
Documents
-
view
213 -
download
0
Transcript of Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas...
![Page 1: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/1.jpg)
Analyzing digital gene expression data in Galaxy
Supervisors:
Peter-Bram A.C. ’t Hoen
Kostas Karasavvas
Students:
Ilya Kurochkin
Ivan Rusinov
![Page 2: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/2.jpg)
GalaxyGalaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.
![Page 3: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/3.jpg)
Adding new tool in Galaxy
To add new tool in Galaxy you need:• Tool definition file in xml format
• The tool script
![Page 4: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/4.jpg)
...
![Page 5: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/5.jpg)
SAGE• Sequence and count short tags representative for a
transcript• Absolute abundance of transcript
![Page 6: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/6.jpg)
Existing pipeline for analyzing DeepSAGE data
GAPSS: General analysis pipeline for second generation sequencers
Implemented in Galaxy
Some final steps were missed:- Gene annotation (ENSEMBL/Biomart) and summarization- Statistical analysis of differential gene expression
![Page 7: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/7.jpg)
Existing workflow
![Page 8: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/8.jpg)
Gene annotation and summarization
Tool for counting DeepSAGE tags in
ENSEMBL annotated exons.
Tool for automatic BioMart format file obtaining.
![Page 9: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/9.jpg)
Obtain BioMart format file
![Page 10: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/10.jpg)
Count DeepSAGE tags in annotated exons
Input files:1) BioMart format file:
2) SAM format file:
![Page 11: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/11.jpg)
Count DeepSAGE tags in annotated exons
![Page 12: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/12.jpg)
Count DeepSAGE tags in annotated exons
Output file:
![Page 13: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/13.jpg)
Count DeepSAGE tags in annotated exons
1. For each line in SAM file reads all Biomart file. (~1 second/line)
2. BioMart file load into dictionary, data splits by chromosome name and strand. (50 seconds for 10,000 lines)
3. SAM file is loaded into dictionary, data splits by chromosome name, strand and genomic position. (16 seconds for 10,000 lines)
4. Work with several SAM files.
5. Both files are loaded into dictionaries. (16 seconds for 10,000 lines; ~16 minutes for 7,768,787 lines)
6. Sort BioMart dictionary by exon coordinates, problem with crossing and repeated exons.
7. Binary search for position from SAM file in sorted list of exon coordinates was implemented. (77 seconds for 7,768,787 lines)
![Page 14: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/14.jpg)
About R/Bioconductor
• R is a language and environment for statistical computing and graphics.
• Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development.
![Page 15: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/15.jpg)
Statistical analysis of differential gene expression
Tool for examining differential expression of replicated count data using edgeR package of Bioconductor
Tool for estimating the variance in count data and test for differential expression using DESeq package of Bioconductor
![Page 16: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/16.jpg)
Analysis of differentially expressed genes (edgeR)
Input files:1. DeepSAGE tags in annotated
exons counter output file2. Metadata file
Design matrix Contrast vector
1
-1
0
Generalized linear model
![Page 17: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/17.jpg)
Analysis of differentially expressed genes (edgeR)
![Page 18: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/18.jpg)
Analysis of differentially expressed genes (edgeR)
Output file:
![Page 19: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/19.jpg)
Analysis of differentially expressed genes (DESeq)
Test for differences between the base means of two levels
Input files:1. DeepSAGE tags in annotated
exons counter output file2. Metadata file
Create a CountDataSet object
Estimate the effective library size for a CountDataSet
Estimate the variance functions for a CountDataSet
![Page 20: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/20.jpg)
Analysis of differentially expressed genes (DESeq)
![Page 21: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/21.jpg)
Analysis of differentially expressed genes (DESeq)
Output file:
![Page 22: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/22.jpg)
Comparison of results obtained by edgeR and DESeq
![Page 23: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/23.jpg)
Full workflow
![Page 24: Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.](https://reader035.fdocuments.us/reader035/viewer/2022081603/5697c0011a28abf838cc23c2/html5/thumbnails/24.jpg)
Thank you for your attention
Any questions?