Analysis of next generation sequencing experiments with...
Transcript of Analysis of next generation sequencing experiments with...
![Page 1: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/1.jpg)
Analysis of next generation sequencing experiments with Galaxy
March 24, 2011
1 Hot Topics: Galaxy Next Gen.Seq.
![Page 2: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/2.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Previous Hot Topics on Next Generation Sequencing Analysis
• Mapping next generation sequence reads
http://iona.wi.mit.edu/bio/education/hot_topics/shortRead_mapping/Mapping_HTseq.pdf
• Analysis of ChIP-seq experiments
http://iona.wi.mit.edu/bio/education/hot_topics/ChIPseq/ChIPSeq_HotTopics.pdf
• RNA-seq: Methods and Applications
http://iona.wi.mit.edu/bio/education/hot_topics/RNAseq/RNA_Seq.pdf
2
![Page 3: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/3.jpg)
Talk Outline
• Introduction to Galaxy
• Data upload
• Format conversion and quality control tools
• Analysis of ChIP-seq experiments with MACS
• Analysis of RNA-seq experiments with Tuxedo tools
• Demo
3 Hot Topics: Galaxy Next Gen.Seq.
![Page 4: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/4.jpg)
Hot Topics: Galaxy Next Gen.Seq.
What is Galaxy
• A web based platform for analysis of large genomic datasets • No need of programming experience. • Integrates many tools within one interface:
– Easy retrieval of data from UCSC, Biomart and other DBs – Powerful text manipulation tools (data preparation) – Filter on columns, join, sort, compute etc – Format conversion tools (text, tab, bed, GFF …) – Integrates tools from other sources. Ex: EMBOSS – MSA tools – Visualize data in UCSC browser. (See Hot topics Dec 09,
http://iona.wi.mit.edu/bio/education/hot_topics/galaxy/Galaxy.pdf) – Next Generation Sequencing Toolbox
4
![Page 5: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/5.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Documentation and Tutorials
• OpenHelix tutorials and exercises http://www.openhelix.com/cgi/tutorialInfo.cgi?id=82 • Galaxy tutorials http://galaxy.psu.edu/screencasts.html • References Galaxy developers: The Center for Comparative Genomics & Bioinformatics,
Pennsylvania State University
Giardine, B., et al. Galaxy: a platform for interactive large-scale analysis. Genome Research (2005) 15:1451-1455
Taylor, J., et al. Using Galaxy to perform large-scale interactive data analyses. Current Protocols in Bioinformatics (2007) Chapter 10, unit 10.
Blankenberg D., et al. Manipulation of FASTQ data with Galaxy. Bioinformatics. 2010 Jul 15;26(14):1783-5
5
![Page 6: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/6.jpg)
Hot Topics: Galaxy
Galaxy Interface
Tools window
Data display and tools dialog window
History window: datasets for each
analysis are kept here
Processed data Green: job is finished Yellow: job is running Gray: job is in queue Red: there is a problem
Data analysis Log in/out
Create analysis pipelines
6
![Page 7: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/7.jpg)
Security issues
• Need to register to be able to keep your data and history (log in button).
• Your data has to be public to be able to be visualized at UCSC. By default the data is public.
• You could make your data private, download it and visualize in UCSC or other browser.
7 Hot Topics: Galaxy Next Gen.Seq.
![Page 8: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/8.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Security issues II
8
Data is private Data is public
![Page 9: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/9.jpg)
Talk Outline
• Introduction to Galaxy
• Data upload
• Format conversion and quality control tools
• Analysis of ChIP-seq experiments with MACS
• Analysis of RNA-seq experiments with Tuxedo tools
• Demo
9 Hot Topics: Galaxy Next Gen.Seq.
![Page 10: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/10.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Data upload I
• For files larger than 2Gb, transfer to the Galaxy server via the file transfer protocol (FTP).
• Log in to tak (ssh –l userName tak.wi.mit.edu), and cd to the folder that has your files. (See hot topic “introduction to Unix” http://iona.wi.mit.edu/bio/education/hot_topics/unix_2010/slides.pdf)
• Ftp to Galaxy: ftp main.g2.bx.psu.edu Name (main.g2.bx.psu.edu:ibarrasa): Type your email Password: Type your Galaxy password 230 User [email protected] logged in Remote system type is UNIX. Using binary mode to transfer files. ftp>
• Upload file ftp> put FileName
ftp> exit
10
![Page 11: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/11.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Data upload II
11
Upload File
![Page 12: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/12.jpg)
Talk Outline
• Introduction to Galaxy
• Data upload
• Format conversion and quality control tools
• Analysis of ChIP-seq experiments with MACS
• Analysis of RNA-seq experiments with Tuxedo tools
• Demo
12 Hot Topics: Galaxy Next Gen.Seq.
![Page 13: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/13.jpg)
Format conversion and quality control tools
13 Hot Topics: Galaxy Next Gen.Seq.
Convert FASTQ Ilumina to FASTQ Sanger
Summarize QC statistics
Visualize QC statistics
• FASTQ Groomer (convert between various FASTQ quality formats)
• Compute quality statistics
• Draw quality score boxplot • Draw nucleotides distribution chart
NGS: QC and manipulation
Note: FastQC is not incorporated in Galaxy but it is installed in tak .
FASTQ Groomer
![Page 14: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/14.jpg)
Illumina data format
• Fastq format:
• ++ @ILLUMINA-F6C19_0048_FC:5:1:12440:1460#0/1 GTAGAACTGGTACGGACAAGGGGAATCTGACTGTAG +ILLUMINA-F6C19_0048_FC:5:1:12440:1460#0/1 hhhhhhhhhhhghhhhhhhehhhedhhhhfhhhhhh
@seq identifier
seq
+any description
seq quality values
/1 or /2 paired-end
14 Hot Topics: Galaxy Next Gen.Seq.
![Page 15: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/15.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Sequence quality values on different FASTQ formats
15
http://en.wikipedia.org/wiki/FASTQ_format
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS............................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh | | | | | 33 59 64 73 104 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40)
To discriminate between Solexa and Illumina 1.3+ check if your sequences have
any of the characters: :;<=>?
![Page 16: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/16.jpg)
Hot Topics: Galaxy Next Gen.Seq.
FASTQ Groomer
16
FASTQ Groomer
![Page 17: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/17.jpg)
Format conversion and quality control tools
17 Hot Topics: Galaxy Next Gen.Seq.
Convert FASTQ Ilumina to FASTQ Sanger
Summarize QC statistics
Visualize QC statistics
• FASTQ Groomer (convert between various FASTQ quality formats)
• Compute quality statistics
• Draw quality score boxplot • Draw nucleotides distribution chart
NGS: QC and manipulation
Note: FastQC is not incorporated in Galaxy but it is installed in tak .
Draw quality score boxplot
FASTQ Groomer
![Page 18: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/18.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Quality control visualization tools Draw nucleotides distribution chart
18
Draw quality score boxplot
![Page 19: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/19.jpg)
Hot Topics: Galaxy Next Gen.Seq.
How to make a workflow from the history
19
History Options
![Page 20: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/20.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Workflow for Quality Control
20
![Page 21: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/21.jpg)
Remove sequencing artifacts
21 Hot Topics: Galaxy Next Gen.Seq.
Remove sequencing artifacts
![Page 22: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/22.jpg)
Clip adapter sequences
22 Hot Topics: Galaxy Next Gen.Seq.
Clip adapter sequences
![Page 23: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/23.jpg)
Trim sequences
23 Hot Topics: Galaxy Next Gen.Seq.
Trim sequences
![Page 24: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/24.jpg)
Talk Outline
• Introduction to Galaxy
• Data upload
• Format conversion and quality control tools
• Analysis of ChIP-seq experiments with MACS
• Analysis of RNA-seq experiments with Tuxedo tools
• Demo
24 Hot Topics: Galaxy Next Gen.Seq.
![Page 25: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/25.jpg)
Analysis of ChIP-seq experiments
25
Map reads
Filter unmapped reads
Call peaks bound
Bowtie
Filter SAM
MACS
Hot Topics: Galaxy Next Gen.Seq.
Bowtie Filter SAM
MACS
![Page 26: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/26.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Mapping Reads with Bowtie
26
MACS
![Page 27: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/27.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Filtering unmapped reads
27
Filter SAM
![Page 28: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/28.jpg)
Analysis of ChIP-seq experiments: MACS
28 Hot Topics: Galaxy Next Gen.Seq.
MACS
![Page 29: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/29.jpg)
Workflow for ChIP-seq analysis
29 Hot Topics: Galaxy Next Gen.Seq.
![Page 30: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/30.jpg)
Hot Topics: Galaxy Next Gen.Seq.
MACS output
30
Excel file with peaks
Bed file with peaks
Wig files
![Page 31: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/31.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Analysis of ChIP-seq experiments: Intersect peaks with promoter regions
31
1. Download 1Kb regions upstream of genes from UCSC in bed format. 2. Get your bed file with peaks from MACS or other peak finding algorithm. 3. Intersect promoter bed file with peaks bed file. (See Hot topics Dec 09, http://iona.wi.mit.edu/bio/education/hot_topics/galaxy/Galaxy.pdf)
UCSC Main table browser
Intersect the intervals of
two datasets
![Page 32: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/32.jpg)
Talk Outline
• Introduction to Galaxy
• Data upload
• Format conversion and quality control tools
• Mapping
• Analysis of ChIP-seq experiments with MACs
• Analysis of RNA-seq experiments with Tuxedo tools
• Demo
32 Hot Topics: Galaxy Next Gen.Seq.
![Page 33: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/33.jpg)
Cufflinks
Expression Profiling Workflow
33
Align
Assemble
Visualize Data
TopHat
UCSC Genome Browser
Cufflinks
Hot Topics: Galaxy Next Gen.Seq.
TopHat
![Page 34: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/34.jpg)
Hot Topics: Galaxy Next Gen.Seq.
Other tools for expression profiling
34
• Cuffcompare: compare assembled transcripts to a reference annotation and track Cufflinks transcripts across multiple experiments • Cuffdiff: find significant changes in transcript expression, splicing, and promoter use Cuffcompare
Cuffdiff
![Page 35: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/35.jpg)
Workflow for RNA-seq analysis
35 Hot Topics: Galaxy Next Gen.Seq.
![Page 36: Analysis of next generation sequencing experiments with Galaxyjura.wi.mit.edu/bio/education/hot_topics/GalaxyNGS/Galaxy_NGS.pdf · 3/24/2011 · Analysis of next generation sequencing](https://reader035.fdocuments.us/reader035/viewer/2022081517/5aaf6e6e7f8b9a07498d5277/html5/thumbnails/36.jpg)
Workflow/Demo for ChIP-seq analysis
1. Workflow for quality control
2. Workflow for mapping and running MACS
3. Workflow for RNA-seq
36 Hot Topics: Galaxy Next Gen.Seq.