Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D....

46
Next Generation Sequencing Data Analysis An ORS Service Registration Page

Transcript of Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D....

Page 1: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Registration Page

Page 2: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis

Lynn Young, Ph.D.

[email protected]

NIH Library Bioinformatics Support Program

An ORS Service

20 September 2012

Page 3: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Acknowledgement

This training uses cloud services provided by an “AWS in Education” grant to the Galaxy Project.

Page 4: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Introduction

http

://en.w

ikiped

ia.org

/wiki/File:D

NA_Seq

uen

cing_gD

NA_lib

raries.jpg

Page 5: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Objectives

■  Sequence quality ■  Mapping ■  Mapping quality ■  Variant analysis ■  Biological context

http://en.wikipedia.org/wiki/File:DNA_Sequencing_gDNA_libraries.jpg

Page 6: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Data Analysis Workflow Reads

Ref

QC

Trim

Map Variant detection

Page 7: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Reference – FASTA format

>gi|206583719|gb|CM000511.1| Homo sapiens chromosome 21, whole genome shotgun sequence

ATTCATTCCATTCCACTGCACTCCAATCTTCACATAAAATGTAGACAGAAGCTTTCTGAGAAACTTTTCT

CTGATGTGTGCATTCATCTCACAGATGTGAACCATTCTTTTGTTTGAGCAGTTTGGTAACATTCTTTTTG

TAGAATCTGCAAAAGGATATTTGTGAGCACTTTGAAGCCTATGGTGAAAAAGGAAATATCTTCAGAGAAA

AACTAGAAAGAAGGTTTCTGAGAAACTGCTTTGTCATGTGTGAATTAGTCTCACAGATTTGAACCTTTCT

GTTGATTGAACATATTGGAAACCTTCTTTTTGTAGAATCTGCAAAGGGATATTTGTGAACACTTGGAGGC

CAATGGTGAAAAAGGAAATATATTCACATGAAAACTAGACAGAATCTTTCTGAGACACTTCTGTGTTTGG!

Page 8: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Reads – FASTQ Format @SRR016862.16884!

ATTTTGAGTGGTACATCTAGGTAGCCGTTTTTGGAAACGGG!

+!

IIIIII,IIIII?III?I&II9$H+/I>IA%1.$,$%$#$F!

@SRR016862.58801!

ATTTTGAGTGGTACATCTAGGTAGCCGTTTTTGAAACCAGG!

+!

IIIIIIIIIIIIIIIIII9III0II4.II@&?6&$&#%'@.!

Page 9: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Alignments – SAM Format

http://samtools.sourceforge.net/SAM1.pdf

http://bio-bwa.sourceforge.net/bwa.shtml#4

Page 10: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Variant Calls – VCF Format http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

Page 11: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

VCF Format - Data

Page 12: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Data – Sequence Read Archive http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP000535

Page 13: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy

■  Public ➤  usegalaxy.org

■  20 September 2012 class ➤  cloud1.galaxyproject.org ➤  cloud2.galaxyproject.org

Page 14: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Account Registration

Page 15: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Account Registration

Page 16: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Login if Already Have Account

Page 17: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Login

Page 18: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Shared Data

Page 19: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Obtain Shared Data

Page 20: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Obtain Share Data for Input Datasets

Step 1

Step 2

Page 21: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Data Analysis Workflow - Details

Page 22: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Analyze Data

Page 23: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Input Dataset – View Sequence Reads

Page 24: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Input Dataset – View Reference Sequence

For next slide

Reference Sequence

Page 25: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy - FASTQ Groomer

Step 1

Step 2

Step 3

Repeat steps for the other two FASTQ files

Page 26: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy - FastQC

Step 1

Step 2

Step 3

Step 4

Page 27: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – FastQC Results

For next slide

Step 2 Step 1

Page 28: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Mapping – Burris Wheeler Aligner (BWA)

Step 1

Step 2

Step 3

Repeat steps for the other two Groomed files

Step 4

Page 29: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – View BWA Results

Step 1

Step 2

For next slide

Page 30: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – SAM to BAM

Step 1

Step 2

Step 3 Repeat steps for the other two SAM files Step 4

Page 31: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Navigation to Picard Alignment Summary Metrics

Step 1

Step 2

Page 32: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Picard Alignment Summary Metrics

Step 1

Step 3

Step 4 Step 2

Step 5 – uncheck the box

Step 6

Page 33: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Results of Picard Summary Alignment Metrics

Key - http://picard.sourceforge.net/picard-metric-definitions.shtml

Step 2 Step 1 For next slide

Page 34: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Variant Detection – Preparation Merging Bam Files

Step 1

Step 3

Step 4

Step 2 Step 6

Step 5

Page 35: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Variant Detection – Preparation Merging Bam Files

Step 1

Step 2

Page 36: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Variant Detection - FreeBayes

Step 2 Step 3 Step 4

Step 5

Step 6 Step 1

Page 37: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy Variant Detection – FreeBayes Results

Step 2 Step 1

For next slide

Page 38: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Filter and Sort

Step 1

Step 3 Step 4

Step 2 Step 5

Page 39: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Filter and Sort Results

For next slide, open new tab

Page 40: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Biological Context UCSC Genome Browser http://genome.ucsc.edu Step 1

Step 3

Step 4

Step 2

Step 5

Page 41: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

UCSC Genome Browser – OMIM Genes, OMIM AV SNPs

Step 1

Step 3

Step 4

Step 2

Page 42: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

UCSC Genome Browser - Results

Page 43: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Exporting Data Download VCF File

Step 1

Step 2 Step 3

Step 4

Page 44: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy – Exporting Data Download BAM Files

Step 1

Step 2

Step 4

Step 5

Step 3

Repeat steps for the other two BAM files

Page 45: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Galaxy - Exporting Data Download BAI Files

Step 1

Step 2 Step 4

Step 5

Step 3

Repeat steps for the other two BAM files

Page 46: Registration Page · 2012. 12. 6. · Next Generation Sequencing Data Analysis Lynn Young, Ph.D. lynny@mail.nih.gov NIH Library Bioinformatics Support Program An ORS Service 20 September

Next Generation Sequencing Data Analysis An ORS Service

Thank you for attending.