NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation •...

87
NGS Data Analysis Workshop Barts and the London Genome Centre & Ingenuity Systems Wednesday 27 th February 2013 www.smd.qmul.ac.uk/gc www.ingenuity.com

Transcript of NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation •...

Page 1: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS Data Analysis Workshop Barts and the London Genome Centre & Ingenuity Systems

Wednesday 27th February 2013

www.smd.qmul.ac.uk/gcwww.ingenuity.com

Page 2: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Welcome and Introduction

The Ingenuity Systems - Barts Genome Centre Partnership

Charles Mein, D.Phil. Niels Nielsen

Centre Manager Senior Sales Executive [email protected] [email protected]

Page 3: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 3

Agenda12.00pm Lunch

12.30pm Niels Nielson, Ingenuity Systems and Charles Mein, Barts and the London Medical School

Welcome and Introduction; The Ingenuity Systems - Bart's Genome Centre Partnership

12.35pm Tim Bonnert, Ingenuity Systems

Running and Analysing NGS Resequencing Experiments: An Overview of Data Generation, Analysis, and Biological

Interpretation

1.00pm Charles Mein and Michael Barnes, Barts and the London Medical School

Sequencing & Bioinformatics Services of the Genome Centre

1.15pm Sasha Howard, & Sayka BarryCentre for Endocrinology, William Harvey Research Institute

Use of Ingenuity Variant Analysis in a Next-Generation Sequencing Project:

Investigating the Genetic Factors Controlling the Timing of Puberty

1.35pm Tea Break

1.55pm Tim Bonnert, Ingenuity Systems

Intro and Case Study: NGS Biological Interpretation using Ingenuity Variant Analysis

2.25pm Michael Barnes, Barts and the London Medical School

Predicting Variant Function

2.45pm Closing Remarks

3.00pm Q&A

Attendees may optionally bring VCF files for analysis and interpretation in Ingenuity Variant Analysis

Page 4: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

An Overview of Data Generation, Analysis, and Biological Interpretation

Tim Bonnert, Ph.D.

Field Application [email protected]

Page 5: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 5

NGS: A Powerful but Time Consuming ProcessSample

Preparation

Sequencing

Annotation

Analysis

ValidationResults from 267 scientists currently using Next Generation Sequencing: What takes the most amount of time?The Global Outlook for Next Generation Sequencing: Usage, Platform Drivers & Workflow (2011)

Biological

HypothesisExperimentation

Biological

Interpretation

Page 6: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 6

NGS: Multiple Options & Complex Considerations

Sequence

In-House / Service Provider�

Sequence Quality�

Sequencing Coverage�

Sequencing Technology�

Align

Reference Genome�

Computational Resources�

Alignment Methods�

Call Variants

Calling Applications�

Computational Resources�

Quality Filtering�

Sample Preparation

Sequencing

Annotation

Analysis

Validation

Annotation

Allele Frequencies�

Variant Location�

Function Prediction�

Known Pathogenicity�

Analysis

Study Design & Type�

Association Statistics�

Genotype & Frequency�

Biological Interpretation

Function Prediction�

Phenotype & Disease Relationship�

Gene & Pathway Membership�

Biological

HypothesisExperimentation

Biological

Interpretation

Page 7: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 7

Ingenuity Systems: Understanding Biology

Content Acquisition Ingenuity Ontology

Page 8: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 8

Ingenuity Variant AnalysisIdentify causal variants from human sequencing data in just hours

Combine in

Analyses

Annotate &

Interactively Filter

Link Variants

to Biology

Share &

Collaborate

of human whole genome, whole exome, and targeted exome samples

Sequence

& Align

Call

Variants

Upstream Pipeline

Loaded

Samples

Up

loa

d t

o V

ari

an

t A

na

lysi

s™

Analysis

StatisticsPathways

Biology

Annotation����

����

����

����

����

Stratification Studies

Personal Genome

Tumour-Normal Pair

Trio/Quad Study

Genetic Disease Cohort

Large Cancer Studies ����Pedigree Support

Disease Identification

Statistical Burden Testing

Page 9: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 9

Rapid Prioritization and Annotation of Variants

Page 10: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 10

Content Critical for Rich Biological Interpretation of Variants from DNA Re-Sequencing Studies

Unified Ontology

Disease models

Pathways

Biomarkers

Causal Networks

Regulatory

Hereditary

Experimental

Somatic

Mouse Ortholog Models

Associations

Copy number

What variants are associated with any type of [skeletal abnormality]?

What variants are associated with [breast cancer]?

What pathways have most deleterious variants in [ALL] tumor vs. normal samples?”

What variants are associated with modified [warfarin] dosing?”

What variants are expected to [activate] genes involved in [bone morphogenesis]?

What variants would be expected to impact expression of [predicted] [NFkB] targets?

Is this variant associated with [early onset breast cancer]? What is the literature evidence?

Which variants [in BRAF] lack kinase activity [in HELA cells]?

Which variants are observed in [>10%] of [melanomas]?

Which variants are deleterious in genes with [tumorigenic] mouse ortholog KO phenotypes?

Which variants are associated with [elevated CVD] risk at p<10-5 in [Framingham SHARe]?

Which variants are in regions [deleted] in [>20%] of [glioblastomas]?

What variant(s) are associated with [cystic fibrosis] patient response to [VX-770] treatment?PGx & Clinically Validated

Bio

logi

cal M

odel

sM

utat

ion

Con

tent

Page 11: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 11

Ingenuity Variant Analysis Content• 130,000+ Ph.D./M.D. expert-curated human phenotype-associated mutations

• 145,000+ somatic variants modeled from COSMIC database

• 680,000+ somatic variants modeled from Cancer Genome Atlas (TCGA)

• 13,400+ Pharmacogenetic findings curated

• 16,000+ OMIM disease findings modeled

• 81,000+ Jackson Laboratory MGD mouse knockout database curated

• 5,900+ findings supporting Haploinsufficient genes

• 78,000+ miRNA predicted/observed binding sites integrated

• 4.9M+ Transcription factor binding sites (observed + JASPAR predicted)

• 1,300+ enhancers integrated (observed + VISTA predicted)

• SIFT, BSIFT & PolyPhen-2 functional prediction calls pre-loaded and maintained up-to-date

• Reference Genomes: dbSNP, 1000 genomes project, 54 Complete Genomics unrelated healthy reference genomes, NHLBI Exome Sequencing Project

• ~4.6 M findings from the Ingenuity Knowledge Base providing foundation layer of literature and integrated content mapped onto ~1.8M classes of the Ingenuity Ontology

Page 12: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 12

Filters Utilize Ingenuity Knowledge Base, Ingenuity Analytics, and Integrated Content to Refine Variants

Ingenuity Knowledge Base + Analytics

Ingenuity Knowledge Base

Ingenuity Analytics

External Algorithm

Integrated Content or Model

Page 13: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 13

Multiple Methods for Variant Selection & Prioritisation

• Genetic Analysis– Refined using genotype selection, variant call quality, read depth,

VCF filter status, and observed frequency at Variant or Gene level

– Annotate samples using a .PED file to restrict to de novo, transmitted, or Mendelian inheritance of variants across related individuals

• Statistical Association– Over-representation analysis of genes associated with Diseases,

Biological Processes, and Signalling & Metabolic Pathways

– Variant-level statistical association in Case vs. Control using basic allelic, dominant, or recessive models

– Gene-level Case (or Control) Burden or C-alpha Test with significance evaluated by permutation

– Pathway-level Case (or Control) Burden or C-alpha Test with significance evaluated by permutation

Page 14: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 14

Refine Variants Based on Biological Associations and Molecular Interactions

DiseasesGenesBiological ProcessesPhenotypesSignalling PathwaysProtein DomainsProtein Families

Variants

Disease

Pathways

Phenotype

Your Gene List

Protein Domain

Regulators of…

Page 15: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 15

Biological Context via Gene and Pathway Relationships

Page 16: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 16

Streamlined Upload of Data via DropZone

Loaded

Samples

Combine in

Analyses

Annotate &

Interactively Filter

Link Variants

to Biology

Share &

Collaborate

of human whole genome, whole exome, and targeted exome samples

Sequence

& Align

Upstream Pipeline

Call

Variants

Up

loa

d t

o V

ari

an

t A

na

lysi

s™

DataDrop via Ingenuity DropZone

VCF 4.x ����

Complete Genomics masterVar ����

Complete Genomics Var ����

GVF ����

Access to Ingenuity Variant Analysis and the use of DropZones for data loading are available to all users

Page 17: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 17

Summary of Features and Benefits

Fast

Knowledge-driven

Scalable and Secure

User friendly

Easy to purchase

Easy to implement

Try Ingenuity® Variant Analysis™ with your Called Variant FilesFree Previews of Unlimited Analyses with Unlimited Samples

www.ingenuity.com/variants

����

����

����

����

����

Stratification Studies

Personal Genome

Tumour-Normal Pair

Trio/Quad Study

Genetic Disease Cohort

Large Cancer Studies ���� Pedigree Support

Disease Identification

Statistical Burden Testing

Analysis

StatisticsPathways

Biology

Annotation

Page 18: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing & Bioinformatics Services of the Genome Centre

Charles Mein, D.Phil. Michael Barnes, Ph.D.

Centre Manager Director of Bioinformatics [email protected] [email protected]

Page 19: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

Number Of Bases

Interrogated Application

3,000,000,000 Whole Genome Sequencing

30,000,000 Exome Sequencing

4,000,000 Targetted Sequence Capture

2,500,000 Whole Genome SNP genotyping

200,000 Targeted SNP typing

6,000 Linkage arrays

1,000 Sanger Sequencing

300 Microsatellite Genotpying

1 Taqman SNP tpying

Next Generation Sequencing - Illumina MiSeq, GAIIx, HiSeq

Species independent, most comprehensive tool currently

available

Micro Array - Illumina iScan. Range of scales to facilitate many

different approaches including GWAS and linkage. Species include

mouse and human

Capillary Sequencing -Applied Biosystems; 3730xl,Sequencing of plasmids and PCR

products. Genotyping length variants – e.g ,microsats

Real time PCR - Applied Biosystems;7900HT 384 well.SNP genotyping at individual loci

Page 20: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

Next Generation Sequencing - Illumina MiSeq, GAIIx, HiSeq

Species independent, most comprehensive tool currently

available

Number Of Bases

Interrogated Application

3,000,000,000 Whole Genome Sequencing

30,000,000 Exome Sequencing

4,000,000 Targetted Sequence Capture

PCR based

target prep

Target Enrichment –hybridisation

methods

Page 21: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

Genome Centre pipeline

Genome Centre/ WHRI Bioinfo group

Research groups with Bioinformatics capabilities

Third party suppliers of software solutions

Page 22: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

• Genome Centre Maintains own compute resource – access at cost

• Dedicated staff member• Additional funding application in for

beefed up system• Tie up with other resource in college

MidPlus and Physics cluster• Cloud in the future?

Page 23: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

• Quality assessment at each step (bioinformatics as well as lab stuff)

• Communication between wet and dry team

• In house pipeline using open source tools• SAMtools• Picard• GATK

Page 24: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference • Alignments of individual sequence back to reference genome

• Alignment part of in house QC tool • Important quality metric– coverage

statistics• Bowtie (open source)• Novoalign (commercial)

Page 25: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

• Variant Calls• Identify changes from reference

sequence – key outcome from resequencing• GATK

Page 26: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

Fu et al; 2013 Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants

• Difficulty in assigning function to variant – even for coding SNPs• conservation-based methods (GERP+

and PhyloP)• functional prediction methods (SIFT,

PolyPhen2 MutationTaster)

• Little overlap in predictions• Fu et al; 2013 only consider a

variant deleterious if predicted by 4/6 methods

• New and innovative approaches required• Inhouse• Third parties

Page 27: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Sequencing and Bioinformatics at the Barts and the London

Genome Centre

• Bioinformatics challenges

• Storage

• Data QC

• Alignment

• Variant calling

• Variant interpretation

• Statistical inference

Genome Centre pipeline

Genome Centre/ WHRI Bioinfo group

Research groups with Bioinformatics capabilities

Third party suppliers of software solutions

Page 28: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Use of Ingenuity Variant Analysis in a Next-Generation Sequencing Project

Investigating the Genetic Factors Controlling the Timing of Puberty

Sasha Howard, Ph.D.

Clinical Research [email protected]

Page 29: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Use of Ingenuity variant analysis in a

next-generation sequencing project -Investigating the Genetic Factors Controlling the

Timing of Puberty

Dr. Sasha Howard

Clinical Research Fellow

Centre for Endocrinology

William Harvey Research Institute

Page 30: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Our project

• Aim: To identify causal genetic factors in a cohort of patients with CDGP

– CDGP: Constitutional Delay in Growth and Puberty

– Phenotype of delayed puberty and short stature

– Segregates in Autosomal Dominant pattern

– Studies estimate 60-80% pubertal timing is genetically determined1

1Gadjos ZK 2009

Page 31: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Our methods

• Use whole-exome sequencing as an initial tool

to identify novel candidate genes which

segregate with the disease trait

• WES of 52 individuals from 7 families (NimbleGen V2 platform)

– 36 affected (4-7 in each family)

– 14 unaffected

– 2 unknown status (grandparent pair)

Page 32: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Next-Generation Sequencing

2Bamshad 2011

Page 33: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS data

• Data returned via cloud-based storage and

analysis tool - DNAnexus

• Average coverage x44

• Mapped reads mean 93% (range 88-99%)

• Coding variants – mean 22,300 per sample

– Mean 11,174 missense

– Mean 165 nonsense

Page 34: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Pilot analysis

• 3 families; 16 individuals

• Analysis in DNAnexus –

– Individual nucleotide-level variation analyses

– Family population-based frequency analysis

• Further filtering in excel/ Galaxy

• Manual minor allele frequency look-up in

Ensembl/Go ESP/UCSC

• Pathway analysis using Genego Metacore

(Thomson Reuters)

Page 35: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Ingenuity Variant analysis

– Nucleotide-level variation analysis in DNAnexus

– Results exported and converted to VCF files using

Perl

– Uploaded to Ingenuity variant analysis for filtering

and annotation

– Additional pathway analysis using Genego

Metacore (Thomson Reuters)

Page 36: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Ingenuity variant analysis

Page 37: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Filtering

• Common variants filter – MAF < 5%

Page 38: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Filtering• Predicted Deleterious – deleterious/damaging muts

Page 39: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Filtering• Genetic Analysis filter – segregation with trait & QC

Page 40: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Filtering• Biological context filter

Page 41: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Pathway analysis

Page 42: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Filtering of candidates

• Further filtering based on:

– Gene expression profiles

– Impact on protein

– Conservation

– Sanger sequencing of potential candidates to verify mutation & sequence in further individuals

– Examine data from genome-wide association studies and related conditions with known genetic basis

Page 43: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Advantages of Variant Analysis

• Speed!

• User-friendly/ intuitive

• Minor allele frequency data

• Good support

• Multiple analyses of same data with small

variations

• Access to Ingenuity database

Page 44: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Disadvantages

• Cost

• Use limited to one year

• Biological context filters can be limited (e.g.

delayed puberty = FGF1)

• Keep aware of limitations of input files

Page 45: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Identification of candidate genes usingtrio family whole exome DNAdata

Dr. Sayka [email protected]

Page 46: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Identification of candidate genes usingtrio family whole exome DNA

data

Dr Sayka Barry

Centre for Endocrinology

William Harvey Research Institute

Barts and the London School of Medicine

Queen Mary University of London

John Vane Science Centre

Charterhouse Square

London EC1M 6BQ

Page 47: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Family ID Individual ID Paternal ID Maternal ID Sex PhenotypeSp95 Sp95M1 Sp95M3 Sp95M2 1 2Sp95 Sp95M2 2 1Sp95 Sp95M3 1 1

PED FILE

Sp95M1 affected child with pituitary hyperplasiaSp95M2 unaffected mother Sp95M3 unaffected father

Trio study:

Exome sequencing:‘Otogenetics’ use ‘NimbleGen V2 (44.1 Mbp)’ for whole human exome enrichment and PE100 Illumina HiSeq2000 sequencing

Data analysis: Ingenuity variant analysis tool

Page 48: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

• 1000 genomes project

• NHLBI ESP exomes

• 54 publicly available whole genomes sequenced by Complete Genomics

Filters used

Common variant filters:

Deleterious prediction filters:

Genetic analysis filters:

Call quality filter:

De novo:

Mendelian:

Page 49: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

De novo analysis

Page 50: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

frameshiftstop gain

Mendelian

Page 51: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Total 158 variants (137 genes):2 stop gain5 frameshift59 in-frame71 missense 11 splice site9 microRNA1 unknown

De novo analysis

Page 52: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

De novo analysis

Notch signaling

Page 53: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

De novo analysis

Page 54: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Summary:

• De novo: six candidate genes and •Mendelian: four genes selected for validation by Sanger sequencing

Total 33 variants (27 genes):1 stop gain5 frameshift10 in-frame15 missense 2 splice site

Mendelian:

Page 55: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Scientific Case Study

NGS Biological Interpretation using Ingenuity® Variant Analysis™

Tim Bonnert, Ph.D.

Field Application [email protected]

Page 56: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 56

Case Study: Hereditary Pheochromocytoma• Hereditary pheochromocytoma (PCC) is a neuroendocrine tumor of the medulla

of the adrenal glands

• Whole exome sequencing (Agilent SureSelect) on Illumina Genome Analyzer II

• Sequence data obtained from European Nucleotide Archive (ENA)– http://www.ebi.ac.uk/ena/data/view/ERR031607-ERR031626

• Published in Nature Genetics (2011); PMID: 2168591– Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma

Independent hereditary pheochromocytoma

Independent HapMapsamples

Page 57: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 57

Analysis of Pheochromocytoma Data

Loaded

Samples

Combine in

Analyses

Annotate &

Interactively Filter

Link Variants

to Biology

Share &

Collaborate

of human whole genome, whole exome, and targeted exome samples

Sequence

& Align

Upstream Pipeline

Call

Variants

Up

loa

d t

o V

ari

an

t A

na

lysi

s™

DataDrop via Ingenuity DropZone

Public

ENA

Data

Galaxy

(GATK)

VCF 4.0

Manual or

Automatic Upload

3 Case vs. 7 Controls

Genetic Disease WorkflowAnnotation – Analysis – Biology

Page 58: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 58

Create Analysis: Different Workflows Available

Page 59: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 59

Add Biological Terms

• Diseases• Genes• Biological Processes• Phenotypes• Signalling Pathways• Protein Domains• Protein Families

Page 60: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 60

Drag-and-Drop Samples and Create Analysis

• Analysis will be created with typical filters with appropriate settings for type of analysis and biological context

Page 61: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Causal variant identification in human DNA resequencing data

Live Demonstration

Analysis of Hereditary Pheochromocytoma

Page 62: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 62

Share and Re-Use Samples: The Power of Controls

Pheochromocytoma data as 3x Case only samples Pheochromocytoma data as 3x Case vs. 7x Controls

+73% / 13%

+175% / +67%

+67% / +13%

+168% / +144%

-86% / -87%

-89% / -89%

Page 63: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proprietary and Confidential 63

Summary of Features and Benefits

Fast

Knowledge-driven

Scalable and Secure

User friendly

Easy to purchase

Easy to implement

Try Ingenuity® Variant Analysis™ with your Called Variant FilesFree Previews of Unlimited Analyses with Unlimited Samples

www.ingenuity.com/variants

����

����

����

����

����

Stratification Studies

Personal Genome

Tumour-Normal Pair

Trio/Quad Study

Genetic Disease Cohort

Large Cancer Studies ���� Pedigree Support

Disease Identification

Statistical Burden Testing

Analysis

StatisticsPathways

Biology

Annotation

Page 64: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Predicting Variant Function

Michael Barnes, Ph.D.

Director of Bioinformatics [email protected]

Page 65: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS Variant Analysis

Pitfalls and Progress

Dr. Michael R. Barnes

Director of Bioinformatics, William Harvey Research Institute, QMUL

Page 66: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Overview

� What X Coverage is right for your experiment?� From FASTQ to VCF file

� Alignment and Variant calling

� Balancing sensitivity and specificity of variant calls

� The Functional analysis challenge

� Defining functional impact

� SNVs are not the only variants...

� Causality and “Clinically Actionable” variants

� How to define a causal variant

� Sources of Failure and Success

Page 67: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

How much sequence coverage is

“enough”?

Page 68: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

High pass genome re-sequencing design

Page 69: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Low pass genome re-sequencing design

Page 70: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Exome sequencing: Exome capture

Page 71: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Factors to define sequencing coverage

GenotypeKnownVariant

Detection

Population Variant

Discovery

Disease Variant

Identification

Inheritancemodel

Recessive*

Unknown**

Dominant**

100x

10x

40x

Sequencecoverage* Causal variant is expected to be homozygous

**Causal variant could be homozygous or heterozygous

Page 72: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Defining variants from NGS data

Page 73: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

An overview of NGS Data processing

GATKGATK• Realign indels• Flag PCR duplicates• Mark unreliable calls

Annovar

Page 74: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

The Variant Call Format (VCF)

Page 75: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

ANNOVAR: Using Annotation to Narrow the

Search Space

� Annovar SNV annotation

� Known Frequencies

� dbSNP

� 1000 genomes

� NIH exome project (6500 exomes)

� Gene mapping

� Coding impact

� SIFT

� PhyloP

� Polyphen

� Mutation taster

openbioinformatics.org/annovar

Page 76: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

ROC and PR curve comparison of protein funciton prediction methods(ExoVar dataset using a 10-fold cross-validation).

Li M-X, Kwan JSH, Bao S-Y, Yang W, et al. (2013) Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in ExomeSequencing Studies. PLoS Genet 9(1): e1003143. doi:10.1371/journal.pgen.1003143http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143

Page 77: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Proportion of pathogenic rare nsSNPs & total load of pathogenic derived alleles in 8 HapMap subjects with high coverage sequencing

Li M-X, Kwan JSH, Bao S-Y, Yang W, et al. (2013) Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in ExomeSequencing Studies. PLoS Genet 9(1): e1003143. doi:10.1371/journal.pgen.1003143http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143

Page 78: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Cancer is a disease of genome alterations.

SNVs are only a partial view of variation:

M. Meyerson, S.Gabriel, G.Getz, Nature Reviews Genetics 11, 685-696

Page 79: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Clinical Utility-Challenges

NGS data density =

frequently encountered

variants of unknown

significance

Which variants are “clinically actionable”?

Development of evidence-basedscientific standards to evaluate

utility in in different patient populations for accurate

risk estimation

Risk of over interpretationunnecessary medical action

unwarranted psychological stress

Careful selection of patients forgenome sequencing and genetic counseling-crucial

Page 80: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS analysis: The fine line between

success and failure

Page 81: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS analysis: Sources of failure

Page 82: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

NGS Analysis: Sources of failure

� Not a SNV!

� Lack of sequence coverage

� Gene not in the exome capture

� Bioinformatic variant calling

� Bioinformatic annotation

� Mutation is not exonic

� You get scooped

� Clinical heterogeneity or wrong diagnosis

Page 83: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Some tips for success

� Variant analysis is only as good as the variant calls� Better to be too stringent than too lenient� But annotate poor calls rather than exclude

� If at first you don’t succeed re-analyse!� Consider other variant types� Consider other types of impact (e.g. ENCODE - RegulomeDB)� Consider other variant calling methods

� Focus your efforts where it matters most� Variants that can be tested in the lab� Genes with biological/pathway support

� Make your life easier and reduce error : automate! � try to avoid over-using excel

Page 84: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Closing Remarks

The Ingenuity Systems - Barts Genome Centre Partnership

Charles Mein, D.Phil. Niels Nielsen

Centre Manager Senior Sales Executive [email protected] [email protected]

Page 85: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Questions?

www.smd.qmul.ac.uk/gcwww.ingenuity.com

Page 86: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab

Stay and Play !• Try your own VCF files in Ingenuity Variant Analysis

• Test the system with some example data

www.smd.qmul.ac.uk/gcwww.ingenuity.com

Page 87: NGS Data Analysis Workshop Complete · • Variant calling • Variant interpretation • Statistical inference • Quality assessment at each step (bioinformatics as well as lab