Validation of 100,000 Genomes Project results; from WGS … · Validation of 100,000 Genomes...

38
Validation of 100,000 Genomes Project results; from WGS to the Genomics England result Emma Baple Consultant Clinical Geneticist, South West GMC Clinical Lead for Rare Disease Validation and Feedback

Transcript of Validation of 100,000 Genomes Project results; from WGS … · Validation of 100,000 Genomes...

Validation of 100,000 Genomes Project results; from WGS to the Genomics

England result

Emma Baple Consultant Clinical Geneticist, South West GMC

Clinical Lead for Rare Disease Validation and Feedback

• Illumina Laboratory Services (ILS) set

up to deliver the 100k genome project

• Operate in Chesterford and Hinxton

facility

• Illumina – NHS Genomics Medicine

Sequencing Centre purpose-built

sequencing facility

• The Ogilvie Building, Hinxton

Campus

• Illumina Laboratory Services at Illumina

Cambridge Limited has been accredited

to ISO15189:2012 Acc No. 8523 on 6th

April

Illumina Laboratory Services

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

PanelApp

“For diagnostic purpose, only

genes with a known

(i.e. published and confirmed)

relationship between the aberrant

genotype and the pathology,

should be included in the

analysis.”

(EuroGentest and ESHG guidelines)

PanelApp

Gene used for variant tiering

Gene on reserve list

We are

currently

undergoing

evaluation of

reviews &

further

curation

A gene panel

is set up on

PanelApp for

all approved

rare diseases

to allow review

and genes to

be added

Version 0 gene panel on PanelApp

Version 1 gene panel on PanelApp

Gene found in 3 or 4 sources

Gene found in 2 sources

Gene found in 1 source/expert list

Expert review & resolution

We now have

50%

Version 1

panels

A big thank

you to all our

expert

reviewers

without

whom these

panels would

not exist

https://bioinfo.extge.co.uk/crowdsourcing/PanelApp/

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Primary Findings

• All variants will be available to GMCs and GeCIPs

• Rare functional variants of potential relevance to the patient’s

condition will be automatically categorised into three tiers

Tier 1 and 2 in the gene panel(s)

Tier 3 outside gene panel(s)

• A small number of plausible candidate variants will be flagged

to aid clinical evaluation

NB - Secondary looked-for findings (if consented) will be returned

separately as part of the main programme, not in the pilot

Details Criteria Details

MAF Datasets: ExAC, 1000G, NHLBI, UK10K, GEL internal Thresholds: <0.1% for dominant models, <1% per variant for recessive models

LOCATION Within gene on gene panels, selected automatically based on level 4 disease code and supplemented by GEL review team based on phenotype review

GENOTYPE Het (or hemi) under dominant models, hom or compound het in recessive models

INHERITANCE Where known: Mendelian rules observed (plus imprinting)

PREDICTED CONSEQUENCE

Most severe consequence predicted across all transcripts using VEP, categorised into high impact and moderate impact:

High impact (mostly protein truncating) = transcript ablation, splice donor variant, splice acceptor variant, stop gained, frameshift variant, stop lost, start lost

Moderate impact (mostly protein altering) = inframe insertion, inframe deletion, missense variant, transcript amplification, splice region variant or incomplete terminal codon variant

10

VEP Impact Sequence Variant Class

HIGH Frameshift, splice donor/acceptor, stop codon gained/lost, initiator codon (truncating variants)

MODERATE Inframe indel, missense variant, splice region, incomplete terminal codon

LOW Synonymous variant, stop codon retained, mature miRNA variant, 5’/3' UTR, intron variant, nonsense mediated decay (NMD) transcript variant

LOWEST Intergenic, transcription factor binding site variant

Variant Effect Prediction

• GEL will tier variants automatically for all patients • Gene panels automatically and manually selected

• Tiered variants sent to annotation providers to be highlighted in decision-support tool

• Each annotation provider may highlight a small number of candidate diagnostic variants based on:

• In-house algorithms

• Review by in-house clinical scientists

• GMCs decide which variants to validate and communicate to patients

• If there are no plausible candidates, you can issue an ‘empty’ result

19 August 2015

Process

Primary Findings

AIM: to maximise diagnostic efficiency

Genomic variants will be automatically annotated, filtered and

prioritised based on:

-Frequency i.e. rare

-Location i.e. inside gene panel(s)

-Genotype i.e. allelic state consistent with MOI of gene

-Inheritance i.e. consistent with family history

-Predicted consequence i.e. coding change

Conservative approach that balances sensitivity and specificity.

Note that:

-Some automatically prioritised variants will not be clinically relevant

-Some diagnostic variants will not be automatically prioritised (but

hopefully will be highlighted by company annotations…)

Yes No

Tier 1 Tier 2 Tier 3

Yes No

Is the variant in a gene in the Virtual Gene Panel (green list) for that disorder?

Known Pathogenic

Yes No

Tier 3

Is the variant in a gene in the Virtual Gene Panel (green list) for that disorder?

Most severe predicted consequence of variant? Other

consequence

The variant allele is not commonly found in the general healthy population Allelic state matches known mode of inheritance for the gene and disorder

Familial segregation (where applicable)

Variant

Protein truncating

Protein altering

Tier 1 & 2

TIER 1: Known pathogenic variants, protein truncating variants

and de novo protein altering variants WITHIN the

phenotype assigned gene panel(s)

TIER 2: Protein altering variants WITHIN the phenotype assigned

gene panel(s)

Expect ~0-2 per trio, ~0-20 per singleton

Should be clinically evaluated by GMCs

(NB - May be supplemented by additional diagnostic candidates from

annotation partners)

Tier 3

TIER 3: Protein truncating and protein altering variants OUTSIDE

known disease gene panel(s)

All modes of inheritance considered (dominant, recessive, X-linked,

imprinted)

Expect ~10-20 per trio, ~100-400 per singleton

NOT intended for GMCs to evaluate routinely, aimed more at research!

Most will be irrelevant.

BUT may contain diagnosis if:

•Gene panel is incomplete (need strong evidence of association)

•Appropriate gene panel not applied

•Etc.

Untiered variants

Obviously millions per person!

REALLY NOT intended for GMCs to evaluate routinely! Almost 100%

will be irrelevant.

BUT annotation providers may find candidates here, as may contain

diagnosis if:

•Inappropriately excluded due to MAF cut-off (e.g. founder effect)

•Segregation or MOI for gene in panel incorrect

•Incomplete penetrance

•Etc

19 August 2015 17

NB. Very large gene panel!

Modelling with DDD data

19 August 2015 18

NB. Very large gene panel!

Modelling with DDD data

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Advantages of Tools

•Interactive

•Visualisation of read level support for variants

•Additional annotation including: Allele frequencies, previous pathogenicity assignments, in silico prediction tools

•Additional variant prioritisation tools eg: semantic similarity

•Audit record at variant and case levels

•Database and software version record

•Facilitate MDT work

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Reporting Portal

Report QC

Virtual Desktop

https://emb.extge.co.uk/ovd/

Reporting Portal

Primary Findings

Default view will be:

Tier 1 and 2 variants

identified in the gene

panel(s) applied and

any candidate(s)

highlighted by the

annotation provider

Results Output

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Data Quality

Genetic checks to

ensure clinical

and genomic data

consistency

Proportion of the genome where 0,1 or 2 alleles are shared between 2 individuals

Sex

Relatedness

Ethnicity

Result Quality

PATIENT-LEVEL QC:

ID correct match for age/sex/family structure, recruiting centre correct, phenotypes correct, correct disease assigned, correct panel(s) applied

VARIANT-LEVEL QC: Prioritised variants

Read level support for variant.

Does allelic state, expected mode of inheritance and segregation match with gene mode of inheritance and family history, any detected complex events (e.g. UPD) included in report

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Validation and Confirmation

Heterozygous de novo GATA6

missense mutation:

c.1354A>G, p.Thr452Ala

Exportable csv

file to aid variant

confirmation

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Interpretation Outcomes

For all variants evaluated by GMCs, we need to know:

• Was the variant confirmed using an orthogonal technique?

(validated, false positive, not applicable)

• Is the variant (or variant combination) pathogenic?

(assessment using standard 1-5 scale from benign to pathogenic)

• Do variant(s) explain all or part of the patient’s phenotype?

Simplified Workflow

Patient/ family

Phenotypes & Pedigree

DNA

Genome sequence

Annotated VCFs

Tiered variants

Gene Panel Variant filtering

Annotation by partner company

Review

Gene Panels

Clinical assessment

GeCIP(s)

Validation Outcomes

Result Email Alert

Report QC

Validation & Feedback GeCIP Working Groups: Interested in clinical results

Validation: Analytical validity: is the variant correctly identified?

Are variants correctly filtered in/out by bioinformatics pipeline? Are reported variants actually present in the DNA sequence?

Clinical validity: are variants relevant to the patient’s phenotype? Is variant(s) pathogenic?

Does variant(s) explain all or part of the phenotype?

Feedback: Clinical utility: does result alter clinical management?

Which variant(s) are reported to patient/family? Has result changed management of patient/family?

V & F GeCIP

• First 39 pilot cases – 15 singletons: • 11/15 had 0 tier1&2

• 3/15 had 2 tier1&2

• 1/15 had 3 tier1&2

• 12/39 had a probable diagnosis

19 August 2015 35

NB. Variety of gene panels

First GEL data

Illumina BAMs

Annotation provider

GEL Bioinformatics Illumina SNVs/indels

Illumina CNVs

GEL SNVs/indels 1

GEL tiered variants

QC, variant calling, annotation

Gene panels

Candidate variants

Prioritization algorithms

GEL QC check

Validation & Feedback

MDT review

Access via virtual desktop to annotation partners’ tool

1 & GEL CNVs in future 2 Personal identifiable data is stored in a separate database. From Autumn 2016, this will be hosted by NHS Digital (formerly HSCIC)

Genomics England Simplified Genomic Data Flow (July 2016)

Personal data 2

GMC

Phenotypes & pedigree

Summary • Tiering and quality checks will be done centrally at Genomics

England to ensure consistency

• Tier 1&2: small number of plausibly pathogenic variants within gene panel(s) relevant to the patient’s phenotype

• Tier 3: potentially interesting variants outside of gene panel(s) relevant to the patient’s phenotype

• Third party decision support systems where sites can review,

validate and report results

• GMCs should aim to clinically evaluate tier 1&2 variants and

any candidates highlighted by the annotation partner(s) for a

likely diagnosis

• Genomics England export the data to produce a single

consistent knowledge base

• GMCs control diagnostic interpretation and reporting

• GeCIPs will help with interpretation and key role is improving

diagnostic yield through discovery

Sequencing Production at Illumina

Sample Accession

Library Prep

Library Quality Control qPCR

X10 Sequence + Run QC

Genome build tumour / normal subtraction

gVCF annotation HiSeq Analysis Software

Analysis Quality Control, Identity Check, Contamination

screen

Network Delivery

Delivery check

Automation + 96 well format

Flowcell prep

cBOT2

Genotype set up

Clarity X LIMs & AIMS Project configured

Track lab and analysis processes Project management, Pipeline automation

Library Amp

(if needed)/

Genotyping

Sample Quant Quality Control,

FFPE check

Pre-PCR lab

Yield per run Tb

Yield Per lane Gb

% Cluster PF

% ≥Q30 R1

% ≥Q30 R2

% Aligned

2.11 132 72 93 83 95