Duzkale_2013_Variant Interpretation_

Clin Genet 2013: 84: 453–463Printed in Singapore. All rights reserved

© 2013 John Wiley & Sons A/S.Published by John Wiley & Sons Ltd

CLINICAL GENETICSdoi: 10.1111/cge.12257

Original Article

A systematic approach to assessing theclinical significance of genetic variants

Duzkale H, Shen J, McLaughlin H, Alfares A, Kelly MA, Pugh TJ, FunkeBH, Rehm HL, Lebo MS. A systematic approach to assessing the clinicalsignificance of genetic variants.Clin Genet 2013: 84: 453–463. © John Wiley & Sons A/S. Published byJohn Wiley & Sons Ltd, 2013

Molecular genetic testing informs diagnosis, prognosis, and riskassessment for patients and their family members. Recent advances inlow-cost, high-throughput DNA sequencing and computing technologieshave enabled the rapid expansion of genetic test content, resulting indramatically increased numbers of DNA variants identified per test. Toaddress this challenge, our laboratory has developed a systematic approachto thorough and efficient assessments of variants for pathogenicitydetermination. We first search for existing data in publications anddatabases including internal, collaborative and public resources. We thenperform full evidence-based assessments through statistical analyses ofobservations in the general population and disease cohorts, evaluation ofexperimental data from in vivo or in vitro studies, and computationalpredictions of potential impacts of each variant. Finally, we weigh allevidence to reach an overall conclusion on the potential for each variant tobe disease causing. In this report, we highlight the principles of variantassessment, address the caveats and pitfalls, and provide examples toillustrate the process. By sharing our experience and providing aframework for variant assessment, including access to a freely availablecustomizable tool, we hope to help move towards standardized andconsistent approaches to variant assessment.

Conflict of interest

H. D., J. S., H. M., M. A. K., T. J. P., B. H. F., H. L. R. and M. S. L. areemployed by fee-for-service laboratories performing clinical sequencingservices. Several individuals serve on advisory boards or in othercapacities for companies providing sequencing or other genetic services(H. L. R. – BioBase, Clinical Future, Complete Genomics, GenomeQuest,Illumina, Ingenuity, Knome, Omicia; B. F. – InVitae; J. S. – LabCorp).

H Duzkalea,b,†, J Shena,b,c,†,H McLaughlina,b, A Alfaresa,b,d,MA Kellyb, TJ Pughb,c,BH Funkeb,c, HL Rehmb,c

and MS Lebob,c

aHarvard Medical School GeneticsTraining Program, Boston, MA, USA,bLaboratory for Molecular Medicine,Partners HealthCare Center forPersonalized Genetic Medicine,Cambridge, MA, USA, cDepartment ofPathology, Brigham and Woman’sHospital, Massachusetts GeneralHospital, and Harvard Medical School,Boston, MA, USA, and dDepartment ofPediatrics, Qassim University, Buraydah,Saudi Arabia

†These authors contributed equally to thiswork.

Key words: (4−9) clinical interpretation– gain-of-function (GOF) – geneticvariant – loss of function (LOF) –next-generation sequencing (NGS) –sequence analysis – variantassessment – variant of uncertainsignificance (VUS)

Corresponding author: Matthew S.Lebo, 65 Landsdowne Street,Cambridge, MA 02139, USA.Tel.: 617 768 8292;fax: 617 768 8513;e-mail: [email protected]

Received 7 July 2013, revised andaccepted for publication 19 August2013

Molecular genetic testing informs medical decisionmaking in the diagnosis of symptomatic individuals, inthe prediction of disease risk, in reproductive geneticcounseling, and in determining pharmacogenetic pro-files for treatment guidance. Until recently, the major-ity of clinically available molecular genetic tests haveeither analyzed known DNA variants, such as cysticfibrosis carrier screening panels (1), or sequenced thecoding regions and splicing boundaries of a limited set

of well-known disease-associated genes. Recent techno-logical advances in low-cost, high-throughput sequenc-ing and computing have enabled testing for targetedpanels of >100 disease-area genes, as well as exomesand genomes. While these next-generation sequenc-ing (NGS) technologies have increased diagnostic sen-sitivity (2, 3), the number of genetic variants withuncertain clinical significance (VUS) per test has alsoincreased. For example, expanding testing for dilated

453

H. Duzkale et al.

cardiomyopathy from 5 to 46 genes in our laboratoryresulted in a threefold increase in clinical sensitivitybut an even more dramatic increase in inconclusivecases, many with multiple VUSs. In addition, exomeand genome sequencing tests add a new layer of com-plexity, as the genes interrogated may not have beencarefully assessed for their role in disease until variantsare identified.

Although molecular genetic testing has a uniqueplace in the diagnosis, management, and preventionof genetic disorders, the field is compromised by theabsence of a standard, comprehensive, and efficientvariant assessment protocol approved and shared bythe community. However, guidelines for variant inter-pretation are available and being updated as variant-level knowledge expands, including those from theAmerican College of Medical Genetics and Genomics(ACMG) (4–7). To supplement these guidelines andcapture the evolving state of the field, we developed avariant assessment tool (VAT) that systematically evalu-ates multiple parameters for each variant and facilitatesthe capture of new knowledge in the literature anddatabases (Appendix S1).

The clinical significance of a variant in relation to adisease or phenotype can be determined by answeringthree core questions. (i) Does the variant alter thefunction of the gene [i.e. loss-of-function (LOF) orgain-of-function (GOF)]? (ii) Can the functional changeresult in disease or another phenotype? (iii) Is theassociated disease or phenotype relevant to the specificclinical condition present in the tested individual? Insome cases variant assessment in a clinical laboratorymay only be focused on the first two questions;however, for maximal benefit to the patient, a carefulassessment of the third dimension can be highlyinformative, particularly for VUSs. Here, we shareour decade-long experience with variant assessment,highlighting key points and challenges of clinical inter-pretation. We have evaluated 245 genes associated with53 diseases while testing greater than 22,000 cases. We

Fig. 1. Variant assessment workflow. Genetic variants identified by laboratory testing are annotated with information from various sources includingpublications, computational prediction algorithms, and public, collaborative and internal databases. After evaluation of all pertinent information inconjunction with patient specific clinical and family information, a professionally trained individual will classify the variant into one of the fiveclinical categories and combine all variants for a clinical report.

have iteratively developed a framework through clinicalassessments of over 17,000 variants, including >8000that have been validated and reported in patients. Usingour semi-automated tool, it takes on average 40 minto perform a thorough evidence-based clinical variantassessment for variants being returned after disease-targeted testing. When assessments with literature areexcluded, the average time decreases to 22 min. Anoverview of the process is presented in Fig. 1 and usefulonline resources are presented in Table 1. In addition,the approaches described below refer to the providedVAT with corresponding SOP available for downloadthrough Appendices S1–S3.

Linking genes to disease

As a first step in variant assessment, it is necessaryto determine which disease phenotypes are associatedwith a gene and which types of variation may resultin clinically relevant consequences. This includes thetypes of variants that are known to cause diseasein the gene (truncating/LOF, non-truncating, etc.), theinheritance patterns observed for variants in the gene,the protein domains that are implicated in disease, andany genotype–phenotype correlations described.

To characterize disease phenotypes, it is importantto review the literature for common clinical features aswell as phenotypic variation among affected individ-uals. Large cohort studies may provide expressivity,age-of-onset, penetrance, and prevalence information,while detailed reports of families with multiple affectedindividuals help determine the mode of inheritanceand strength of association. Comparison of the variantspectrum in affected individuals against that in thegeneral population may be useful in identifying thetypes of mutations that are disease-causing. Forinstance, heterozygous LOF variants in MYBPC3have been reported in 14% (311/2302) of patientswith hypertrophic cardiomyopathy (HCM) testedin our laboratory, but in <0.1% (6/6500) of the

454

A systematic approach to assessing variant clinical significance

Table 1. Useful online resources for variant assessment

Usage Online tools Url

Computationalprediction formissense variants

Align GVGD (42) http://agvgd.iarc.fr/agvgd_input.php/

CONDEL http://bg.upf.edu/condel/analysis/MutationAssessor http://mutationassessor.org/MutationTaster http://www.mutationtaster.org/PolyPhen2 (43) http://genetics.bwh.harvard.edu/pph2/SIFT (44) http://sift.jcvi.org/

Computationalprediction forsplicing variants

GeneSplicer http://ccb.jhu.edu/software/genesplicer/

Human Splicing Finder http://www.umd.be/HSF/MaxEntScan http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.htmlNNSplice http://www.fruitfly.org/seq_tools/splice.html

Disease curation GeneReviews http://www.ncbi.nlm.nih.gov/books/NBK1116/OMIM http://omim.org/

Domain database NCBI conserved domaindatabase

http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

Genome Browser Ensembl http://www.ensembl.org/index.htmlUCSC Genome Browser http://genome.ucsc.edu/

Literature database PubMed http://www.ncbi.nlm.nih.gov/pubmedVariant database 1000 Genomes Project http://browser.1000genomes.org

ClinVar http://www.ncbi.nlm.nih.gov/clinvar/dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/Exome Variant Server (9) http://evs.gs.washington.edu/EVS/HGMD http://www.hgmd.cf.ac.uk/ac/index.php

Variant validation HGVS nomenclature http://www.hgvs.org/mutnomen/Mutalyzer https://mutalyzer.nl/

general population per the NHLBI Exome Sequenc-ing Project (ESP), supporting that LOF MYBPC3variants are a common mechanism in HCM (8).However, external information must be carefullyvetted. An apparent frameshift variant in MYBPC3NM_000256:c.2854_2858del reported to occur in 7%of the general population in ESP is likely a technicalartifact, as we have never observed it sequencing theregion by NGS and/or Sanger in over 2000 cases.

Different types of variants in the same gene may beassociated with distinct phenotypes or inheritance pat-terns. For example, missense GOF variants in PTPN11cause RASopathies, such as Noonan syndrome, whereasLOF variants lead to an entirely different phenotype, acartilage tumor syndrome (metachondromatosis) char-acterized by enchondromas and exostoses (9). Certainmissense variants in TECTA lead to autosomal dominanthearing loss (10), whereas LOF variants result in auto-somal recessive hearing loss (11). Similarly, variants indifferent regions or domains of a gene may cause differ-ent phenotypes (10, 12). Important questions to considerwhen analyzing gene–disease associations and specificvariants within a gene can be found in Table 2.

Validating variants to ensure accuracy

As test complexity has increased, so has the need toensure variants identified and included on a clinicalreport are technically accurate. This is especiallyimportant for sequencing tests where the variants are

not part of a pre-defined list. It is essential to reviewraw assay results (e.g. chromatographs of Sangersequencing traces or NGS reads) to verify the variantsand their nomenclature. Prior to variant assessment,the laboratory should predefine the genome build,gene name, and reference transcript that will be usedin interpretation and reporting, along with a methodlinking genomic coordinates to cDNA and amino acidlevel annotations. Laboratories should also be awareof homologous and repetitive regions particularlyfrom pseudogenes and segmental duplications, whichmay result in lack of coverage, alignment difficulties,and incorrect variant calls. These steps will enablevalidation of the correct variant call, zygosity andnomenclature according to the Human Genome Vari-ation Society (HGVS) guidelines (13). Validationinformation is captured in the ‘Variant’ tab of the VAT.

Because standards for variant nomenclature haveonly recently been widely adopted and still do notaddress all modifications, variants may have differingnames in publications and databases. The amino acidposition may not be numbered according to the startcodon to be consistent with current recommendations.For example, TTR variants were originally numberedaccording to the position within the mature proteinlacking the 20 amino acid signal peptide (14). Partialcloning of a gene may have led to inconsistentnomenclature in early publications (15, 16). Nucleotidegene numbering may have been determined using thetranscription start site instead of the translation start site,

455

H. Duzkale et al.

Table 2. Variant assessment checklist

Gene-level informationConfirm gene is implicated in disease with sufficient evidence, including human genetic data and functional dataDetermine inheritance pattern, age-of-onset, penetrance and prevalence for each gene-disease association, ifpossibleDetermine types of disease-associated variants in gene (gain-of-function, loss-of-function, etc.)

Variant validationReview raw sequence data to confirm the variant callDetermine zygosity of the variantAssociate genome build, genomic coordinate, and reference transcript to the variantConfirm variant nomenclature

Genetic dataDetermine frequency of variant in large population studies, parsed by raceDetermine if population frequency is consistent with disease inheritance, age-of-onset, penetrance and prevalenceEvaluate whether variant segregates with disease in affected family membersIf disease is inherited in a recessive manner, determine if the variant is found in trans with a pathogenic variantIf applicable, determine if there is a statistically significant difference in variant frequency between cases andcontrols

Functional dataEvaluate available in vivo functional data

Confirm type of animal model is relevant for human diseaseEvaluate available in vitro functional data

Confirm assays used reflect disease-associated cellular mechanismsComputational data

Evaluate nucleotide alignment data and assess evolutionary conservation (for all variants)Evaluate amino acid alignment data and assess evolutionary conservation (for missense variants)

Eliminate any poor species alignmentsDetermine if computational tools predict an effect on protein structure or splicing (for all variants)

which was particularly challenging given transcriptionalstart site variability. Furthermore, for many smallinsertions and deletions, it is not possible to determinethe exact location of the inserted or deleted base(s).This can lead to multiple potential names for thesame variant, highlighting the importance for followingstandard HGVS nomenclature rules such as attributingalternations within a repetitive stretch to the most 3′possible position. Legacy terms and alternative aliasesare useful to maintain association with the correctlynamed variant both to facilitate searching the literatureand databases, as well as communicating with orderingphysicians and other laboratories.

Genes may have multiple transcripts, some ofwhich are tissue-specific and associated with distinctphenotypes. For example, the shorter USH1C transcript(NM_005709) is expressed in both the retina and innerear, whereas the longer transcript (NM_153676) isexpressed exclusively in the inner ear (17). Accord-ingly, variants in exons of shorter transcript lead toUsher syndrome type 1C, characterized by profounddeafness, retinitis pigmentosa, and vestibular dys-function, whereas variants in additional exons uniqueto NM_153676 lead to non-syndromic hearing loss(18). Variants should be reported according to a singleprimary transcript. The reported reference is typicallythe major transcript unless a more severe impact ispredicted on an alternative transcript, in which case thevariant should be defined according to the alternative

transcript, noting an alias to the primary transcript(Fig. 2a).

When multiple variants in the same gene areidentified, the phase of the variants (i.e. on thesame chromosome – in cis – or on homologouschromosomes – in trans) may influence the interpre-tation, especially for autosomal recessive traits. Ifvariants are within the same NGS fragment, the phasemay be determined without parental samples (Fig. 2b).

Collecting evidence to determine the likelihood ofpathogenicity

Once the variant call is validated, literature, variantdatabases, and population control studies should beevaluated. This information is used to determinewhether and under what context the variant has beenpreviously observed. The population, literature andinternal case data are captured in the ‘Control_Freq’,‘DB’ and ‘Publ+Internal_data’ tabs of the VAT.

Recent large-scale population studies such as theNHLBI Exome Sequencing Project (9), the 1000Genomes Project (19), the ClinSeq Project (20) andothers found in dbSNP (21) have catalogued largeamounts of sequence variation (Table 1). Because thesepopulations may include presymptomatic individualswith late onset diseases, asymptomatic individualswith low penetrance diseases or younger than typicalage-of-onset, and heterozygous carriers of recessive

456


Fig. 2. (a) Reference transcript selection. Two transcripts for OTOF are shown: NM_194248, the longest transcript selected as the primarytranscript, and NM_194322, a shorter alternate transcript. Position g.26700697 (grey box) is non-coding in NM_194248 (c.2215-80) but codingin NM_194322 (c.65); therefore NM_194322 should be selected while evaluating this variant. Adapted from Alamut® software (InteractiveBiosoftware, http://www.interactive-biosoftware.com) (b) Phasing multiple variants. Two variants are present at positions c.2401 and c.2402 inOTOF (NM_194248). The top traces show the chromatographs from Sanger sequencing with the consensus reference sequence shown underneath.Representative aligned NGS reads are shown below. Grey bars represent reference sequence with variants highlighted in red. The bottom schematicshows associated OTOF coding exons (rectangles) and the reference amino acid sequence. The arrow indicates the 5′ to 3′ direction. The c.2401G>T(p.Glu801*) is listed in dbSNP (rs75624587) and ESP as a nonsense variant but with an allele frequency of 10% in African Americans. However,NGS reads reveal that the variants are in cis and should therefore be named c.2401_2402delinsTT (p.Glu801Leu). (c) Segregation analysis withincomplete penetrance. A family with hypertrophic cardiomyopathy is shown. Affected individuals are indicated by filled squares (males) or circles(females). Mutation-positive individuals are indicated by a ‘+’, while mutation-negative individuals are indicated by a ‘−’. All mutation-positiveindividuals are affected, with the exception of individual II-1. Because HCM can display reduced penetrance, individual II-1 would not be considereda non-segregation. (d) Conservation based on multiple species alignment. An example of alignment of TNNC1 in UCSC Genome Browser is shown.Tree shrew sequence shows poor alignment (red box). Arrows point to non-conserved residues (45). Adapted from http://genome.ucsc.edu usinghg19 reference sequence. (e) Conflicting computational predictions of a missense variant. The results of multiple computational tools are capturedin the VAT. They provide conflicting predictions for the NM_000366:c.688G>A (p.Asp230Asn) variant in TPM1 , suggesting that at least sometools are not reliable. (f) Predicted splicing effect of a coding variant. Splicing prediction tools indicate that the NM_022124:c.5712G>A variant inCDH23 , which affects the last base in exon 43, may impact splicing. However, not all programs agree in the potential effect on splicing, and theycannot predict whether it would lead to exon skipping, intron retention or use of cryptic splice sites. Adapted from Alamut® software (InteractiveBiosoftware, http://www.interactive-biosoftware.com)

457

H. Duzkale et al.

Fig. 2. Continued.

traits, variants should not be assumed benign simplybecause of their presence in large population studies.Information on affected individuals with the variant canbe obtained from internal and public variant databases(e.g. ClinVar, HGMD (22) or locus-specific databases),as well as from the literature. Public variant databasesare of varying quality and may be outdated or containcontradictory data. Recent studies have demonstrateda large number of false-positive variants incorrectlyidentified as clinically relevant in these databases(23–27). Therefore, databases available today shouldbe used to identify relevant primary literature ratherthan directly reference a variant classification.

For Mendelian disorders, the pathogenicity ofa variant can be ruled out if its frequency in thegeneral population exceeds what can be accountedfor by inheritance pattern, age-of-onset, prevalence,penetrance, and heterogeneity. Large sample sizeswithout selection bias towards individuals with diseasephenotypes are required to achieve confidence inestimating the population allele frequency. Moreover,

disease prevalence is not always known, accurate, orapplicable across all populations. Because one affectedallele is sufficient to cause an autosomal dominant trait,a pathogenic allele must present at a frequency lowerthan the disease prevalence in the general population.HCM is primarily an autosomal dominant conditionoccurring in 1 in 500 individuals (1/1000 chromosomesor 0.1% allele frequency) (28). We consider a variantlikely benign if the allele frequency is >0.3% which isa conservative 1.5 times above the highest frequencyexpected even if penetrance was only 50% and thedisease was due to one pathogenic variant. In contrast,both paternal and maternal alleles need to be affectedto cause an autosomal recessive disorder. The heterozy-gous carrier frequency of any pathogenic allele must beless than twice the square root of the disease prevalence(which is the hypothetical allele frequency if only onedisease allele accounts for all cases). For example, theprevalence of congenital hearing loss with a geneticetiology is roughly 1 in 1000 and half of these casesare due to GJB2 variants. Therefore, the estimated

458


prevalence of GJB2 -related hearing loss is 1 in 2000.Accordingly, pathogenic variants in GJB2 are expectedto occur no more than 4% in the general population.It is not surprising that the carrier frequency for thec.35delG variant in GJB2 could be as high as 2% (29).Population data pertaining to a specific ethnic composi-tion are particularly useful. The 1000 Genomes Projecthas revealed many variants common in certain ethnicgroups, but rare in general (30). If a subpopulation doesnot have an increased occurrence of the associated dis-ease and affected individuals are not under-diagnosed,variant classification based on the allele frequency inthe subpopulation can be applied more broadly.

While a high allele frequency in the general popula-tion may rule out pathogenicity of a variant for a raredisorder, absence or a very low frequency of a vari-ant in the broad population cannot be used to assumepathogenicity. While coding variants below 1% allelefrequency in the seven populations examined by the1000 Genomes Project are enriched for functional vari-ants (31), lack of a variant from population datasetscannot be used to assume absence from the popula-tion unless it is determined that the study technicallyinterrogated the position sufficiently to rule out a poten-tial false-negative result. A variant is statistically morelikely pathogenic if it occurs in affected individualsmore than expected by chance. The likelihood of ran-dom occurrence can be calculated as the probability ofco-incidence of rare events, as the logarithm of odds(LOD) score through linkage analysis or as p-valuesthrough case–control studies using a Fisher’s exact orchi-square test. Low probabilities of co-incidence sta-tistically demonstrate non-random occurrences of thevariant in affected individuals.

The presence of de novo variants may support dis-ease association due to their rarity. The de novo pointmutation rate is ∼1 per exome (32), consistent withan average rate of 1.2 × 10−8 per nucleotide per gen-eration in human genome (33). Therefore, confirmedde novo status of a variant in a disease-associatedgene strongly increases the likelihood of pathogenic-ity in rare conditions when the patient’s disease isde novo and matches the associated phenotypes. Testingof biological parents and excluding the possibilities ofnon-paternity and sample swap (e.g. genotyping withmicrosatellite markers) are necessary for confirmationof de novo variants. Similarly, for rare recessivedisorders, if a rare variant is confirmed in trans withanother pathogenic variant in the same disease gene, itis more likely pathogenic.

Significant co-segregation of a variant with diseaseprovides strong genetic linkage evidence to supportpathogenicity. Linkage analysis programs can be usedto calculate the LOD scores, but a simple count ofinformative segregations can provide an estimate. Asa rule of thumb, 10 informative segregations wouldachieve a LOD score > 3.0, necessary to establishlinkage between a genetic locus and a disease. Forestablished disease genes, given the a priori proba-bility of disease association, fewer informative segre-gations may be acceptable in combination with other

supporting evidence. Because genotype-phenotype cor-relation may be masked by incomplete penetrance, vari-able expressivity, and late age-of-onset in genotype-positive individuals, unaffected family members shouldnot contribute segregation information under these cir-cumstances (34) (Fig. 2c). Additional evidence may stillbe required to establish pathogenicity, as any variant inlinkage disequilibrium with the causative variant willsegregate with the disease. For example, the Ile148Thrvariant in CFTR was removed from the original cysticfibrosis carrier-testing panel because it was later deter-mined its association with the disease was due to tightlinkage with another pathogenic variant (35, 36).

Functional evidence that links the variant to diseasephenotypes is important to establish causality. How-ever, this information is typically unavailable for indi-vidual variants in routine diagnostic testing. When stud-ies regarding a specific variant have been published, itis important to determine the type of assay used andwhether the results and conclusions drawn are applica-ble to the mechanism and presentation of the disease.In general, direct assays on patient tissues provide thestrongest functional evidence because they reveal truebiological consequences of a variant within a humanindividual. In vivo studies in mammals may add moreevidence at the system level. In vitro studies can be use-ful, especially in cases where the in vitro assay directlytests an established molecular mechanism of disease[e.g. structural proteins or ion channels (37)], but maynot accurately represent the biological environment ordirectly prove causation of disease.

In summary, population, statistical and functionalevidence need to be carefully evaluated to determinethe clinical significance of a variant. Table 2 lists someimportant considerations when collecting this data.

Predicting disease association using bioinformaticstools

If the evidence for disease association from existingdata is not strong or the mechanism of gene function isunclear, a number of bioinformatics tools may be usedto predict the possible impact of the variant on the geneor protein. Computational predictions are generallybased on the type of change, the domain structure,sequence conservation, and biochemical propertiesof the affected amino acid residues. Computationalinformation is captured in the ‘Conserv_Biochem’ and‘Splicing’ tabs of the VAT.

At both the nucleotide and amino acid level, sequenceconservation may indicate regions and positions offunctional importance, as negative selection removeschanges that are deleterious to proper biological func-tion, leading to high evolutionary conservation (38).Computationally derived alignments can indicate whena specific sequence is important to the underlying geneor protein function (Fig. 2d). Conversely, presence ofthe variant amino acid in other species, particularlyprimates and other mammals, may indicate a toleranceto that change.

459

H. Duzkale et al.

Tabl

e3.

Exa

mpl

esof

com

bini

ngev

iden

cefro

mm

ultip

leso

urce

sto

dete

rmin

epa

thog

enic

itya

Exa

mpl

eru

les

for

varia

ntcl

assi

ficat

ion

Gen

efir

mly

asso

ciat

edw

ithpa

tient

dise

ase

Var

iant

type

asso

ciat

edw

ithpa

tient

dise

ase

Seg

rega

tion

(#in

form

ativ

em

eios

es)

Co-

occu

rren

cein

tran

sw

ithpa

thog

enic

varia

nts

(rece

ssiv

edi

seas

e)A

min

oac

idco

nser

vatio

n

Freq

uenc

yin

cont

rolc

hrom

osom

esor

heal

thy

popu

latio

nFu

nctio

nal

data

Insi

lico

(com

puta

tiona

l)da

ta

Pat

hoge

nic

Sig

nific

ants

egre

gatio

nY

Y≥1

0S

tron

gse

greg

atio

n+

addi

tiona

lda

taY

Y5

–9C

onse

rved

thro

ugh

bird

s≤1

/500

0(0

.02%

)Y

Mul

tiple

co-o

ccur

renc

esw

ithpa

thog

enic

varia

ntin

tran

s(re

cess

ive

dise

ase)

YY

5+C

onse

rved

thro

ugh

bird

s≤1

/500

0(0

.02%

)Y

LOF

varia

nt(5

′ ofl

ast5

0ba

ses

ofpe

nulti

mat

eex

on)

YY

Like

lypa

thog

enic

Var

iant

atlo

wfre

quen

cyw

ithst

rong

invi

vofu

nctio

nald

ata

YY

Con

serv

edth

roug

hbi

rds

≤1/5

000

(0.0

2%)

Y

De

novo

varia

ntin

patie

ntw

ithde

novo

dise

ase

YY

Str

ong

segr

egat

ion

WIT

HO

UT

cont

rold

ata

YY

5–9

Con

serv

edth

roug

hbi

rds

N/A

Mod

erat

ese

greg

atio

nY

Y3

–4C

onse

rved

thro

ugh

bird

s≤1

/500

0(0

.02%

)

Pat

ient

carr

ies

one

path

ogen

icva

riant

AN

Dse

cond

varia

ntin

tran

s(re

cess

ive

dise

ase)

YY

YC

onse

rved

thro

ugh

bird

sS

uppo

rtiv

e

VU

Sb

–fa

vor

path

ogen

icM

inim

umse

greg

atio

nY

Y≤2

Con

serv

edth

roug

hbi

rds

Sup

port

ive

Oth

erva

riant

sat

posi

tion

know

nto

bepa

thog

enic

YY

Con

serv

edth

roug

hbi

rds

≤1/5

000

(0.0

2%)

Sup

port

ive

LOF

varia

ntw

hen

LOF

noty

etan

esta

blis

hed

dise

ase

mec

hani

sm

YN

VU

Sb

Con

flict

ing

info

rmat

ion

Gen

eor

varia

ntty

peno

tpr

evio

usly

asso

ciat

edw

ithte

sted

dise

ase

NN

Sile

ntva

riant

,abs

entf

rom

larg

era

ce-m

atch

edco

ntro

lAN

Dpr

edic

ted

effe

cton

splic

ing

AN

Dsp

lice

varia

nts

know

nty

peof

path

ogen

icva

riant

YY

≤1/5

000

(0.0

2%)

Sup

port

ive

Var

iant

dete

cted

inco

ntro

lor

heal

thy

indi

vidu

als

atlo

wfre

quen

cy

Low

frequ

ency

c

460


Tabl

e3.

Con

tinue

d

Exa

mpl

eru

les

for

varia

ntcl

assi

ficat

ion

Gen

efir

mly

asso

ciat

edw

ithpa

tient

dise

ase

Var

iant

type

asso

ciat

edw

ithpa

tient

dise

ase

Seg

rega

tion

(#in

form

ativ

em

eios

es)

Co-

occu

rren

cein

tran

sw

ithpa

thog

enic

varia

nts

(rece

ssiv

edi

seas

e)A

min

oac

idco

nser

vatio

n

Freq

uenc

yin

cont

rolc

hrom

osom

esor

heal

thy

popu

latio

nFu

nctio

nal

data

Insi

lico

(com

puta

tiona

l)da

ta

VU

Sb

–fa

vor

beni

gnV

aria

ntam

ino

acid

pres

ent

inm

ultip

lem

amm

als

Mul

tiple

mam

mal

sw

ithva

riant

Var

iant

notc

onse

rved

and

pres

enti

nhe

alth

yin

divi

dual

s

Not

cons

erve

dLo

wfre

quen

cyc

Var

iant

pred

omin

antly

dete

cted

inm

inor

ityra

ce,

noco

ntro

ldat

a,an

dco

mpu

tatio

nald

ata

pred

icts

beni

gn

N/A

Pre

dict

beni

gn

Like

lybe

nign

Mis

sens

eO

Rsp

lice

cons

ensu

sou

tsid

e±1

,2(−

3,−5

to−1

5,O

R+3

to+6

)

MA

Fd≥

0.3%

AN

D<

1%A

ND

≥10

alle

les

Sile

ntva

riant

orin

tron

icva

riant

outs

ide

splic

eco

nsen

sus

(−4

OR

+7to

+15)

MA

Fd<

1%

Ben

ign

Mis

sens

eO

Rsp

lice

cons

ensu

sou

tsid

e±1

,2(−

3,−5

to−1

5,O

R+3

to+6

)

MA

Fd≥

1%A

ND

≥30

alle

les

Sile

ntva

riant

orin

tron

icva

riant

outs

ide

splic

eco

nsen

sus

(−4

OR

+7to

+15)

MA

Fd≥

1%A

ND

≥5

alle

les

aE

xam

ples

ofpa

thog

enic

itycl

assi

ficat

ion

com

bine

mul

tiple

sour

ces

ofev

iden

ceof

vario

usst

reng

th.

Str

engt

hof

evid

ence

isba

sed

onth

eco

rrel

atio

nof

the

type

ofev

iden

cew

ithth

eac

cura

cyof

varia

ntcl

assi

ficat

ion.

Dire

ctev

iden

cew

ithsu

ffici

ent

stat

istic

alpo

wer

orfro

mpr

oper

biol

ogic

alex

perim

ents

isde

emed

‘str

ong’

,w

hile

supp

ortiv

est

udie

sor

insi

lico

pred

ictio

nsth

atca

nnot

prov

etr

uebi

olog

ical

cons

eque

nces

are

cons

ider

edm

oder

ate

orw

eak.

bV

US

,Var

iant

ofU

ncer

tain

Sig

nific

ance

.cLo

wfre

quen

cy,F

requ

ency

thre

shol

dsde

pend

upon

dise

ase

mod

eof

inhe

ritan

ce,p

enet

ranc

e,an

dpr

eval

ence

.See

text

for

mor

ede

tails

.dM

AF,

Min

oral

lele

frequ

ency

.

461

H. Duzkale et al.

Many algorithms are available to classify missensesubstitutions and potential splicing alterations (Fig. 2e).Use of multiple prediction algorithms is recommended.Because most of the programs use similar underlyingdatasets and assumptions, they should not be regardedas independent evidence, though some may includeadditional features. The datasets used for training thealgorithms are mostly from non-clinical grade databasesthat may not be accurate or comprehensive. Diseasespecific algorithms can be applied to a specific set ofgenes with significantly enhanced performance (39, 40),though these are limited in availability. Predicting theeffect of variants occurring near the splice region canbe particularly challenging as it is often unclear whatkind of abnormal transcript may be produced (Fig. 2e).

In summary, although computational predictions areuseful in guiding classification, they are not able todetermine or rule out pathogenicity. Table 2 addressesspecific questions for consideration when examiningcomputational data.

Combining multiple lines of evidence to reach anoverall interpretation

Final interpretation of the clinical significance ofa variant requires examination of all the availableevidence. While some data can be strong enough todetermine or rule out pathogenicity, most informationonly moderately influences final conclusions and isvaluable in combination. Table 3 lists examples of howa laboratory may combine different types of availableevidence into a clinical classification scheme.

For instance, a synonymous variant in exon 16of TECTA, NM_005422.2: c.5331G>A p.Leu1777Leu,may not be expected to be pathogenic because it doesnot alter the amino acid. However, it is predictedto lead to loss of an exonic splice enhancer bindingsite, has not been reported in large population studies,and has been reported to segregate with disease in10 affected family members with autosomal dominanthearing loss (41). In addition, examination of mRNAfrom patient lymphocytes revealed skipping of exon16, leading to an in-frame deletion in the aminoacid sequence. Protein impairment, but not total LOF,is associated with TECTA-related autosomal dominanthearing loss, consistent with this prediction. Thisexample demonstrates the importance of evaluatingclinical data as well as functional evidence to makea definitive classification.

Conclusions and future perspectives

Variant assessment has become the bottleneck of largescale sequencing tests. Using the VAT described herehas served to decrease the average time of variantassessment in our laboratory to 22 min by utilizinghyperlinks to perform database and literature searchesand providing a platform to compile, analyze, andinterpret variant data. However, it may take longerthan 2 h if a large collection of literature needs to bereviewed. Large gene panels may produce >10 variants

that need review, and even after filtration strategies,exome and genome sequencing may produce 100s ofvariants. Further automation to retrieve relevant variantinformation directly from the literature and databaseswill speed the process. Clinically validated predictionalgorithms trained on variants with well-establishedpathogenic or benign classifications (39) will improvethe accuracy of computational prediction. Routine andstandardized functional assays will provide necessaryevidence to classify VUSs, but it is challengingto establish and support these assays in clinicaldiagnostics laboratories.

Information sharing and collaboration amongst lab-oratories will reduce the number of unique assess-ments performed. ClinVar, a recent NCBI initiativeaiming to share clinical-grade variant information, isexpected to support the molecular diagnostics commu-nity through genotype–phenotype associations aided byactual patient data. This may in turn inspire and acceler-ate the development of automated diagnostic predictionalgorithms. Software is currently being developed tosupport the aggregation of internal and external variantinformation to enable sharing of clinical grade variantdata between different laboratories without jeopardizingpatient identity (Table 1). Collectively, these approacheswill greatly facilitate variant classification.

Conventionally, each clinical laboratory has had theliberty to develop, validate and perform diagnostic testsfollowing recommendations by national or internationalagencies such as ACMG, CAP, CLIA, CLSI, EMNQand WHO. Although proficiency testing has addressedthe consistency in raw test output between differentlaboratories, there is still lack of agreement in variantassessment procedures and parameters, as well as finalclassification criteria. ACMG has provided guidelinesfor variant assessment (4, 7), but a consensus structuredframework ensuring evidence-based classifications thatcan be easily adopted by individual laboratories is cur-rently missing. Working groups have been formed toaddress this issue and are modifying the current variantclassification guidelines into a consensus variant grad-ing system based on the feedback from the community(ACMG 2013 Interpreting Sequencing Variants OpenForum). We hope that the framework provided aboveand the attached variant assessment tool, in combina-tion with the consensus guidelines, serves as a usefulmechanism in the clinical variant interpretation process.

Supporting Information

The following Supporting information is available for this article:

Appendix S1. Variant Assessment ToolAppendix S2. Variant Assessment SOPAppendix S3. Variant Assessment Static Data

Additional Supporting information may be found in the onlineversion of this article.

Acknowledgements

We thank Jordan Lerner-Ellis, Sami Amr, and Mark Bowser fortheir help in developing and maintaining the VAT. We also thank

462


all of our colleagues past and present at the LMM for theircontributions to our variant assessment process over the pastdecade. This work was supported in part by National Institutesof Health grants HG006834 and HG006500.

References1. Richards CS, Bradley LA, Amos J et al. Standards and guidelines for

CFTR mutation testing. Genet Med 2002: 4: 379–391.2. Huang T. Next generation sequencing to characterize mitochondrial

genomic DNA heteroplasmy. Curr Protoc Hum Genet 2011: Chapter19: Unit 19.8.

3. Valencia CA, Ankala A, Rhodenizer D et al. Comprehensive mutationanalysis for congenital muscular dystrophy: a clinical PCR-basedenrichment and next-generation sequencing panel. PLoS One 2013:8: e53083.

4. Richards CS, Bale S, Bellissimo DB et al. ACMG recommendationsfor standards for interpretation and reporting of sequence variations:revisions 2007. Genet Med 2008: 10: 294–300.

5. Kearney HM, Thorland EC, Brown KK, Quintero-Rivera F, SouthST, Working Group of the American College of Medical GeneticsLaboratory Quality Assurance. American College of Medical Geneticsstandards and guidelines for interpretation and reporting of postnatalconstitutional copy number variants. Genet Med 2011: 13: 680–685.

6. Plon SE, Eccles DM, Easton D et al. Sequence variant classification andreporting: recommendations for improving the interpretation of cancersusceptibility genetic test results. Hum Mutat 2008: 29: 1282–1291.

7. Rehm HL, Bale SJ, Bayrak-Toydemir P et al. ACMG clinical laboratorystandards for next-generation sequencing. Genet Med, 2013: 15:733–747.

8. Niimura H, Bachinski LL, Sangwatanaroj S et al. Mutations in thegene for cardiac myosin-binding protein C and late-onset familialhypertrophic cardiomyopathy. N Engl J Med 1998: 338: 1248–1257.

9. Sobreira NL, Cirulli ET, Avramopoulos D et al. Whole-genomesequencing of a single proband together with linkage analysis identifiesa Mendelian disease gene. PLoS Genet 2010: 6: e1000991.

10. Verhoeven K, Van Laer L, Kirschhofer K et al. Mutations in the humanalpha-tectorin gene cause autosomal dominant non-syndromic hearingimpairment. Nat Genet 1998: 19: 60–62.

11. Mustapha M, Weil D, Chardenoux S et al. An alpha-tectorin gene defectcauses a newly identified autosomal recessive form of sensorineuralpre-lingual non-syndromic deafness, DFNB21. Hum Mol Genet 1999:8: 409–412.

12. Balciuniene J, Dahl N, Jalonen P et al. Alpha-tectorin involvement inhearing disabilities: one gene--two phenotypes. Hum Genet 1999: 105:211–216.

13. Taschner PE, den Dunnen JT. Describing structural changes byextending HGVS sequence variation nomenclature. Hum Mutat 2011:32: 507–511.

14. Mita S, Maeda S, Shimada K, Araki S. Cloning and sequence analysisof cDNA for human prealbumin. Biochem Biophys Res Commun 1984:124: 558–564.

15. Joensuu T, Hamalainen R, Yuan B et al. Mutations in a novel genewith transmembrane domains underlie Usher syndrome type 3. Am JHum Genet 2001: 69: 673–684.

16. Fields RR, Zhou G, Huang D et al. Usher syndrome type III: revisedgenomic structure of the USH3 gene and identification of novelmutations. Am J Hum Genet 2002: 71: 607–617.

17. Verpy E, Leibovici M, Zwaenepoel I et al. A defect in harmonin, aPDZ domain-containing protein expressed in the inner ear sensory haircells, underlies Usher syndrome type 1C. Nat Genet 2000: 26: 51–55.

18. Ouyang XM, Xia XJ, Verpy E et al. Mutations in the alternativelyspliced exons of USH1C cause non-syndromic recessive deafness. HumGenet 2002: 111: 26–30.

19. Abecasis GR, Altshuler D, Auton A et al. A map of humangenome variation from population-scale sequencing. Nature 2010: 467:1061–1073.

20. Biesecker LG, Mullikin JC, Facio FM et al. The ClinSeq Project:piloting large-scale genome sequencing for research in genomicmedicine. Genome Res 2009: 19: 1665–1674.

21. Sayers EW, Barrett T, Benson DA et al. Database resources of theNational Center for Biotechnology Information. Nucleic Acids Res2012: 40: D13–D25.

22. Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN. TheHuman Gene Mutation Database (HGMD) and its exploitation in thefields of personalized genomics and molecular evolution. Curr ProtocBioinformatics 2012: Chapter 1: Unit 1.13.

23. Andreasen C, Nielsen JB, Refsgaard L et al. New population-based exome data are questioning the pathogenicity of previouslycardiomyopathy-associated genetic variants. Eur J Hum Genet, 2013:21: 918–928.

24. Bell CJ, Dinwiddie DL, Miller NA et al. Carrier testing for severechildhood recessive diseases by next-generation sequencing. Sci TranslMed 2011: 3: 65ra4.

25. Xue Y, Chen Y, Ayub Q et al. Deleterious- and disease-alleleprevalence in healthy individuals: insights from current predictions,mutation databases, and population-scale resequencing. Am J HumGenet 2012: 91: 1022–1032.

26. Hunt KA, Smyth DJ, Balschun T et al. Rare and functional SIAEvariants are not associated with autoimmune disease risk in up to 66,924individuals of European ancestry. Nat Genet 2012: 44: 3–5.

27. Kenna KP, McLaughlin RL, Hardiman O, Bradley DG. Using referencedatabases of genetic variation to evaluate the potential pathogenicity ofcandidate disease variants. Hum Mutat 2013: 34: 836–841.

28. Maron BJ. Hypertrophic cardiomyopathy: a systematic review. JAMA2002: 287: 1308–1320.

29. Gasparini P, Rabionet R, Barbujani G et al. High carrier frequencyof the 35delG deafness mutation in European populations. GeneticAnalysis Consortium of GJB2 35delG. Eur J Hum Genet 2000: 8:19–23.

30. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D et al.A map of human genome variation from population-scale sequencing.Nature 2010: 467: 1061–1073.

31. Marth GT, Yu F, Indap AR et al. The functional spectrum of low-frequency coding variation. Genome Biol 2011: 12: R84.

32. O’Roak BJ, Deriziotis P, Lee C et al. Exome sequencing in sporadicautism spectrum disorders identifies severe de novo mutations. NatGenet 2011: 43: 585–589.

33. Kong A, Frigge ML, Masson G et al. Rate of de novo mutations and theimportance of father’s age to disease risk. Nature 2012: 488: 471–475.

34. Caleshu C, Day S, Rehm HL, Baxter S. Use and interpreta-tion of genetic tests in cardiovascular genetics. Heart 2010: 96:1669–1675.

35. Rohlfs EM, Zhou Z, Sugarman EA et al. The I148T CFTR alleleoccurs on multiple haplotypes: a complex allele is associated with cysticfibrosis. Genet Med 2002: 4: 319–323.

36. Buyse IM, McCarthy SE, Lurix P et al. Use of MALDI-TOF massspectrometry in a 51-mutation test for cystic fibrosis: evidence that3199del6 is a disease-causing mutation. Genet Med 2004: 6: 426–430.

37. Mann SA, Castro ML, Ohanian M et al. R222Q SCN5A mutation isassociated with reversible ventricular ectopy and dilated cardiomyopa-thy. J Am Coll Cardiol 2012: 60: 1566–1573.

38. Reed FA, Akey JM, Aquadro CF. Fitting background-selectionpredictions to levels of nucleotide variation and divergence along thehuman autosomes. Genome Res 2005: 15: 1211–1221.

39. Jordan DM, Kiezun A, Baxter SM et al. Development and validationof a computational method for assessment of missense variantsin hypertrophic cardiomyopathy. Am J Hum Genet 2011: 88:183–192.

40. Crockett DK, Lyon E, Williams MS, Narus SP, Facelli JC, MitchellJA. Utility of gene-specific algorithms for predicting pathogenicityof uncertain gene variants. J Am Med Inform Assoc 2012: 19:207–211.

41. Collin RW, de Heer AM, Oostrik J et al. Mid-frequency DFNA8/12hearing loss caused by a synonymous TECTA mutation that affects anexonic splice enhancer. Eur J Hum Genet 2008: 16: 1430–1436.

42. Tavtigian SV, Deffenbaugh AM, Yin L et al. Comprehensive statisticalstudy of 452 BRCA1 missense substitutions with classification of eightrecurrent substitutions as neutral. J Med Genet 2006: 43: 295–305.

43. Sunyaev S, Lathe W 3rd, Bork P. Integration of genome data andprotein structures: prediction of protein folds, protein interactions and”molecular phenotypes” of single nucleotide polymorphisms. Curr OpinStruct Biol 2001: 11: 125–130.

44. Ng PC, Henikoff S. SIFT: predicting amino acid changes that affectprotein function. Nucleic Acids Res 2003: 31: 3812–3814.

45. Kent WJ, Sugnet CW, Furey TS et al. The human genome browser atUCSC. Genome Res. 2002: 12: 996–1006.

463

Duzkale_2013_Variant Interpretation_

Documents

Transcript of Duzkale_2013_Variant Interpretation_