Duzkale_2013_Variant Interpretation_
-
Upload
hatice-duzkale-md-mph-phd-facmg -
Category
Documents
-
view
75 -
download
0
Transcript of Duzkale_2013_Variant Interpretation_
Clin Genet 2013: 84: 453–463Printed in Singapore. All rights reserved
© 2013 John Wiley & Sons A/S.Published by John Wiley & Sons Ltd
CLINICAL GENETICSdoi: 10.1111/cge.12257
Original Article
A systematic approach to assessing theclinical significance of genetic variants
Duzkale H, Shen J, McLaughlin H, Alfares A, Kelly MA, Pugh TJ, FunkeBH, Rehm HL, Lebo MS. A systematic approach to assessing the clinicalsignificance of genetic variants.Clin Genet 2013: 84: 453–463. © John Wiley & Sons A/S. Published byJohn Wiley & Sons Ltd, 2013
Molecular genetic testing informs diagnosis, prognosis, and riskassessment for patients and their family members. Recent advances inlow-cost, high-throughput DNA sequencing and computing technologieshave enabled the rapid expansion of genetic test content, resulting indramatically increased numbers of DNA variants identified per test. Toaddress this challenge, our laboratory has developed a systematic approachto thorough and efficient assessments of variants for pathogenicitydetermination. We first search for existing data in publications anddatabases including internal, collaborative and public resources. We thenperform full evidence-based assessments through statistical analyses ofobservations in the general population and disease cohorts, evaluation ofexperimental data from in vivo or in vitro studies, and computationalpredictions of potential impacts of each variant. Finally, we weigh allevidence to reach an overall conclusion on the potential for each variant tobe disease causing. In this report, we highlight the principles of variantassessment, address the caveats and pitfalls, and provide examples toillustrate the process. By sharing our experience and providing aframework for variant assessment, including access to a freely availablecustomizable tool, we hope to help move towards standardized andconsistent approaches to variant assessment.
Conflict of interest
H. D., J. S., H. M., M. A. K., T. J. P., B. H. F., H. L. R. and M. S. L. areemployed by fee-for-service laboratories performing clinical sequencingservices. Several individuals serve on advisory boards or in othercapacities for companies providing sequencing or other genetic services(H. L. R. – BioBase, Clinical Future, Complete Genomics, GenomeQuest,Illumina, Ingenuity, Knome, Omicia; B. F. – InVitae; J. S. – LabCorp).
H Duzkalea,b,†, J Shena,b,c,†,H McLaughlina,b, A Alfaresa,b,d,MA Kellyb, TJ Pughb,c,BH Funkeb,c, HL Rehmb,c
and MS Lebob,c
aHarvard Medical School GeneticsTraining Program, Boston, MA, USA,bLaboratory for Molecular Medicine,Partners HealthCare Center forPersonalized Genetic Medicine,Cambridge, MA, USA, cDepartment ofPathology, Brigham and Woman’sHospital, Massachusetts GeneralHospital, and Harvard Medical School,Boston, MA, USA, and dDepartment ofPediatrics, Qassim University, Buraydah,Saudi Arabia
†These authors contributed equally to thiswork.
Key words: (4−9) clinical interpretation– gain-of-function (GOF) – geneticvariant – loss of function (LOF) –next-generation sequencing (NGS) –sequence analysis – variantassessment – variant of uncertainsignificance (VUS)
Corresponding author: Matthew S.Lebo, 65 Landsdowne Street,Cambridge, MA 02139, USA.Tel.: 617 768 8292;fax: 617 768 8513;e-mail: [email protected]
Received 7 July 2013, revised andaccepted for publication 19 August2013
Molecular genetic testing informs medical decisionmaking in the diagnosis of symptomatic individuals, inthe prediction of disease risk, in reproductive geneticcounseling, and in determining pharmacogenetic pro-files for treatment guidance. Until recently, the major-ity of clinically available molecular genetic tests haveeither analyzed known DNA variants, such as cysticfibrosis carrier screening panels (1), or sequenced thecoding regions and splicing boundaries of a limited set
of well-known disease-associated genes. Recent techno-logical advances in low-cost, high-throughput sequenc-ing and computing have enabled testing for targetedpanels of >100 disease-area genes, as well as exomesand genomes. While these next-generation sequenc-ing (NGS) technologies have increased diagnostic sen-sitivity (2, 3), the number of genetic variants withuncertain clinical significance (VUS) per test has alsoincreased. For example, expanding testing for dilated
453
H. Duzkale et al.
cardiomyopathy from 5 to 46 genes in our laboratoryresulted in a threefold increase in clinical sensitivitybut an even more dramatic increase in inconclusivecases, many with multiple VUSs. In addition, exomeand genome sequencing tests add a new layer of com-plexity, as the genes interrogated may not have beencarefully assessed for their role in disease until variantsare identified.
Although molecular genetic testing has a uniqueplace in the diagnosis, management, and preventionof genetic disorders, the field is compromised by theabsence of a standard, comprehensive, and efficientvariant assessment protocol approved and shared bythe community. However, guidelines for variant inter-pretation are available and being updated as variant-level knowledge expands, including those from theAmerican College of Medical Genetics and Genomics(ACMG) (4–7). To supplement these guidelines andcapture the evolving state of the field, we developed avariant assessment tool (VAT) that systematically evalu-ates multiple parameters for each variant and facilitatesthe capture of new knowledge in the literature anddatabases (Appendix S1).
The clinical significance of a variant in relation to adisease or phenotype can be determined by answeringthree core questions. (i) Does the variant alter thefunction of the gene [i.e. loss-of-function (LOF) orgain-of-function (GOF)]? (ii) Can the functional changeresult in disease or another phenotype? (iii) Is theassociated disease or phenotype relevant to the specificclinical condition present in the tested individual? Insome cases variant assessment in a clinical laboratorymay only be focused on the first two questions;however, for maximal benefit to the patient, a carefulassessment of the third dimension can be highlyinformative, particularly for VUSs. Here, we shareour decade-long experience with variant assessment,highlighting key points and challenges of clinical inter-pretation. We have evaluated 245 genes associated with53 diseases while testing greater than 22,000 cases. We
Fig. 1. Variant assessment workflow. Genetic variants identified by laboratory testing are annotated with information from various sources includingpublications, computational prediction algorithms, and public, collaborative and internal databases. After evaluation of all pertinent information inconjunction with patient specific clinical and family information, a professionally trained individual will classify the variant into one of the fiveclinical categories and combine all variants for a clinical report.
have iteratively developed a framework through clinicalassessments of over 17,000 variants, including >8000that have been validated and reported in patients. Usingour semi-automated tool, it takes on average 40 minto perform a thorough evidence-based clinical variantassessment for variants being returned after disease-targeted testing. When assessments with literature areexcluded, the average time decreases to 22 min. Anoverview of the process is presented in Fig. 1 and usefulonline resources are presented in Table 1. In addition,the approaches described below refer to the providedVAT with corresponding SOP available for downloadthrough Appendices S1–S3.
Linking genes to disease
As a first step in variant assessment, it is necessaryto determine which disease phenotypes are associatedwith a gene and which types of variation may resultin clinically relevant consequences. This includes thetypes of variants that are known to cause diseasein the gene (truncating/LOF, non-truncating, etc.), theinheritance patterns observed for variants in the gene,the protein domains that are implicated in disease, andany genotype–phenotype correlations described.
To characterize disease phenotypes, it is importantto review the literature for common clinical features aswell as phenotypic variation among affected individ-uals. Large cohort studies may provide expressivity,age-of-onset, penetrance, and prevalence information,while detailed reports of families with multiple affectedindividuals help determine the mode of inheritanceand strength of association. Comparison of the variantspectrum in affected individuals against that in thegeneral population may be useful in identifying thetypes of mutations that are disease-causing. Forinstance, heterozygous LOF variants in MYBPC3have been reported in 14% (311/2302) of patientswith hypertrophic cardiomyopathy (HCM) testedin our laboratory, but in <0.1% (6/6500) of the
454
A systematic approach to assessing variant clinical significance
Table 1. Useful online resources for variant assessment
Usage Online tools Url
Computationalprediction formissense variants
Align GVGD (42) http://agvgd.iarc.fr/agvgd_input.php/
CONDEL http://bg.upf.edu/condel/analysis/MutationAssessor http://mutationassessor.org/MutationTaster http://www.mutationtaster.org/PolyPhen2 (43) http://genetics.bwh.harvard.edu/pph2/SIFT (44) http://sift.jcvi.org/
Computationalprediction forsplicing variants
GeneSplicer http://ccb.jhu.edu/software/genesplicer/
Human Splicing Finder http://www.umd.be/HSF/MaxEntScan http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.htmlNNSplice http://www.fruitfly.org/seq_tools/splice.html
Disease curation GeneReviews http://www.ncbi.nlm.nih.gov/books/NBK1116/OMIM http://omim.org/
Domain database NCBI conserved domaindatabase
http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
Genome Browser Ensembl http://www.ensembl.org/index.htmlUCSC Genome Browser http://genome.ucsc.edu/
Literature database PubMed http://www.ncbi.nlm.nih.gov/pubmedVariant database 1000 Genomes Project http://browser.1000genomes.org
ClinVar http://www.ncbi.nlm.nih.gov/clinvar/dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/Exome Variant Server (9) http://evs.gs.washington.edu/EVS/HGMD http://www.hgmd.cf.ac.uk/ac/index.php
Variant validation HGVS nomenclature http://www.hgvs.org/mutnomen/Mutalyzer https://mutalyzer.nl/
general population per the NHLBI Exome Sequenc-ing Project (ESP), supporting that LOF MYBPC3variants are a common mechanism in HCM (8).However, external information must be carefullyvetted. An apparent frameshift variant in MYBPC3NM_000256:c.2854_2858del reported to occur in 7%of the general population in ESP is likely a technicalartifact, as we have never observed it sequencing theregion by NGS and/or Sanger in over 2000 cases.
Different types of variants in the same gene may beassociated with distinct phenotypes or inheritance pat-terns. For example, missense GOF variants in PTPN11cause RASopathies, such as Noonan syndrome, whereasLOF variants lead to an entirely different phenotype, acartilage tumor syndrome (metachondromatosis) char-acterized by enchondromas and exostoses (9). Certainmissense variants in TECTA lead to autosomal dominanthearing loss (10), whereas LOF variants result in auto-somal recessive hearing loss (11). Similarly, variants indifferent regions or domains of a gene may cause differ-ent phenotypes (10, 12). Important questions to considerwhen analyzing gene–disease associations and specificvariants within a gene can be found in Table 2.
Validating variants to ensure accuracy
As test complexity has increased, so has the need toensure variants identified and included on a clinicalreport are technically accurate. This is especiallyimportant for sequencing tests where the variants are
not part of a pre-defined list. It is essential to reviewraw assay results (e.g. chromatographs of Sangersequencing traces or NGS reads) to verify the variantsand their nomenclature. Prior to variant assessment,the laboratory should predefine the genome build,gene name, and reference transcript that will be usedin interpretation and reporting, along with a methodlinking genomic coordinates to cDNA and amino acidlevel annotations. Laboratories should also be awareof homologous and repetitive regions particularlyfrom pseudogenes and segmental duplications, whichmay result in lack of coverage, alignment difficulties,and incorrect variant calls. These steps will enablevalidation of the correct variant call, zygosity andnomenclature according to the Human Genome Vari-ation Society (HGVS) guidelines (13). Validationinformation is captured in the ‘Variant’ tab of the VAT.
Because standards for variant nomenclature haveonly recently been widely adopted and still do notaddress all modifications, variants may have differingnames in publications and databases. The amino acidposition may not be numbered according to the startcodon to be consistent with current recommendations.For example, TTR variants were originally numberedaccording to the position within the mature proteinlacking the 20 amino acid signal peptide (14). Partialcloning of a gene may have led to inconsistentnomenclature in early publications (15, 16). Nucleotidegene numbering may have been determined using thetranscription start site instead of the translation start site,
455
H. Duzkale et al.
Table 2. Variant assessment checklist
Gene-level informationConfirm gene is implicated in disease with sufficient evidence, including human genetic data and functional dataDetermine inheritance pattern, age-of-onset, penetrance and prevalence for each gene-disease association, ifpossibleDetermine types of disease-associated variants in gene (gain-of-function, loss-of-function, etc.)
Variant validationReview raw sequence data to confirm the variant callDetermine zygosity of the variantAssociate genome build, genomic coordinate, and reference transcript to the variantConfirm variant nomenclature
Genetic dataDetermine frequency of variant in large population studies, parsed by raceDetermine if population frequency is consistent with disease inheritance, age-of-onset, penetrance and prevalenceEvaluate whether variant segregates with disease in affected family membersIf disease is inherited in a recessive manner, determine if the variant is found in trans with a pathogenic variantIf applicable, determine if there is a statistically significant difference in variant frequency between cases andcontrols
Functional dataEvaluate available in vivo functional data
Confirm type of animal model is relevant for human diseaseEvaluate available in vitro functional data
Confirm assays used reflect disease-associated cellular mechanismsComputational data
Evaluate nucleotide alignment data and assess evolutionary conservation (for all variants)Evaluate amino acid alignment data and assess evolutionary conservation (for missense variants)
Eliminate any poor species alignmentsDetermine if computational tools predict an effect on protein structure or splicing (for all variants)
which was particularly challenging given transcriptionalstart site variability. Furthermore, for many smallinsertions and deletions, it is not possible to determinethe exact location of the inserted or deleted base(s).This can lead to multiple potential names for thesame variant, highlighting the importance for followingstandard HGVS nomenclature rules such as attributingalternations within a repetitive stretch to the most 3′possible position. Legacy terms and alternative aliasesare useful to maintain association with the correctlynamed variant both to facilitate searching the literatureand databases, as well as communicating with orderingphysicians and other laboratories.
Genes may have multiple transcripts, some ofwhich are tissue-specific and associated with distinctphenotypes. For example, the shorter USH1C transcript(NM_005709) is expressed in both the retina and innerear, whereas the longer transcript (NM_153676) isexpressed exclusively in the inner ear (17). Accord-ingly, variants in exons of shorter transcript lead toUsher syndrome type 1C, characterized by profounddeafness, retinitis pigmentosa, and vestibular dys-function, whereas variants in additional exons uniqueto NM_153676 lead to non-syndromic hearing loss(18). Variants should be reported according to a singleprimary transcript. The reported reference is typicallythe major transcript unless a more severe impact ispredicted on an alternative transcript, in which case thevariant should be defined according to the alternative
transcript, noting an alias to the primary transcript(Fig. 2a).
When multiple variants in the same gene areidentified, the phase of the variants (i.e. on thesame chromosome – in cis – or on homologouschromosomes – in trans) may influence the interpre-tation, especially for autosomal recessive traits. Ifvariants are within the same NGS fragment, the phasemay be determined without parental samples (Fig. 2b).
Collecting evidence to determine the likelihood ofpathogenicity
Once the variant call is validated, literature, variantdatabases, and population control studies should beevaluated. This information is used to determinewhether and under what context the variant has beenpreviously observed. The population, literature andinternal case data are captured in the ‘Control_Freq’,‘DB’ and ‘Publ+Internal_data’ tabs of the VAT.
Recent large-scale population studies such as theNHLBI Exome Sequencing Project (9), the 1000Genomes Project (19), the ClinSeq Project (20) andothers found in dbSNP (21) have catalogued largeamounts of sequence variation (Table 1). Because thesepopulations may include presymptomatic individualswith late onset diseases, asymptomatic individualswith low penetrance diseases or younger than typicalage-of-onset, and heterozygous carriers of recessive
456
A systematic approach to assessing variant clinical significance
Fig. 2. (a) Reference transcript selection. Two transcripts for OTOF are shown: NM_194248, the longest transcript selected as the primarytranscript, and NM_194322, a shorter alternate transcript. Position g.26700697 (grey box) is non-coding in NM_194248 (c.2215-80) but codingin NM_194322 (c.65); therefore NM_194322 should be selected while evaluating this variant. Adapted from Alamut® software (InteractiveBiosoftware, http://www.interactive-biosoftware.com) (b) Phasing multiple variants. Two variants are present at positions c.2401 and c.2402 inOTOF (NM_194248). The top traces show the chromatographs from Sanger sequencing with the consensus reference sequence shown underneath.Representative aligned NGS reads are shown below. Grey bars represent reference sequence with variants highlighted in red. The bottom schematicshows associated OTOF coding exons (rectangles) and the reference amino acid sequence. The arrow indicates the 5′ to 3′ direction. The c.2401G>T(p.Glu801*) is listed in dbSNP (rs75624587) and ESP as a nonsense variant but with an allele frequency of 10% in African Americans. However,NGS reads reveal that the variants are in cis and should therefore be named c.2401_2402delinsTT (p.Glu801Leu). (c) Segregation analysis withincomplete penetrance. A family with hypertrophic cardiomyopathy is shown. Affected individuals are indicated by filled squares (males) or circles(females). Mutation-positive individuals are indicated by a ‘+’, while mutation-negative individuals are indicated by a ‘−’. All mutation-positiveindividuals are affected, with the exception of individual II-1. Because HCM can display reduced penetrance, individual II-1 would not be considereda non-segregation. (d) Conservation based on multiple species alignment. An example of alignment of TNNC1 in UCSC Genome Browser is shown.Tree shrew sequence shows poor alignment (red box). Arrows point to non-conserved residues (45). Adapted from http://genome.ucsc.edu usinghg19 reference sequence. (e) Conflicting computational predictions of a missense variant. The results of multiple computational tools are capturedin the VAT. They provide conflicting predictions for the NM_000366:c.688G>A (p.Asp230Asn) variant in TPM1 , suggesting that at least sometools are not reliable. (f) Predicted splicing effect of a coding variant. Splicing prediction tools indicate that the NM_022124:c.5712G>A variant inCDH23 , which affects the last base in exon 43, may impact splicing. However, not all programs agree in the potential effect on splicing, and theycannot predict whether it would lead to exon skipping, intron retention or use of cryptic splice sites. Adapted from Alamut® software (InteractiveBiosoftware, http://www.interactive-biosoftware.com)
457
H. Duzkale et al.
Fig. 2. Continued.
traits, variants should not be assumed benign simplybecause of their presence in large population studies.Information on affected individuals with the variant canbe obtained from internal and public variant databases(e.g. ClinVar, HGMD (22) or locus-specific databases),as well as from the literature. Public variant databasesare of varying quality and may be outdated or containcontradictory data. Recent studies have demonstrateda large number of false-positive variants incorrectlyidentified as clinically relevant in these databases(23–27). Therefore, databases available today shouldbe used to identify relevant primary literature ratherthan directly reference a variant classification.
For Mendelian disorders, the pathogenicity ofa variant can be ruled out if its frequency in thegeneral population exceeds what can be accountedfor by inheritance pattern, age-of-onset, prevalence,penetrance, and heterogeneity. Large sample sizeswithout selection bias towards individuals with diseasephenotypes are required to achieve confidence inestimating the population allele frequency. Moreover,
disease prevalence is not always known, accurate, orapplicable across all populations. Because one affectedallele is sufficient to cause an autosomal dominant trait,a pathogenic allele must present at a frequency lowerthan the disease prevalence in the general population.HCM is primarily an autosomal dominant conditionoccurring in 1 in 500 individuals (1/1000 chromosomesor 0.1% allele frequency) (28). We consider a variantlikely benign if the allele frequency is >0.3% which isa conservative 1.5 times above the highest frequencyexpected even if penetrance was only 50% and thedisease was due to one pathogenic variant. In contrast,both paternal and maternal alleles need to be affectedto cause an autosomal recessive disorder. The heterozy-gous carrier frequency of any pathogenic allele must beless than twice the square root of the disease prevalence(which is the hypothetical allele frequency if only onedisease allele accounts for all cases). For example, theprevalence of congenital hearing loss with a geneticetiology is roughly 1 in 1000 and half of these casesare due to GJB2 variants. Therefore, the estimated
458
A systematic approach to assessing variant clinical significance
prevalence of GJB2 -related hearing loss is 1 in 2000.Accordingly, pathogenic variants in GJB2 are expectedto occur no more than 4% in the general population.It is not surprising that the carrier frequency for thec.35delG variant in GJB2 could be as high as 2% (29).Population data pertaining to a specific ethnic composi-tion are particularly useful. The 1000 Genomes Projecthas revealed many variants common in certain ethnicgroups, but rare in general (30). If a subpopulation doesnot have an increased occurrence of the associated dis-ease and affected individuals are not under-diagnosed,variant classification based on the allele frequency inthe subpopulation can be applied more broadly.
While a high allele frequency in the general popula-tion may rule out pathogenicity of a variant for a raredisorder, absence or a very low frequency of a vari-ant in the broad population cannot be used to assumepathogenicity. While coding variants below 1% allelefrequency in the seven populations examined by the1000 Genomes Project are enriched for functional vari-ants (31), lack of a variant from population datasetscannot be used to assume absence from the popula-tion unless it is determined that the study technicallyinterrogated the position sufficiently to rule out a poten-tial false-negative result. A variant is statistically morelikely pathogenic if it occurs in affected individualsmore than expected by chance. The likelihood of ran-dom occurrence can be calculated as the probability ofco-incidence of rare events, as the logarithm of odds(LOD) score through linkage analysis or as p-valuesthrough case–control studies using a Fisher’s exact orchi-square test. Low probabilities of co-incidence sta-tistically demonstrate non-random occurrences of thevariant in affected individuals.
The presence of de novo variants may support dis-ease association due to their rarity. The de novo pointmutation rate is ∼1 per exome (32), consistent withan average rate of 1.2 × 10−8 per nucleotide per gen-eration in human genome (33). Therefore, confirmedde novo status of a variant in a disease-associatedgene strongly increases the likelihood of pathogenic-ity in rare conditions when the patient’s disease isde novo and matches the associated phenotypes. Testingof biological parents and excluding the possibilities ofnon-paternity and sample swap (e.g. genotyping withmicrosatellite markers) are necessary for confirmationof de novo variants. Similarly, for rare recessivedisorders, if a rare variant is confirmed in trans withanother pathogenic variant in the same disease gene, itis more likely pathogenic.
Significant co-segregation of a variant with diseaseprovides strong genetic linkage evidence to supportpathogenicity. Linkage analysis programs can be usedto calculate the LOD scores, but a simple count ofinformative segregations can provide an estimate. Asa rule of thumb, 10 informative segregations wouldachieve a LOD score > 3.0, necessary to establishlinkage between a genetic locus and a disease. Forestablished disease genes, given the a priori proba-bility of disease association, fewer informative segre-gations may be acceptable in combination with other
supporting evidence. Because genotype-phenotype cor-relation may be masked by incomplete penetrance, vari-able expressivity, and late age-of-onset in genotype-positive individuals, unaffected family members shouldnot contribute segregation information under these cir-cumstances (34) (Fig. 2c). Additional evidence may stillbe required to establish pathogenicity, as any variant inlinkage disequilibrium with the causative variant willsegregate with the disease. For example, the Ile148Thrvariant in CFTR was removed from the original cysticfibrosis carrier-testing panel because it was later deter-mined its association with the disease was due to tightlinkage with another pathogenic variant (35, 36).
Functional evidence that links the variant to diseasephenotypes is important to establish causality. How-ever, this information is typically unavailable for indi-vidual variants in routine diagnostic testing. When stud-ies regarding a specific variant have been published, itis important to determine the type of assay used andwhether the results and conclusions drawn are applica-ble to the mechanism and presentation of the disease.In general, direct assays on patient tissues provide thestrongest functional evidence because they reveal truebiological consequences of a variant within a humanindividual. In vivo studies in mammals may add moreevidence at the system level. In vitro studies can be use-ful, especially in cases where the in vitro assay directlytests an established molecular mechanism of disease[e.g. structural proteins or ion channels (37)], but maynot accurately represent the biological environment ordirectly prove causation of disease.
In summary, population, statistical and functionalevidence need to be carefully evaluated to determinethe clinical significance of a variant. Table 2 lists someimportant considerations when collecting this data.
Predicting disease association using bioinformaticstools
If the evidence for disease association from existingdata is not strong or the mechanism of gene function isunclear, a number of bioinformatics tools may be usedto predict the possible impact of the variant on the geneor protein. Computational predictions are generallybased on the type of change, the domain structure,sequence conservation, and biochemical propertiesof the affected amino acid residues. Computationalinformation is captured in the ‘Conserv_Biochem’ and‘Splicing’ tabs of the VAT.
At both the nucleotide and amino acid level, sequenceconservation may indicate regions and positions offunctional importance, as negative selection removeschanges that are deleterious to proper biological func-tion, leading to high evolutionary conservation (38).Computationally derived alignments can indicate whena specific sequence is important to the underlying geneor protein function (Fig. 2d). Conversely, presence ofthe variant amino acid in other species, particularlyprimates and other mammals, may indicate a toleranceto that change.
459
H. Duzkale et al.
Tabl
e3.
Exa
mpl
esof
com
bini
ngev
iden
cefro
mm
ultip
leso
urce
sto
dete
rmin
epa
thog
enic
itya
Exa
mpl
eru
les
for
varia
ntcl
assi
ficat
ion
Gen
efir
mly
asso
ciat
edw
ithpa
tient
dise
ase
Var
iant
type
asso
ciat
edw
ithpa
tient
dise
ase
Seg
rega
tion
(#in
form
ativ
em
eios
es)
Co-
occu
rren
cein
tran
sw
ithpa
thog
enic
varia
nts
(rece
ssiv
edi
seas
e)A
min
oac
idco
nser
vatio
n
Freq
uenc
yin
cont
rolc
hrom
osom
esor
heal
thy
popu
latio
nFu
nctio
nal
data
Insi
lico
(com
puta
tiona
l)da
ta
Pat
hoge
nic
Sig
nific
ants
egre
gatio
nY
Y≥1
0S
tron
gse
greg
atio
n+
addi
tiona
lda
taY
Y5
–9C
onse
rved
thro
ugh
bird
s≤1
/500
0(0
.02%
)Y
Mul
tiple
co-o
ccur
renc
esw
ithpa
thog
enic
varia
ntin
tran
s(re
cess
ive
dise
ase)
YY
5+C
onse
rved
thro
ugh
bird
s≤1
/500
0(0
.02%
)Y
LOF
varia
nt(5
′ ofl
ast5
0ba
ses
ofpe
nulti
mat
eex
on)
YY
Like
lypa
thog
enic
Var
iant
atlo
wfre
quen
cyw
ithst
rong
invi
vofu
nctio
nald
ata
YY
Con
serv
edth
roug
hbi
rds
≤1/5
000
(0.0
2%)
Y
De
novo
varia
ntin
patie
ntw
ithde
novo
dise
ase
YY
Str
ong
segr
egat
ion
WIT
HO
UT
cont
rold
ata
YY
5–9
Con
serv
edth
roug
hbi
rds
N/A
Mod
erat
ese
greg
atio
nY
Y3
–4C
onse
rved
thro
ugh
bird
s≤1
/500
0(0
.02%
)
Pat
ient
carr
ies
one
path
ogen
icva
riant
AN
Dse
cond
varia
ntin
tran
s(re
cess
ive
dise
ase)
YY
YC
onse
rved
thro
ugh
bird
sS
uppo
rtiv
e
VU
Sb
–fa
vor
path
ogen
icM
inim
umse
greg
atio
nY
Y≤2
Con
serv
edth
roug
hbi
rds
Sup
port
ive
Oth
erva
riant
sat
posi
tion
know
nto
bepa
thog
enic
YY
Con
serv
edth
roug
hbi
rds
≤1/5
000
(0.0
2%)
Sup
port
ive
LOF
varia
ntw
hen
LOF
noty
etan
esta
blis
hed
dise
ase
mec
hani
sm
YN
VU
Sb
Con
flict
ing
info
rmat
ion
Gen
eor
varia
ntty
peno
tpr
evio
usly
asso
ciat
edw
ithte
sted
dise
ase
NN
Sile
ntva
riant
,abs
entf
rom
larg
era
ce-m
atch
edco
ntro
lAN
Dpr
edic
ted
effe
cton
splic
ing
AN
Dsp
lice
varia
nts
know
nty
peof
path
ogen
icva
riant
YY
≤1/5
000
(0.0
2%)
Sup
port
ive
Var
iant
dete
cted
inco
ntro
lor
heal
thy
indi
vidu
als
atlo
wfre
quen
cy
Low
frequ
ency
c
460
A systematic approach to assessing variant clinical significance
Tabl
e3.
Con
tinue
d
Exa
mpl
eru
les
for
varia
ntcl
assi
ficat
ion
Gen
efir
mly
asso
ciat
edw
ithpa
tient
dise
ase
Var
iant
type
asso
ciat
edw
ithpa
tient
dise
ase
Seg
rega
tion
(#in
form
ativ
em
eios
es)
Co-
occu
rren
cein
tran
sw
ithpa
thog
enic
varia
nts
(rece
ssiv
edi
seas
e)A
min
oac
idco
nser
vatio
n
Freq
uenc
yin
cont
rolc
hrom
osom
esor
heal
thy
popu
latio
nFu
nctio
nal
data
Insi
lico
(com
puta
tiona
l)da
ta
VU
Sb
–fa
vor
beni
gnV
aria
ntam
ino
acid
pres
ent
inm
ultip
lem
amm
als
Mul
tiple
mam
mal
sw
ithva
riant
Var
iant
notc
onse
rved
and
pres
enti
nhe
alth
yin
divi
dual
s
Not
cons
erve
dLo
wfre
quen
cyc
Var
iant
pred
omin
antly
dete
cted
inm
inor
ityra
ce,
noco
ntro
ldat
a,an
dco
mpu
tatio
nald
ata
pred
icts
beni
gn
N/A
Pre
dict
beni
gn
Like
lybe
nign
Mis
sens
eO
Rsp
lice
cons
ensu
sou
tsid
e±1
,2(−
3,−5
to−1
5,O
R+3
to+6
)
MA
Fd≥
0.3%
AN
D<
1%A
ND
≥10
alle
les
Sile
ntva
riant
orin
tron
icva
riant
outs
ide
splic
eco
nsen
sus
(−4
OR
+7to
+15)
MA
Fd<
1%
Ben
ign
Mis
sens
eO
Rsp
lice
cons
ensu
sou
tsid
e±1
,2(−
3,−5
to−1
5,O
R+3
to+6
)
MA
Fd≥
1%A
ND
≥30
alle
les
Sile
ntva
riant
orin
tron
icva
riant
outs
ide
splic
eco
nsen
sus
(−4
OR
+7to
+15)
MA
Fd≥
1%A
ND
≥5
alle
les
aE
xam
ples
ofpa
thog
enic
itycl
assi
ficat
ion
com
bine
mul
tiple
sour
ces
ofev
iden
ceof
vario
usst
reng
th.
Str
engt
hof
evid
ence
isba
sed
onth
eco
rrel
atio
nof
the
type
ofev
iden
cew
ithth
eac
cura
cyof
varia
ntcl
assi
ficat
ion.
Dire
ctev
iden
cew
ithsu
ffici
ent
stat
istic
alpo
wer
orfro
mpr
oper
biol
ogic
alex
perim
ents
isde
emed
‘str
ong’
,w
hile
supp
ortiv
est
udie
sor
insi
lico
pred
ictio
nsth
atca
nnot
prov
etr
uebi
olog
ical
cons
eque
nces
are
cons
ider
edm
oder
ate
orw
eak.
bV
US
,Var
iant
ofU
ncer
tain
Sig
nific
ance
.cLo
wfre
quen
cy,F
requ
ency
thre
shol
dsde
pend
upon
dise
ase
mod
eof
inhe
ritan
ce,p
enet
ranc
e,an
dpr
eval
ence
.See
text
for
mor
ede
tails
.dM
AF,
Min
oral
lele
frequ
ency
.
461
H. Duzkale et al.
Many algorithms are available to classify missensesubstitutions and potential splicing alterations (Fig. 2e).Use of multiple prediction algorithms is recommended.Because most of the programs use similar underlyingdatasets and assumptions, they should not be regardedas independent evidence, though some may includeadditional features. The datasets used for training thealgorithms are mostly from non-clinical grade databasesthat may not be accurate or comprehensive. Diseasespecific algorithms can be applied to a specific set ofgenes with significantly enhanced performance (39, 40),though these are limited in availability. Predicting theeffect of variants occurring near the splice region canbe particularly challenging as it is often unclear whatkind of abnormal transcript may be produced (Fig. 2e).
In summary, although computational predictions areuseful in guiding classification, they are not able todetermine or rule out pathogenicity. Table 2 addressesspecific questions for consideration when examiningcomputational data.
Combining multiple lines of evidence to reach anoverall interpretation
Final interpretation of the clinical significance ofa variant requires examination of all the availableevidence. While some data can be strong enough todetermine or rule out pathogenicity, most informationonly moderately influences final conclusions and isvaluable in combination. Table 3 lists examples of howa laboratory may combine different types of availableevidence into a clinical classification scheme.
For instance, a synonymous variant in exon 16of TECTA, NM_005422.2: c.5331G>A p.Leu1777Leu,may not be expected to be pathogenic because it doesnot alter the amino acid. However, it is predictedto lead to loss of an exonic splice enhancer bindingsite, has not been reported in large population studies,and has been reported to segregate with disease in10 affected family members with autosomal dominanthearing loss (41). In addition, examination of mRNAfrom patient lymphocytes revealed skipping of exon16, leading to an in-frame deletion in the aminoacid sequence. Protein impairment, but not total LOF,is associated with TECTA-related autosomal dominanthearing loss, consistent with this prediction. Thisexample demonstrates the importance of evaluatingclinical data as well as functional evidence to makea definitive classification.
Conclusions and future perspectives
Variant assessment has become the bottleneck of largescale sequencing tests. Using the VAT described herehas served to decrease the average time of variantassessment in our laboratory to 22 min by utilizinghyperlinks to perform database and literature searchesand providing a platform to compile, analyze, andinterpret variant data. However, it may take longerthan 2 h if a large collection of literature needs to bereviewed. Large gene panels may produce >10 variants
that need review, and even after filtration strategies,exome and genome sequencing may produce 100s ofvariants. Further automation to retrieve relevant variantinformation directly from the literature and databaseswill speed the process. Clinically validated predictionalgorithms trained on variants with well-establishedpathogenic or benign classifications (39) will improvethe accuracy of computational prediction. Routine andstandardized functional assays will provide necessaryevidence to classify VUSs, but it is challengingto establish and support these assays in clinicaldiagnostics laboratories.
Information sharing and collaboration amongst lab-oratories will reduce the number of unique assess-ments performed. ClinVar, a recent NCBI initiativeaiming to share clinical-grade variant information, isexpected to support the molecular diagnostics commu-nity through genotype–phenotype associations aided byactual patient data. This may in turn inspire and acceler-ate the development of automated diagnostic predictionalgorithms. Software is currently being developed tosupport the aggregation of internal and external variantinformation to enable sharing of clinical grade variantdata between different laboratories without jeopardizingpatient identity (Table 1). Collectively, these approacheswill greatly facilitate variant classification.
Conventionally, each clinical laboratory has had theliberty to develop, validate and perform diagnostic testsfollowing recommendations by national or internationalagencies such as ACMG, CAP, CLIA, CLSI, EMNQand WHO. Although proficiency testing has addressedthe consistency in raw test output between differentlaboratories, there is still lack of agreement in variantassessment procedures and parameters, as well as finalclassification criteria. ACMG has provided guidelinesfor variant assessment (4, 7), but a consensus structuredframework ensuring evidence-based classifications thatcan be easily adopted by individual laboratories is cur-rently missing. Working groups have been formed toaddress this issue and are modifying the current variantclassification guidelines into a consensus variant grad-ing system based on the feedback from the community(ACMG 2013 Interpreting Sequencing Variants OpenForum). We hope that the framework provided aboveand the attached variant assessment tool, in combina-tion with the consensus guidelines, serves as a usefulmechanism in the clinical variant interpretation process.
Supporting Information
The following Supporting information is available for this article:
Appendix S1. Variant Assessment ToolAppendix S2. Variant Assessment SOPAppendix S3. Variant Assessment Static Data
Additional Supporting information may be found in the onlineversion of this article.
Acknowledgements
We thank Jordan Lerner-Ellis, Sami Amr, and Mark Bowser fortheir help in developing and maintaining the VAT. We also thank
462
A systematic approach to assessing variant clinical significance
all of our colleagues past and present at the LMM for theircontributions to our variant assessment process over the pastdecade. This work was supported in part by National Institutesof Health grants HG006834 and HG006500.
References1. Richards CS, Bradley LA, Amos J et al. Standards and guidelines for
CFTR mutation testing. Genet Med 2002: 4: 379–391.2. Huang T. Next generation sequencing to characterize mitochondrial
genomic DNA heteroplasmy. Curr Protoc Hum Genet 2011: Chapter19: Unit 19.8.
3. Valencia CA, Ankala A, Rhodenizer D et al. Comprehensive mutationanalysis for congenital muscular dystrophy: a clinical PCR-basedenrichment and next-generation sequencing panel. PLoS One 2013:8: e53083.
4. Richards CS, Bale S, Bellissimo DB et al. ACMG recommendationsfor standards for interpretation and reporting of sequence variations:revisions 2007. Genet Med 2008: 10: 294–300.
5. Kearney HM, Thorland EC, Brown KK, Quintero-Rivera F, SouthST, Working Group of the American College of Medical GeneticsLaboratory Quality Assurance. American College of Medical Geneticsstandards and guidelines for interpretation and reporting of postnatalconstitutional copy number variants. Genet Med 2011: 13: 680–685.
6. Plon SE, Eccles DM, Easton D et al. Sequence variant classification andreporting: recommendations for improving the interpretation of cancersusceptibility genetic test results. Hum Mutat 2008: 29: 1282–1291.
7. Rehm HL, Bale SJ, Bayrak-Toydemir P et al. ACMG clinical laboratorystandards for next-generation sequencing. Genet Med, 2013: 15:733–747.
8. Niimura H, Bachinski LL, Sangwatanaroj S et al. Mutations in thegene for cardiac myosin-binding protein C and late-onset familialhypertrophic cardiomyopathy. N Engl J Med 1998: 338: 1248–1257.
9. Sobreira NL, Cirulli ET, Avramopoulos D et al. Whole-genomesequencing of a single proband together with linkage analysis identifiesa Mendelian disease gene. PLoS Genet 2010: 6: e1000991.
10. Verhoeven K, Van Laer L, Kirschhofer K et al. Mutations in the humanalpha-tectorin gene cause autosomal dominant non-syndromic hearingimpairment. Nat Genet 1998: 19: 60–62.
11. Mustapha M, Weil D, Chardenoux S et al. An alpha-tectorin gene defectcauses a newly identified autosomal recessive form of sensorineuralpre-lingual non-syndromic deafness, DFNB21. Hum Mol Genet 1999:8: 409–412.
12. Balciuniene J, Dahl N, Jalonen P et al. Alpha-tectorin involvement inhearing disabilities: one gene--two phenotypes. Hum Genet 1999: 105:211–216.
13. Taschner PE, den Dunnen JT. Describing structural changes byextending HGVS sequence variation nomenclature. Hum Mutat 2011:32: 507–511.
14. Mita S, Maeda S, Shimada K, Araki S. Cloning and sequence analysisof cDNA for human prealbumin. Biochem Biophys Res Commun 1984:124: 558–564.
15. Joensuu T, Hamalainen R, Yuan B et al. Mutations in a novel genewith transmembrane domains underlie Usher syndrome type 3. Am JHum Genet 2001: 69: 673–684.
16. Fields RR, Zhou G, Huang D et al. Usher syndrome type III: revisedgenomic structure of the USH3 gene and identification of novelmutations. Am J Hum Genet 2002: 71: 607–617.
17. Verpy E, Leibovici M, Zwaenepoel I et al. A defect in harmonin, aPDZ domain-containing protein expressed in the inner ear sensory haircells, underlies Usher syndrome type 1C. Nat Genet 2000: 26: 51–55.
18. Ouyang XM, Xia XJ, Verpy E et al. Mutations in the alternativelyspliced exons of USH1C cause non-syndromic recessive deafness. HumGenet 2002: 111: 26–30.
19. Abecasis GR, Altshuler D, Auton A et al. A map of humangenome variation from population-scale sequencing. Nature 2010: 467:1061–1073.
20. Biesecker LG, Mullikin JC, Facio FM et al. The ClinSeq Project:piloting large-scale genome sequencing for research in genomicmedicine. Genome Res 2009: 19: 1665–1674.
21. Sayers EW, Barrett T, Benson DA et al. Database resources of theNational Center for Biotechnology Information. Nucleic Acids Res2012: 40: D13–D25.
22. Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN. TheHuman Gene Mutation Database (HGMD) and its exploitation in thefields of personalized genomics and molecular evolution. Curr ProtocBioinformatics 2012: Chapter 1: Unit 1.13.
23. Andreasen C, Nielsen JB, Refsgaard L et al. New population-based exome data are questioning the pathogenicity of previouslycardiomyopathy-associated genetic variants. Eur J Hum Genet, 2013:21: 918–928.
24. Bell CJ, Dinwiddie DL, Miller NA et al. Carrier testing for severechildhood recessive diseases by next-generation sequencing. Sci TranslMed 2011: 3: 65ra4.
25. Xue Y, Chen Y, Ayub Q et al. Deleterious- and disease-alleleprevalence in healthy individuals: insights from current predictions,mutation databases, and population-scale resequencing. Am J HumGenet 2012: 91: 1022–1032.
26. Hunt KA, Smyth DJ, Balschun T et al. Rare and functional SIAEvariants are not associated with autoimmune disease risk in up to 66,924individuals of European ancestry. Nat Genet 2012: 44: 3–5.
27. Kenna KP, McLaughlin RL, Hardiman O, Bradley DG. Using referencedatabases of genetic variation to evaluate the potential pathogenicity ofcandidate disease variants. Hum Mutat 2013: 34: 836–841.
28. Maron BJ. Hypertrophic cardiomyopathy: a systematic review. JAMA2002: 287: 1308–1320.
29. Gasparini P, Rabionet R, Barbujani G et al. High carrier frequencyof the 35delG deafness mutation in European populations. GeneticAnalysis Consortium of GJB2 35delG. Eur J Hum Genet 2000: 8:19–23.
30. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D et al.A map of human genome variation from population-scale sequencing.Nature 2010: 467: 1061–1073.
31. Marth GT, Yu F, Indap AR et al. The functional spectrum of low-frequency coding variation. Genome Biol 2011: 12: R84.
32. O’Roak BJ, Deriziotis P, Lee C et al. Exome sequencing in sporadicautism spectrum disorders identifies severe de novo mutations. NatGenet 2011: 43: 585–589.
33. Kong A, Frigge ML, Masson G et al. Rate of de novo mutations and theimportance of father’s age to disease risk. Nature 2012: 488: 471–475.
34. Caleshu C, Day S, Rehm HL, Baxter S. Use and interpreta-tion of genetic tests in cardiovascular genetics. Heart 2010: 96:1669–1675.
35. Rohlfs EM, Zhou Z, Sugarman EA et al. The I148T CFTR alleleoccurs on multiple haplotypes: a complex allele is associated with cysticfibrosis. Genet Med 2002: 4: 319–323.
36. Buyse IM, McCarthy SE, Lurix P et al. Use of MALDI-TOF massspectrometry in a 51-mutation test for cystic fibrosis: evidence that3199del6 is a disease-causing mutation. Genet Med 2004: 6: 426–430.
37. Mann SA, Castro ML, Ohanian M et al. R222Q SCN5A mutation isassociated with reversible ventricular ectopy and dilated cardiomyopa-thy. J Am Coll Cardiol 2012: 60: 1566–1573.
38. Reed FA, Akey JM, Aquadro CF. Fitting background-selectionpredictions to levels of nucleotide variation and divergence along thehuman autosomes. Genome Res 2005: 15: 1211–1221.
39. Jordan DM, Kiezun A, Baxter SM et al. Development and validationof a computational method for assessment of missense variantsin hypertrophic cardiomyopathy. Am J Hum Genet 2011: 88:183–192.
40. Crockett DK, Lyon E, Williams MS, Narus SP, Facelli JC, MitchellJA. Utility of gene-specific algorithms for predicting pathogenicityof uncertain gene variants. J Am Med Inform Assoc 2012: 19:207–211.
41. Collin RW, de Heer AM, Oostrik J et al. Mid-frequency DFNA8/12hearing loss caused by a synonymous TECTA mutation that affects anexonic splice enhancer. Eur J Hum Genet 2008: 16: 1430–1436.
42. Tavtigian SV, Deffenbaugh AM, Yin L et al. Comprehensive statisticalstudy of 452 BRCA1 missense substitutions with classification of eightrecurrent substitutions as neutral. J Med Genet 2006: 43: 295–305.
43. Sunyaev S, Lathe W 3rd, Bork P. Integration of genome data andprotein structures: prediction of protein folds, protein interactions and”molecular phenotypes” of single nucleotide polymorphisms. Curr OpinStruct Biol 2001: 11: 125–130.
44. Ng PC, Henikoff S. SIFT: predicting amino acid changes that affectprotein function. Nucleic Acids Res 2003: 31: 3812–3814.
45. Kent WJ, Sugnet CW, Furey TS et al. The human genome browser atUCSC. Genome Res. 2002: 12: 996–1006.
463