Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing...

8
J Lab Med 2018; aop Molecular-Genetic and Cytogenetic Diagnosis Edited by: H.G. Klein Yasemin Dincer*, Julian Schulz, Sandra Wilson, Christoph Marschall, Monika Y. Cohen, Volker Mall, Hanns-Georg Klein and Sebastian H. Eck Multiple Integration and Data Annotation Study (MIDAS): improving next-generation sequencing data analysis by genotype-phenotype correlations https://doi.org/10.1515/labmed-2017-0072 Received June 12, 2017; accepted November 14, 2017 Abstract: Next-generation sequencing (NGS) technologies in clinical diagnostics open vast opportunities through the ability to sequence all genes simultaneously at a cost and speed that is superior to traditional sequencing approaches. On the other hand, the practical implemen- tation of NGS in routine diagnostics involves a variety of challenges, which need to be overcome. Among these are the generation, analysis and storage of large amounts of data, strict control of sequencing performance, validation of results, interpretation of detected variants and report- ing. Here, we outline the Multiple Integration and Data Annotation Study, an approach for data integration in clinical diagnostics based on genotype-phenotype corre- lations. MIDAS aims to accelerate NGS data analysis and to enhance the validity of the results by computer-based variant prioritization using the clinical data of the patient. In this context, we present the MIDAS case reports of one patient with intellectual disability caused by a novel de novo loss-of-function variant in the GATAD2B gene [NM_020699.3: c.1426G>T (p.Glu476*)] identified by trio whole-exome sequencing, as well as two cardiac dis- ease patients with severe phenotype and multiple vari- ants in genes linked to cardiac arrhythmogenic disorders analyzed with multi-gene panel sequencing. Based on the data collected in the MIDAS cohort, the MIDAS soft- ware will be tested and optimized. Moreover, the MIDAS software concept can be extended modularly to include further data resources for improved data handling and interpretation in the broad field of diagnostics. Keywords: cardiac arrhythmogenic disorders; clinical diagnostics; genotype-phenotype correlations; GATAD2B; intellectual disability; next-generation sequencing; rare diseases; whole-exome sequencing. Introduction Motivation for MIDAS The diagnostics of rare genetic disorders is still a chal- lenge for physicians today (see also the NAMSE-Initiative of the German Government, www.namse.de). Rare disor- ders often manifest during infancy or early childhood, yet are difficult to discern due to phenotypic variability and masking by auxiliary symptoms. Many affected individu- als and support groups, who are organized in Germany in the umbrella organization ACHSE (www.achse-online.de), state that their disorder is under-diagnosed and recognized far too late. Many patients and parents of affected chil- dren have endured a diagnostic odyssey before the correct diagnosis could be obtained. During the last years, new highly parallel DNA sequencing approaches have enabled a broader, faster and more precise way to identify genetic variants [1]. These new technologies, subsumed under the term next-generation sequencing have since achieved a level of quality sufficient to justify the application in a diagnostic setting. Today’s challenges of NGS approaches are no longer the technical feasibility or costs but the evalu- ation of the results and the medical interpretation. Due to tremendous increasing amount of genetic data by NGS approaches in diagnostics, the integration of phenotype and genotype data becomes more and more important for data interpretation and generation of conclusive genetic reports. Therefore, the development of more powerful computational algorithms is necessary to ensure reliable and accurate bioinformatics filtering and prioritization of *Correspondence: Yasemin Dincer, Zentrum für Humangenetik und Laboratoriumsdiagnostik (MVZ) Dr. Klein, Dr. Rost und Kollegen, Martinsried, Germany, Phone: +49898955780, E-Mail: [email protected]; and Lehrstuhl für Sozialpädiatrie, Technische Universität München, Munich, Germany Julian Schulz, Sandra Wilson, Christoph Marschall, Monika Y. Cohen, Hanns-Georg Klein and Sebastian H. Eck: Zentrum für Humangenetik und Laboratoriumsdiagnostik (MVZ) Dr. Klein, Dr. Rost und Kollegen, Martinsried, Germany Volker Mall: Lehrstuhl für Sozialpädiatrie, Technische Universität München, Munich, Germany Angemeldet | [email protected] Autorenexemplar Heruntergeladen am | 15.01.18 10:10

Transcript of Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing...

Page 1: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

J Lab Med 2018; aop

Molecular-Genetic and Cytogenetic Diagnosis Edited by: H.G. Klein

Yasemin Dincer*, Julian Schulz, Sandra Wilson, Christoph Marschall, Monika Y. Cohen, Volker Mall, Hanns-Georg Klein and Sebastian H. EckMultiple Integration and Data Annotation Study (MIDAS): improving next-generation sequencing data analysis by genotype-phenotype correlationshttps://doi.org/10.1515/labmed-2017-0072Received June 12, 2017; accepted November 14, 2017

Abstract: Next-generation sequencing (NGS) technologies in clinical diagnostics open vast opportunities through the ability to sequence all genes simultaneously at a cost and speed that is superior to traditional sequencing approaches. On the other hand, the practical implemen-tation of NGS in routine diagnostics involves a variety of challenges, which need to be overcome. Among these are the generation, analysis and storage of large amounts of data, strict control of sequencing performance, validation of results, interpretation of detected variants and report-ing. Here, we outline the Multiple Integration and Data Annotation Study, an approach for data integration in clinical diagnostics based on genotype-phenotype corre-lations. MIDAS aims to accelerate NGS data analysis and to enhance the validity of the results by computer-based variant prioritization using the clinical data of the patient. In this context, we present the MIDAS case reports of one patient with intellectual disability caused by a novel de novo loss-of-function variant in the GATAD2B gene [NM_020699.3: c.1426G>T (p.Glu476*)] identified by trio whole-exome sequencing, as well as two cardiac dis-ease patients with severe phenotype and multiple vari-ants in genes linked to cardiac arrhythmogenic disorders analyzed with multi-gene panel sequencing. Based on the data collected in the MIDAS cohort, the MIDAS soft-ware will be tested and optimized. Moreover, the MIDAS software concept can be extended modularly to include

further data resources for improved data handling and interpretation in the broad field of diagnostics.

Keywords: cardiac arrhythmogenic disorders; clinical diagnostics; genotype-phenotype correlations; GATAD2B; intellectual disability; next-generation sequencing; rare diseases; whole-exome sequencing.

Introduction

Motivation for MIDAS

The diagnostics of rare genetic disorders is still a chal-lenge for physicians today (see also the NAMSE-Initiative of the German Government, www.namse.de). Rare disor-ders often manifest during infancy or early childhood, yet are difficult to discern due to phenotypic variability and masking by auxiliary symptoms. Many affected individu-als and support groups, who are organized in Germany in the umbrella organization ACHSE (www.achse-online.de), state that their disorder is under-diagnosed and recognized far too late. Many patients and parents of affected chil-dren have endured a diagnostic odyssey before the correct diagnosis could be obtained. During the last years, new highly parallel DNA sequencing approaches have enabled a broader, faster and more precise way to identify genetic variants [1]. These new technologies, subsumed under the term next-generation sequencing have since achieved a level of quality sufficient to justify the application in a diagnostic setting. Today’s challenges of NGS approaches are no longer the technical feasibility or costs but the evalu-ation of the results and the medical interpretation. Due to tremendous increasing amount of genetic data by NGS approaches in diagnostics, the integration of phenotype and genotype data becomes more and more important for data interpretation and generation of conclusive genetic reports. Therefore, the development of more powerful computational algorithms is necessary to ensure reliable and accurate bioinformatics filtering and prioritization of

*Correspondence: Yasemin Dincer, Zentrum für Humangenetik und Laboratoriumsdiagnostik (MVZ) Dr. Klein, Dr. Rost und Kollegen, Martinsried, Germany, Phone: +49898955780, E-Mail: [email protected]; and Lehrstuhl für Sozialpädiatrie, Technische Universität München, Munich, GermanyJulian Schulz, Sandra Wilson, Christoph Marschall, Monika Y. Cohen, Hanns-Georg Klein and Sebastian H. Eck: Zentrum für Humangenetik und Laboratoriumsdiagnostik (MVZ) Dr. Klein, Dr. Rost und Kollegen, Martinsried, GermanyVolker Mall: Lehrstuhl für Sozialpädiatrie, Technische Universität München, Munich, Germany

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 2: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

2      Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)

results, which is the aim of the Multiple Integration and Data Annotation Study. MIDAS focuses on the improve-ment of NGS data analysis for diagnostic purposes through integration of various patient data. In this context, the familial medical history of the patient and the well-docu-mented medical examination (standardized recording of phenotype traits), the diagnostic focus (suspected disor-der, associated genes) and the completeness of issued tests (coverage, diagnostic gaps as well as the classification of identified variants (class 1–5, according to Richards et  al. [2]) are of prime importance. The MIDAS software aims to accelerate the evaluation of NGS results and to enhance the validity of the results by computer-based variant prioritiza-tion based on the patient data. Phenotype and genotype data of the patients collected within MIDAS will be used to test and optimize the software.

MIDAS patient cohort and sequencing methods

All patients were evaluated by an expert committee of clinical geneticist and extensively educated regarding potential results and incidental findings. Written informed consent was provided by each participant or legal guard-ian. The study was approved by the Ethics Committee (Ethik-Kommission der Bayerischen Landesärtzekam-mer). Before enrollment, the patients had undergone routine diagnostic testing. We enrolled patients with cardiac arrhythmogenic disorders, who were analyzed by multi-gene panel sequencing, as well as children with unexplained syndromic intellectual disability and their unaffected parents (trios), who were analyzed by whole-exome sequencing (WES). NGS enrichment and library preparation, sequencing, data analysis and evaluation of results were performed in an accredited lab following state of the art guidelines and standards. Identified variants were confirmed by Sanger sequencing. Additionally, the phenotypical features of all patients were collected in a standardized way, using the Human Phenotype Ontology (HPO, [3]). The phenotype and genotype data, which was recorded and generated during MIDAS, will be used for the development and optimization of the MIDAS software.

MIDAS software implementation and data model

The MIDAS software is implemented in Java (Oracle Corporation, CA, USA), the graphical user interface is realized through JavaFX. It relies on a central MySQL data-base to store and integrate all relevant information. The

implementation follows a modular approach, which each core module performing specific tasks. The NGS module is used to incorporate the results of NGS data analysis into the database using standard variant call formats (.vcf). Additionally, the NGS module allows configuration of gene panels to restrict analysis to genes relevant for the patients’ phenotype. The phenotypic feature of each patient is entered by the HPO module. This module will allow the identification of patients with similar pheno-types and the correlation of these data with the identified genotypes. Results from additional diagnostic tests like Sanger sequencing of candidate genes, identification of insertions and deletions by MLPA or structural variations by genome-wide array comparative genomic hybridiza-tion (array CGH) analysis are integrated into the database by pipeline-specific parsers. Lastly, core patient data like name, date of birth and diagnostic history are made avail-able by the LIMS module. This allows a comprehensive overview of all relevant data of a patient through a single software interface (Figure 1).

Case report: novel GATAD2B loss-of-function variant causes MRD18As one of the MIDAS patients, we present a 10-year-old boy with a combination of clinical features including delayed motor and speech development, behavioral disorders, macrocephaly and minor morphological anomalies. He is the third child of healthy, non-consanguineous parents with an unremarkable family history. His 14-year-old sister and 12-year-old brother are unaffected.

Figure 1: Modular design of the MIDAS software.

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 3: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)      3

Background: intellectual disability

Intellectual disability (ID) is defined as IQ < 70 and has a prevalence ranging from 1.5 to 2% in Caucasian popula-tions. Severe ID (IQ < 50) has a prevalence of 0.3–0.5% [4, 5]. In many cases, ID manifests not as an isolated disor-der but as part of a superordinate complex syndrome with additional co-morbidities like congenital malformations, epilepsy and maladjustments such as autism spectrum disorder. It is assumed that these clinical very heterogene-ous disorders have a genetic origin in at least 50% of the cases. Routine diagnostics (karyotyping, targeted sequenc-ing in the case of a suspected disorder) is used to identify relatively common genetic causes like Down syndrome (1 in 600 newborns) or fragile-X syndrome (approx. 1 in 2500  males). It has shown that after employing routine diagnostics, a large diagnostic gap and many undiscovered causes remain. New genetic tests like micro array technolo-gies that are able to uncover submicroscopic copy number variants (CNVs) and their application in routine diagnos-tics during the last decade revealed that approximately 5–20% of isolated or syndromic ID and autism spectrum disorders may be attributed to submicroscopic CNVs [6, 7]. However, even after applying these tests, approximately 60% of cases with ID remain unsolved [8].

Whole-exome sequencing enables simultaneous sequencing of all protein-coding regions of a patient. By WES, a genetic cause can be identified in approxi-mately 27% of the patients with severe ID [4, 9–11]. Gilissen et al. [4] re-analyzed 50 patients with severe ID and negative WES by whole-genome sequencing (WGS). For 42% of these patients, a conclusive diagnosis could be obtained by WGS. But the disadvantages of WGS are currently still the very elaborate and time-consuming analysis and high costs that come along with the very large amounts of raw data. Moreover, lacking guide-lines for the analysis and handling of such data as well as the difficult interpretation of intronic and intergenic regions create an obstacle for the implementation of WGS in routine diagnostics. Therefore, WES is currently recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]).

Medical history

The boy was born at 42 weeks after a normal conception and  pregnancy. Due to a pathological cardiotocograph, forceps were used during birth. With a birthweight of 4270 g, a length of 59  cm and head circumference of 39, all his growth parameters were above the 90th centile.

Ultrasonography at his third day of life and at the age of 6  weeks showed normal results, macrocephaly was recorded. During the first year of life, the boy was a calm child. At the age of 1 + 8/12  years, a slow stato-motoric development was diagnosed and mild cerebellar ataxia was suspected. At the age of 2 + 5/12 years, the diagnosis of cerebellar ataxia, delayed speech development, macro-cephaly and dysphagia was confirmed. He walked at 3 years (pes planovalgus was noted). At the age of 3 + 9/12 years, a developmental age of 1 + 5/12 (fine motor skills) to 2 + 8/12 (speech) years was diagnosed. Oppositional behavior was subject of discussion and at the age of 4  years, he was treated as an inpatient due to a behavioral disorder. Oral and tactile hypersensitivity, open bite and increased sali-vary flow were reported at the age of 4 + 10/12 years. At the age of 5, the boy had normal height (115 cm) and weight (20.5 kg), but his head circumference was above 90th centile (56 cm) and progressed to 57.1 cm (>97th centile) by age of 6. Intellectual disability (IQ = 56) was diagnosed at 6 + 7/12 years. A neuropediatric examination at the age of 7 + 6/12 years showed no indication for cerebellar ataxia.

The craniofacial characteristics of the patient include a prominent forehead, high frontal hairline, triangular face, mild midface retrusion, periorbital fullness, slightly prominent upper jaw and receding chin and slightly low-set ears.

Sequencing results

Karyotyping and array CGH analysis did not reveal any pathogenic variants. WES and trio analysis of the patient and his parents identified a de novo variant NM_020699.3: c.1426G>T (p.Glu476*) in exon 9 of the GATAD2B gene. The pathogenic variant was confirmed by Sanger sequencing. It was absent in the unaffected parents and not listed in population databases of the NCBI, Exome Aggregation Consortium and Genome Aggregation Database (dbSNP, ExAC, gnomAD) or in disease databases such as the Human Gene Mutation Database (HGMD® professional release, Cardiff, UK) or Online Mendelian Inheritance in Man (OMIM®, [13]).

Diagnosis

GATA zinc finger domain-containing 2B (GATAD2B, loca-tion: 1q21.3, OMIM ID: #614998) encodes p66beta, a subunit of the methyl-CpG-binding protein-1 complex (MeCP1), which is repressing gene transcription by remodeling and deacetylation of methylated nucleosomes [14, 15].

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 4: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

4      Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)

Heterozygous loss-of-function variants of GATAD2B have recently been defined to cause a recognizable syn-drome with ID called Mental Retardation, Autosomal Dominant 18 (MRD18; OMIM ID: #615074). By now, eight patients have been shown to have a severely disruptive de novo variant in the coding sequence of GATAD2B (Figure 2; [9, 17–19]) and one patient was reported with an acceptor splice-site variant in intron seven of GATAD2B [20]. More-over, nine patients were reported with 1q21.3  microdele-tions including the entire GATAD2B [21, 22].

The rare autosomal dominant disorder has its onset in infancy. The phenotype of patients with MRD18 is char-acterized by neurological features (delayed psychomotor and speech development, ID), hypotonia, long fingers/toes and craniofacial features (Table  1). Behavioral psy-chiatric manifestations (such as hyperactivity, tics or easy frustration) have been described in some patients.

Taken together, by considering the patient’s phe-notype, we were able to identify a novel pathogenic de novo variant in the GATAD2B gene by WES and to diag-nose the rare disease MRD18 in our patient. However, the manual inspection and evaluation of variants remaining after the filtering process (in this trio case: 210 variants) is still a time-consuming process. The MIDAS software aims to accelerate the evaluation of future WES (or WGS) data by automated variant prioritization based on the patients’ phenotype. Thereby, the validity of genetic analysis results will be enhanced and faster diagnosis in patient care will be enabled. Moreover, the identifica-tion of candidate genes in MIDAS patients also provides further clinical information about patients with rare ID

syndromes and thereby contributes to the specification of the associated phenotypes. Standardized recording of phenotypic traits can thereby increase the knowledge about rare diseases.

Case reports: two patients with severe phenotype and multiple variants in genes linked to cardiac arrhythmogenic diseasesHere we present two cardiac patients diagnosed or sus-pected with cardiac arrhythmogenic disorders that were analyzed within MIDAS.

First cardiac case: medical history

The first patient was a 10-year-old boy, who suffered from sudden cardiac death (SCD). The boy’s uncle previously had a SCD at the age of 43. The autopsy confirmed a hyper-trophic cardiomyopathy.

Background: hypertrophic cardiomyopathy

Hypertrophic cardiomyopathy (HCM) is the most common familial heart disease with vast genetic heterogeneity. HCM is an autosomal dominant, structural disease of

Figure 2: GATAD2B gene structure and functional domains (CR, conserved region; CR1, MBD2- and MBD3-binding; CR2, histone tail binding; [16] ), marked all single nucleotide variants (SNVs) small insertions/deletions listed in the HGMD® as well as the variant found in our patient.

Table 1: Craniofacial features of patients with MRD18 (OMIM ID: #615074).

Face Eyes Nose Mouth

Broad forehead Strabismus Tubular nose Broad mouthShort philtrum Hypermetropia Broad nasal bridge Thin upper lip

Narrow palpebral fissuresHypertelorismDeep-set eyes

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 5: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)      5

the heart muscle with a prevalence of approximately 1 in 500 in Caucasian populations [23]. The disease is usually associated with an asymmetrical increase in muscle mass of the left ventricle with the interventricular septum being involved. Characteristic changes in ECG (q wave, ST segment, p wave) are the consequences. The pheno-type ranges from benign forms, to forms with reduced penetrance and malignant forms with a high risk of SCD occurring already during childhood [24]. The average life expectancy is 66 years; however, the prognosis depends on the underlying molecular cause. Penetrance of left ventricular hypertrophy is 95% after 55  years. Five to seven percent of the patients carry multiple variants and may have more severe or early disease expression. More than 1000 pathogenic variants have been identified in 15 HCM-associated genes so far. Most of these genes encode cardiac sarcomeric proteins. The current routine diagnos-tics enable detection of pathogenic variants in approxi-mately 60% of all HCM cases. About 90% of all pathogenic variants are located in the genes for the β-myosin heavy chain (MYH7), the myosin binding protein C (MYBPC3), troponin T (TNNT2) and troponin I (TNNI3) [25].

Sequencing results

Sequencing of HCM-associated genes in MIDAS resulted in the identification of a known pathogenic variant in the MYBPC3 gene and the identification of a variant of unknown significance (VUS) in the ACTN2 gene, both in heterozygous state (Table 2). The identified variant in the MYBPC3 gene is extensively described in patients with HCM and classified as pathogenic [26–28]. The second variant in the HCM-associated ACTN2 gene may have contributed to the severe phenotype in our patient. The rare variant leads to a non-synonymous substitution of glutamine to arginine at amino acid position 460 in the spectrin-domain of α-actine 2. Glutamine at position 460 is highly conserved and this variant is very rare (absent in 138,000 individuals, [29]), thus potentially pathogenic. Currently, up to 30 different variants in the ACTN2 gene are described to be associated with either HCM or DCM.

The variant present in our patient was previously not reported. Functional prediction, using the in silico pre-diction programs PolyPhen2 [30], MutationTaster [31] and SIFT [32], rated the variant as deleterious. Following the standard classification guidelines of Richards et al., this variant was classified as VUS.

Second cardiac case: medical history

The second patient is a 7-year-old boy with an extreme QTc prolongation, bizarre T waves and a biphasic U wave. The boy’s father suffered from a sudden unexpected death (SUD) at the age of 20.

Background: long QT syndrome

Long QT syndrome (LQTS), a clinically and genetically heterogeneous cardiac disease, is characterized by pro-longed ventricular repolarization. Long-term ECG shows prolonged frequency-adjusted QT intervals (QTc) of 440 to >500  ms. Arrhythmia, which may lead to unrespon-siveness and SCD occurs in dependence to the QTc [33]. If the disease remains untreated, the 10-year survival rate is 50%. There are two types of LQTS: the common autoso-mal dominant Romano-Ward syndrome (RW) and the very rare autosomal recessive Jervell and Lange-Nielson syn-drome (JLN). The prevalence of LQTS among the Cauca-sian population is at least 1 in 2500. In approximately 75% of all clinically confirmed cases, causative variants can be detected in one of the five myocardial ion channel genes. They encode one sodium channel (SCN5A, LQT3) as well as four potassium channels responsible for repolariza-tion: KCNQ1 (LQT1), KCNH2 (LQT2), KCNE1 (LQT5), KCNE2 (LQT6). Furthermore, in rare special forms of LQTS, patho-genic variants were found in other genes: ANK2 (LQT4), KCNJ2 (LQT7), CAV3 (LQT9), SCN4B (LQT10), KCNE3, SNTA1 (LQT12). These forms are also described with highly complex phenotypes [34]. The identification of carriers of causative variants allows early, possibly pre-symptomatic treatment. In this way, the risk of cardiac events can be

Table 2: Variants identified in our patients with arrhythmogenic diseases.

Patient   Gene   Variant   Zygosity   Classification

1   MYBPC3   NM_000256.3:c.821+1G>A (IVS7+1G>A, Intron 7)   Heterozygous   Pathogenic variant1   ACTN2   NM_001103.3:c.1379A>G (p.Gln460Arg)   Heterozygous   VUS2   KCNJ2   NM_000891.2:c.436G>A (p.Gly146Ser)   Heterozygous   Pathogenic variant2   KCNH2   NM_000238.3:c.2768C>T (p.Pro923Leu)   Heterozygous   VUS2   PRKAG2   NM_016203.3:c.298G>A (p.Gly100Ser)   Heterozygous   VUS

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 6: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

6      Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)

reduced by 62–95% in LQTS type 1 and by 74% in LQTS type 2.

Sequencing results

Sequencing of LQTS-associated genes resulted in the detection of three rare variants in our patient (Table 2). The first variant in the KCNJ2 gene is a known pathogenic variant associated to the autosomal-dominant inherited Andersen-Tawil syndrome (ATS, also known as LQT type 7). It leads to a non-synonymous substitution of glycine to serine at amino acid position 146 and was previously described causative for ATS [35]. In addition, a rare VUS was detected in the KCNH2 gene leading to the substitu-tion of proline to leucine in the HERG protein. This variant was not previously described, yet comparable variants in the KCNH2 gene are a frequent cause of long QT syn-drome. Proline at this position is conserved in homologous mammalian proteins. Functional prediction, using the in silico tools PolyPhen2, MutationTaster and SIFT, yielded contradicting results. This variant was thus classified as VUS. A second rare VUS was detected in the PRKAG2 gene, leading to a substitution of glycine by serine at amino acid position 100. For this variant a co-segregation with ven-tricular pre-excitation, conduction defects and HCM was shown in a Chinese family [36]. This variant seems to lead to a decreased expression of PRKAG2. The variant has an occurrence of approximately 1:15 in Asian populations [29], so it might constitute a low penetrant risk factor.

These cases exemplify the difficulties in variant clas-sification. It could be hypothesized that the severe pheno-types of the patients are caused by a combination of the identified variants. According to guidelines and frame-works for variant classification [2], additional variants are currently only classified as VUS. For further elucidation of potential modifier effects, it may become necessary to compare sets of phenotypic features with sets of variants in a standardized way to identify possible correlations. Therefore, these data will be stored in the course of MIDAS in a comprehensive knowledge base that can be accessed for analyzing future cases with a similar phenotype.

Discussion – MIDAS concept and outlookNGS approaches such as multi-gene panel sequencing or WES have massively accelerated the diagnosis and char-acterization of rare genetic diseases. However, due to the

large amount of genetic data, NGS approaches are still limited by the time-consuming human interpretation of the genetic variants. Therefore, powerful computational algorithms are necessary to improve data handling in diagnostics and to accelerate the NGS data interpretation.

The main concept of the MIDAS software is the incor-poration of all available clinical data of the patient in a standardized way, so that it is applicable for the NGS data analysis. Often, patients suffering from complex genetic syndromes, such as intellectual disability combined with morphological features and/or behavioral abnormalities; have to endure a diagnostic odyssey. Starting with karyo-typing and array CGH analysis, up to the sequencing of single candidate genes for suspected syndromes, it can take several years until a definite diagnosis is obtained. On the other hand, human geneticists and physicians are tasked with the evaluation of all these tests, often only made available in specialized analysis software. For the interpretation, it is necessary to combine differ-ent diagnostic results, which can be very time-consum-ing and costly. The MIDAS software aims to mitigate this problem by integration of different diagnostic results and linking this information directly to the phenotype of the

Figure 3: MIDAS data integration concept.

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 7: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)      7

individual patient. These genotype-phenotype correla-tions will enable a computer-based variant prioritization and thereby accelerate the identification of disease-caus-ing variants in NGS data analysis. For the development and optimization of the MIDAS software, phenotype and genotype data of the MIDAS cohort patients will be used.

The necessity of data integration goes well beyond genetic diagnostics. For the diagnosis of many disorders, it would be hugely beneficial to integrate and correlate genetic data with further resources like imaging (nuclear magnetic resonance scans) or biochemical tests to support or exclude certain diagnoses. The MIDAS concept can be modularly extended to include further data resources (Figure 3) and provides the opportunity to improve data handling and interpretation in diagnostics.

Acknowledgments: We thank the patients and their fami-lies for the participation in the study. We thank S. Pfalzer, V. Hasselbacher, S. Eilitz, and S. Lippert for excellent technical support and M. Ziegler for critical reading of the manuscript. A consortium consisting of the Center for Human Genetics and Laboratory Diagnostics – Dr. Klein, Dr. Rost and Colleagues, Genomatix GmbH, kbo – Klini-ken des Bezirks Oberbayern and IMGM Laboratories was founded to develop the MIDAS software system for data integration in diagnostics. The MIDAS project is sup-ported by a grant of the Bavarian Ministry of Economics Affairs and Media, Energy and Technology (Grant Num-ber: “MED-1603-0011”).

References1. Klein HG, Rost I. Moderne genetische Analysemethoden.

Bundesgesundheitsbla 2015;58:113–20.2. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al.

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American Col-lege of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405–24.

3. Köhler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, et al. The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 2014;42(Database issue):D966–74.

4. Gilissen C, Hehir-Kwa JY, Thung DT, van de Vorst M, van Bon BW, Willemsen MH, et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 2014;511:344–7.

5. Leonard H, Wen X. The epidemiology of mental retardation: chal-lenges and opportunities in the new millennium. Ment Retard Dev Disabil Res Rev 2002;8:117–34.

6. Girirajan S, Rosenfeld JA, Coe BP, Parikh S, Friedman N, Goldstein A, et al. Phenotypic heterogeneity of genomic disorders and rare copy-number variants. N Engl J Med 2012:367:1321–31.

7. Vissers LE, de Ligt J, Gilissen C, Janssen I, Steehouwer M, de Vries P, et al. A de novo paradigm for mental retardation. Nat Genet 2010;42:1109–12.

8. Rauch A, Hoyer J, Guth S, Zweier C, Kraus C, Becker C, et al. Diagnostic yield of various genetic approaches in patients with unexplained developmental delay or mental retardation. Am J Med Genet A 2006;140:2063–74.

9. de Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T, et al. Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 2012;367:1921–9.

10. Rauch A, Wieczorek D, Graf E, Wieland T, Endele S, Schwarzmayr T, et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 2012;380:1674–82.

11. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 2013;369:1502–11.

12. Stark Z, Tan TY, Chong B, Brett GR, Yap P, Walsh M, et al. A pro-spective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders. Genetics in Medicine. 2016;18:1090–6.

13. Online Mendelian Inheritance in Man® (OMIM). https://www.omim.org/.

14. Feng Q, Zhang Y. The MeCP1 complex represses transcription through preferential binding, remodeling, and deacetylating methylated nucleosomes. Genes Dev 2011;15:827–32.

15. Brackertz M, Gong Z, Leers J, Renkawitz R. p66alpha and p66beta of the Mi-2/NuRD complex mediate MBD2 and histone interaction. Nucl Acids Res 2006;34:397–406.

16. UniProt Knowledgebase. http://www.uniprot.org/uniprot/Q8WXI9.

17. Lazaridis KN, Schahl KA, Cousin MA, Babovic-Vuksanovic D, Riegert-Johnson DL, Gavrilova RH, et al. Outcome of whole exome sequencing for diagnostic Odyssey cases of an individu-alized medicine clinic: the Mayo Clinic experience. Mayo Clin Proc 2016;91:297–307.

18. Vanderver A, Simons C, Helman G, Crawford J, Wolf NI, Bernard G, et al. Whole exome sequencing in patients with white matter abnormalities. Ann Neurol 2016;79:1031–7.

19. Posey JE, Harel T, Liu P, Rosenfeld JA, James RA, Coban Akdemir ZH, et al. Resolution of disease phenotypes resulting from multilocus genomic variation. N Engl J Med 2017:376:21–31.

20. Hamdan FF, Srour M, Capo-Chichi JM, Daoud H, Nassif C, Patry L, et al. De novo mutations in moderate or severe intellectual disability. PLoS Genet 2014;10:e1004772.

21. Tim-Aroon T, Jinawath N, Thammachote W, Sinpitak P, Limrungsikul A, Khongkhatithum C, et al. 1q21.3 deletion involving GATAD2B: an emerging recurrent microdeletion syndrome. N Engl J Med 2012;367:1321–31.

22. Willemsen MH, Nijhof B, Fenckova M, Nillesen WM, Bongers EM, Castells-Nobau A, et al. GATAD2B loss-of-function mutations cause a recognisable syndrome with intellectual disability and are associated with learning deficits and synaptic undergrowth in Drosophila. J Med Genet 2013;50:507–14.

23. Maron BJ, Maron MS, Semsarian C. Genetics of hypertrophic cardiomyopathy after 20 years: clinical perspectives. J Am Coll Cardiol 2012;60:705–15.

24. Ho CY, Charron P, Richard P, Girolami F, Van Spaendonck-Zwarts KY, Pinto Y. Genetic advances in sarcomeric cardiomyopathies: state of the art. Cardiovasc Res 2015;105:397–408.

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10

Page 8: Volker Mall, Hanns-Georg Klein and Sebastian H. Eck ... · recommended as a first-tier sequencing test for children with a suspected monogenic disorder (Stark et al. [12]). Medical

8      Dincer et al.: Multiple Integration and Data Annotation Study (MIDAS)

25. Lopes LR, Zekavati A, Syrris P, Hubank M, Giambartolomei C, Dalageorgou C, et al. Genetic complexity in hypertrophic cardio-myopathy revealed by high-throughput sequencing. J Med Genet 2013;50:228–39.

26. Walsh R, Thomson KL, Ware JS, Funke BH, Woodley J, McGuire KJ, et al. Reassessment of Mendelian gene pathogenicity using 7855 cardiomyopathy cases and 60,706 reference samples. Genet Med 2017;19:192–203.

27. Niimura H, Bachinski LL, Sangwatanaroj S, Watkins H, Chudley AE, McKenna W, et al. Mutations in the gene for cardiac myosin-binding protein C and late-onset familial hypertrophic cardio-myopathy. N Engl J Med 1998;338:1248–57.

28. Erdmann J, Raible J, Maki-Abadi J, Hummel M, Hammann J, Wollnik B, et al. Spectrum of clinical phenotypes and gene variants in cardiac myosin-binding protein C mutation carriers with hyper-trophic cardiomyopathy. J Am Coll Cardiol 2001;38:322–30.

29. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016;536:285–91.

30. PolyPhen2. http://genetics.bwh.harvard.edu/pph2/.

31. Mutation Taster. http://www.mutationtaster.org/.32. SIFT. http://sift.jcvi.org/.33. Schulze-Bahr E, Klaassen S, Abdul-Khaliq H, Schunkert H.

Gendiagnostik bei kardiovaskulären Erkrankungen – Position-spapier der Deutschen Gesellschaft für Kardiologie (DGK) und der Deutschen Gesellschaft für Pädiatrische Kardiologie (DGPK). Kardiologe 2015;9:213–43.

34. Lieve KV, Williams L, Daly A, Richard G, Bale S, Macaya D, et al. Results of genetic testing in 855 consecutive unrelated patients referred for long QT syndrome in a clinical laboratory. Genet Test Mol Biomarkers 2013;17:553–61.

35. Haruna Y, Kobori A, Makiyama T, Yoshida H, Akao M, Doi T, et al. Genotype-phenotype correlations of KCNJ2 mutations in Japanese patients with Andersen-Tawil syndrome. Hum Mutat 2007;28:208.

36. Zhang BL, Xu RL, Zhang J, Zhao XX, Wu H, Ma LP, et al. Identifica-tion and functional analysis of a novel PRKAG2 mutation respon-sible for Chinese PRKAG2 cardiac syndrome reveal an important role of non-CBS domains in regulating the AMPK pathway. J Cardiol 2013;62:241–8.

Angemeldet | [email protected] AutorenexemplarHeruntergeladen am | 15.01.18 10:10