Short Read (Next-gen) Sequencing: A Tutorial with...

28
10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen) Sequencing: A Tutorial with Cardiomyopathy Diagnostics as an Exemplar Running title: Punetha et al.; Next-gen sequencing in cardiomyopathy diagnostics Jaya Punetha, MS 1,2 and Eric P. Hoffman, PhD 1,2 1 Dept of Integrative Systems Biology, The George Washington University School of Medicine; 2 Center for Genetic Medicine Research, Children's National Medical Center, Washington DC Correspondence: Eric P Hoffman, PhD Research Center for Genetic Medicine Children's National Medical Center 111 Michigan Ave, NW Washington DC 20010 Phone: 202-476-6011 Fax: 202-476-6014 E-mail: [email protected] Journal Subject Codes: [33] Other diagnostic testing, [11] Other heart failure Key words: next-gen sequencing, titin, dilated cardiomyopathy, exome, cardiomyopathy, genetics, human D 1 1 1 1,2 2 2 2 n d or Genetic Medicine Research, Children's National Medical Center, Washington ndence: nt te t g g grative Sy S S Syst st t stem e ems s s B B B B io io iolo lo logy gy gy, Th Th Th The e e e Ge Ge Geor or o o ge W Was shi hi i hing ng n ngto t n n n n Un Un Univ iv ive e ersity y y ty S S S Sch ch ch choo oo ool l l l of of of M M M Med or r r r G G G Genetic Med ed dicin ne e e Re e e es se s arch, Ch C Child d dre en's s N Nat t tio io o ion n n nal l Me M Medica al C Ce C n n nt n er, Wa Wa Wash h hingt g gton nd d dence: by guest on May 20, 2018 http://circgenetics.ahajournals.org/ Downloaded from

Transcript of Short Read (Next-gen) Sequencing: A Tutorial with...

Page 1: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

1

Short Read (Next-gen) Sequencing: A Tutorial with Cardiomyopathy

Diagnostics as an Exemplar

Running title: Punetha et al.; Next-gen sequencing in cardiomyopathy diagnostics

Jaya Punetha, MS1,2 and Eric P. Hoffman, PhD1,2

1Dept of Integrative Systems Biology, The George Washington University School of Medicine; 2Center for Genetic Medicine Research, Children's National Medical Center, Washington DC

Correspondence:

Eric P Hoffman, PhD

Research Center for Genetic Medicine

Children's National Medical Center

111 Michigan Ave, NW

Washington DC 20010

Phone: 202-476-6011

Fax: 202-476-6014

E-mail: [email protected]

Journal Subject Codes: [33] Other diagnostic testing, [11] Other heart failure

Key words: next-gen sequencing, titin, dilated cardiomyopathy, exome, cardiomyopathy, genetics, human

D1111,2222

n dor Genetic Medicine Research, Children's National Medical Center, Washington

ndence:

nttet gggrative SySSSyststtstemeemsss BBBBioioiololologygygy, ThThThTheeee GeGeGeororoo ge WWasshihiihingngnngtot nnnn UnUnUniviviveeersityyyty SSSSchchchchooooool l ll ofofof MMMMedorrrr GGGGenetic Mededdicinneee Reeeesses arch, ChCChildddreen'ss NNatttioiooionnnnall MeMMedicaal CCeC nnntn er, WaWaWashhhingtggton

ndddence:

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 2: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

2

Rapid advances in DNA sequencing technologies have made it increasingly cost effective to

obtain accurate and timely large-scale genomic sequence data on individuals (short read

massively parallel, or ‘next generation’ [next-gen]). A next-gen molecular diagnostic approach

that has seen rapid deployment in the clinic over the last year is exome sequencing. Whole

exome sequencing covers all protein coding genes in the genome (~1.1% of genome) and an

exome test for a single patient generates about 6 gigabases (109 bp) of DNA sequence data. A

key challenge facing routine use of next-gen data in patient diagnosis and management is data

interpretation. What sequence variant findings are relevant to diagnosis (pathogenic mutations)?

What sequence variant findings are relevant to clinical care, but not necessarily to patient

diagnosis (clinically actionable incidental data)? What sequence information should be stored,

and where can it be stored? This review provides a tutorial on current approaches to answering

these questions. A recent landmark study showed that application of next-gen sequencing to a

large cohort of idiopathic dilated cardiomyopathy patients found ~27% of patients to show

mutations of the titin gene, the most complex gene in the genome (363 exons). We use titin in

cardiomyopathy as an exemplar for explaining next-gen sequencing approaches and data

interpretation.

Comparing sequencing strategies

Decreasing sequencing costs and broad dissemination of next-gen equipment and expertise are

increasing availability of massively parallel sequencing of patient DNA samples (short read

massively parallel, or next generation [next-gen] sequencing).1,2 Most rapidly expanding is

exome sequencing, where all protein coding sequences (exons) are selected from total genomic

DNA, and selectively sequenced.3 Alternative approaches to next-gen sequencing include

targeted sequencing, and whole genome (complete genome) sequencing. Currently marketed

is (pathogenic muttttatatatatio

ecessarilililily tttto patttientntntnt

clinically actionable incidental data)? What sequence information should be sto

can it be stored? This review provides a tutorial on current approaches to answer

ions. A recent landmark study showed that application of next-gen sequencing to

rt of idiopathic dilated cardiomyopathy patients found ~27% of patients to show

of the titin gene the most comple gene in the genome (363 e ons) We se titin

clininininicacacalllllll yyy acacactionononable incidental data)? WhWhWhWhaat sequence infooormmrmation should be sto

caaaann n n it be storeddd??? Thhhiiisi reveveve iew ww pprovoovideees a tututoriaiaial ononoo curuurrrentt aaapppprorooachehehehesss ttto anssswwer

ions. AAAA rrrreeecennttt llalandnddmamark stututuuddddy shohohoh weweweweddd ththhh tatat aapppppplilililicaaatitititionnn ooooff ff nenexxt-genenenen seqqueuencncininii gg to

rt of idiopapp thic ddddiilii at dded cardididiiomyoyy papp thhhhy yy papp tiiients foundddd ~22227%7%7%% offf f papp tients to show

off thhe tiitiin thhe t lle ii hth ((363633 )s) WWe tiitiin by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 3: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

3

targeted Sanger sequencing panels using traditional individual exon-by-exon sequencing remain

expensive and time consuming, and massively parallel next-gen approaches are beginning to

supplant Sanger sequencing in the clinic (Figure 1).4

Here we compare and contrast the three next-gen approaches to molecular diagnostics:

targeted sequencing panels (TS), whole exome sequencing (WES), and whole genome

sequencing (WGS) (Table 1). Briefly, targeted sequencing (TS) selects ‘candidate genes’ that are

already known to cause the disorder in question, and targets only these for extensive sequence

analyses (~ 200 kb -1 million bp; ~200-2,000 exons). Exome sequencing (WES) pulls out most

exons (coding sequence) of all genes in the genome for sequencing (~30 million bp; ~180,000

exons). Whole genome sequencing (WGS) non-discriminately sequences all 6 billion base pairs

of DNA in a patient, including the large majority of DNA that does not code for proteins and

remains problematic to interpret.

Both targeted next-gen sequencing and whole exome next-gen sequencing have recently

entered the molecular diagnostics workspace, with multiple private and academic labs offering

next-gen sequencing on a clinical, fee-for-service basis.9,10 The costs of generating targeted and

whole exome sequencing data are not very different. If targeted panels covering ~50 genes cost

about the same as whole exome sequencing covering 20,000 genes, then it would seem that

‘more is better’, and the exome will become the standard diagnostic test. However, there are

some technical differences in how the data is generated and interpreted that make the targeted vs.

exome choice less clear cut. The key issue is that targeted sequencing is able to cover 99% of

the ‘candidate genes’, whereas exomes cover only about 90% of the same candidate genes (as

well as 90% of all other genes; in-house sequencing data). This 10% difference in sensitivity is

significant. For example, as described in more detail below, if a patient with dilated

ing (WES)) pulls ooooututuu

30 milililillilililion bbbbp;; ~181818180000,0

hole genome sequencing (WGS) non-discriminately sequences all 6 billion base

a patient, including the large m ority of DNA that does not code for proteins an

o

h

molec lar diagnostics orkspace ith m ltiple pri ate and academic labs offer

holelelele ggenenenomomome sesesequqq encing (WGS) non-disssscrccriminately sequenenenccces all 6 billion base

a a a paaaatient, incluudddingg ttthe llllaara geee mmaaajoorittty of DDNAAA tttthhah tt doooees notott codododde foooor prprprottteinnns an

oblemamamaattticiccic to inini tteterprprprerettt.

h targegg ted next-gggen seqqquencing gg andd d whhhholllle exome next-geg n sequqq encinggg have rec

lle lla didi stiic krk iithh lltiiplle iri te dnd dde imi llabbs ffff by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 4: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

4

cardiomyopathy comes into your clinic and you suspect a titin gene mutation, you may wish to

confirm this suspicion by next-gen sequencing. Ordering a targeted sequencing panel that

includes the titin gene will cover 99% of the 363 exons, and likely rule in or rule out titin as a

cause of the patient’s disorder. On the other hand, sending the same patient out for a whole

exome analysis will miss about 1/10 exons – so for the 363 exon titin gene, about 36 exons will

be missing from the data generated (often called ‘exon drop outs’). Thus, a negative result could

be due to missing data due to drop out, and thus not rule out a titin mutation. The higher

sensitivity of targeted sequencing panels has been successfully shown in the diagnosis of

cardiomyopathies by Meder, Haas et al. 11

In addition to the issue of differential drop out, targeted sequencing and exome

sequencing can differ in terms of accuracy of detection of mutations and copy number variations

(CNVs). Accuracy for both small mutations (single base changes, small deletions/duplications)

and CNVs depend on read depth. ‘Depth’ refers to the number of independent sequence reads

generated for each specific region of the genome tested. A typical exome has a median read-

depth of 50 – each exon shows approximately 50 independent reads. In contrast, targeted re-

sequencing panels may achieve a median read-depth of 500 or more reads per exon (10 times

greater than exomes). In general, the greater the depth, the more accurate the detection of

mutations, and the easier it is to detect CNVs. It is possible to increase the machine time on an

exome sequence to match the depth of targeted sequencing, but this increases machine time

allocated per sample, and thus cost.

Whole complete genome sequencing has not yet entered the molecular diagnostics

workspace. There has been a single company marketing whole genome sequencing, Complete

Genomics, but uptake has been limited. The Complete Genomics method that has been used to

in the diagng osis ooooff f f

a

can differ in terms of accuracy of detection of mutations and copy number varia

A t

depend on read depth. ‘Depth’ refers to the number of independent sequence re

for each specific region of the genome tested A t pical e ome has a median rea

addititititioioioi nnn tototot tttthehhh iiissssue of differential drop ouououtt, targeted sequenenenciccc ng and exome

cccannnn differ in tterrrmss ooof acacaca ccuraaaccy off deeeteectioion ofofof mmmmuttatioiions aannnd cccooopy nununun mbmbmbber vvvaaria

Accurararaacycycyy fffor bbb ttotothhhh smsmala l mumumuutatatatationsss (((s(sininini lglgllee bbabab seee cccchahhahannngeseses, smsm llalall dedededelelleletitititionns/s//ddduduplplliicicat

depppend on read ddedd ptptpthh.h ‘D‘D‘Depppthhh’ reffef rs to hthhe numbbbber off f iiini dded pepp nddddent sequqq ence re

fo hch icififi igi ff hth t tedd AA t ipi ll hha didi by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 5: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

5

generate most whole genomes produces large numbers (120 billion) of short (33 bp) reads –

shorter than other next-gen sequencing platforms.12 Interpretation of pathogenic variants in the

large majority of non-coding sequence (99% of genome) remains difficult, and this results in

relatively little interest in generating this data in the clinical setting. However, as sequencing

technologies continue to rapidly advance, and costs decline, this may change in the not so distant

future.

Variant calling: benign, pathogenic, VOUS, or incidental.

Once next-gen sequencing data is obtained with adequate depth, the bioinformatics process of

interpretation of the data begins in order to define potential mutations, dubbed ‘variant calling’

(Figure 2). Variants are typically grouped into four categories: benign, pathogenic, VOUS

(variant of unknown significance), or incidental findings. Pathogenic variants are mutations –

variants in a gene that are likely or known to cause the patient’s symptoms (Table 2). Benign

polymorphisms are variants that are common in healthy populations, and are not considered

candidates for disease causing variants. Variants of Unknown Significance (VOUS) are

sequence changes that are not seen at high frequency in general populations, but where there is

little or no supportive evidence of the variant being pathogenic. An incidental finding (IF) is

defined as a sequence variant that has potential health or reproductive significance to the patient

being tested but is secondary to the patient’s primary clinical complaint (e.g. a variant known to

be associated with drug sensitivity, but unrelated to the complaint of cardiac symptoms)13 (Table

2).

There are a growing number of software tools and database resources that greatly aid in

variant calling and variant filtering, and these are increasingly adept at predicting which variant

has the highest probability of causing disease. Predictive software like Mutation Taster14,

ioinformatics proccccesesesess

dubbbbbbbedddd ‘‘‘‘variiiiant cacacacallllli

S

unknown significance or incidental findings. Pathogenic variants are mutation

a gene that are likely or known to cause the patient’s symptoms (Table 2). Beni

isms are variants that are common in healthy populations, and are not considered

for disease ca sing ariants Variants of Unkno n Significance (VOUS) are

VaVaVariririananantststs aaare tttyyypically grouped into fouuurrr ccategories: benigngngn, pathogenic, VOUS

ununnnkkknown signniffficaanccce),), ooor innnncicicideenntalll ffinddinngs.s PaPaPattht ogeenicc vvvarrriaiaiaants ararara e mumm taaatiiiion

a genenene tttthahahah t arree lilililikekek llyly or knknknowowowown tooo cccauauuusee ttthhehe patatatieieieientntnt’’’’s sssymymymymptpttomoms (T(T(T(Tabababblee 222)))). BeBeBB nin

isms are variants thhhah t are common in hheh llalthhhhy yy popp pupp llal itions,,, a ddndd are not considered

fff ddiis isi iiant VVa iri ts ff UUnkkn SiSi ififiic ((VOVOUSUS)) by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 6: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

6

Polyphen-215, LRT 16 (likelihood ratio test of codon constraint) and SIFT17 take into account

evolutionary conservation and amino acid substitution to predict the probability of the mutation

being disease causing or benign. Different functional prediction tools employ different methods

to assign likelihood of functional significance with each tool having its own strengths and

weaknesses. The database of non-synonymous functional prediction (dbNSFP) 18 includes

functional prediction scores from four predictive software (MutationTaster, Polyphen-2, SIFT,

LRT) and a conservation score (PhyloP). Therefore, it gives likelihood scores for functional

prediction of non-synonymous SNPs from multiple databases with a single query.

Different mutations in a single gene can cause varying phenotypes (phenotypic and allelic

heterogeneity), and the bioinformatics pipeline to classify variants as likely pathogenic, possibly

pathogenic or VOUS, benign polymorphism, or clinically actionable (or non-actionable)

incidental is not black and white. Typically, the larger a gene becomes, the more variants it

contains, and categorizing variants and defining what is pathogenic is more difficult for larger

genes like titin. We define and discuss the classification of variants into these categories, using

titin as an example in Table 2. The clinical phenotype of a patient can also be considered a

highly relevant ‘filter’ for classifying variants. There can be a ‘top candidate’ gene in the list of

those containing potential pathogenic variants (e.g. disease causative variant) given concordance

of the patient’s clinical symptoms to previous reports.

Confirming variants as pathogenic is easier if the variants are previously reported and

known to be causative of the disease being studied. With novel variants in genes related to the

disease where the variant is predicted to be pathogenic (evolutionarily conserved region or

frame-shifting variant) further analysis may be required. Testing can include segregation analysis

in families, functional testing for reduced protein levels in tissue/biopsy, checking databases for

ingle quq ery.y

ypes (p((p(phhheh notyttt pipipipic anananandddd

ity), and the bioinformatics pipeline to classify variants as likely pathogenic, pos

s not black and white. Typically, the larger a gene becomes, the more variants it

nd categorizing variants and defining what is pathogenic is more difficult for lar

titin We define and disc ss the classification of ariants into these categories s

ity)))), ananand ddd thththheee biiioiooo nformatics pipeline to ccclalalalasssify variants as lililiikely pathogenic, pos

ooor VVVOUS, benininign pooolo ymmmmorphphphphiisi m,mm orrr cclinniccallylyly aaaacctcc iioi nannabble (o(o(or nononon-acccctitititionnnaaableee)

s nottt blblblblacacaca kkk annddd whwhhititite.e Typypypicicicically, thhththe llalargrgrgerer aa gggeneneneneee bbbbecococomemmemess, tttthe mmmmooore vava iririananttsts it

nd categogg rizinggg variiiai nts anddd d dedd fifififi iniinggg whahhh t iiisi pppathohhh gegg niiic iisii more ddidd fficult for lar

titi iin WWe dd fefiin dd didi hth lcl isififi iti ff iri ts iint hth te iri by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 7: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

7

known polymorphisms, or in vitro protein re-modeling studies.19

Copy number variations from next-gen data

Detection of deletions or duplications of exons (copy number variations; CNVs) is necessary for

high sensitivity of detection for all types of mutations. With the aid of newer bioinformatics

tools, it is possible to detect copy number variations (CNVs) in exome sequencing datasets using

CNV calling algorithms; however these tools are still being evaluated before moving to the

clinical realm.20,21,22 Next-gen sequencing does not detect CNVs directly via sequence data, but

the number of sequence reads per exon can be used to derive the presence of CNVs. A log2

ratio of RPKM (Reads Per Kilobase of exon model per Million mapped reads) for one patient’s

next-gen reads can be compared to the average of the other patients for each exon, with

significant deviations from baseline indicating the presence of a deletion or duplication.23 The

accuracy and sensitivity of all CNV detection methods depend on the depth of the sequence data,

where targeted next-gen data typically gives greater depth, and hence greater sensitivity, for

CNVs compared to exome or whole genome data.

Interpretation of CNVs and small mutation pathogenic data is often simplified by the

sequencing of parents and siblings, as well as the proband. Having the sequence data of the

parents permits the testing of different inheritance models. For example, de novo mutations are

increasingly seen as a cause of genetic disease. 24,25 However, detection of de novo mutational

events is all but impossible without the sequence of both parents in hand.

The current ambiguities concerning sensitivity and specificity of detecting and reporting

pathogenic mutations and CNVs, as well as incidental data, requires consent forms that speak to

these ambiguities.26,27 Before offering WGS tests, medical diagnostic labs have to be equipped

with detailed consent forms and resources for data storage, analysis and interpretation.28

ence of CNVs. A lolololog

ed readsddd )))) fffof r one papapapattititie

d T

n e

e o

pared to e ome or hole genome data

adsss cacacannn bebebebe cccommmpared to the average of ththththe other patients fffororor each exon, with

ddded vvviations from mm baaseeeliinenenene inddddicicici atttinng ttthe prreesennceceee oooff f a ddelettiooon orrr duplplplplicii aaatiiioi nnn.2232 T

nd senennsisisisititititivvityy ooffff alalll ll CNCCNCNV V dededeetttetectiooonnn memmemethththhododdss dededepppependndndd on nn ththththee dddedepth hhh ofofofof the e seseququenencec

eted next-gegg n data typypypicii llallylyl gggivii es gggreater dddepppth, ,, and dd hhheh nce grgg eater sensitivity,y,y, fo

dd to hh lol ddat by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 8: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

8

Incidental findings – what is considered clinically actionable?

Incidental findings are variants that have defined functions associated with them, such as the

sequence change in the Factor V clotting protein that is associated with changes in thrombosis in

general populations (Factor V Leiden; ~5% of individuals of European descent). 29,30,31 Another

example of an incidental finding is identification of the carrier state for the common deltaF508

cystic fibrosis mutation32 in a patient where cystic fibrosis is not in the differential diagnosis. A

key issue with incidental data is whether or not it is actionable, and how the definition of

actionable changes as a function of the age of the patient. For example, to a child, neither Factor

V neither Leiden nor deltaF508 CF carrier status are clinically actionable (nothing would be

done differently in the clinical management of the child given knowledge or lack of knowledge

of these genotypes). Yet, both become clinically actionable in the context of a young adult. A

pregnant woman who is a carrier for Factor V Leiden has a 5–10 fold increase in the risk of

venous thrombosis events, whereas homozygotes have nearly a 100 fold increased risk.33

Similarly, the carrier status for deltaF508 becomes clinically actionable when considering

pregnancy and having children, and genetic counseling of the carrier is warranted. The

definition of clinically actionable vs. non-actionable incidental data from next-gen sequencing

studies is under active discussion by national regulatory and academic groups, and is hotly

debated. 34,35,36. Recently, the ACMG has issued a set of guidelines for reporting of incidental

findings after exome and whole genome sequencing testing a clinical setting.37 The authors

provide a list of 57 genes where known pathogenic (KP) or expected pathogenic (EP) mutations

should be reported back to the physician, even though these genes are not related to the initial

clinical complaint or differential diagnosis of the patient (e.g. incidental data). Some examples

include BRCA1 (risk for breast cancer), MYH7 (cardiomyopathy risk), and RYR1 (malignant

e, to a child, neitheheheher rrr F

ble ((((noththththiiini g woululululddd d bbbb

ently in the clinical management of the child given knowledge or lack of knowle

notypes Yet, both become clinically actionable in the context of a young adult

o f

o

he carrier stat s for deltaF508 becomes clinicall actionable hen considering

entlylylyly iiinnn thththt eee clinininnical management of the ccchihihh ld given knowledededdge or lack of knowle

nooto yyypy es). Yet,, bbbothh h bbebeb cocococ me cccllil nnniccalllly acttioonababablelee inn thhhee contnttexxxtt oof a yyououounnng aaduduud lt

omannn wwwwhohohoh is s aa cacarrrriieier r for FaFaFaFactor VVV LLLLeieieidededd nn hahahh sss aaaa 555 111–10 00 foofofoldldldl iiincncreasasasaseee iiiin tttheheh rrisisi kkk ofo

ombosis events, ,, hwhhhereas hhoh mozyyygogg tes hhhah ve nearlllly yy a 1010100000 ffof ldldldld iincreased risk.33

hehh iri tat ffo ddellt FaF505088 bbe lcliiniic lalll ctiio blbl hh isidde iri by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 9: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

9

hyperthermia susceptibility). There are concerns that this will increase costs of these tests in the

future due to increased burden on data analyses and interpretation. For example, the RYR1 gene

has 106 exons38, (38=42)and variants are frequently found in this gene by exome sequencing.

Assigning pathogenic significance to a variant in RYR1 in terms of risk for malignant

hyperthermia is extraordinarily challenging. One must also question the cost/benefit of reporting

back RYR1 expected pathogenic incidental data, as the risk for malignant hyperthermia is

generally only after halothane anesthetics, and only a small subset of the population is ever

exposed to this. 39,40,41

A recent publication outlines a proposed bioinformatics tool with minimal researcher

effort that looks for incidental findings (defined as variants known to cause Mendelian diseases)

from whole genome sequencing data as a possible solution to reduce the burden on lab

personnel.42 The interpretation, reporting, and retention in medical records is currently quite

variable, and most remains in the context of research studies rather than routine molecular

diagnostics. It will take a while for labs and institutional review boards to incorporate the new

ACMG guidelines.

Data storage

WES on an individual generates around 6 -12 gigabases (109 ) base pairs of data with more than

30,000 variants. WES data analysis strategies focus on filtering out variants at different levels to

only look at novel mutations in coding regions for prioritizing possible disease causing variants.

43 With increasing amounts of data from NGS efforts shared across research sites, the database

grows and data interpretation of new samples becomes more robust. One can easily imagine that

integration of next-gen data from hundreds of thousands of individuals worldwide creates a

powerful world genetic knowledge base, where defining the nature of any particular variant

ith miiiiniiiimallll researararrchchchche

ooks for incidental findings (defined as variants known to cause Mendelian dise

e

2 t

n

It ill take a hile for labs and instit tional re ie boards to incorporate the n

ookkkks s s fofofoforr r inininnciccc dededeennntal findings (defined as vvvarir ants known to ccacause Mendelian dise

e gggeeene ome sequuennncingngg ddatatata aa a as aa pppossibbble soollutiiiononnn ttttoo o redudduce tthhhe bbbuuurdennn n ononon llllab a

2 Thhhheee iiiintntntnteeerprretett ttatatioioi nn, rreppporrrtiititinnnng, annnddd reeeetttentnttiioionn ininin mmeddededicalallal rrececorords iiiissss cccurrenentltltly ququitii

nd most remains iiiin hhthhe context offff research hhh studies ra hhthher thhah n rroutini e molecular

IIt iillll t kak hhilil ffo llabbs dnd ii itit iti lal iie bbo dds t iin o te thhe by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 10: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

10

becomes increasingly accurate.

A database specific for WES data – Exome Variant Server by NHLBI GO Exome

Sequencing Project (ESP) has become an important resource for filtering variants as it contains

data from WES (non- pathogenic variants) of over 6000 individuals from African-American and

European-American populations. 44 While database resources are quickly improving, there

remain technical differences between different next-gen sequencing equipment and

bioinformatics methods that make the bioinformatics data resources a work in progress. For

example, there was a recent report of a study where the same DNA samples were sent to a

number of the top sequencing labs in the world, with comparison of the ‘variant classification’

between sites. There was surprisingly little agreement in the variant analysis of the same

individual (www.rd-neuromics.eu; 24 January 2013 workshop).

An exemplar in cardiogenomics: Genetics of dilated cardiomyopathy

Dilated Cardiomyopathy (DCM; also CMD) is a condition in which the heart’s ability to pump

blood is lessened due to its weakened and enlarged state eventually leading to heart failure.

DCM is characterized by enlargement of the left ventricle and systolic dysfunction. It is the third

most common cause of heart failure with an estimated frequency of 1:2500 in the general

population. 45 Around 20-48% of idiopathic DCM cases were found to be due to an underlying

genetic cause (familial DCM/FDC). 46,47,48,49 FDC was estimated to be found in 20-35% of first-

degree relatives of patients diagnosed with idiopathic DCM50 and may be inherited in autosomal

dominant, autosomal recessive, X-linked, or mitochondrial inheritance patterns. 51 To date,

mutations in 51 genes have been reported to cause DCM.52 Diverse inheritance patterns and a

large number of causative genes make molecular diagnostics of DCM a challenge.

Until recently, patients and physicians only had the option of iteratively sequencing genes

mples were sent toooo a aaa

he ‘variiiianttt t llcllassifififificacacacattti

t

(

l

rdiomyopathy (DCM; also CMD) is a condition in which the heart’s ability to pu

ssened d e to its eakened and enlarged state e ent all leading to heart fail re

tes. TTTheheheererere wwwass sssuru prisingly little agreememementnnt in the variant anananalaaa ysis of the same

(wwwwwww.rd-neuromommiccs.eu;;;; 2242 JJanaanuuuarry 2220013 wworkrkrkshshshopopoo ).

lar ininn ccccararara ddid ogenomiicici s: GGGeneneneneticss ofofoff ddddiili atatt dededd carararrdididid ooomyoyoyopathththhyyy

rdiomyyyopppathyy ((DCDCDCDCM;M;M; alllso CMCMCMC D)D)D)D iiis a condddiiiti ioiii n ini whihihih chhhh thehhh hhhheart’s abilityyy to pupp

dd dd to iits kke dd dd lla ded tate nt llll lle dadiin to hh rt ff iaill by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 11: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

11

one at a time with high costs and long turn-around times. This involved send-outs to multiple

laboratories and companies, often internationally, with each negative result leaving a long list of

genes yet to be tested. This was especially challenging for cardiomyopathies due the inclusion of

some of the largest and most complex genes in the genome (DMD - dystrophin,TTN - titin).

Over the last few years, a number of DCM gene testing panels have been introduced

commercially in the United States - Harvard Medical School and Partners Healthcare: DCM

panel – 28 genes; GeneDx: DCM panel -38 genes; AmbryGenetics DCM panel -37 genes;

Transgenomic DCM panel – 13 genes (GeneTests;DCM). These panels offer targeted

sequencing (TS) for coding regions (exons) of genes related to DCM. The results are available in

a timely manner and are relatively cost-effective (~$4000) as multiple genes are tested at once.

An example of where next-gen sequencing is making a major impact on the diagnosis of

dilated cardiomyopathy is with regards to mutations in the titin gene and protein. The titin

protein (TTN), as its name belies, is the largest known protein in humans, with ~34,000 amino

acids, and a molecular weight of ~3,810 kDa (100-times larger than ‘average’). For comparison,

the average protein in the human body is about 300 amino acids and 30 kD in molecular weight.

53 The gene encoding titin is also the most complex in the genome, with 363 exons used in many

different alternatively spliced transcripts (Figure 3). 54 The titin protein is a key component of

muscle tissue (both skeletal muscle and cardiac muscle), where it serves as a structural and

functional linker between adjacent sarcomeres (Figure 3), and giving structural integrity to the

characteristic ‘striated’ appearance of the actin/myosin network. 55,56,57 Mutations in titin can be a

cause of both skeletal muscle and cardiac muscle disease. Titin mutations can be autosomal

dominant or recessive with phenotypes ranging from dilated/hypertrophic cardiomyopathy to

skeletal myopathy/dystrophy, or both (OMIM # 188840). Titin’s enormous size has made it a

els offer targeted

The resultltltlts are avvvvaiaiaiailllla

a n

e s

d

TN), as its name belies, is the largest known protein in humans, with ~34,000 am

a molec lar eight of 3 810 kDa (100 times larger than ‘a erage’) For compa

annenener rr ananand ddd ararare rererelatively cost-effective (~$~$~$~$4044 00) as multiple e ggeg nes are tested at on

exaxaxample of whhheeere neeext-g-g-gen ssseequeuuenciniing iss mmakakakininingg g aa mammajor immmpapapacct onnn tttheee dddiagggnnos

diomyyyopopopatatathhhy iiiss iwiwiththth rregegegarrdsdsds to mumumutatatatititiononss iinin thehehe tttitittininin ggenenene e anand dd protototeieiein. ThThThhe e titititititin n

TN),), as its name bbbbelllliiiei s,,, iiis thhheh llargegg st kkknown ppproteiini iiiin huhhh mans,,, with ~34,0, 00 am

lol ll iei hght fof 33 818100 kDkD (1(10000 tiim ll hth ‘‘ ’e’)) FFo by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 12: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

12

molecular diagnostic nightmare using more classic Sanger sequencing approaches where each of

the 363 exons was analyzed individually. As such, it was a prime target for next-gen sequencing

approaches where the role of titin in cardiomyopathies and skeletal myopathies could be better

defined.

A recent landmark study by Herman et al. used both next-gen and Sanger sequencing

methods to sequence all coding exons and splice sites of the titin gene in a large cohort of

patients with DCM, hypertrophic cardiomyopathy (HCM), and normal controls.64 They found

that a large proportion of DCM cases (~ 27%) showed likely autosomal dominant mutations in

titin that result in shortening of full length protein (truncations). Titin truncating variants

(nonsense, frameshift, splicing variants) were seen in ~25% of familial cases (inherited

mutations) and ~18% of sporadic cases (de novo mutations). In the familial DCM cases, co-

segregation analysis was highly supportive of a causative role of the identified titin mutations,

with strong statistical support (combined LOD score of 11.1), and high penetrance of the

mutations (>95% after 40 years of age). Most of the titin truncating variants were located in the

A-band region of the titin.

The large size of the titin gene and protein, and high frequency of mutations in DCM

patients suggests that the gene is a frequent target for mutational events. The clustering of DCM

mutations in a sub-domain of titin also suggests that there may be additional non-DCM

phenotypes associated with mutations in other regions of the gene. Clearly, patients presenting

with DCM should now have titin mutation studies as a front-line genetic test, and this is likely

best done with targeted sequencing.

Titin mutations have also been found through whole exome sequencing in skeletal muscle

disease patients. The first examples of using WES for titin mutations was in families with

al dominant mutatitititionooo

truncatitititing variiaii ntntttssss

o

n analysis was highly supportive of a causative role of the identified titin mutatio

>95% after 40 ears of age) Most of the titin tr ncating ariants ere located in

framememeshshshshifififi t,t, splplplplicicicicing variants) were seen inininin ~25% of familiaaallll cases (inherited

aaandndnd ~18% of ssppporaadddid c cacacac ses (((dedede nnovooo muutaationons)s)s)... In thhe faammmilililiaaala DCMCMCMCM caases, co

n analallysysysy isisisis wass hhhigiigighlhlhly sus ppppppororororttttive ooofff a cccausus tatattivivii e ee rorororolelele of ff thhththee idididenentififififiedededed titinini mmututt tatatioii

statistical suppppopp rt ((((combbbinii dded LLLODODODO score offf f 1111 .11),),) a ddnd hhhhiigigi h hhh pepp netrance of the

(>9595%% fafte 4040 ff )e) MMost ff hth iti iti tr iti iiant ll at ded ii by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 13: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

13

autosomal dominant hereditary myopathy with early respiratory failure (HMERF; OMIM

#603689) by Ohlsson et al. 65 and Pfeffer et al. 66 Both groups identified the same disease causing

mutation in titin (g.274375T>C; p.Cys30071Arg) in A-band titin, and were able to provide

statistical support for causality of this point mutation using family studies and linkage analyses.

Interestingly, no cardiac abnormalities were reported in the affected patients.

All these studies show the power of next-gen sequencing to elucidate the role of the giant

protein titin that can cause autosomal dominant disorders of heterogeneous phenotypes with

mutations situated in the same A-band region (DCM and HMERF). As pointed out above, the

10% drop out rate of exons of titin when using whole exome sequence, compared to the 1% drop

out in targeted sequencing, is an important technical distinction between the two approaches.

Earlier this year, Andreasen et al. compared previously associated cardiomyopathy

variants (missense and nonsense) with variants in the ESP (exome variant server) 44 database, and

found that a significant proportion of sequence variants published as ‘pathogenic’ may in fact

be ‘VOUS’ or polymorphisms.67 Since phenotypic data from ESP is not available, they defined

cut-off values based on the estimated prevalence of DCM at 1:2500 and only picked variants

above this cut off. Approximately 17% (58 out of 337) of variants known to be associated to

DCM were present in ESP. Polyphen-2 analysis on these 58 variants predicted 33% to be benign.

The authors hypothesize that if these variants were to be disease-causing, then a much higher

prevalence of cardiomyopathy would be expected in the general population (1000-fold higher

than expected). The conclusions from this study were that many cardiomyopathy associated

variants are wrongly classified as damaging and are probably false positives (benign

polymorphisms). With more next-gen data being generated; it will be easier to identify benign

polymorphisms among different populations resulting in fewer false positives.

As pointed out abovvvve,eee

e, compapp red ddd ttott ttttheheee 1111%%%%

eted sequencing, is an important technical distinction between the two approache

l

s

or pol morphisms 67 Since phenot pic data from ESP is not a ailable the defi

ted d d d sesesequququenenenciccc ngngng,, is an important technicaaal l ll did stinction betweeeeeen the two approache

lieer this year, AAAnnndrreaaasennnn et aalll. y comoompaaarred prrevvioioioususususlyy aaasssociiai tttedd d cccardddioioioiomymymyooopaaathhhyh

issennnsesesee aaaannd nnononsesensns )e)e) wititithhh h vvariananntststs iiiinn ththhthee ESESESE PPP (e(((exoxoxomememe varariiaiantn sereerervvver) 444444 ddadad tatat bbabas

a signgg ificant prpp oppportiiion of ff seqqquence variiiants pppubbbbllil shhheh d ddd as ‘‘‘‘papp thhhhogggenic’ may yy in

ll hihi 6677 SiSi hph ot iic ddat ffr EESPSP ii t iaillablbl thhe ddeffi by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 14: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

14

Summary

As NGS technologies become more routine, the field of human genetics will receive an influx of

information that may be in contrast with what was reported in the past. This represents a

retrograde- storm before the calm situation where there might be initial unsettling of established

norms only to result in a greater understanding of human genetics. The future holds much

excitement for uncovering new genotype-phenotype co-relations, development of bioinformatics

tools to accelerate data analyses and discovery of new genes related to disease.

Acknowledgements: We would like to thank Dr. Simon C Watkins for kindly providing us the EM scan of human sarcomere.

Funding Sources: Supported in part by the National Institutes of Health (3R01 NS29525), and the Center for Genetic Medicine Research at Children’s National Medical Center.

Conflict of Interest Disclosures: None

References:

1. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31-46.

2. Manolio TA, Chisholm RL, Ozenberger B, Roden DM, Williams MS, Wilson R, et al. Implementing genomic medicine in the clinic: The future is here. Genet Med. 2013;15:258-267.

3. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, et al. Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes. Nature. 2009;461:272–276

4. Sikkema-Raddatz B, Johansson LF, de Boer EN, Almomani R, Boven LG, van den Berg MP, et al. Targeted Next-Generation Sequencing can Replace Sanger Sequencing in Clinical Diagnostics. Hum Mutat. 2013;34:1035-1042.

5. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135-1145.

6. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010;7:111-118.

7. Clark MJ, Chen R, Lam HY, Karczewski KJ, Chen R, Euskirchen G, et al.. Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011;29:908-914.

or kindndndndlylylyly pppprorororovivivivididididingngngng uuuus

o )f

f

s

ourururrces: Suuuupppppororororted d dd ininn pppparrrt t t bybybyby theheheh NNNNatatata ioii naal Instststititititutututu esss oooof ff HeHeHealalalalth ((((3R3R3R3R0101010 NNNNS2S2SS2959595525222 )fooro Genetic Medddicinininne ReReReessearrrchchch aaatt Chhhilldrenen’s NNNatatatatioioionaaal Meddidicaaalll CCCeC ntttterrr.

f Interest Disclololosusus res:s NNNNonone

s::

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 15: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

15

8. LG Biesecker, KV Shianna, JC Mullikin. Exome sequencing: the expert view. Genome Biol.2011,12:128.

9. Jamal SM, Yu J-H, Chong JX, Dent KM, Conta JH, Tabor HK, et al. Practices and policies of clinical exome sequencing providers: Analysis and implications. Am J Med Genet Part A.2013;161A:935–950.

10. Rehm HL. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet.2013;14:295-300.

11. Meder B, Haas J, Keller A, Heid C, Just S, Borries A, et al. Targeted next-generation sequencing for the molecular genetic diagnostics of cardiomyopathies. Circ Cardiovasc Genet.2011;4:110-122.

12. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science.2010;327:78-81.

13. Wolf SM, Lawrenz FP, Nelson CA, Kahn JP, Cho MK, Clayton EW, et al. Managing incidental findings in human subjects research: analysis and recommendations. J Law Med Ethics. 2008;36:219-248.

14. Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods. 2010;7:575-576.

15. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248-249.

16. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–1561.

17. Kumar P, Henikoff S, Ng PC Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073-1081.

18. Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32:894-899.

19. Domchek S, Weber BL. Genetic variants of uncertain significance: flies in the ointment. JClin Oncol. 2008;26:16-17.

20. Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, et al.. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics. 2012;28:2747-2754.

mamamamanininini BBBBGGGG,,, etetetet aaaalll.l HHHHumummmaaaaNA nanoarrays. ScScScScieieieien

Mf

z JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates diseaset

bei IA Schmidt S Peshkin L Ramensk VE Gerasimo a A Bork P et al A me

MMMM, LLLLawrenzzzz FFFP,PPP NNNelelelelsososon nnn CACAACA, , KaKaKaKahnhnhn JJJJP,P,P,P Choho MK,K,K,K, CCCClaaaaytytyty ononon EEEW,WWW etetetet aaaal.l.l. Manannanagagaaginininng ggfinndddings in hummaaan ssuuubu jeeectctctc s reeesseaaarcch: annalyyssis anannd d d reecooommmenenndaaatititiions. JJ J J LLLaww MeMeMeM d 8;3;3;33666:6 219-242424248..

z JM, Rödelspeeeergrgrgerererer CCC, ScSSchuhuhuhuelelele kekekeke MMMM,,,, SeSSeS elelelowowoww DDDD. MuMuMuM tatatatatititt onononnTaTaTaT stststs ererer eeeevvavv luates diseasetential of seqqquence llalteratiiions. NaNN t MeMM hhthhoddds. 222010;00;0 7:57557555-57577666.6

b iei IIAA SS hch imiddt SS PPe hshkiki LL RR ksk VEVE GGe isi AA BBo krk PP et ll AA e by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 16: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

16

21. Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet. 2012;91:597-607.

22. Abel HJ, Duncavage EJ, Becker N, Armstrong JR, Magrini VJ, Pfeifer JD. SLOPE: a quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data. Bioinformatics. 2010;26:2684-2688.

23. Scoto M, Cullup T, Cirak S, Yau S, Manzur AY, Feng L, et al. Nebulin (NEB) mutations in a childhood onset distal myopathy with rods and cores uncovered by next generation sequencing. Eur J Hum Genet. 2013 Feb 27. doi: 10.1038/ejhg.2013.31. [Epub ahead of print]

24. Peng G, Fan Y, Palculict TB, Shen P, Ruteshouser EC, Chi AK, et al. Rare variant detection using family-based sequencing analysis. Proc Natl Acad Sci U S A. 2013;110:3985-3990.

25. Ku CS, Tan EK, Cooper DN. From the periphery to centre stage: de novo single nucleotide variants play a key role in human genetic disease. J Med Genet. 2013;50:203-211.

26. Groisman IJ, Mathieu G, Godard B. Use of next generation sequencing technologies in research and beyond: are participants with mental health disorders fully protected? BMC Med Ethics. 2012;13:36.

27. Soden SE, Farrow EG, Saunders CJ, Lantos JD. Genomic medicine: evolving science, evolving ethics. Per Med. 2012;9:523-528.

28. Berg JS, Khoury MJ, Evans JP. Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time. Genet. Med. 2011;13:499–504.

29. Bertina RM, Koeleman BP, Koster T, Rosendaal FR, Dirven RJ, de Ronde H, et al. Mutation in blood coagulation factor V associated with resistance to activated protein C. Nature.1994;369:64-67.

30. De Stefano V, Leone G. Resistance to activated protein C due to mutated factor V as a novel cause of inherited thrombophilia. Haematologica. 1995;80:344-356.

31. Ridker PM, Miletich JP, Hennekens CH, Buring JE. Ethnic distribution of factor V Leiden in 4047 men and women. Implications for venous thromboembolism screening. JAMA.1997;277:1305-1307.

32. Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science.1989;245:1066-1073.

33. Bloomenthal D, von Dadelszen P, Liston R, Magee L, Tsang P. The effect of factor V Leiden carriage on maternal and fetal health. CMAJ. 2002;167:48-54.

dededede nnnnovovovovoooo sisisisingngngnglelelele nnnnucucucucleleleleoooo50:20200033-33 21212121111.1

M

Sh

S, Khour MJ, Evans JP. De in whole nome s uencin in clinical acticth: meeting the challenge one bin at a time G t M d 2011;13:499 504

an IIIJ,JJ, MMMMatathihihh euu GGGG, Godard B. Use of nexxxt tt ggeneration sequeencncncing technologies in ddd bbbeeeye ond: aaaareee ppppartitiiticicccipapapaantts s s wiwiwiwithththh mmmennnntatatat l heeaalth ddddisisisisorooo dedededersss fffululullylyyly prooootetetetectctctted??? BMBMBMBMC CCC M2222;13:36.

SE, FaFaFFarrrrrrrrowowowo EEEG,GG SSSSauau dndnders CJCJCJCJ, Lantntntosososos JJJDDD.D GGGenenomomommicicic medededdiiicicinini e:e: evoooolvlvlvl iiiing scscieieii ncnce,ehics. Per Med. 2020202 12121212;9;9;9:5:5:5:5232323-5-5-5-52828288...

S, Khoury y MJMJMJMJ, , EvEvEvEvanannans sss JPJPPJP. DeDeeDeplpplployoyyoyininiing g gg whwhwhwhololololeee gegegeg nononomemmeme sssseeqeqequeueueuencncncncininining gg inininin cccliliilinininnicacc l practiccthh: iti thhe hhallll bbiin t iti GG t MM dd 20201111 1;133:494999 505044 by guest on M

ay 20, 2018http://circgenetics.ahajournals.org/

Dow

nloaded from

Page 17: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

17

34. Abdul-Karim R, Berkman BE, Wendler D, Rid A, Khan J, Badgett T, et al. Disclosure of Incidental Findings From Next-Generation Sequencing in Pediatric Genomic Research. Pediatrics. 2013;131:564-571.

35. Gliwa C, Berkman BE. Do researchers have an obligation to actively look for genetic incidental findings? Am J Bioeth. 2013;13:32-42.

36. Zawati MH, Knoppers BM. International normative perspectives on the return of individual research results and incidental findings in genomic biobanks. Genet Med. 2012;14:484-489.

37. Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, et al. ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing. Genet Med. 2013;15:565-574.

38. Phillips MS, Fujii J, Khanna VK, DeLeon S, Yokobata K, de Jong, et al. The structural organization of the human skeletal muscle ryanodine receptor (RYR1) gene. Genomics.1996;34:24-41.

39. MacLennan DH, Duff C, Zorzato F, Fujii J, Phillips M, Korneluk RG, et. al Ryanodine receptor gene is a candidate for predisposition to malignant hyperthermia. Nature. 1990;343:559-561.

40. Litman RS, Rosenberg H. Malignant hyperthermia: update on susceptibility testing. JAMA.2005;293:2918-2924.

41. Bandschapp O, Girard T. Malignant hyperthermia. Swiss Med Wkly. 2012;142:w13652.

42. Berg JS, Adams M, Nassar N, Bizon C, Lee K, Schmitt CP, et al. An informatics approach to analyzing the incidentalome. Genet Med. 2013;15:36-44.

43. Gilissen C, Hoischen A, Brunner HG, Veltman JA. Disease gene identification strategies for exome sequencing. Eur J Hum Genet. 2012;20:490-497.

44. Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/.

45. Maron BJ, Towbin JA, Thiene G, Antzelevitch C, Corrado D, Arnett D, et al. Contemporary definitions and classification of the cardiomyopathies. Circulation. 2006;113:1807–1816.

46. Michels VV, Moll PP, Miller FA, Tajik AJ, Chu JS, Driscoll DJ, et al. The frequency of familial dilated cardiomyopathy in a series of patients with idiopathic dilated cardiomyopathy. NEngl J Med. 1992;326:77-82.

47. Grunig E, Tasman JA, Kucherer H, Franz W, Kubler W, Katus HA. Frequency and phenotypes of familial dilated cardiomyopathy. J Am Coll Cardiol. 1998;31:186–194.

, et al. The structuuuurarararal) )) gegegegenenenene. GeGeGeGenonononomimimimicccsc .

nn

M2

h

nnananan DDDDH,HHH DDDDuffff f f f C,C Zorzato F, Fujii J, Phihiiillllll ipips M, Korneluk kk RGRRR , et. al Ryanodinenee e isisisis a candiddd dadaadatettt fforororor ppprererer dididispspspsposooo ititititioioion tototot maaliignaaantntntnt hhhhypppypererere thththerrermimmm a. NaNaNaNatututuure. 11119999990;0;0;0;34

RS, RRRRososososeeenbeergrg HHHH. MMMMaligngngnanananant hypepepertrtrtrtheheheh rmrmiiaia:: upupuppdaddadatetete on nn sussususcscepepeptibibibibilililil tytyty tesestiititingng. JAJAJAJAM2918-2924.

hhhapp O, GiGiiGirarararardrdrd TTTT. . MaMaMMalililiigngngngnanananant ttt hyyhyhypepeppertrtrtrthehhehermrmrmrmiaiaia... SwSwSwS isisisisssss MeMeMeMed dd WkWkWWklylyyly.. 20200201212112;1;11;14242442:w136522

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 18: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

18

48. Baig MK, Goldman JH, Caforio AL, Coonar AS, Keeling PJ, McKenna WJ. Familial dilated cardiomyopathy: cardiac abnormalities are common in asymptomatic relatives and may represent early disease. J Am Coll Cardiol. 1998;31:195-201.

49. Mestroni L, Rocco C, Gregori D, Sinagra G, Di Lenarda A, Miocic S, et al. Familial dilated cardiomyopathy: evidence for genetic and phenotypic heterogeneity. J Am Coll Cardiol.1999;34:181-190.

50. Hershberger RE, Siegfried JD. Update 2011: clinical and genetic issues in familial dilated cardiomyopathy. J Am Coll Cardiol. 2011;57:1641-1649.

51. Schönberger J, Seidman CE. Many roads lead to a broken heart: the genetics of dilated cardiomyopathy. Am J Hum Genet. 2001;69:249-260.

52. Posafalvi A, Herkert JC, Sinke RJ, van den Berg MP, Mogensen J, Jongbloed JD, et al. Clinical utility gene card for: dilated cardiomyopathy (CMD). Eur J Hum Genet. 2012 Dec 19. doi: 10.1038/ejhg.2012.276. [Epub ahead of print]

53. Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005;33:3390-400.

54. Bang ML, Centner T, Fornoff F, Geach AJ, Gotthardt M, McNabb M, et al. The complete gene sequence of titin, expression of an unusual approximately 700-kDa titin isoform, and its interaction with obscurin identify a novel Z-line to I-band linking system. Circ Res. 2001;89:1065–1072.

55. Wang K, Ramirez-Mitchell R, Palter D. Titin is an extraordinarily long, flexible, and slender myofibrillar protein. Proc Natl Acad Sci U S A. 1984;81:3685-3689.

56. Maruyama K, Yoshioka T, Higuchi H, Ohashi K, Kimura S, Natori R. Connectin filaments link thick filaments and Z lines in frog skeletal muscle as revealed by immunoelectron microscopy. J Cell Biol. 1985;101:2167-2172.

57. Fürst DO, Osborn M, Nave R, Weber K. The organization of titin filaments in the half-sarcomere revealed by monoclonal antibodies in immunoelectron microscopy: a map of ten nonrepetitive epitopes starting at the Z line extends close to the M line. J Cell Biol.1988;106:1563-1572.

58. Granzier HL, Labeit S. The giant protein titin: a major player in myocardial mechanics, signaling, and disease. Circ Res. 2004;94:284-295.

59. Linke WA. Sense and stretchability: the role of titin and titin-associated proteins in myocardial stress-sensing and mechanical dysfunction. Cardiovasc Res. 2008;77:637-648.

60. Labeit S, Kolmerer B. Titins, giant proteins in charge of muscle ultrastructure and elasticity. Science. 1995;270:293–296.

, Jonggbloed JD, etttt aaaal.lHuHuHuHum mmm GeGeGeGenenenenet.t.t.t. 2020202012121212 DDDDecccc

i c2

M en

0

K Ramire Mitchell R Palter D Titin is an e traordinaril long fle ible and sl

ieriiii LLL, KaKaKKarlrlrlliniii SSSS... Protein length in eukaryyyotototici and prokaryottticicicic proteomes. Nucleic2000005050505;33:333339090909 -4-44-40000....

ML,L,L, CCCCentnerrr T,, FFFornnoooff F,F,F, GGeaachhch AAJ,,, GGotththardtddt MM, McMccNNabbb MM,, eeet al... TTThhhe commmplence ofofof tttititititiinin, exexprprpreses isisionon of ff anananan unusususualalalal apppppproroxixiimamamaateteelylyly 7770000000-kDkDkkDaa tititititinnnn isisisofffororm,m aa dndnd with obscurin ididididenenenntitititifyfyfy aaaa nnnovovovoveleleel ZZZZ-l-l-l- inininne e e e totoo III-b-bbbanananand dd lililil nknknknkininining g g sysysysystststemememe . CiCiCiCirrcr Res.

065–1072.

KKK RRa imi MiMitchhellll RR PPallt DD TiTi iti iis tr drdiin ilil llo flfl ibiblle dd lsl by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 19: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

19

61. Freiburg A, Trombitas K, Hell W, Cazorla O, Fougerousse F, Centner T, et al. Series of exon-skipping events in the elastic spring region of titin as the structural basis for myofibrillar elastic diversity. Circ Res. 2000;86:1114–1121.

62. Greaser M. Identification of new repeating motifs in titin. Proteins. 2001;43:145-149.

63. Obermann WM, Gautel M, Steiner F, van der Ven PF, Weber K, Furst DO. The structure of the sarcomeric M band: localization of defined domains of myomesin, M-protein, and the 250-kD carboxyterminal region of titin by immunoelectron microscopy. J Cell Biol. 1996;134:1441–1453.

64. Herman DS, Lam L, Taylor MR, Wang L, Teekakirikul P, Christodoulou D, et al. Truncations of titin causing dilated cardiomyopathy. N Engl J Med. 2012;366:619-628.

65. Ohlsson M, Hedberg C, Bradvik B, Lindberg C, Tajsharghi H, Danielsson O, et al. Hereditary myopathy with early respiratory failure associated with a mutation in A-band titin. Brain. 2012;135:1682–1694.

66. Pfeffer G, Elliot H, Griffin H, Barresi R, Miller J, Marsh J, et al. Titin mutation segregates with hereditary myopathy with early respiratory failure. Brain. 2012;135:1695–1713.

67. Andreasen C, Nielsen JB, Refsgaard L, Holst AG, Christensen AH, Andreasen L, et al. New population-based exome data are questioning the pathogenicity of previously cardiomyopathy-associated genetic variants. Eur J Hum Genet. 2013 Jan 9. doi: 10.1038/ejhg.2012.283. [Epub ahead of print]

nnnnieieieielslslslssosososon nnn O,O,O,O, eeeett tt alalalal. mutattttiiiion iiiin AAAA bb-bbanddd d ttititit

g

s-based exome data are questioning the pathogenicity of previously cardiomyopatg p

G,G,G, EEEElliot HHHH, GrGrrGrifiii fin n nn H,H,H, Barararrerereresiss RRRR, MiMiMiM llllller JJ, Maaarsrsrsrsh h h h J, eeeettt alalal. TTTTitin mmmmutututtataaa iooonn nn sesesegrgrgrgregtaara yyy y myopathyy wwwitth earlrlrllyy y respsspirataatoryyy ffailurure. BrBrrraiaiaia nn. 2022012;113335:1:1:1695––––1117131313.

sen CCCC, NNNNieieielsenen JJJJB,BB,B RRRRefe sgggaaaaaaardrdrdrd L, HoHHoHolslslslsttt AGAGAGAG, ChChChC riririristststenenenseeennn AHAAHAH, AnAnAA drdrdrdreaeaeasen n LLLL, etett aal.ll -based exome ddddaaata a a a arararareee ququququesesesstitititiononononininininggg ththththe e e papapaththhogogogogenee icicicicititity y y y ofofofo pppprererer viviviv ouououslslslslyy yy cardiomyopatgggenetic variants. EuEE r J J HHHum GeGG net. 2020202 13131313 JJan 9. doddd i:i 11110.00 10101010 838388/e/// jhjhjhhg.gg 2012.283. [E[ pppiiint]

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 20: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

20

Table 1. Comparison of existing next-generation sequencing strategies for molecular diagnosis of a disease.

Targeted Sequencing(TS) Whole exome sequencing(WES) Whole genome sequencing(WGS)

Approach Create a custom panel to target a select number of genes to study a genetic

disease.

Use exome capture kits to target the protein-coding genes of the genome (1.1% of total genome)

Whole genome re-sequencing by fragmenting genomic DNA

and sequencing.

Approx size 500kb-3Mb 44 -62Mb (most kits include splice junctions)

3.1Gb

Capture technique

molecular inversion probesPCR based (RainDance, Fluidigm), hybrid capture (Agilent, NimbleGen)6

hybridization with biotinylated oligonucleotide baits different capture techniques based on DNA/RNA baits (NimbleGen, Agilent, Illumina)7

services for whole genome re-sequencing provided by CompleteGenomics, BGIAmericas, Illumina, Knome etc.Most include mitochondrial DNA sequencing too

Advantages greater sensitivity as higher coverage lower drop-out rate of exons (increases specificity for regions of interest)easier assembly and analysisquick, cost-effective

high chance of finding a mutation in all the protein coding regionscost-effective, data interpretation easier than WGScan discover a new gene associated with disease

covers everything - can identify pathogenic variants including structural variation for all genetic diseases sequence enrichment not required

Dis-advantages

pathogenic mutations could be in genes not coveredCan miss large indels/duplications and CNVs.

high drop-out rate of exonsanalysis and interpretation of data can get complicatedCNVs and large indels/duplications not detected.

assembly and analysis very challengingneed huge compute power for this analysis, data size is 15x of WES data8

not cost-effective; more expensive for higher coverage data

Success Rapid, cost-effective, successfully used for

relatively common genetic diseases as a first-pass screening technique.

Higher success rate when exomes of parents of proband also

sequenced, and when used in conjunction with TS panels for

dropped out regions and mtDNA.

Still in nascent phase, will revolutionize the field of human

genetics in the future

provided by yCoCoCoCompmpmpmpleleleleteteteeGeGeGeGenonononommmmBGBGBGBGIAIAIAIAmememem riririricaaaas,s,,, Illuminananana, , , , KnKnKnomMMMM tttt iiii llll dddd

ne

tah

q

( gD

can discover a new geneg )

( gNiNiNN mbmbmbm leleleleGen)6

Illumiiinnana)))7 mitochondrial D

sequencingggg too

greateer sensitttivityyy asasasas hhhiggigheeeer coocovvvev ragegegee lower drdrdrdropo -out rate of f f f eeexe onononns sss(increases spsspspececececiffficiccicitititity y y fofofofor reregigigiononss ofofof iiintntereresest)t)t)

hiiighh chhannce ofoffof ffffiniini ddinggg aa mmmum tatatatioonoo in alallll ththththe prprprproooteieiein n nncodinggg regionscocococoststs --efefeffefeectctctctivivvee,e ddddatatata a aainterpreta iitiion easiiei r hththhan WGWWGWGSSSScacann didiscscovoverer aa nnewew ggenenee

cooovvev rrrs everyrrytthinidididideeentiiiifyyyy patatatathooogevariants includistructural variatall genetic diseasesesesequququq ence enrichhnonott rereququiririrededed by guest on M

ay 20, 2018http://circgenetics.ahajournals.org/

Dow

nloaded from

Page 21: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

21

Table 2. Classification of variants obtained from NGS data using TTN as an example.

Variant Definition Example Significance Report Pathogenic Disease-causing

mutationFrame-shifting variant

in TTN A-band Diagnosis Yes

Benign Polymorphism

Natural variation in DNA with no known adverse

effect

Variant in TTN present in dbSNP with

high frequency

Could be a genetic modifier in future studies

No

VOUS/VUS Variant of uncertain or

unknown significance

Missense variant in a less evolutionarily

conserved region of TTN

May change status to

Pathogenic with additional data

Yes

Incidental findings

Variants that have been

associated with some risk of a

disease state, but are not relevant to the patient’s

diagnostic questions.

Carrier state for mutations in CFTR

gene.

Not ‘diagnostic’. Assessment of

“clinically actionable” defined by

ACMG guidelines

Yes – if clinically actionable

Genetic modifier

Variant that modifies a

disease state due to a second gene

Hypothetical variant in a gene that increases

or decreases TTN expression

Genetic modifiers of monogenic

disorders remain few.

No

Figure Legends:

Figure 1. Schematic comparison of Sanger sequencing and next-generation sequencing

technologies (adapted from Shendure5). For the purpose of molecular diagnostics, genomic

DNA is generally the starting material for both techniques. Sanger sequencing (left column): a)

The first step for sequencing by this method is to design target primers for a specific region in a

gene of interest; preferably between 300-800bp. PCR amplification with target primers and

genomic DNA is then carried out and the PCR product is visualized on an agarose gel to confirm

nnnnt t t t ofofofof lly le” b

clclclclininininicicicicalalalallylylyly acacacactitititiononononabababableleele

, y

r

g p

,arararee nnnot rererelevant to thehehh pppatatatatientntntnt’sss

diagnonoosticc quququesttiooons..

yACAA MGMGMG

guguguguididideleelininnineseee

rVariannnnt t t t thththatatattmodifififiieseeses aaaa

dididiseeeasaaa e ttst ttate dddue to aa ssececonond dd gegenene

HyHyHyypopopop thththheteteteticicicalalalal vvvararariaiiantntntt iiiin n na aa gegegeg nenenee tttthahaahattt t iiincncnccrerereasasassesesese

or dddeccccreases TTTTNTNTN exexprpresessisionon

GeGeGenenenenetititicc c momomom dididid fiff erererrs ss sofofoff mmmmononnnogogogo enenene icicici

dididiso ddrderererers remaiiin fef w.

No

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 22: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

22

predicted size. b) The sequencing reaction is carried out using fluorescently labeled ddNTPs as

chain terminators, DNA polymerase, dNTPs, PCR product generated in the earlier step, and

primers (can be either forward primer or reverse primer per reaction). c) As DNA polymerase

adds nucleotides to the denatured strand, it randomly picks up labeled ddNTPs which cause

termination of the reaction; resulting in a large number of fragments of various sizes. These

fragments are then subject to capillary electrophoresis to separate them by size. Shorter

fragments travel faster towards the negatively charged electrode while longer fragments move

slower. d) The last fluorescently labeled ddNTP for each fragment is recorded as a colored peak

which is used to generate a sequencing trace; with each base assigned a specific color based on

the fluorescent dye. Next-generation sequencing (right column): The methods depicted in this

figure are based on second-generation sequencing with the Illumina platform. a) The first step

for this method is to fragment genomic DNA to a uniform size. Generally, fragmentation is

carried out using adaptive focused acoustics technology (AFA) by the Covaris instrument. The

required size distribution is then confirmed by an agarose gel; image on the right shows sheared

DNA. b) Sequence enrichment is carried out for targeted or exome sequencing; whole genome

sequencing does not require enrichment. To make the library sequencing-ready, adapters are

ligated to both ends of the DNA. The different colored adapters (pink and blue) reflect a

sequencing adapter and a barcoding adapter. After barcoding individual samples, they can be

pooled for sequencing to optimize sequencing costs. c) The library is then immobilized to an

array (flow cell) where bridge amplification occurs to generate clusters (clonal libraries). d)

Sequencing by synthesis technology is used to detect each base by fluorescently labeled dNTPs.

The image shows individual clusters shown on the SAV software by Illumina during sequencing.

A small section of the image represented in the yellow box is zoomed into and each of the

recorded as a colorerereredd d

a specififififiicii collllor bbbbasasasased

c n

b s

thod is to fragment genomic DNA to a uniform size. Generally, fragmentation is

using adaptive focused acoustics technology (AFA) by the Covaris instrument.

e distrib tion is then confirmed b an agarose gel; image on the right sho s she

cent tt dydydydyeee. NeNeNeNexttt-gggeneration sequencing ((((ririrr ghg t column): Thhheee methods depicted in

baasa eeed on secondndnd-genennerattttioioioi n seeeququuenncingnng wwitth thththe IlIlI luummminna pplaaatffforororm. a)a)a)a Thheee firsrrst s

thod ddd isisis ttttooo o fffraggmementntt ggene ommmiccicic DDDDNAAAA tttto a ununifififfoorm mm iisizezeze. GeGeGeeneneraralllllllly, ffffrararragggmenenttatatititionon iiis

using gg adappptive ffffocus deddd acoustics techhnhh ollllogggy yy (A( FAFAFAF ) )) ) bybbyb thehhh CCCCovaris instrument.

didist iribb iti ii hth fifi ded bb lel iim thhe iighht hho hhe by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 23: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

23

fluorescent dots on the image represents a library cluster. Base call files (images) are then

converted to .fastq files (DNA letters) that can be aligned to the genome. The average number of

reads from an individual’s exome sequence (minimum coverage of 50x) is ~ 40 million (in-house

sequencing data).

Figure 2. Analysis pipeline for whole exome sequencing (WES) data used for molecular

diagnostics of dilated cardiomyopathy (DCM). This pipeline is for data from the Illumina

platform using a commercially available software NextGENe (SoftGenetics,PA). Raw data from

Illumina is in .bcl (basecall) format which is converted to .fastq format using Illumina’s software

CASAVA. If samples were pooled per lane, they can be de-multiplexed based on the barcode

adapters attached to specific samples. Reads are filtered for a minimum quality score of 30 which

reflects a single error in 1000 bases. The .fastq files can be then imported to NextGENe where

the first step is conversion to .fasta format. The .fasta files for each paired read are then merged

together and aligned to the reference human genome using custom alignment and variant calling

algorithms. The variants are annotated using the database dbNSFP (database for nonsynonymous

SNPs' functional predictions). 18 The dbNSFP combines prediction scores from different

prediction algorithms (SIFT, Polyphen-2, Mutation Taster, LRT) and includes conservation

scores for all non-synonymous SNPs in the human genome. Thus, enabling comparison of

different functional prediction algorithms at once and giving a probability score ranging from 0

to 1 (0 being benign and 1 being disease causing). For detection of pathogenic variants, further

variant filtration is required. This is achieved by looking at only the coding sequences; splice

junctions (+/- 5 bases from exons), removing synonymous SNPs, and removing polymorphisms

already reported in dbSNP and 1000Genomes. We then create a filter to first look at the 51 genes

netics,PA).) Raw ddddatatatata

t usini g gg IlIlIlIllllul miiiina’ssss ssssooof

o

tached to specific samples. Reads are filtered for a minimum quality score of 30

i h

p is conversion to .fasta format. The .fasta files for each paired read are then me

d aligned to the reference h man genome sing c stom alignment and ariant ca

If sasasampmpmplelelel s ss wewewererere pooled per lane, they cananan be de-multiplpp exxedededed based on the barco

taaca hhhed to speciiififific ssammmpleleleles. RRRReeaddds areee ffilteereed fofofor r r r aa aa miiinnimuummm quququaality y y y ssscoooreee offff 33330

ingle erererrrorororor in 111000000000 00 bbabases s. TTThehehehe .fastststqqq fififif llles s cacann bebebee tthehehen imimimmpopo trtrt deded to ooo NeNeNeNextGEGEGEENeNeN wh

ppp is conversion to ff.ffasta ffformat. TThheh .faff sta fifififilelll s for eachhhh ppp iaiiired dd read are then me

ddd lili ded t hth ffe hh isi to laliig t dd iri t by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 24: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

10.1161/CIRCGENETICS.113.000085

24

already reported to cause DCM as there is a greater probability of finding a mutation in one of

these genes given the patients phenotype. If variants are detected in the known DCM causing

genes, the variants are confirmed by Sanger sequencing and cross referenced with the patient’s

physician. Since exome sequencing has a high rate of false positives, confirming that the variant

is a not a polymorphism on the exome variant server 44 is the next step. If no variants in known

genes are detected in the patient, we then proceed to analyzing variants in the whole exome.

Figure 3. An overview of titin (also known as connectin) gene and protein. a). The gene for

titin (TTN) is located on the long arm of chromosome 2 at position 2q31 with a huge size of 283

kb encoding 363 exons. 54 b) The coding region of titin has been divided into different regions

based on the protein’s location in the sarcomere (exon numbers based on Granzier and Labeit, 58

and Linke59). The N-terminal region of TTN contains the Z-band/disk region which is anchored

to the Z line of the sarcomere and is comprised of exons1-28. I-band of titin roughly covers

exons 28 – 251; this region undergoes extensive alternative splicing. 60,61 The I-band consists of

tandemly repeated Ig domains and the PEVK domain that contains a repetitive motif of the

amino acids: Proline, Glutamate, Valine, and Lysine.62 A-band of titin is coded for by exons 251-

358; exon 358 codes for the titin-kinase domain and is at the A/M junction. M-band titin is

encoded by the last few exons 358-363 and is near the carboxy terminal of titin attached to the

M-line of the sarcomere. c) A sarcomere is the contractile unit of the myofibril made up of

myofilaments: actin and myosin; along with titin. The image depicts the structural arrangement

of a single sarcomere adapted from Herman et al. along with the electron micrograph image of a

human sarcomere. TTN filaments overlap at the M line and adjacent TTN filaments overlap at

the Z-disk. 63,64

rotein. a). The gegeeenenenene

31 wiiiithththth a hhhhuge siiiizezezeze o

g 54 o

he protein’s location in the sarcomere exon numbers based on Granzier and Lab

9 h

e s

251; this region ndergoes e tensi e alternati e splicing 60,61 The I band consis

g 3666333 exexexonononsss. 545454 b)bbb The coding region of titititititin has been dividddedeee into different regio

heee prrror tein’s loccaaationn iiini ttttheheheh sarararccomemm reee ((exonon numummmbebeberrs bbbaaseddd ooon GGGrG anzizizizieeer annnd LLLLab

9). TTTThehehehe NNNN t-termimiinanal ll reregig on oooffff TTTTTN NNN ccocontntntn iaiaiinsns tttthehehe ZZZZ bb-baaand/d/d/didididiskskk rreeegioonnnn whww icchhh isisii aancnch

e of the sarcomere anddd iiis compppriiiised d foff exons1-28888. I bb-bband ddd offff titiiini roughg lyy covers

225151 hthiis iio dde te ii lalte atii lili ici 6060 6,611 ThTh II bb dd iis by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 25: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 26: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 27: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from

Page 28: Short Read (Next-gen) Sequencing: A Tutorial with ...circgenetics.ahajournals.org/.../14/CIRCGENETICS.113.000085.full.pdf · 10.1161/CIRCGENETICS.113.000085 1 Short Read (Next-gen)

Jaya Punetha and Eric P. HoffmanExemplar

Short Read (Next-gen) Sequencing: A Tutorial with Cardiomyopathy Diagnostics as an

Print ISSN: 1942-325X. Online ISSN: 1942-3268 Copyright © 2013 American Heart Association, Inc. All rights reserved.

TX 75231is published by the American Heart Association, 7272 Greenville Avenue, Dallas,Circulation: Cardiovascular Genetics

published online July 14, 2013;Circ Cardiovasc Genet. 

http://circgenetics.ahajournals.org/content/early/2013/07/14/CIRCGENETICS.113.000085World Wide Web at:

The online version of this article, along with updated information and services, is located on the

  http://circgenetics.ahajournals.org//subscriptions/

is online at: Circulation: Cardiovascular Genetics Information about subscribing to Subscriptions: 

http://www.lww.com/reprints Information about reprints can be found online at: Reprints:

  document. Permissions and Rights Question and Answer this process is available in the

located, click Request Permissions in the middle column of the Web page under Services. Further information aboutnot the Editorial Office. Once the online version of the published article for which permission is being requested is

can be obtained via RightsLink, a service of the Copyright Clearance Center,Circulation: Cardiovascular Genetics Requests for permissions to reproduce figures, tables, or portions of articles originally published inPermissions:

by guest on May 20, 2018

http://circgenetics.ahajournals.org/D

ownloaded from