Applying genomics to complex diseases 2007 juran

10
Applying Genomics to the Study of Complex Disease Brian D. Juran, B.S., 1 and Konstantinos N. Lazaridis, M.D. 1 ABSTRACT The interest in dissecting the genetic and environmental components of complex human disease is growing, fueled by the emerging advances in the field of genomics and related disciplines. Improved understanding of the pathogenesis of complex liver diseases such as gallbladder stones, nonalcoholic fatty liver disease, viral hepatitis, and hepatocel- lular carcinoma remains a goal of the clinical and experimental hepatologist alike. Despite the scientific progress and technological advancement, elucidating the underlying mech- anisms of complex hepatic diseases from the genomic standpoint will be demanding. Complexity of genomic structure and function, disease heterogeneity, influence of the environment on disease development and progression, and epigenetics all contribute to the challenge. To overcome these obstacles, novel conceptual frameworks regarding biological systems and human diseases are necessary in addition to a coordinated endeavor among different scientific disciplines. Deciphering in an integrated fashion the genomic, tran- scriptional, and translational aspects of the pathogenesis of complex liver diseases will lead to their better prediction, diagnostics, and treatment. KEYWORDS: Systems biology, disease susceptibility, genetics Complex diseases are heterogeneous, the cumu- lative result of a wide array of gene variants (both common and rare), somatic mutations, epigenetic mod- ifications, and environmental exposures, the combina- tions of which are apt to be significantly varied among the spectrum of affected individuals. 1 Thus, inherited genetic variation is not directly the cause of complex disease but instead acts to mediate the risk of disease development in response to environmental exposures. The clinical and genetic heterogeneity inherent in these disorders greatly complicates our ability to dissect the underpinnings of their etiology and pathogenesis. In the following pages we provide a brief overview of the concepts involved with disease complexity and the field of genomics. We then discuss the current strategies and future challenges of applying genomics-based stud- ies toward achieving a better understanding of complex disease. DISEASE COMPLEXITY Systems Biology: Robustness, Modularity, and Redundancy We humans are in essence complex biological machines, shaped over millennia by evolutionary forces and defined by our genome. All of the information necessary for life is encoded in our DNA. However, its utilization is dependent on the cellular context in which it resides, allowing the development, organization, and sustain- ment of the diverse set of cells that comprise us. The ability of the genome to generate and coordinate this 1 Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, Mayo Clinic College of Medicine, Rochester, Minnesota. Address for correspondence and reprint requests: Konstantinos N. Lazaridis, M.D., Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905. Genetics and Genomics of Complex Diseases in Hepatology; Guest Editor, Konstantinos N. Lazaridis, M.D. Semin Liver Dis 2007;27:3–12. Copyright # 2007 by Thieme Medical Publishers, Inc., 333 Seventh Avenue, New York, NY 10001, USA. Tel: +1(212) 584-4662. DOI 10.1055/s-2006-960167. ISSN 0272-8087. 3

Transcript of Applying genomics to complex diseases 2007 juran

Page 1: Applying genomics to complex diseases 2007 juran

Applying Genomics to the Study of ComplexDiseaseBrian D. Juran, B.S.,1 and Konstantinos N. Lazaridis, M.D.1

ABSTRACT

The interest in dissecting the genetic and environmental components of complexhuman disease is growing, fueled by the emerging advances in the field of genomics andrelated disciplines. Improved understanding of the pathogenesis of complex liver diseasessuch as gallbladder stones, nonalcoholic fatty liver disease, viral hepatitis, and hepatocel-lular carcinoma remains a goal of the clinical and experimental hepatologist alike. Despitethe scientific progress and technological advancement, elucidating the underlying mech-anisms of complex hepatic diseases from the genomic standpoint will be demanding.Complexity of genomic structure and function, disease heterogeneity, influence of theenvironment on disease development and progression, and epigenetics all contribute to thechallenge. To overcome these obstacles, novel conceptual frameworks regarding biologicalsystems and human diseases are necessary in addition to a coordinated endeavor amongdifferent scientific disciplines. Deciphering in an integrated fashion the genomic, tran-scriptional, and translational aspects of the pathogenesis of complex liver diseases will leadto their better prediction, diagnostics, and treatment.

KEYWORDS: Systems biology, disease susceptibility, genetics

Complex diseases are heterogeneous, the cumu-lative result of a wide array of gene variants (bothcommon and rare), somatic mutations, epigenetic mod-ifications, and environmental exposures, the combina-tions of which are apt to be significantly varied amongthe spectrum of affected individuals.1 Thus, inheritedgenetic variation is not directly the cause of complexdisease but instead acts to mediate the risk of diseasedevelopment in response to environmental exposures.The clinical and genetic heterogeneity inherent in thesedisorders greatly complicates our ability to dissect theunderpinnings of their etiology and pathogenesis.

In the following pages we provide a brief overviewof the concepts involved with disease complexity and thefield of genomics. We then discuss the current strategiesand future challenges of applying genomics-based stud-

ies toward achieving a better understanding of complexdisease.

DISEASE COMPLEXITY

Systems Biology: Robustness, Modularity,and RedundancyWe humans are in essence complex biological machines,shaped over millennia by evolutionary forces and definedby our genome. All of the information necessary for lifeis encoded in our DNA. However, its utilization isdependent on the cellular context in which it resides,allowing the development, organization, and sustain-ment of the diverse set of cells that comprise us. Theability of the genome to generate and coordinate this

1Division of Gastroenterology and Hepatology, Center for BasicResearch in Digestive Diseases, Mayo Clinic College of Medicine,Rochester, Minnesota.

Address for correspondence and reprint requests: Konstantinos N.Lazaridis, M.D., Mayo Clinic College of Medicine, 200 First StreetSW, Rochester, MN 55905.

Genetics and Genomics of Complex Diseases in Hepatology; GuestEditor, Konstantinos N. Lazaridis, M.D.

Semin Liver Dis 2007;27:3–12. Copyright # 2007 by ThiemeMedical Publishers, Inc., 333 Seventh Avenue, New York, NY 10001,USA. Tel: +1(212) 584-4662.DOI 10.1055/s-2006-960167. ISSN 0272-8087.

3

Page 2: Applying genomics to complex diseases 2007 juran

tremendous level of diversity has evolved over the billionsof years that DNA-based life has existed.2,3 Robustness isthe driving force behind this process and is a fundamentalfeature of complex biological systems.4 Simply put, robustproperties provide phenotypic stability in the presence ofunpredictable environmental or genetic challenges. Com-plex systems become robust through modularity andredundancy, both of which act to mitigate the potentialfor system-wide damage.5 Modular structure is pervasivein life and often exists in a hierarchy (i.e., organs, tissues,cells, and organelles). In addition to physical structure,functional and regulatory mechanisms such as metabo-lism, cell cycle, and signal transduction are widely modu-larized, taking the form of numerous networks andpathways operating at the postgenome level.5,6 Intercon-nection of these networks and widespread gene duplica-tion derive the means for use of alternative genes orpathways generating redundancy,7 a compensatory proc-ess to achieve the desired phenotype when failure in

another gene or module occurs (Fig. 1). Thus, redun-dancy can and often does generate disconnect betweengenotype and phenotype, a process that is dependent onhigher order pathway and network interactions and thecontext in which the genome is utilized.8,9 These funda-mental features of complex biological systems signifi-cantly affect the genetic mechanisms underlying theetiology of complex human disease and pose limits onour current ability to study them.

Disease GeneticsGenetic predisposition is thought to play a role in mosthuman diseases.10 Currently, three classifications basedon genetic involvement with disease are recognized:chromosomal, Mendelian, and complex.11 Chromosomaldisorders are characterized by gross abnormalities inchromosome number or structure and often result inpreterm death related to developmental abnormalities.

Figure 1 Interconnected biological pathways generate redundancy. The interrelation of numerous pathways generates a redundantnetwork, buffering the effect of input variability and genetic polymorphism on phenotype. Shown is a simplified hypothetical signalingnetwork through which phenotypic effects are stimulated by a primary input that is detected by members of two distinct modularpathways (delineated by boxes). In this example, these pathways communicate through a primary node (dark gray square) that acts tomediate a large portion of the signal in the network and provides a feedback loop to diminish the effect of input variation on thephenotype (illustrated by dotted lines). However, some of the signal from each pathway, as well as a secondary input (dark gray circle), isable to bypass this node and directly stimulate an effect on phenotype. Genetic polymorphism in individual components of the network(i.e., genes) is unlikely to have a great effect on phenotype because of themanymeans throughwhich the input stimulus can be passed.However, a slight phenotypic effect could be demonstrated by these putative variants, potentially contributing to risk of disease.Furthermore, genetic variants of the primary node and secondary nodes (light gray circles) are more likely to display a detectable effect.

4 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

Page 3: Applying genomics to complex diseases 2007 juran

Mendelian diseases run in families and display classicpatterns of inheritance, such as autosomal dominant,autosomal recessive, or X-linked. In general, thesedisorders are rare, arise early in life, and can be attributedto mutation in a single gene that, when present, directlycauses the disease phenotype. Often these causativemutations are family specific.

The vast majority of human diseases are genet-ically complex, wherein the direct correspondence be-tween causative genotype and disease phenotypecharacteristic of Mendelian disorders is not present.12

Instead, complex diseases develop as the cumulativeresult of environmental exposures, exerting their effectover time, in genetically susceptible individuals. There-fore, the genotypic components of complex diseases arenot causative but rather mediate disease risk. Many suchsusceptibility genotypes are expected for each complexdisorder, some of which are common to similar diseases(e.g., autoimmune disorders) and some of which may bedisease specific. Regardless, the individual contributingvariants are likely to have only a slight contribution tothe overall risk of each specific disease in the affectedpopulation (Fig. 2).

Complex diseases are diverse in clinical presenta-tion, progression, and response to treatment. Because ofthis heterogeneity, it is useful to break down complexdisorders to a series of disease traits or phenotypes forconsideration in genomic studies. These traits can bequalitative, such as the presence of a comorbid disease, anassociated diagnostic marker, or a previously determinedrisk factor. In addition, these traits could be quantitativemeasures such as age of disease onset, results of serumliver tests, or gene expression profiles. Comprehensivecharacterization, assessment, and utilization of thesetraits will be essential in the dissection of the geneticand environmental contributors to complex disease.

Disease concordance in monozygotic (MZ) twinsis presently the best means for establishing the strengthof the inherited genetic determinates of complex dis-ease.13 As MZ twins have identical DNA sequences,disease concordance is suggestive of genetic influence,whereas discordance indicates a greater role for environ-mental or stochastic effects. Furthermore, the differencein disease concordance between MZ and dizygotic (DZ)twin pairs may provide additional insight into themechanisms at play in complex disease development.For example, large differences in disease concordancebetween MZ and DZ twin pairs could signify theinvolvement of numerous risk-modifying gene variantsand, conversely, slight differences in concordance mightimply a stronger shared-environment effect. However, itshould be noted that de novo genetic and/or epigeneticeffects prior to or after MZ twinning could have an effecton MZ disease concordance and MZ/DZ concordanceratios, obscuring the reality of the inherited geneticcontribution to disease.14 Familial aggregation alsoprovides a way to estimate the genetic impact oncomplex diseases,11 as family members share moregenetic material between themselves than with thegeneral population. Relative risk ratios (l) are oftenused to illustrate the risk of disease development in thefamilies of affected individuals. The l is calculated bydividing the prevalence of a complex disease amongfamily members (often specifically siblings, ls) by theprevalence of the disease in the population at large.11 Ingeneral, higher l values suggest a greater role of thegenetic component in disease.

GENOMICSFrom the earliest stages of life the human genome isselectively activated in response to both internal and

Figure 2 Comparison between Mendelian and complex diseases. In general, Mendelian diseases have lower frequency in thepopulation, display higher prevalence, and are caused by a single or a few genes, each of which has a high effect on the diseasephenotype. In contrast, complex diseases are usually more common, have reduced prevalence, and are mediated by several tonumerous genes, each of which has a small contribution to the phenotype.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 5

Page 4: Applying genomics to complex diseases 2007 juran

external cues to guide the development and continuedfunction of numerous cell types organized into higherorder structures such as our tissues and organ systems.Genomics is the field of study that seeks to understandand characterize the genome’s role in this process. Theachievement of sequencing the human genome is in-deed momentous and has provided new impetus to thestudy of its involvement in disease pathogenesis. How-ever, the intricacies of genomic function cannot beunderstood by solely focusing on our species. To thisextent, the genomes of a wide variety of organismsranging from bacteria to mammals have been eluci-dated, and the effort continues. Complete lists of thesegenome sequencing projects are freely available throughthe National Center for Biotechnology Informa-tion (NCBI) website (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genomeprj).

Sequence VariationThe sequence of the human genome is varied, asillustrated by the nearly 12 million polymorphismscurrently cataloged on the NCBI’s dbSNP website(http://www.ncbi.nlm.nih.gov/SNP/). Single nucleo-tide polymorphisms (SNPs) are the most prevalent ofthese variants, accounting for over 90% of humanpolymorphic loci.15 Distribution of SNPs across thegenome is rather uniform and primarily dependent onthe frequency of the minor allele (MAF). For instance,a SNP with a MAF of 1% is expected to be present inevery 300 bases of genomic DNA, whereas a SNP withMAF of 40% is expected in only every 3300 bases.16

Resequencing portions of the genome in 137 individ-uals has confirmed that SNPs are quite common,occurring approximately every 180 base pairs.17 Inaddition, a majority of these SNPs (64%) are rare,with MAF < 5%.17

The location of a SNP, to some extent, suggestsits potential for functional significance and impact ondisease risk.18 SNPs in gene coding sequence resultingin premature termination (i.e., nonsense SNPs) oramino acid substitution (i.e., nonsynonymous SNPs)alter protein sequence and possibly function and, there-fore, are more likely to demonstrate a phenotypic effect.Interestingly, resequencing of two genes in a largereference sample of 450 individuals identified a largenumber of rare nonsynonymous SNPs that were notdetected in a smaller screening set,19 suggesting thatmany SNPs with a high likelihood to affect phenotype(and by correlate disease) are present in the population,albeit at low prevalence. SNPs located in splice siterecognition, transcription factor binding, or enhancersequences also have the potential to alter phenotype byaffecting gene expression, splicing, or stability. How-ever, we are not yet able to predict these functionalconsequences a priori. SNPs located between genes

(i.e., intergenic SNPs) are thought to be far less likelyto affect function. Nevertheless, it has been suggestedthat a large portion of this intergenic sequence is underpositive selection and, thus, potentially functional.20 Tofacilitate a better understanding of genome operation,the National Human Genome Research Institute(NHGRI) has launched the ENCODE (ENCyclope-dia of DNA Elements) program, which seeks to per-form an exhaustive determination of all functionalelements in the human genome. Information regard-ing ENCODE can be found on the web at http://www.genome.gov/10005107.

Genomic DiversityDiversity of the genome is primarily driven by randommating and meiotic recombination, the phenomenonduring which regions between pairs of equivalent chro-mosomes are exchanged in the course of gametogenesis,generating discreet genetic differences between parentsand offspring.21 As we have transmitted the geneticmaterial through relatively few generations since ourancestral origins, contemporary chromosomes have re-gions of variation in common, or display linkage dis-equilibrium (LD) (Fig. 3). Thus, the relatively youngage of our species limits the diversity of the humangenome.

The pattern of LD across the genome is incon-sistent because of the presence of recombination hotspots, regions of the genome in which recombinationoccurs more readily.22,23 The result is that regions of lowLD flank regions of high LD, creating haplotypeblocks.23,24 Limited allelic diversity within these blockscan potentially simplify genomic studies aimed at deci-phering the mechanisms of complex disease.25,26 Tofacilitate such efforts, the International Human Haplo-type Map (HapMap) project (http://www.hapmap.org/)was initiated to determine the structure of humanhaplotype blocks and identify SNPs that are predictivefor the variation in a larger set (i.e., tag SNPs). To date,the HapMap project has assessed some 3 million SNPsin 269 individuals from four racial/ethnic groups. TheHapMap effort has confirmed that significant redun-dancy exists among common SNPs such that localgenomic variation can be reliably determined using asubset of tagging SNPs.22,27

The HorizonGenome sequencing projects, variant cataloging, andhaplotype block mapping efforts all provide a solidbasis for the study of genomics but together amount toonly the ‘‘tip of the iceberg’’ in the level of under-standing that will be required if genomic studies are toaffect globally the way in which we approach thediagnosis, prognosis, and treatment of complex disease.

6 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

Page 5: Applying genomics to complex diseases 2007 juran

Increasing our knowledge regarding sequence func-tionality, currently the aim of the ENCODE program,will take us one step further in this endeavor byproviding a basis for the prediction of variant-inducedconsequences outside the nonsense and nonsynony-mous polymorphisms for which this approach is cur-rently feasible. Elucidation of the rules governingcontextual genome utilization, including the genera-tion of cell type–specific transcriptomes and proteomes(i.e., all gene transcripts and functional proteins) aswell as the principles determining postgenome proteinnetwork, environmental, and physiological interac-tions, will be required before genomic effects onhigh-order structure and redundant processes will befully appreciated.

APPLYING GENOMICS TO THE STUDYOF COMPLEX DISEASEThe knowledge gained from our current efforts atsequencing and cataloging the variation of the humangenome provides a sound basis for exploring its role inmodulating traits of complex disease. Admittedly, we are

ill equipped to study the effects of inherited geneticvariation across numerous physically separated loci anddetermine how these combinations of alleles interactwith the environment and lead to the development ofcomplex disease.28 However, utilizing the genomic ap-proaches described in the following pages, we are begin-ning to identify simple associations between geneticvariants and complex disease traits.

Common single-variant and haplotypic associ-ations are expected to be weak (e.g., odds ratio [OR]< 2.0) and, thus, poor predictors of disease develop-ment,29 but they may account for a large portion ofthe disease risk experienced by the affected popula-tion. Conversely, rare genetic variants significantlyincreasing susceptibility to disease development (e.g.,OR> 5.0) may be widespread but individually arelikely to account for disease predisposition in only asmall subset of the affected population. Moreover, therole of emergent genetic phenomena such as somaticmutation and epigenetic modification in complexdisease development is becoming more appreciated.At present, it is unclear which mechanisms will proveto have the greater overall impact on disease in

Figure 3 Linkage disequilibrium (LD) limits the diversity of the human genome. LD is the nonrandom association of alleles at two ormore loci. In humans, this is largely driven by the limited number of generations, and thus recombination events, through which thegenome has passed since our common ancestors, generating allele combinations that are widely shared among contemporary humans(i.e., common haplotypes). Patterns of LD are influenced by population structure and dynamics, such as founder effects and populationadmixture, and as a result display differences between racial and ethnic groups.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 7

Page 6: Applying genomics to complex diseases 2007 juran

general, although it is becoming apparent that all arelikely to play a role in the spectrum of most complexdisorders.

Investigating Inherited Disease Susceptibility

LINKAGE

Genetic linkage analysis seeks to identify the cosegre-gation of polymorphic genetic markers in affectedfamily members to map disease-related genomic lociby exploiting the nature of meiotic recombination.That is, genes near each other will be inheritedtogether more often than those located farther apart.Accordingly, the genetic markers segregating with thedisease are assumed to be located near (linked to) thecausative gene. The loci identified by traditional ge-nome-wide linkage approaches have often been large(5 to 10 Mb) because of historically applied low markerdensities and to some extent the limited number ofobservable meioses in family groups,30 generally re-quiring an extensive fine-mapping effort to pinpointthe offending alleles. However, significantly betterresolution could be achieved in many studies by utiliz-ing higher densities of genetic markers (e.g., SNPs), aproposition that is becoming more affordable as thecost of genotyping continues to drop. Both model-based (i.e., parametric) and model-free (i.e., nonpara-metric) linkage approaches can be employed in thesearch for disease-related alleles.31 Parametric linkageanalysis calls for the researcher to specify a model ofinheritance and estimate the frequency and penetranceof the disease genes, and it is a powerful approachwhen applied to extended pedigrees with many affectedindividuals. Nonparametric linkage analysis is lesspowerful but does not require the specification of agenetic model and instead looks simply for excessivesharing of alleles identical by descent among affectedfamily members.

The tenets underlying linkage analysis makethis a powerful approach for the identification ofgenes involved with Mendelian diseases but limit itsapplication to complex disorders.31 Clinical, locus,and allelic heterogeneities, all of which are commonfeatures of complex diseases, effectively dilute thelinkage signal, significantly reducing the chance ofidentifying plausible candidate regions. Parametricapproaches are impractical, as models of inheritanceand estimations of the frequency and penetrance ofthe disease genes are not readily assumable for com-plex disorders. Moreover, the late age of onset ofmost complex diseases often precludes the assessmentof multiple generations, significantly reducing thepower of both parametric and nonparametric ap-proaches. However, the use of linkage analysis toidentify genes involved in the development of com-

plex disease could prove useful, especially when ap-plied to families in whom the genetic component islikely to be enriched, such as those with an unusuallyhigh rate of disease occurrence or exceptionally earlyage of disease onset. Although these families mightnot be representative of the disease in the majority ofthe population, findings could implicate major net-works or pathways involved in the disease and wouldcertainly provide a basis for further investigations.

ASSOCIATION

In general, association studies look for a statisticaldifference in the frequency of alleles between affectedand unaffected individuals. Often this involves pop-ulation-based comparisons between cases and unre-lated controls; however, family-based tests ofassociation can also be quite useful.32 Associationstudies take advantage of the LD generated by meioticrecombination throughout our ancestral history toidentify genomic loci contributing to disease pheno-type, in contrast to linkage approaches, which observeonly recent meioses in family groups. The formerapproach has traditionally been applied to the studyof genetic variants in candidate genes preselected fortheir potential involvement in specific disease proc-esses such as genes or loci identified by prior whole-genome linkage scans or known to be involvedbiochemically with disease. Lately, genome-wide asso-ciation studies utilizing hundreds of thousands tomillions of SNPs spread across the genome havebecome reality.33,34

Overall, association studies are capable of identi-fying substantial genetic effects (i.e., OR> 2.0) on dis-ease phenotype with relatively small sample sizes (n!200)35 and have high power to detect small effects ofgenetic variation (i.e., OR < 2.0) but require the samplesizes to be quite large (n! 1000). SNPs are the genotyp-ing markers most often employed in association studiesas they are quite abundant and easy to type. Dependingon the individual marker being tested, detected associ-ations could directly affect the phenotype (i.e., suscept-ibility variant) or may be in LD with the true phenotypiceffector.36 In general, follow-up functional studies aimedat explaining the disease mechanism underlying anydetected associations would be beneficial. However,such studies are often not performed because of lack ofa suitable model system and/or adequate specimens fromwhich to derive RNA or protein. Moreover, when theyare performed, the results can be unsatisfying anddifficult to interpret, as the contextual milieu in whichfunction is affected and contributes to disease is likely tobe altered or lost.

Population-based association studies are suscep-tible to false positive findings related to populationstratification, a phenomenon stemming from unequalgenetic backgrounds (i.e., allele frequencies) between the

8 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

Page 7: Applying genomics to complex diseases 2007 juran

case and control populations. In fact, loci completelyunlinked to the tested disease will exhibit associationwhen the allele frequencies in the populations are con-siderably different.37 Moreover, less extreme manifesta-tions of selection bias can work to cloud the results ofassociation studies. For example, control populationsderived from blood-bank donors may not be representa-tive of the case population as blood donors tend to berigorously screened and possibly in better general healththan the cases, especially when collected for elderlypopulations.38 Thus, positive associations may reflectgenetic variations involved with overall health and notspecifically the disease of interest. Another source offalse positive findings in association studies is multipletesting. For instance, when P values of .05 are consideredsignificant, 1 out of 20 positive associations are likely tobe false. When extrapolated to thousands of tests,numerous false associations are likely to be detected.Several methods to correct for multiple testing have beenemployed, but many of these are quite draconian andprobably drive the exclusion of true positive findings.39

Replication in an independent data set is the bestmethod to verify the research findings, and great careshould be taken to ensure that the follow-up study isadequately powered and the controls are well matched.40

On the other hand, family-based association tests are notprone to stratification biases and, thus, offer an alter-native to study and confirm the observations of case-control studies.41

Genotyping and statistical methods keep im-proving, and LD-based association studies will con-tinue to demonstrate usefulness in the future. Thecollection of large sets of cases, their family members,and well-matched controls remains the primary chal-lenge to the investigator wishing to study complexdisorders. To this extent, the case for developing a largeUnited States prospective cohort of !200,000 individ-uals to study the role of genes and environment indisease development has been raised42 and is currentlydebated between scientists, policy makers, and thefederal government.

Gene ResequencingIn general, association-based approaches to the identi-fication of alleles involved with complex disease are notwell suited to identify rare, recently arising variants thatmay be involved with disease risk or protection, as thesevariants are not likely to be directly assessed or indirectlydetected by haplotypic association. An approach toidentifying such rare variants is the complete resequenc-ing of a particularly interesting candidate gene/gene-region in affected patients, most likely focused on theprotein coding exons and splice junctions, althoughpotentially across whole genes or multigene regionswhen relatively small. This approach has been used to

confirm the role of leptin (the cause of extreme obesity inthe obese ob/ob mouse model) genetic polymorphism inhuman obesity43 and also used in the identification of achemokine receptor (CCR5-delta 32) mutation that isprotective against human immunodeficiency virus (HIV)infection and progression.44 As genotyping costs con-tinue to decline, gene resequencing will become a moreattractive approach to the identification of the geneticdeterminants of complex diseases and their phenotypesas it allows the simultaneous assessment of both com-mon and rare variants. Toward this end, the NHGRI isaggressively funding the advancement of revolutionarygenome sequencing technologies, with the goal of re-sequencing the entire human genome for $1000 (somefour orders of magnitude less than currently feasible).When this lofty goal becomes reality, the way in whichwe view and approach genome science will be foreverchanged.

Investigating Emergent Disease Susceptibility:Epigenetics, Somatic Mutation, and AgingAlthough inherited genetic variation is likely to play asignificant role in determining susceptibility to complexdiseases, it is becoming apparent that epigenetic changesand mutation in somatic cells may contribute a greaterrole in the etiology of these disorders than previouslythought.14,45 This has become widely apparent in cancer,where some level of inherited risk is seemingly present,but the mechanisms leading to malignant transformationprimarily involve misregulated epigenetics and accumu-lation of somatic mutation.46 The extent of these epi-genetic and mutagenic phenomena is possibly somewhatdetermined by inherited variation, invoking a viciouscycle, but they are thought to be largely driven bystochastic phenomena and aging.

Epigenetics involves stable changes in gene ex-pression that do not entail alteration of DNA sequenceand are decoupled from labile, reactionary transcrip-tional control processes.14 This form of transcriptionalregulation is known to involve the methylation ofcytosine bases at cytosine-guanine dinucleotides butmay involve some of the classes of noncoding RNAs aswell.14 Epigenetic events play a major role in mamma-lian development and in the maintenance of tissue-specific cellular function, and aberrant methylation isbecoming increasingly evident in the expression ofdisease phenotypes.14,46 However, the methylationprofile of the human genome remains largely un-known. To this extent, the Human Epigenome Projecthas been established, which aims to analyze DNAmethylation patterns in the regulatory regions of allknown human genes in most of the major cell types ina healthy state and their diseased counterparts.47 In-formation regarding the current status of this effortcan be found at http://www.epigenome.org.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 9

Page 8: Applying genomics to complex diseases 2007 juran

Somatic mutations ranging from individual pointmutations to large rearrangements, duplications, or de-letions are increasingly noted as playing a role in thedevelopment of complex diseases, primarily in cancersfor which diseased tissue (i.e., tumor) is often observable.DNA rearrangements or deletion resulting in the loss ofheterozygosity (LOH) at one allele can unmask thedeleterious effect of a recessive-acting mutation in atumor suppressor gene and lead to the development ofmalignancy.48 LOH is identified when heterozygosityfor a genetic marker is noted in the germline DNA butnot in the DNA of the tumor. DNA rearrangements canalso generate fusion genes, sometimes resulting in again-of-function abnormal hybrid protein. Previouslyidentified fusion genes have often been classified asoncogenes because of their involvement with malignancyand poor outcome; one example is BCR-ABL, which isstrongly associated with the development of chronicmyeloid leukemia.49 Moreover, segmental gene duplica-tion, occurring by a process involving low-copy repeatsequences, can affect gene copy number and, therefore,dosage of gene product, which can ultimately have aneffect on phenotype. Such copy number changes havebeen associated with susceptibility to HIV infection.50

At present, the extent to which somatic mutation mayplay a role in complex disease phenotype is unclear.Moreover, this putative effect is hard to assess.

Finally, some portion of most common complexdisease cases may be attributed solely to stochasticprocesses related to aging, primarily the result of accu-mulated somatic mutation and cellular damage due tolong-term exposure to harmful endogenous and environ-mental agents.51,52

CHALLENGESPresently, we stand in the midst of scientific promise andchallenges of applying genomics to the study of complexdisease. Ideally, these obstacles can turn into opportu-nities leading to discovery of better methods for diseaseprognosis and novel therapies. Because of the HumanGenome Project and subsequent initiatives (e.g., Hap-Map, ENCODE), we are now able to read the instruc-tion of our genome and we possess the first tools to beginelucidating its functionality. However, we still lackmethods to systematically examine the interactionsamong the numerous genes of the genome as well as tocomprehend how variation can leave some of us morevulnerable to developing disease than others.

The barriers we have to overcome in genomicresearch are considerable. First, it is important to realizethe intricate nature of human biological systems. Clearerunderstanding of the redundancy of networked arrange-ments is needed in the context of the genome itself and thehigh-order processes it encodes. Second, we lack the largepatient, family member, and matched control specimen

banks needed to gather comprehensive data sets ongenomic variation and environmental exposures. Third,the definition and classification of several complex dis-eases and associated traits, whether clinical, biochemical,or other, have to be expanded and improved to minimizedisease heterogeneity.

Pursuing clinical genomic research requires di-verse expertise of investigators (i.e., clinicians, labora-tory-based researchers, genetic epidemiologists,statistical geneticists, bioinformatics specialists). Thisneed comes in antithesis to the conventional model ofresearch programs where an individual principal inves-tigator is responsible for the direction of the study. Thecost of current genomic-based technology methods isfalling but still quite high, limiting the application ofthese exciting approaches to large, well-funded researchprograms. Improvements of high-throughput solutionsare necessary to reduce the price tag of these technologiesand make them affordable to more researchers andapplicable across the spectrum of complex diseases.Finally, but most important, the protection of humanparticipants, whether patients, unaffected family mem-bers, or unrelated healthy controls, has to be ensured.These individuals are the key component of genomicresearch and their legal rights need to be protected if wewish to continue on with genomic science and toeventually apply genomic-based medicine for the goodof humankind.

CONCLUSIONSGenomics offers the potential to change the way wediagnose and treat complex disease. The foremost goalsof medical research are to advance the prognostication ofdisease and to develop safe, effective novel drugs. Aclearer understanding of the roles the genome as a wholeand environmental interaction play in complex diseasepathogenesis will provide the keystone for future medicalbreakthroughs.

ACKNOWLEDGMENTS

Supported by NIH grant DK68290, the Palumbo Foun-dation, and the Morgan Foundation. The authors thankStacy Roberson for secretarial assistance.

ABBREVIATIONSDZ dizygoticHIV human immunodeficiency virusLD linkage disequilibriumLOH loss of heterozygosityMAF minor allele frequencyMZ monozygoticNHGRI National Human Genome Research

Institute

10 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

Page 9: Applying genomics to complex diseases 2007 juran

OR odds ratioSNP single nucleotide polymorphism

REFERENCES

1. Chakravarti A, Little P. Nature, nurture and human disease.Nature 2003;421:412–414

2. Koonin EV. Orthologs, paralogs, and evolutionary genomics.Annu Rev Genet 2005;39:309–338

3. Makalowski W. The human genome structure and organ-ization. Acta Biochim Pol 2001;48:587–598

4. Kitano H. Biological robustness. Nat Rev Genet 2004;5:826–837

5. Dover G. How genomic and developmental dynamics affectevolutionary processes. Bioessays 2000;22:1153–1159

6. Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA. Diseasegenes and intracellular protein networks. Physiol Genomics2003;15:223–227

7. Gu Z, Steinmetz LM, Gu X, et al. Role of duplicate genes ingenetic robustness against null mutations. Nature 2003;421:63–66

8. Flatt T. The evolutionary genetics of canalization. Q RevBiol 2005;80:287–316

9. Nowak MA, Boerlijst MC, Cooke J, Smith JM. Evolution ofgenetic redundancy. Nature 1997;388:167–171

10. Collins FS. The human genome project and the future ofmedicine. Ann NY Acad Sci 1999;882:42–55; discussion 56–65

11. Lazaridis KN, Juran BD. American GastroenterologicalAssociation future trends committee report: the applicationof genomic and proteomic technologies to digestive diseasediagnosis and treatment and their likely impact on gastro-enterology clinical practice. Gastroenterology 2005;129:1720–1752

12. Strohman R. Maneuvering in the complex path fromgenotype to phenotype. Science 2002;296:701–703

13. MacGregor AJ, Snieder H, Schork NJ, Spector TD. Twins:novel uses to study complex traits and genetic diseases.Trends Genet 2000;16:131–134

14. Jiang YH, Bressler J, Beaudet AL. Epigenetics and humandisease. Annu Rev Genomics Hum Genet 2004;5:479–510

15. Lander ES, Linton LM, Birren B, et al. Initial sequencing andanalysis of the human genome. Nature 2001;409:860–921

16. Kruglyak L, Nickerson DA. Variation is the spice of life.Nat Genet 2001;27:234–236

17. Crawford DC, Akey DT, Nickerson DA. The patterns ofnatural variation in human genes. Annu Rev Genomics HumGenet 2005;6:287–312

18. Tabor HK, Risch NJ, Myers RM. Opinion: candidate-geneapproaches for studying complex genetic traits—practicalconsiderations. Nat Rev Genet 2002;3:391–397

19. Glatt CE, DeYoung JA, Delgado S, et al. Screening a largereference sample to identify very low frequency sequencevariants: comparisons between two genes. Nat Genet 2001;27:435–438

20. Waterston RH, Lindblad-Toh K, Birney E, et al. Initialsequencing and comparative analysis of the mouse genome.Nature 2002;420:520–562

21. Moens PB. The double-stranded DNA helix in recombina-tion at meiosis. Genome 2003;46:936–937

22. International HapMap Consortium. A haplotype map of thehuman genome. Nature 2005;437:1299–1320

23. Stumpf MP. Haplotype diversity and the block structure oflinkage disequilibrium. Trends Genet 2002;18:226–228

24. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure ofhaplotype blocks in the human genome. Science 2002;296:2225–2229

25. Cardon LR, Abecasis GR. Using haplotype blocks to maphuman complex trait loci. Trends Genet 2003;19:135–140

26. Johnson GC, Esposito L, Barratt BJ, et al. Haplotype taggingfor the identification of common disease genes. Nat Genet2001;29:233–237

27. Zeggini E, Rayner W, Morris AP, et al. An evaluation ofHapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat Genet 2005;37:1320–1322

28. Carlson CS, Eberle MA, Kruglyak L, Nickerson DA.Mapping complex disease loci in whole-genome associationstudies. Nature 2004;429:446–452

29. Holtzman NA. Putting the search for genes in perspective.Int J Health Serv 2001;31:445–461

30. Botstein D, Risch N. Discovering genotypes underlyinghuman phenotypes: past successes for mendelian disease,future approaches for complex disease. Nat Genet 2003;33(suppl):228–237

31. Mayeux R. Mapping the new frontier: complex geneticdisorders. J Clin Invest 2005;115:1404–1407

32. Majumder PP, Ghosh S. Mapping quantitative trait loci inhumans: achievements and limitations. J Clin Invest 2005;115:1419–1424

33. Maraganore DM, de Andrade M, Lesnick TG, et al. High-resolution whole-genome association study of Parkinsondisease. Am J Hum Genet 2005;77:685–693

34. Puppala S, Dodd GD, Fowler S, et al. A genomewide searchfinds major susceptibility loci for gallbladder disease onchromosome 1 in Mexican Americans. Am J Hum Genet2006;78:377–392

35. Whitcomb DC, Aoun E, Vodovotz Y, et al. Evaluatingdisorders with a complex genetics basis: the future roles ofmeta-analysis and systems biology. Dig Dis Sci 2005;50:2195–2202

36. Palmer LJ, Cardon LR. Shaking the tree: mapping complexdisease genes with linkage disequilibrium. Lancet 2005;366:1223–1234

37. Gordon D, Finch SJ. Factors affecting statistical power in thedetection of genetic association. J Clin Invest 2005;115:1408–1418

38. Vineis P, McMichael AJ. Bias and confounding in molecularepidemiological studies: special considerations. Carcinogen-esis 1998;19:2063–2067

39. Shephard N, John S, Cardon L, et al. Will the real disease geneplease stand up? BMC Genet 2005;6(suppl 1):S66

40. Colhoun HM, McKeigue PM, Davey Smith G. Problemsof reporting genetic associations with complex outcomes.Lancet 2003;361:865–872

41. Cardon LR, Bell JI. Association study designs for complexdiseases. Nat Rev Genet 2001;2:91–99

42. Collins FS. The case for a US prospective cohort study ofgenes and environment. Nature 2004;429:475–477

43. Montague CT, Farooqi IS, Whitehead JP, et al. Congenitalleptin deficiency is associated with severe early-onset obesityin humans. Nature 1997;387:903–908

44. Dean M, Carrington M, Winkler C, et al. Genetic restrictionof HIV-1 infection and progression to AIDS by a deletion

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 11

Page 10: Applying genomics to complex diseases 2007 juran

allele of the CKR5 structural gene. Hemophilia Growth andDevelopment Study, Multicenter AIDS Cohort Study,Multicenter Hemophilia Cohort Study, San Francisco CityCohort, ALIVE Study. Science 1996;273:1856–1862

45. Dean M. Approaches to identify genes for complex humandiseases: lessons from Mendelian disorders. Hum Mutat2003;22:261–274

46. Laird PW. The power and the promise of DNA methylationmarkers. Nat Rev Cancer 2003;3:253–266

47. Rakyan VK, Hildmann T, Novik KL, et al. DNAmethylation profiling of the human major histocompatibilitycomplex: a pilot study for the human epigenome project.PLoS Biol 2004;2:e405

48. Knudson AG. Antioncogenes and human cancer. Proc NatlAcad Sci USA 1993;90:10914–10921

49. Randolph TR. Chronic myelocytic leukemia. Part II:approaches to and molecular monitoring of therapy. ClinLab Sci 2005;18:49–56

50. Gonzalez E, Kulkarni H, Bolivar H, et al. The influenceof CCL3L1 gene-containing segmental duplications onHIV-1/AIDS susceptibility. Science 2005;307:1434–1440

51. Kirkwood TB. Understanding the odd science of aging. Cell2005;120:437–447

52. Suh Y, Vijg J. Maintaining genetic integrity in aging:a zero sum game. Antioxid Redox Signal 2006;8:559–571

12 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007