Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM)

32
Ontology of Genetic Ontology of Genetic Susceptibility Factors Susceptibility Factors to Diabetes Mellitus to Diabetes Mellitus (OGSF-DM) (OGSF-DM) Yu Lin, Norihiro Sakamoto Yu Lin, Norihiro Sakamoto Department of Sociomedical Informatics, Department of Sociomedical Informatics, Graduate School of Medicine, Kobe University Graduate School of Medicine, Kobe University

description

Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM). Yu Lin, Norihiro Sakamoto Department of Sociomedical Informatics, Graduate School of Medicine, Kobe University. Agenda. Wh at are Genetic Susceptibility Factors (GSF) ? How do we confirm genetic susceptibility ? - PowerPoint PPT Presentation

Transcript of Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM)

Ontology of Genetic Ontology of Genetic Susceptibility Factors to Susceptibility Factors to

Diabetes Mellitus Diabetes Mellitus (OGSF-DM)(OGSF-DM)

Yu Lin, Norihiro Sakamoto Yu Lin, Norihiro Sakamoto

Department of Sociomedical Informatics, Department of Sociomedical Informatics,

Graduate School of Medicine, Kobe UniversityGraduate School of Medicine, Kobe University

2008/022008/02 InterOntology08InterOntology08 22

WhWh at are Genetic Susceptibility Factors at are Genetic Susceptibility Factors (GSF) ?(GSF) ?

How do we confirm genetic susceptibility ?How do we confirm genetic susceptibility ?Why do we need an ontology ?Why do we need an ontology ?The Ontology of Genetic Susceptibility The Ontology of Genetic Susceptibility

Factors to Diabetes MellitusFactors to Diabetes Mellitus (( OGSF-OGSF-DM)DM)MethodologyMethodologyTestingTesting

DiscussionDiscussion

AgendaAgenda

2008/022008/02 InterOntology08InterOntology08 33

Search “Genetic Susceptibility” in UMLSSearch “Genetic Susceptibility” in UMLS

2008/022008/02 InterOntology08InterOntology08 44

Scope of “GSF to Diabetes Mellitus”Scope of “GSF to Diabetes Mellitus”

Those Those genetic characteristic and interactiongenetic characteristic and interaction between genetic and environmental factors which between genetic and environmental factors which increase the probabilityincrease the probability to develop diabetes to develop diabetes mellitus (DM).mellitus (DM).polymorphismpolymorphism linked locilinked lociSNPSNPhaplotypehaplotypegenotypegenotype

If “decrease”, If “decrease”, then then

“resistence”“resistence”

2008/022008/02 InterOntology08InterOntology08 55

Mendelian Mendelian Diease Diease

VSVS

Complex Complex DiseaseDisease

Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease. Nature. 2005 Jun 2;435(7042):584-9. Review.

2008/022008/02 InterOntology08InterOntology08 66

How to confirm the GSFHow to confirm the GSF

Through combined Through combined family-based linkage studyfamily-based linkage study and and population-based association studypopulation-based association study

Through a combined Through a combined geneticgenetic (gene-by-gene (gene-by-gene function-candidate) association approach with a function-candidate) association approach with a genome-widegenome-wide association approach association approach

Through combined Through combined statisticalstatistical study with study with biologicalbiological function study function study

2008/022008/02 InterOntology08InterOntology08 77

Factors Affecting Statistical PowerFactors Affecting Statistical Power of Confirming GSFof Confirming GSF

Number of disease variantsNumber of disease variants Allele frequencies among populationAllele frequencies among population Effect size on disease phenotype Effect size on disease phenotype

Odds Ratio (OR)Odds Ratio (OR) Population structure and geographyPopulation structure and geography Selection biasSelection bias Genotype and phenotype misclassification errorsGenotype and phenotype misclassification errors

Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.

2008/022008/02 InterOntology08InterOntology08 88

No Criteria EstablishedNo Criteria Established

There are no established criteria for confirming There are no established criteria for confirming GSF (Genetic Susceptibility Factors)GSF (Genetic Susceptibility Factors) OR1.5-2.0 ? OR1.5-2.0 ? sample size sample size population population

Can we settle down this?Can we settle down this?

2008/022008/02 InterOntology08InterOntology08 99

A Knowledge Base is NeededA Knowledge Base is Needed

The primary idea is to catalog all GSF to Diabetes The primary idea is to catalog all GSF to Diabetes Mellitus (DM)Mellitus (DM)

The reality of researches on GSF to DMThe reality of researches on GSF to DM Different levels of genetic objectDifferent levels of genetic object Different types of study design Different types of study design Inconsistent resultInconsistent result Complex phenotypes of DMComplex phenotypes of DM

Versatile datasets demand a knowledge base on this Versatile datasets demand a knowledge base on this topictopic

2008/022008/02 InterOntology08InterOntology08 1010

Ontology in GeneralOntology in General

Originally from philosophyOriginally from philosophy An ontology is “specification of a shared conceptualization” [Gruber An ontology is “specification of a shared conceptualization” [Gruber

T.]T.] Ontology as an approach to “annotation of multiple bodies of Ontology as an approach to “annotation of multiple bodies of

data”[Smith B. et al]data”[Smith B. et al] Widely used in computer science and information scienceWidely used in computer science and information science

artificial intelligenceartificial intelligence the Semantic Webthe Semantic Web software engineeringsoftware engineering biomedical informatics biomedical informatics “Gene Ontology as a successful “Gene Ontology as a successful

example”example” library sciencelibrary science information architecture as a form of knowledge representationinformation architecture as a form of knowledge representationRef: http://en.wikipedia.org/wiki/Ontology_%28computer_science%29

2008/022008/02 InterOntology08InterOntology08 1111

Ontology is a Good ToolOntology is a Good Tool

In our case, ontology can help with:In our case, ontology can help with: Knowledge representationKnowledge representation Database designDatabase design Content-oriented analysisContent-oriented analysis Information retrieval and extractionInformation retrieval and extraction Information integrationInformation integration

By setting rules, can we establish a criteria to By setting rules, can we establish a criteria to demonstrate either the genetic susceptibility or demonstrate either the genetic susceptibility or causality to complex disease?causality to complex disease?

2008/022008/02 InterOntology08InterOntology08 1212

WhWh at are the Genetic Susceptibility at are the Genetic Susceptibility Factors (GSF)Factors (GSF)

How do we confirm genetic susceptibility How do we confirm genetic susceptibility Why do we need an ontologyWhy do we need an ontologyThe Ontology of Genetic Susceptibility The Ontology of Genetic Susceptibility

Factors to Diabetes MellitusFactors to Diabetes Mellitus (( OGSF-OGSF-DM)DM)MethodologyMethodologyTestingTesting

DiscussionDiscussion

AgendaAgenda

2008/022008/02 InterOntology08InterOntology08 1313

The Methodology of OGSF-DMThe Methodology of OGSF-DM

conceptualizationconceptualization

specificationspecification

integrationintegration

Implementation, Implementation, evaluationevaluation

Specify the domain and scopeSpecify the domain and scope

Build the conceptual modelBuild the conceptual model

Reuse and import other ontologiesReuse and import other ontologies

Protégé 3.3.1, OWL , SWRL rules Protégé 3.3.1, OWL , SWRL rules

2008/022008/02 InterOntology08InterOntology08 1414

Step1. SpecificationStep1. Specification

Domain: Represent the knowledge of GSF to DM and Domain: Represent the knowledge of GSF to DM and related phenotypesrelated phenotypes

Explore relevant literature resources:Explore relevant literature resources: PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007) Books: Books:

Joslin’s Joslin’s Diabetes MellitusDiabetes Mellitus Human Molecular GeneticsHuman Molecular Genetics 3 3

The most fundamental terms:The most fundamental terms: i) Human disease: diabetes mellitus and related disorders; i) Human disease: diabetes mellitus and related disorders; ii) Phenotypes and observed quantity parameters; ii) Phenotypes and observed quantity parameters; iii) Genetic concepts;iii) Genetic concepts; iv) Geographical regions; iv) Geographical regions; v) Disease gene study of the original paper.v) Disease gene study of the original paper.

2008/022008/02 InterOntology08InterOntology08 1515

Step2. ConceptualizationStep2. Conceptualization

The core conception generated by analyzing the titles The core conception generated by analyzing the titles of the corpus of the corpus

The conception shows an The conception shows an N-ary relationship N-ary relationship

2008/022008/02 InterOntology08InterOntology08 1616

The top-level of OGSF-DMThe top-level of OGSF-DM

Adopted terms from BFO (Basic Formal Ontology ): Adopted terms from BFO (Basic Formal Ontology ): ContinuantContinuant,,Occurrent, Independent_ContinuantOccurrent, Independent_Continuant, , Dependent_ContiuantDependent_Contiuant , , QualityQuality

2008/022008/02 InterOntology08InterOntology08 1717

The position of core conceptsThe position of core concepts

2008/022008/02 InterOntology08InterOntology08 1818

CLASS: CLASS: Observed_RelationshipObserved_Relationship

• Class hierarchyClass hierarchy • Constraints of classConstraints of class

2008/022008/02 InterOntology08InterOntology08 1919

The termThe term ‘Allele’ ‘Allele’ is polysemous is polysemous

Genetics definition: an allele is either one of a pair (or Genetics definition: an allele is either one of a pair (or series) of alternative forms of a gene that can occupy series) of alternative forms of a gene that can occupy the same locus on a particular chromosome, and that the same locus on a particular chromosome, and that control the same character of the phenotype. control the same character of the phenotype. (http://www.thefreedictionary.com/allele)(http://www.thefreedictionary.com/allele)

“ “Allele” appeared in different resources:Allele” appeared in different resources:

Meaning of AlleleMeaning of Allele Appeared FormAppeared Form ResourceResource

the variant of gene in the variant of gene in an individualan individual

disease “allele”disease “allele” original paperoriginal paper

representation of SNPrepresentation of SNP ““allele/allele” in DNA, RNA and allele/allele” in DNA, RNA and amino acid levelamino acid level

HGVBaseHGVBase

allele sharing in sibsallele sharing in sibs IBS,IBD “allele” IBS,IBD “allele” linkage studylinkage study

2008/022008/02 InterOntology08InterOntology08 2020

Allele Allele CLASS in OGSF-DMCLASS in OGSF-DM

An abstractionAn abstraction Currently, it satisfied the data modelCurrently, it satisfied the data model Need to be refined in the futureNeed to be refined in the future

2008/022008/02 InterOntology08InterOntology08 2121

GeneGene concept has evolved concept has evolved

1860s-1860s-1900s1900s

1910s1910s 1940s1940s 1950s1950s 1960s1960s 1970s-1970s-1980s1980s

1990s-1990s-2000s2000s

Gene asGene asa discrete a discrete

unit of unit of heredityheredity

Gene asGene asa distinct a distinct locuslocus   

Gene asGene asa blueprint a blueprint

for a for a proteinprotein

Gene asGene asa physical a physical moleculemolecule

Gene asGene astranscribetranscribe

d coded code

Gene asGene asORF ORF

sequence sequence patternpattern

Gene asGene asannotated annotated genomic genomic

entityentity

2007-2007-

Gene asGene as……

Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.

2008/022008/02 InterOntology08InterOntology08 2222

Some definitions of ‘gene’Some definitions of ‘gene’

Human Genome Nomenclature Organization:“a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology”(Wain et al. 2002)

Rat Genome Database : : “the DNA sequence necessary and sufficient to express the complete complement of functional products derived from a unit of transcription ”(2003)

Sequence Ontology Consortium: “locatable region of genomic sequence,corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions” (Pearson 2006).

ENCODE project Consortium: “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.”(Gerstein et al.2008)

MeSH : genes are “Specific sequences of nucleotides along a molecule of DNA (or, in the case of some viruses, RNA) which represent functional units of HEREDITY. Most eukaryotic genes contain a set of coding regions (EXONS) that are spliced together in the transcript, after removal of intervening sequence (INTRONS) and are therefore labeled split genes. ”

2008/022008/02 InterOntology08InterOntology08 2323

GeneGene CLASS in OGSF-DMCLASS in OGSF-DM

A place holderA place holder The instance of The instance of GeneGene is the name of the gene which is the name of the gene which

appears in the research paperappears in the research paper

2008/022008/02 InterOntology08InterOntology08 2424

Step3. IntegrationStep3. Integration

Importing two ontologies:Importing two ontologies:ontology of glucose metabolism disorders ontology of glucose metabolism disorders

A slim OBO files was extracted from Human A slim OBO files was extracted from Human Disease ontology Disease ontology

OBO file was transfered to OWL file OBO file was transfered to OWL file The class hierarchy was restructure new terms The class hierarchy was restructure new terms

from “Joslin’s Diabetes Mellitus” addedfrom “Joslin’s Diabetes Mellitus” added

ontology of geographical regions ontology of geographical regions Generated by hand adopting the terms from Generated by hand adopting the terms from

MeSH2008 “Geographic Locations[z01]”MeSH2008 “Geographic Locations[z01]”

2008/022008/02 InterOntology08InterOntology08 2525

Step4. Implementation and EvaluationStep4. Implementation and Evaluation

ProtProtééggé_é_3.3.1 + OWL3.3.1 + OWL

SWRL rule example:SWRL rule example: hasPopulation-1 RulehasPopulation-1 Rule

isObservedIn (?x, ?y) hasStudyPopulation(?y, ?z) ∧isObservedIn (?x, ?y) hasStudyPopulation(?y, ?z) ∧

→ → hasPopulation(?x, ?z)hasPopulation(?x, ?z)

to infer the population(z) of the to infer the population(z) of the Obeserved_RelationshipObeserved_Relationship(x) ; y is a (x) ; y is a Disease_Gene_Study.Disease_Gene_Study.

2008/022008/02 InterOntology08InterOntology08 2626

The example articleThe example article

Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134

2008/022008/02 InterOntology08InterOntology08 2727

Asserting individual 1)Asserting individual 1)

1) associated_with_1 ⊆1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_AssociationNot_Stated_Resistance_or_Susceptibility_Association

⋂∀ ⋂∀ hasSupportingEvidence ( {odds_ratio_OR_1.49 } )∋hasSupportingEvidence ( {odds_ratio_OR_1.49 } )∋

⋂∃ ⋂∃ isObservedIn ( {Disease_Genetic_Study_15047632})∋isObservedIn ( {Disease_Genetic_Study_15047632})∋

⋂∃ ⋂∃ isObservedRelationshipOf ( {a_3_intronic_SNP_rs3818247})∋isObservedRelationshipOf ( {a_3_intronic_SNP_rs3818247})∋

⋂∃ ⋂∃ isRelationshipWith ( {Type_2_Diabetes_})∋isRelationshipWith ( {Type_2_Diabetes_})∋

means that a 3’ intronic SNP rs3818247 is means that a 3’ intronic SNP rs3818247 is associated with Type 2 Diabetes with a associated with Type 2 Diabetes with a supporting evidence of OR 1.49. The supporting evidence of OR 1.49. The relationship is an associated relationship, but is relationship is an associated relationship, but is stated to be neither a susceptibility nor a stated to be neither a susceptibility nor a resistance factor in this study. resistance factor in this study.

2008/022008/02 InterOntology08InterOntology08 2828

Asserting individual 2),3),4)Asserting individual 2),3),4)

2) odds_ratio_OR_1.49 ⊆2) odds_ratio_OR_1.49 ⊆ Odds_RatioOdds_Ratio ⋂∀ ⋂∀ hasOR ( {1.49} )∋hasOR ( {1.49} )∋ ⋂∀ ⋂∀ hasCI95 ( {1.15-1.90} )∋hasCI95 ( {1.15-1.90} )∋ ⋂∃ ⋂∃ hasP ( {Corrected_P_0.0252} {Uncorrected_P_0.0028} )∋ ⋂hasP ( {Corrected_P_0.0252} {Uncorrected_P_0.0028} )∋ ⋂ ⋂∃ ⋂∃ hasClassifiedGroup ( {Control_Group_1} {Case_Group_1} )∋ ⋂hasClassifiedGroup ( {Control_Group_1} {Case_Group_1} )∋ ⋂ 3) Control_Group_1 ⊆3) Control_Group_1 ⊆ Classified_GroupClassified_Group ⋂∃ ⋂∃hasPopulationSize ( {342 int})∋hasPopulationSize ( {342 int})∋ ⋂∀ ⋂∀isPartOf ( {an_ashkenazi_jewish_population})∋isPartOf ( {an_ashkenazi_jewish_population})∋ 4) Case_Group_1 ⊆4) Case_Group_1 ⊆ Classified_GroupClassified_Group ⋂∃ ⋂∃hasPopulationSize ( {275 int})∋hasPopulationSize ( {275 int})∋ ⋂∀ ⋂∀isPartOf ( {an_ashkenazi_jewish_population})∋isPartOf ( {an_ashkenazi_jewish_population})∋

2), 3) and 4) together means that the study conducted a case-2), 3) and 4) together means that the study conducted a case-control study(case size =275 and control size = 342) in an control study(case size =275 and control size = 342) in an Ashkenazai Jewish population.Ashkenazai Jewish population.Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected PP = 0.0252, = 0.0252,

uncorrected uncorrected P P = 0.0028). = 0.0028).

2008/022008/02 InterOntology08InterOntology08 2929

Asserting individual 5)Asserting individual 5)

5) Disease_Gene_Study_15047632 ⊆5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_StudyDisease_Gene_Study ⋂∀ ⋂∀ hasPubMedID ( {PMID_15047632}∋hasPubMedID ( {PMID_15047632}∋ ⋂∃ ⋂∃ hasStudyPopulation ( {an_ashkenazi_jewish_population})∋hasStudyPopulation ( {an_ashkenazi_jewish_population})∋ ⋂∀ ⋂∀ hasURI ( {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})∋hasURI ( {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})∋6) an_ashkenazi_jewish_population ⊆6) an_ashkenazi_jewish_population ⊆ Population_GroupPopulation_Group ⋂∃⋂∃hasPopulationCharacteristic ( {Jews} )∋hasPopulationCharacteristic ( {Jews} )∋ ⋂∃ ⋂∃hasGeographicalSite (( {Israel} {U.S.} )∋ ⋂hasGeographicalSite (( {Israel} {U.S.} )∋ ⋂

5) and 6) means :5) and 6) means :①① An Ashkenazi Jewish population was investigated in this An Ashkenazi Jewish population was investigated in this

study; study; ②② The population belongs to Jews ethinic group and The population belongs to Jews ethinic group and

located in Israel and U.S. ;located in Israel and U.S. ;③③ the PubMedID and URL of this paper were collected.the PubMedID and URL of this paper were collected.

2008/022008/02 InterOntology08InterOntology08 3030

The core conceptionThe core conception

Put 1)-5) together, the core conception of Put 1)-5) together, the core conception of this one relationship is built: this one relationship is built:

relationships {relationships { associated associated } between the { } between the { 3_intronic_SNP_rs3818247 3_intronic_SNP_rs3818247} and } and

{{Type_2_DiabetesType_2_Diabetes} observed in a {} observed in a { an_ashkenazi_jewish_population an_ashkenazi_jewish_population } from a study } from a study {{ PMID_15047632 PMID_15047632}}. .

2008/022008/02 InterOntology08InterOntology08 3131

Representation of a SNPRepresentation of a SNP

a_3_intronic_SNP_rs3818247 ⊆a_3_intronic_SNP_rs3818247 ⊆ htSNPhtSNP ⋂∃ ⋂∃ hasAlleleComponent ( {DNA_Level_Allele_T} { DNA_Level_Allele_G})∋ ⋂hasAlleleComponent ( {DNA_Level_Allele_T} { DNA_Level_Allele_G})∋ ⋂ ⋂∃ ⋂∃ hasGenomeSite ( {flanking_3_intronic})∋hasGenomeSite ( {flanking_3_intronic})∋ ⋂∃ ⋂∃ isGeneticVariantOf ( {hepatocyte_nuclear_factor-4_alpha})∋isGeneticVariantOf ( {hepatocyte_nuclear_factor-4_alpha})∋ ⋂∃ ⋂∃ hasVariantDatabase ( {HGVBase_SNP002310533} {dbSNP_rs3818247})∋ ⋂hasVariantDatabase ( {HGVBase_SNP002310533} {dbSNP_rs3818247})∋ ⋂

This means that the 3’ intronic SNP rs3818247 is a htSNP of This means that the 3’ intronic SNP rs3818247 is a htSNP of hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic sequence of the gene. The alleles of this SNP are T/G in DNA level. sequence of the gene. The alleles of this SNP are T/G in DNA level. Reference databases entry : Reference databases entry : 1) HGVBase : “SNP002310533” 1) HGVBase : “SNP002310533”

2) dbSNP : “rs3818247”2) dbSNP : “rs3818247”

2008/022008/02 InterOntology08InterOntology08 3232

DiscussionDiscussion

A hybrid of middle-out and top-down approach was A hybrid of middle-out and top-down approach was conducted to build our ontology.conducted to build our ontology.

BFO is important for harmonizing the domain ontologies BFO is important for harmonizing the domain ontologies in our case.in our case.

The ontology can apply to other complex diseases too.The ontology can apply to other complex diseases too. We anticipate the further application of this ontology:We anticipate the further application of this ontology:

Information retrievalInformation retrieval Knowledge base developmentKnowledge base development Logic rules establishingLogic rules establishing Mapping or link to other ontologies, such as GO, Mammalian Mapping or link to other ontologies, such as GO, Mammalian

Phenotype, and so on. Phenotype, and so on.