A discovery platform to support translational research on human
diseases
ECCB T7 tutorial September 4 2016
JanetPiñeroandLauraI.Furlong
• HowcanDisGeNEThelpinyourresearch?• OverviewoftheDisGeNETPlaDorm
• Hands-onTutorial• Webinterface
• DisGeNETCytoscapeapp• DisGeNETRDFandSPARQLendpoint
• disgenet2rRpackage
DisGeNETECCB2016Tutorial
Research ques+ons
• WhatarethediseasesassociatedtothegeneSIRT1?
• WhatarethegenesassociatedtoaAlzheimer’sdisease?
• Whatarethegenessharedbycomorbiddiseases?
• WhatarethegeneScvariantsassociatedtoobesity?
• Whatarethedruggableproteinsassociatedto
Schizophrenia?
• WhicharethepathwaysperturbedinLaforadisease?
Highthroughputgenomictechnologiesarehelpingtofinddiseasegenesandpathogenicvariants
OnlyoneorfewmaybecausaSve
Approximately10,000ofthesevariantswillhaveaconsequenceattheproteinfuncSon
Atypicalwholeexomesequencingexperimentproduces30,000–100,000variantsrelaSvetothereferencegenome
IdenSficaSonoftruepathogenicvariantsamong
allthevariaSonissSllamajorchallenge
Theavailabilityofcomprehensive,traceable,highqualitydataongenotype-phenotyperelaSonsiskey
phenotypegenotype
DATASILOS
WhatisthegeneDcbasisofWilsonDisease?
ATP7B
ATP7B
CPATP7BPRNPIL6LOXANXA5TNFAPOE
ATP7BCPATP7ACOMMD1ARSAHFESLC31A1
WhatisthegeneDcbasisofWilsonDisease?
DataSilos
DifferentStandards
LargeVolume
Needforresourcesthatgather,integrateandstandardizeinformaSononthegeneScbasisofdiseases
InformaDonongeneDcbasisofdiseases
ü KnowledgeplaDormonhumandiseasesandtheirgenes
ü CoversalldiseasetherapeuScareas
ü IntegratesinformaSonfromexpert-curatedresourcesandfrom
theliterature
ü Focusongene-diseaseassociaSon(GDA)anditssupporSng
evidence
ü StandardizaSonoftheinformaSonandprovenance
Bio-Entity Finder and Relation Extraction
Gene-diseaseassociaSonsGene-diseaseassociaSons
Biomedical databases Text mining
http://ibi.imim.es/befree/
DisGeNET:theimplementaDon
Piñeroetal,2015doi:10.1093/database/bav028
GWAscatalog
OrphaNet
UniProt
CTD
LHGDN
CTD
Curated Predicted Literature
RGD
BEFREE
GAD
ClinVar
MGD
DisGeNETv4.0
DisGeNET:datasources
DisGeNET:staDsDcs(version4.0)
Source Genes Diseases AssociaDons
Curated 7,362 7,607 32,834
Predicted 2,743 2,064 10,264
Literature 16,141 11,447 403,925
All 17,381 15,093 429,036
Lastupdate:June2016
>94%
LauraI.Furlong 15
What is Text Mining?
TextminingunlocksinformaSonbyautomaScallyextracSngdatafromfree-textresources
BioNERmodule
• EnStymenSonandnormalizaSon
• Fuzzyandpanernmatchingmethods+dicSonaries
• Diseaseandgenes• HandlesambiguiSesbetween
enSSes
RelaDonExtracDonmodule
• BasedonSVM• CombinesShallowLinguisSc
Kernel(KSL)withDependencyKernel(KDEP)
• ExploitsshallowanddeepsyntacScinformaSon
hnp://ibi.imim.es/tools/befree/
Gene-diseaseassociaDonidenDficaDonwithBeFree
Gene-diseaseassociaDontypesaccordingtotheDisGeNETontology
18
STANDARDS
phenotypegenotype
• Largeinscaleandgrowingrapidly(NGS)
• LargestudiesongeneScsofdiseaseavailable
• HGVSstandardforsequencevariaSonnomenclature
• Standardsfordataexchange• UniProt,NCBI,Ensembl• VarioML,VariO
• PhenotypedataspansawidespectrumofpossibleobservaSonsaboutanindividual
• Moredifficulttocaptureandtostandardize
• HumanPhenotypeOntology,DiseaseOntology
• Broadphenotypecategoriesusedinmanystudies
phenotypegenotype
• Gene,protein,SNPs• OfficialGenesymbol• NCBIGeneId• Uniprotaccession• dbSNPidenSfierforvariants
• Diseasesandphenotypes• UMLSCUIs• UMLSsemanSctypes• DiseaseOntology• Mappingstoavarietyofphenotypevocabulariesandontologies
StandardsinDisGeNET
DisGeNETassociaSontypeontology
DisGeNETassociaDontypeontology
hnp://sio.semanScscience.org
CoverageofdiseasevocabulariesandontologiesinDisGeNET
UMLS MeSH OMIM NCIt DO ORDO ICD9CM EFO HPO DECIPH
100 57 40 34 20 14 11 11 8 0.4
Signs,symptomsanddiseasesinDisGeNET
• Abnormalphenotypes,signsandsymptomsInflammaSonSeizuresPainOverweight
• DiseasesBreastcarcinomaDiabetesMellitus
• DiseaseclassCardiovascularDiseasesAutoimmuneDiseasesNeurodegeneraSveDiseases
Numberofconcepts
Numberofassociatedgenes
NumberofassociatedSNPs
Disease 13,674 17,005 44,467
Diseaseclass 55 5,739 992
Phenotype 1,364 9,332 2,894
Signs,symptomsanddiseasesinDisGeNET
DATAPRIORITIZATION
Indicatespopularityofagene-diseaseassociaDonacrossalldatasources
DisGeNETscore=SCURATED+SPREDICTED+SLITERATURE
DisGeNETgene-diseaseassociaDonscore
DiseaseSpecificityIndex(DSI)
ü Indicateshowspecificisagenewithrespecttodiseasesü IsinverselyproporSonaltothenumberofdiseasesassociatedto
aparSculargene(rangesfrom0to1).ü Ageneassociatedtoalargenumberofdiseases,suchasTNF
(associatedto>1,500diseases),isless“specific”foranydisease,andhasasmallDSIvalue(0.247)
ü Ageneassociatedtoonlyonedisease,ismore“specific”forthatdiseaseandhasDSIof1.
TopscoredgenesforWilsondisease
GeneNumber
ofdiseases
DisGeNETscore DSI Numberof
PMIDsNumberof
SNPs
ATP7B 57 0,819 0,596 234 99ANXA5 129 0,2 0,505 1 0PRNP 205 0,128 0,468 4 1CP 114 0,126 0,532 26 0LOX 141 0,123 0,498 2 0LOXL2 48 0,123 0,610 1 0APOE 729 0,122 0,333 2 0TNF 1524 0,120 0,247 2 0IL6 1260 0,120 0,268 2 0NDUFB7 1 0,120 1 1 0
TopscoredgenesforMajorDepressiveDisorder
GeneNumber
ofdiseases
DisGeNETscore DSI Numberof
PMIDsNumberof
SNPs
SLC6A4 374 0,236 0,411 157 5TPH2 89 0,211 0,548 26 1HTR2A 222 0,155 0,463 45 17PCLO 20 0,130 0,696 12 5CRHR1 118 0,127 0,531 11 11CYP2D6 316 0,127 0,4281 11 2FKBP5 78 0,126 0,563 16 1SP4 16 0,125 0,739 3 1GRM7 32 0,123 0,666 5 1GNAI3 7 0,122 0,812 2 1
FLEXIBLEDATAACCESS
• HowcanDisGeNEThelpinyourresearch?• OverviewoftheDisGeNETPlaDorm
• Hands-onTutorial• Webinterface
• DisGeNETCytoscapeapp• DisGeNETRDFandSPARQLendpoint
• disgenet2rRpackage
DisGeNETECCB2016Tutorial
• HowcanDisGeNEThelpinyourresearch?• OverviewoftheDisGeNETPlaDorm
• Hands-onTutorial• Webinterface
• DisGeNETCytoscapeapp• DisGeNETRDFandSPARQLendpoint
• disgenet2rRpackage
DisGeNETECCB2016Tutorial
DisGeNETCytoscapeapp
• NetworkrepresentaSonofgene-diseaseassociaSonsandprojecSons
• DownstreamanalysiswithavarietyofnetworkanalysisandannotaSontoolsavailableinCytoscape
• HowcanDisGeNEThelpinyourresearch?• OverviewoftheDisGeNETPlaDorm
• Hands-onTutorial• Webinterface
• DisGeNETCytoscapeapp• DisGeNETRDFandSPARQLendpoint
• disgenet2rRpackage
DisGeNETECCB2016Tutorial
DisGeNETasLinkedOpenData
ü WhataretheperturbedpathwaysinLaforadisease?
ü WhatproteinsassociatedwithAarskogsyndromeare
potenSaldrugtargets?
ü WhichgenesdifferenSallyexpressedinbetacellsare
associatedtoPancreaSccancer?
DisGeNETasLinkedOpenData
• RDFandnanopublicaDons• URIs:RDFprovidersor
• SIO• Useofstandards(11ontologiesinNCBO)
• MetadatadescripSon(W3CHCLS)• Interlinking
• Bio2RDF• LinkedLifeData
• Access• DownloadDataDump• SPARQLEndpoint• FacetedBrowser• OpenPHACTS
• NanopublicaSonNetwork• disgenet2R
• Openlicense• FAIR(ELIXIRandNIH)• Datahub• Sovware
hnp://lod-cloud.net/;Aug2014
DisGeNETasLinkedOpenData
SemanDcWeb–LinkedDataBasedonW3Cstandards
RDF:ResourceDescripSonFrameworkCaptureslogicalstructureofthedataGraphrepresentaSon
SPARQL:RDFquerylanguage
UsualWebvsSemanScWeb
Website DatasetPage/URL Resource/URIdocument,textual FormaldescripSonHTML:presentaSon RDF:semanScHumanreadable Machinereadable
SPARQL Query Structure #prefixdeclaraSonsPREFIX foaf:<hnp://xmlns.com/foaf/0.1/>#datasetdefiniSonFROM <DATASETGRAPH>#resultclauseSELECT/CONSTRUCT/ASK/DESCRIBE ..OUTPUT..#querypanernWHERE {graphpajern}#querymodifiersORDERBY…
DisGeNET-Tutorial 44IBISEMINAR-17–05-2016
GeneassociatedDisease
S P O
RBisoverexpressedinbladdercancersamplesasmeasuredby….
AstatementinapublicaSon
InRDF,astatementisatriple
Subject
Predicate
Object
RB1
RBisoverexpressedinbladdercancersamplesasmeasuredby….
AstatementinapublicaSon
InRDF,astatementisatriple
AlteredExpression
Carcinomaofbladder
hnp://rdf.disgenet.org/resource/gda/DGN1234
hnp://idenSfiers.org/hgnc.symbol/RB1hnp://linkedlifedata.com/resource/umls/id/C0699885
Data Model
• HowtodescribeanassociaDon?
a)Asapropertyb)Asaclass
GeneassociatedDisease
S P O
GeneAssociaDonDisease
PO SP O
Data Model
• HowtodescribeanassociaDon?
a)Asapropertyb)Asaclass
GeneassociatedDisease
S P O
GeneAssociaDonDisease
PO SP O
Data Model
• HowtodescribeanassociaDon?
a)Asapropertyb)Asaclass
GeneassociatedDisease
S P O
GeneAssociaDonDisease
PO SP O
ProvenanceandEvidenceRDFtriples
Data Model • Ontology-basedintegraDon
• DisGeNETStandards• SharedIDs• Standardontologies
GeneAssociaDonDisease
PO SP O
hjp://semanDcscience.org/ontology/sio.owl
DisGeNETAssociaDonTypeOntology
rdf:type
hjp://rdf.disgenet.org/download/4.0.0/DisGeNET-RDF-Example.jl(Turtle)
RDFdatamodel
DisGeNET:thedatamodel
• HowcanDisGeNEThelpinyourresearch?• OverviewoftheDisGeNETPlaDorm
• Hands-onTutorial• Webinterface
• DisGeNETCytoscapeapp• DisGeNETRDFandSPARQLendpoint
• disgenet2rRpackage
DisGeNETECCB2016Tutorial
• Rpackage
• TointerrogateDisGeNETdata
• TocrossDisGeNETdatawithotherresources
• TovisualizetheresultswithinthepowerfulRframework
• ToengagewiththeR/Bioconductorcommunity
• LaunchedwithinthereleaseofDisGeNETv4.0(April,2016)
hnp://www.disgenet.org/[email protected]:@DisGeNET
IBIGrouphjp://ibi.imim.es/AlbaGuSérrez-SacristánÀlexBravoJanetPiñeroAlexiaGiannoulaMiguelA.MayerAngelaLeisSanSagodelaPeñaEmilioCentenoLauraI.FurlongFerranSanz
PastMembersNúriaQueralt-RosinachMontserratCasesSolèneGrosdidierPabloCarbonellAnnaBauer-MehrenMichaelRautschka
Top Related