Advancing Discovery Science Predictive, Evidential and ... · Advancing Discovery Science...
Transcript of Advancing Discovery Science Predictive, Evidential and ... · Advancing Discovery Science...
1
AdvancingDiscoverySciencePredictive,EvidentialandMetaAnalytical
Methods
MichelDumontierAssociateProfessorofMedicine
StanfordCenterforBiomedicalInformaticsResearchStanfordUniversity
AAAIFallSymposiumAcceleratingScience:AGrandChallengeforAI
November18,2016
Newdiscoveriesarebeingmadeby“researchparasites”usingotherpeople’sdata
2
A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantationKhatri et al. JEM. 210 (11): 2205DOI: 10.1084/jem.20122709
Development of an intelligent system for scientific inquiry using the totality of web-accessible data and services.
Challenge
Efficientanduniformaccess todistributed,versionedand selfdescribing
data andservicesforreproducible analyses
5
Vastnumbersofschemasandterminologiesmakeforaconfusing setofchoices
@micheldumontier::StanfordAI:11-05-2016 9
10
metadatacenter.org
NIHCOMMONS
Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata
Challenge
Datawillalwaysbedescribedusingdifferentschemasandvocabularies.
Doesthatstillmatter?Canweautomate theintegrationofdata?
16
NewMehtods forDataIntegration
• Manyelegantsolutionsforentityorconceptmappings,buttheseonlyofferanincompletesolutionwhencombinedwithschemas
• Needtolearn robusttransformation patterns– Subsumption,Similarity,Analogy,ML,Probability
• Evaluatetheseinthecontextofusecases– Queryanswering– Datamining– Prediction
18
20
Most published research findings are false- John Ioannidis, Stanford University
Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124.
TheProblemofReproducibilityinScientificResearch
• Non-reproducibilityofratesof65–89% inpharmacologicalstudiesand64% inpsychologicalstudies.
• Problemofmultipletestinginhigh-dimensionalexperiments.Forgeneexpressionanalyses,26of36(72%)genomicassociationsinitiallyreportedassignificantwerefoundtobeover-estimatesofthetrueeffect whentestedinotherdatasets
• Analyticfocushasbeenonsignificance (P)valuesratherthaneffectsizeorindependentverification.
21
SupportandGapAnalysisusingOpenDataandOpenServices
• HyQue isaplatformforknowledgediscoverythatusesdataretrievalcoupledwithcontradiction-basedautomatedreasoningtovalidatescientifichypotheses
• Leveragessemantictechnologiestoprovideaccesstolinkeddata,ontologies,andsemanticwebservices
• Usespositiveand negativefindings,capturesprovenance
• Weighsevidenceaccordingtocontext• Usedtofindaginggenesinworm,
assesscardiotoxicity oftyrosinekinaseinhibitors
HyQue: evaluating hypotheses using Semantic Web technologies. J Biomed Semantics. 2011 May 17;2 Suppl 2:S3.
Evaluating scientific hypotheses using the SPARQL Inferencing Notation. Extended Semantic Web Conference (ESWC 2012). Heraklion, Crete. May 27-31, 2012. 22
23
Text co-mentionsGene ontology annotationsDifferential gene expression…
Access to open dataLinked to ontologiesRepresented with a universal languageQueried using a portable languageResults stored with their provenance
ScalingValidation
• Automatedexperimentation(Adam&Eve)• Crowdsourcing
– Asasimpletask– Asanopenproblem
• Automateddiscoveryofviablemethods• Automatedimplementationofviablemethods
24
KeyResearchChallenges• Scalable,shared,fault-tolerant,andreadilyre-deployable
frameworksforarchiving andprovidingversionedandmaximallyFAIRbiomedical(meta)data
• Scalablemethodsfortheprospective andretrospectiveauthoring,assessment,andrepairofmetadata.
• Scalablemethodstolearnequivalentrepresentationalpatterns• Scalableframeworksforopen,transparent,reproducible and
recurrent analysis andmeta-analysis ofFAIRresearchdata.• Methodstoidentifyinvestigative biases andknowledge gaps• Scalableandreliablemethodsfortheprioritizationscientific
hypothesesusingevidencegatheredacrossscalesandsources• Scalablemethodsforvalidation ofresearchfindings.
25