Nanoinformatics Workshop | Nanoinformatics - …nanoinformatics.org/nidocuments/download/KENI...
Transcript of Nanoinformatics Workshop | Nanoinformatics - …nanoinformatics.org/nidocuments/download/KENI...
KENI Pilot
Status Report
Contents
1. Intro and Overview – 3 min. - Joe
2. KENI Architecture – 5 min. – Joe
3. Ontology – 5 min. – Nathan
4. Quant Modeling – 5 min. – Krishna
5. Pilot Model – 7 min – Joe and Krishna
6. Q&A – 5 min.
KENI Pilot Overview
• Purpose - leverage successful representation and computation
methods from an array of disciplines, integrating them into an
interdisciplinary informatics intelligence solution
• Organization - three core components:
– KENI Architecture
– Ontology
– Quant Modeling
• Pilot team - Jessica Adamick (project manager),
Nathan Baker, Brian Davis, Joe Glick (pilot lead), Liz
Hahn-Dantona, Neil Jacobson, Fred Klaessig, Phil
Lippel, Krishna Rajan and Dennis Thomas
Differentiation - Comparison of KENI Initiative with
Wolfram Alpha, Large Knowledge Collider and IBM Watson
IBM Watson KENI - CCR LarKC Wolfram Alpha
Computes Answer Relevance Selected process
outputs
Answer
Approach - Statistical
match
- Computing
power (85,000
watts)
- Contextual parsing
- Concept
quantification
- Relationship
discovery
- Massive,
distributed and
incomplete
reasoning lab
- Linguistic parsing
- Curating
computable
knowledge
User
Systems Custodian
Systems
KENI Engine
Data Sources
Minimum
Information
Requirements
Quant
Formalisms
KENI Components Overview
KENI Architecture Ontology Quant Modeling
Lead Joe Glick Nathan Baker Krishna Rajan
Inputs - existing models
- pilot content
sources
- pilot data & rules
- literature
abstracts
- ontologies
- taxonomies
- candidate data
- computation
methods &
theories
Deliverables - Prototype model
- Computable
Context
Representation
(CCR)
- Sample
content
- Subsumption
architecture
- Sample data,
rules
- Computational
use cases
KENI Architecture - Status
• Initial Content– NPO
– Internano Taxonomy
– ISO Nano Terminology
– USPTO Class 977 Taxonomy & Abstracts
– caNanoLab Publications Abstracts
– (Krishna’s Data)
• Initial Prototype Functionality– Dynamic multi-factor analysis
– Discovery of relationships and common factors
– Multidimensional quantitative exploration
KENI Architecture - Future
• Next Stage Content– NIH Thesaurus
– Biomedical Knowledge Bases
– Materials Science Knowledge Bases
• Computational Approaches:
XXScenario Automation
XXX
Conceptual
Rationalization
XXXCognitive Architectures
XXXXXInteraction Simulation
XXXXRelevance Inference
XXRule Inference
XXXNeural Modeling
UncertaintyComplexitySilosOpaquenessChaosMitigation Strategies
XXScenario Automation
XXX
Conceptual
Rationalization
XXXCognitive Architectures
XXXXXInteraction Simulation
XXXXRelevance Inference
XXRule Inference
XXXNeural Modeling
UncertaintyComplexitySilosOpaquenessChaosMitigation Strategies
KENI project
Meta-ontology subgroup
Jessica Adamick, Nathan Baker, Brian Davis, Joe Glick, Liz Hahn-
Dantona, Fred Klaessig, Phil Lippel, Dennis Thomas
Long-term objectives
• Develop a “consistent” vocabulary for
nanotechnology across its numerous
domains
• Provide structure for integrating existing
terminologies
• Support machine learning, semantic
search, and related informatics
applications
Aim 1: Inventory vocabularies• Collect, identify, and describe the different
taxonomies/terminologies/etc. that are currently available for Nanotechnology
• Possible terminologies– NCImt (NPO, NCIt, HUGO, ChEBI)
– InterNano NanoManufacturing taxonomy
– USPTO Class 977 hierarchical terminology (taxonomy-ish)
– Standards (ISO TC229, ASTM E56, OECD, IEEE)• What is our relationship to these?
• Who are our advocates?
– Other vocabularies?
• How do we deal with proprietary information in this domain (standards copyright, materials genome, etc.)?– Identification of proprietary info through KENI
– Discriminate between things that are business-sensitive and things that are not free
Aim 2: Describe use cases• Describing the use cases for Nanotechnology ontology in the
KENI project (and beyond)
• Identify communities– What are the primary applications and objectives for each
community?• Develop prioritized requirements and use cases
– Variables• Roles (Researcher, program manager, policy maker, infrastructure
provider, clinician, student, worker, …)
• Environments (Regulatory, manufacturing, safety, clinical, research, …)
– Your feedback is needed
• Identify use cases and requirements– Search (semantic capability, resolving synonomy problems,
searching across resources)
– Machine learning (descriptors for QSAR-like studies)
– Annotation (meta-data for deposited information, nano-TAB, journal articles, etc.)
– Reasoning (logical and probabilistic inference)
– Others?
Aim 3: Develop plan of action
• Develop a plan of action for the “meta-ontology” that “combines” the most relevant terminologies
• Terminology alignment/comparison– Semi-automatic
• NLP-based probabilistic approaches
• Ontology-based logical alignment approaches
– Human curation
– How should the output be presented to end-users?
• Deployment– Versioning
– Description language
– Integration with KENI infrastructure/architecture
• Application– What are the first use-case-based demonstrations?
KENI project
Quant Modeling subgroup
Jessica Adamick, Joe Glick,
Phil Lippel, Krishna Rajan
Targeted
Property(s)
“materials genes”
Challenge:
To construct Robust Correlations
between materials properties to features/ characteristics
Methods:
Data Manifold Representations
Dimensionality Reduction
Machine/Statistical Learning
Uncertainty Quantification
Mapping Materials Discovery= F ( x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 …)
Materials Genome Mapping
Krishna Rajan
Pilot ModelMaterials Science “Cartography
Initial KENI Prototype
• Reminder: we are prototyping the design
and functionality of the KENI engine, not
interfaces for end users, which is the
responsibility of implementation owners
• The KENI platform is a repository for the
integration and rationalization of multi-
disciplinary knowledge and methods for
nanoinformatics research and discovery
Sample data
Prototype Materials Genome Map
Targeted
Property(s)
“materials genes”
Structure
Assumptions
Context Uncertainty, etc.Predictors,
Provenance
Quant
Method
Compound Property
Targeted
Structure(s)
Site
Prototype Demo
Q & A