Scalable Ontological Sense Matching Dr. Geoffrey P Malafsky TECHi2 LLC, Fairfax, VA.
-
Upload
penelope-booker -
Category
Documents
-
view
218 -
download
0
Transcript of Scalable Ontological Sense Matching Dr. Geoffrey P Malafsky TECHi2 LLC, Fairfax, VA.
Scalable Ontological Sense Matching
Dr. Geoffrey P Malafsky
TECHi2 LLC, Fairfax, VA
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
2
Need for Smarter Systems
Enormous and ever increasingly amounts of data and information are available
Potential exists for significant increases in efficiency, effectiveness, and success in all fields IF the data and information can be harnessed
Most common use case is information overload- too much to sift through in too little time and too little resources/support/authority
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
3
Major Challenges
Value is subjective and context-based, i.e. not deterministic
Metrics and decision criteria are heavily dependent on conditions, information uncertainty, decision/activity timelines, vulnerability to error, risk capacity
Rules and interaction mechanisms are usually nebulous, poorly defined, non-existent, or incorrect
Information Technology approaches are immature with handling this situation with poor scalability (e.g. too computationally intensive, storage requirements, security) and/or untrustworthy results
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
4
Advanced Techniques
Improve machine understanding of information by annotating meaning (e.g. Ontologies)
Compute best scenario using domain models built using Subject Matter Experts defining core knowledge coupled to probabilistic fit calculations
Extract patterns from very large scale data sets
Hire lots of people
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
5
Example: Knowledge Discovery & Dissemination (KDD) Seeks to find knowledge for practical
purposes within large scale data/information stores Cross organizational and functional boundaries High relevance to searcher Uncertainty is assessed and used Secure Rules and domain model based Automated
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
6
Knowledge is Not Just Information or Data Knowledge has:
Context: what is it about? Confidence: is it right? Relationships: what does it have to do with that? Priorities: what is most important?
Types Explicit knowledge is codified and can be manipulated Tacit knowledge is unspoken “know-how”
Looks just like data when in an electronic system It is data Annotations on “about” and “how” tied to intelligent
application logic make it knowledge for user
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
7
KDD Knowledge Map Analysis of scientific and technological areas on emphasis
in Knowledge Discovery and Dissemination (KDD) Gaps reveal technical vulnerabilities
Total and TRL Rated Citations
0
50
100
150
200
250
300
350
Total 1-3 4-7 8-9
TRL
Nu
mb
er o
f C
itat
ion
s
Dissemination
Discovery
Knowledge
Most research and development is
concentrated in the discovery portion of
KDD
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
8
Information & Data Mining Dominates KDD Focus
Citations
0
50
100
150
200
250
Rep
rese
ntat
ion
Rul
es f
orm
atio
n +
anal
ysis
Col
lect
ion
+ T
acit
capt
ure
Life
cycl
eM
aint
enan
ce
Fus
ion
Info
/dat
a m
inin
g
Mod
els
Unc
erta
inty
mgm
t+
miti
gatio
n
Fea
ture
ext
ract
ion
+ a
naly
sis
Vis
ualiz
atio
n
Rea
l-tim
eop
erat
ions
Dis
trib
uted
arch
itect
ure
Age
nts
Per
sona
lizat
ion
On-
dem
and
(TP
PU
)
Sto
rage
Sec
urity
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
9
What is Missing to Make KDD Work
Knowledge-based metadata architecture Predictive personalization algorithms Models of knowledge lifecycle Computational knowledge techniques Real-time analysis with large data and high
uncertainty Conduit to feedback from end-users to
capture and evolve domain knowledge
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
10
Bringing Structure to Unbounded Knowledge Knowledge Mgmt systems have failed to
meet operational requirements Knowledge is inherently expansive and evolving IT tends to collect and organize assets without
relevance, context, .. Level of Effort to manually collect, map, cleanse
source data/information is too high AI is not around the corner
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
11
Structured Knowledge
Applying a structured framework creates repeatable, interoperable, consistent solutions
Knowledge fidelity is maintained with combination of human and machine processable representations
Unknowns are discovered using known analogies via triangulation
Reduction of universe of possible combinations of knowledge and user needs to engineering scale solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
12
Applying Quantum Mechanics to Knowledge Processing Even the small area of this room has an infinite number of
possible combinations of things (macro, micro, atomic, subatomic)
Representing these “knowledge” and “states” with a domain model reducing infinite possible to a tractable few: Schrödinger Equation (H = E )
Wavefunctions describe specific states (4 quantum numbers) Connection between objects defined by overlap integral of
wavefunctions *0 1 = Degree of match
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
13
The KORS™ Framework
Knowledge: collected using templates from SMEs Ontologies: Conceptual models of domain
knowledge Rules: Business and technical rules are extracted
and defined from domain knowledge and ontologies Semantic metadata: knowledge, ontology
relationships, and rules are connected to and represented with data and information
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
14
KORS Structured Knowledge Knowledge is inherently expansive and evolving KM systems have failed to meet operational requirements
IT collects & organizes assets without relevance, context, .. Level of Effort is too high to manually collect, map, cleanse
source data/information KORS™-pending framework creates repeatable, interoperable,
consistent solutions Knowledge fidelity is maintained with combination of human
and machine processable representations Ontologies (concepts) are expressed with domain specific and
standard terms Reduction of universe of possible combinations of knowledge
and concepts to engineering scale solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
15
KORS Ontologies Conventional wisdom
Multi-tiered, fully explicit, broad coverage
= Too large; Too difficult to maintain; Too hard to implement KORS™ Ontologies
Cross-domain framework, domain specific instances of classes, leverage existing ontologies, concepts defined with domain-based uncontrolled vocabulary AND common controlled vocabulary
= Smaller; easier to maintain; supports engineering processes Answers the broad question: ”Do the concepts in this other
ontology have semantic similarity to those in my ontology?” Semantic metadata used to characterize domain ontologies
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
16
Conventional Wisdom
Upper, middle and lower ontologies.
Every concept made fully explicit
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
17
The KORS Engineering Framework
Structured knowledge capture
Identify rules and metadata structures
Incorporate within the engineered solution
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
18
Cross-Concept Overlap Calculation Overlap integrals calculated from semantic metadata – updated
when ontologies change
Overlap (S) is computed at: Ontology-ontology level using primary task-description pairs Term level using allowed and disallowed senses (not just synonyms)
Real-time determination using coarse-medium-fine concept match Coarse= ontology-ontology Medium=Ontology-term Fine=term-term
With metadata architecture implementation, calculation uses what is available Inherently scalable, distributed with evolving improvements
)]([)]([ BBAABA termdomainOntermdomainOnS
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
19
Semantic Variance Across Domains For example, the term “Insurgent” means:
Mission Planner: person who takes part in an armed rebellion against the
constituted authority Geospatial Analyst:
someone who participates in a peaceful public display of group feeling
Diplomatic Corps: someone who participates in a public display against an
established government
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
20
Semantic Challenges
Semantic Consistency Semantic Variance Across Domains Search and Discovery Requirements
Controlled Uncontrolled Vocabularies
Have “local” variantsNavy fliers and Air force Pilots
Change to match changing realityYesterday’s friend could be tomorrow’s foe
Change to match changes in policy (spin)Today, “freedom fighter;” tomorrow, “insurgent”
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
21
KORS is Extensible, Scaleable, Adaptive Ontology level metadata describes the basic
functional concepts and processes of the domain Can be linked to Enterprise Architecture products Direct conceptual match two functional domains
Ontology term descriptions use: domain specific (uncontrolled) expressions Allowed senses from controlled vocabularies Disallowed senses from controlled vocabularies True meaning is found from combination of domain,
allowed, disallowed as is done in real language Metadata architecture: values used if present but
not required scalability and extensibility
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
22
Example Domain: GEOINT Exploitation and analysis of imagery and geospatial
information to describe, assess, and visually depict physical features and geographically referenced activities on the Earth.
GEOINT encompasses all the activities involved in the collection, analysis, and exploitation of spatial information in order to gain knowledge about the national security environment, and the visual depiction of that knowledge.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
23
MyGEOINT Ontological Architecture
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
24
Functional Application
Semantic Expansion Cross domain commonality Qualified synonym identification
>> discovery of potentially relevant knowledge
Semantic Resolution Allowed and disallowed alternate semantics Binding of dynamic domain-specific semantics to controlled
vocabularies.
>> computable semantic comparisons and knowledge relevance ranking
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
25
Cross domain commonality
Answers the broad question:“Do the concepts in this other ontology have semantic similarity to those in my ontology?” Semantic metadata used to characterize domain
ontologies Overlap integrals calculated from semantic
metadata – updated when ontologies change Run-time ontology-to-ontologies greatly simplified.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
26
Qualified synonym identification
Knowledge discovery via synonym lists is well supported by standardized lexical tools (WordNet, etc)
Broad-domain perspective limits ability to isolate domain-specific usage
KORS domain-specific ontologies and semantic metadata allow the use of broad-spectrum vocabularies to make fine-grain distinctions.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
27
Normalizing Uncontrolled Vocabularies
Semantic Homing The Binding of dynamic domain-specific semantics to
controlled vocabularies Allows domain-specific ontologies to evolve Provides stable semantic anchors for knowledge
computability.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
28
MyGEOINT: Ontology Knowledge Discovery
Ontology applies concept matching to make discoveries more relevant
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006, ASTI
29
Functional Impact
Discovery of knowledge sources of potentially relevance. Simpler solutions, lower real-time computational requirements, practical multi-domain solutions.
Computable semantic comparisons and knowledge relevance ranking Directly computed rather than inferred Higher domain-level precision without the overhead of
extensive upper and mid-level ontologies