Post on 23-Feb-2016
description
Advantage Through TechnologyAdvantage Through Technology
Information Ontologies for the Intelligence Communities
A Survey of DCGS-A Ontology Work
Ron RudnickiNovember 12, 2013
Topics
• The DCGS-A ontology suite• Standard operating procedures and ontology
quality assurance• Annotation vs. Explication• How the DCGS-A ontologies are being used for
the explication of data models
Motives for Ontology Development
• Multiple formats including free text, semi-structured and structured
• Some “surprise” data sets are made available a short time prior to system testing
• Data sets will change along with domain of interest
• Data can not be collected into a single store• Provide cross-source searching and analytics• Need to maintain the provenance of data
Part of a Big Data solution
Contribution of the Ontologies
• Common Upper Level Ontology – The ontologies extend from a common upper level ontology
• Delineated Content - Each ontology has a clearly specified and delineated content that does not overlap with any other ontology
• Composable Content – Classes in the ontologies represent entities at a level of granularity that can be composed in various ways to map to terms in sources
Design choices affect the outcome
6
Integration Through a Common Upper Level Ontology
• Provides common patterns within the target ontology for mappings from the sources– Easier to include new sources of data
• Enables more uniformity between queries– Easier to transition to domains of interest
Encourages uniform representations of domainsEntity
Organization
Object
Quality of Physical Artifact
Quality of Organization
PhysicalArtifact
Quality
has_quality has_quality
CUBRC - Proprietary
bearer_of
7
Integration Through Delineated Content
• Facilitates locating a class within the target ontologies• Provides better recall in queries– Less likely to overlook relevant data
Each class in the target ontologies is defined in one place
Entity
Organization
Object
PhysicalArtifact
Spatial Location
located_at located_at
CUBRC - Proprietary
Integration Through Composition of Classes
8
Car
Make
Model
Data Source 1
Car
Full Size Mid Size Compact
Data Source 2
Car
Length of Wheelbase
Manufacturer
Model
Compact
Mid Size
Full Size
prescribes
manufactureshas quality
is nominally measured by
• Granular classes better accommodate mappings from various perspectives on the same domain without loss of information
CUBRC - Proprietary
High Level Depiction of Domain
Provides Coverage of Domain of Human Activity
Attributes
Actions
Natural & Artificial
Environments
Time
People & Organizations
Artifacts
are distinguished by
use
to perform
that take place in
Developed Using a Top-Down Bottom-Up Strategy
• Treasury Office of Foreign Assets Control – Specially Designated Nationals and Blocked Persons
• NCTC – Worldwide Incidents Tracking System• UMD – Global Terrorism Database• RAND – Database of Worldwide Terrorism Incidents• LDM version .60 (TED)• VMF PLI• DCGS-A Global Graph• DCGS-A Event Reporting• BFT Report (CCRi test data)• Cidne Sigact (CCRi test data)• Long War Journal• Harmony Documents from CTC at West Point• Threats Open Source Intelligence Gateway
Partial List of Data Sources Used
Based Upon Standards
• DOD Dictionary of Military and Associated Terms (JP 1-02)• JC3IEDM• Counterinsurgency (FM 3-24)• Operations (FM 3-0)• Multinational Operations (JP 3-16)• International Standard Industrial Classification of all Economic Activities Rev.4
(ISIC4)• Universal Joint Task List (CJSCM 3500.04C)• Weapon Technical Intelligence (WTI) Improvised Explosive Device IED Lexicon• Information Artifact Ontology (IAO)• Phenotype and Trait Ontology (PATO)• Foundational Model of Anatomy (FMA)• Regional Connection Calculus (RCC-8)• Allen Time Calculus• Wikipedia
Partial List of Doctrine and Standards Used
Current DCGS-A Ontology Architecture
Basic Formal Ontology 1.1
Extended Relation Ontology
Agent Ontology Artifact Ontology Event Ontology
Emotion Ontology
Geospatial Ontology
Mid-Level Ontology
Information Entity Ontology Quality Ontology Time Ontology
Ontology Metrics
Ontology Name Number of Classes
Number of Relations
Equivalent Class Axioms
Subclass Of Axioms
Agent Ontology 986 71 378 1004
AIRS Emotion Ontology
73 88
AIRS Mid-Level Ontology
516 8 221 641
Artifact Ontology 298 3 310
Event Ontology 409 2 423
Extended Relation Ontology
45
Geospatial Ontology 297 14 13 316
Information Entity Ontology
83 29 21 83
Quality Ontology 681 2 681
Relation Ontology 20
Time Ontology 16 22 30
Totals 3359 209 640 (~19%) 3576 (~106%)
Semantic Conformance Testing
• An importing ontology reuses a term from another and adds to its content in some way – adds an axiom to some upper-level term. – the imported class inherits content from parent classes
of the importing ontology• Corrective action– request that the curators of the ontology that is the
source of the class add the content• If not possible, then plan for revision of import architecture
– the importing ontology should introduce a subtype of the term to which the content could then be added.
Semantic Smuggling
Semantic Conformance Testing
• Defining a class to be a subtype of more than one superclass
• Corrective action– remove any subclass assertions that are false (e.g.
Bank subClassOf Organization, Bank subClassOf Facility)
– refactor superclasses into disjoint classes– write axiom so that the multiple inheritance exists
in the inferred hierarchy rather than the asserted hierarchy
Multiple Inheritance
Semantic Conformance Testing
• Extending an ontology by introducing terms as child terms of a higher-level ontology using another relation (e.g. part of, is narrower in meaning than)
• Corrective action– Place the terms into their appropriate place in the
taxonomy
Taxonomy Overloading
Semantic Conformance Testing
• a term from a lower level is not a subclass of any class of the ontologies it imports
• containment requires that the domain covered by a lower-level ontology be circumscribed by the domain covered by the higher-level ontology from which it extends.
• Corrective action – Add the class (or an appropriate superclass) to the
appropriate higher-level ontology– Import a higher-level ontology that does provide a
superclass
Containment
Semantic Conformance Testing
• an ontology includes information model assertions that are not true of the domain– e.g. carrying over a not null constraint as in every
person must have an email address• Corrective action– Make needed modifications to axiom (generally
the source of such violations) so that it conforms to the domain• e.g. every person that has purchased from amazon.com
must have an email address
Conflation
Semantic Conformance Testing
• a class is a set-theoretic combination of other classes
• Corrective action– Add the class as a new type (College or University
=> Higher Education Organization)
Logic of Terms
Calculating Value of Ontology Terms
• The content of ontologies used in an enterprise will be the subject of debate and possibly, disagreement
• Having one or more metrics that are proven measures of value would help resolve such disagreements
• Current methods are often applied to ontologies in their entirety (e.g. Swoogle), fewer are designed to evaluate value of ontology classes and properties
Provide some basis for class inclusion/exclusion
Calculating Value of Ontology Terms
• A purely statistical method applied to an ontology as a graph will undervalue isolated terms that are of importance in a domain
• Importance, is at least a function of amount of use and criticality
• Usage is tractable to definition, criticality less so
Statistical Methods Supplemented by Weightings
Mappings
• Many of the purposes for which ontologies are built will be realized only to the degree to which they are linked to data
• One component of mapping is an act of translation and should be assessed on the degree of equivalence between source and target
• Another component of mapping is an implementation and should be assessed on performance criteria such as costs and scalability
• Techniques and technologies vary*
Value and Assessment
*An introductory overview can be found at: http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport_01082009.pdf
Mappings
• Hashtags – the subjective assignment of uncurated keywords to a source
• Annotations – rule based assignment of curated terms to a source
• Machine maps – automated, structure-based translation of source into target vocabulary
• Definitions – rule based expansion of source terms into types and differentiating attributes
• Explications – rule based translation of all semantic content (including that which is implicit) of a source using terms and relations of the ontology
Subtypes
TermMappings
AssertionMappings
26
Mappings
• Term mappings– Can be automated– Enable faceted queries (Select “JFK” as type Airport)– Can result in significant loss of information– Not reuseable
• Assertion mappings– Manual process that does not scale– Requires extensive knowledge of the target ontology– Enables navigational queries– Improves integration of data sources– Can result in significant carry over of source information– Not reuseable
Pros and Cons
Assessing Current Mapping MethodsTi
me/
Mon
ey
High
Low
TranslationLossy Lossless
No Ideal Instances…Hashtags
Annotations
Machine Maps
Definitions
Explications
28
Examples of Mappings
CityId Name State IncorporationDate Area Coordinates
1 Tampa Florida July 15, 1887 170.6 sq mi.
27 56’50” N82 27’31” W
2 Boston Massachusetts March 4, 1822 89.63 sq. mi.
42 21’29” N71 03’49” W
3 Dallas Texas February 2, 1856 385.8 sq. mi.
32 46’58” N96 48’14” W
4 Los Angeles California April 4, 1850 503 sq. mi.
34 03’ N118 15’ W
A Source of Data About Cities
Explication of the Source as an End Point
29
Coordinates
State
Incorporation Date
City Name
City
Area has_quality
participates_in
part_of
designated_bydesignated_by
State Name
designated_by
City Government
delimits
Act Of Incorporation
occurs_on
Explication Implementation Example
map:PersonBirth rdf:type d2rq:ClassMap ; rdfs:label "Person Birth" ; d2rq:class event:Birth ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirth/@@TreasuryPerson.id|urlify@@" .
map:PersonBirthTemporalInterval rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval" ; d2rq:class span:TemporalRegion ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifier/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" .
map:PersonBirthTemporalIntervalIdentifier rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval Identifier" ; d2rq:class airs:TemporalRegionIdentifier ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval Identifier" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifier/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" .
map:PersonBirthTemporalIntervalIdentifierBearer rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Temporal Interval Identifier Bearer" ; d2rq:class airs:TemporalRegionIdentifierBearer ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Temporal Interval Identifier Bearer" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthTemporalIdentifierBearer/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.dateofbirthlist_uid|urlify@@" .
map:PersonBirthGeospatialLocation rdf:type d2rq:ClassMap ; rdfs:label "Person Birth Geospatial Location" ; d2rq:class geo:GeospatialLocation ; d2rq:classDefinitionLabel "Treasury OFAC Person Birth Geospatial Location" ; d2rq:dataStorage map:KDD-02-B-Treasury-SDN ; d2rq:uriPattern "treasurydata_PersonBirthGeospatialLocation/@@TreasuryPerson.id|urlify@@_@@TreasuryPerson.placeofbirthlist_uid|urlify@@" .
A Portion of a D2RQ File Mapping Birth Place and Date
Explication Current Method
• The full mapping of birth place and date consists of 16 such blocks
• The full mapping of the entire table consists of 150 such blocks
• If the ontologies change, so must the mappings• Common patterns in the ontologies make some re-use
possible by adding placeholders to portions of maps and replacing them with specific values for the source at hand.
• Applications exist or are under development to auto-generate initial mappings that a human can then edit
Explication Current Method
• The improvements are source and implementation specific– What works for structured sources mapped in
D2RQ can’t be reused in structured sources mapped in other languages (R2RML, EDOAL)
– Separate mappings would be needed for sources expressed in XML, HTML or free text
• Another solution is needed
Start with Machine Made Assertion Mappings
• Type to Type mapping (e.g. table column to class)
• Relationships between types expressed using a default generic object property
• Meta-data about the source entity (e.g. table name, column name, element name) is mapped to annotation properties (rdfs:label)
Machine Made Assertion Mapping as a Starting Point
35
Coordinates
State
Incorporation Date
Name
CityArea
Class mappings createdby associating the containerwith the components witha generic property
has_area
has_incorporation_date
has_state
has_namehas_coordinates
Current Content of Ontologies is not Well Used
• Ontologists are trained to associate subclass and equivalence axioms to classes
• OWL reasoners don’t expand the graph by creating instances based upon these axioms
• OWL reasoners are resource expensive and often result in unimpressive output
• Not much control can be exerted upon which inferences an OWL reasoner performs
Create a Library of Rules
CONSTRUCT {?city ex:designated_by ?cityname . ?cityname rdf:type ex:CityName .}
WHERE {?city rdf:type ex:City .?cityname rdf:type ex:Name .?city ?related_to ?cityname .NOT EXISTS {?city ex:designated_by ?cityname . }}
Change the relationship and type of the name of a city
Create a Library of Rules
DELETE {?city ?related_to ?name . ?name rdf:type ex:Name . }
WHERE {?city ?related_to ?name .?name rdf:type ex:Name .?city ex:designated_by ?name .?name rdf:type ex:CityName .
}
Delete the original relationship and type
The Affect of Such Rules on Translated Data
39
coordinates _1
state_1
act_of_incorporation_1
area_1
Tampa
July 15, 1887
27 56’50”N82 27’31”W
city_1
has_text_value
has_value
has_value
has_valuedesignated_by
part_of
has_incorporation_date
has_quality
designated_by
city_name_1
170.6 sq. mi.state_name_1
designated_by
Florida
has_valuecity_government_1
act_of_incorporation
act_of_incorporation_2
delimits
is_output_of
occurs_on
Benefits of Rule Library
• No need to write different rules for different source formats
• Changes to the ontology affect a single rule rather than some (possibly large) number of mappings
• Allows mappings from source to target to be simple and possibly fully automated
• Writing of rules can be performed by SMEs• Fine grained control of which rules are executed– by user group– above a stated level of priority (weighting)