Semantics 101
-
Upload
kurt-cagle -
Category
Documents
-
view
443 -
download
0
Transcript of Semantics 101
Semantics 101Business Use Cases
Intro to Semantics Business Use Cases Marklogic: Semantics + Search
Semantics 101 Overview
Intro to SemanticsFrom conversations to query
A Conversation◦ Semantics links concepts together via triples◦ Concepts are identified by global identifiers (IRIs)◦ Concepts can also have descriptive metadata◦ Ordinary names are labels – descriptive, not unique
Triples◦ A semantic “triple” consists of a
subject (what the assertion describes) predicate (the relationship) object (that thing or descriptive metadata that is related
to the subject). context (identifies domain of interest, optional)
Intro to Semantics
Assertions are Subject | Predicate | Object◦ Michael | is | an individual.◦ Michael | has | policy X. ◦ Policy X | is sold by | InsureCo.◦ Michael | is married to | Jane.◦ Jane | is a dependent of | Michael.◦ If A is a dependent of B, then A is an Individual.◦ => Jane | is an | Individual.◦ “is married to” | is | a reflexive property.◦ If A is reflexive to B, then B is reflexive to A.◦ => Jane | is married to | Michael.
Intro to Semantics II
Declares how assertions are made. Analogous to XML. Directed assertions create labeled graphs. More generalized than hierarchies (XML or
JSON) Much of the power of RDF comes from
traversal of the graph Can be expressed in multiple ways
Resource Description Framework (RDF)
RDF as Graph
@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix individual: <http://optum.com/ns/individual#> .@prefix property: <http://optum.com/ns/property#> .@prefix class: <http://optum.com/ns/class#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix aetnaPerson: <http://aetna.com/ns/persons/42164323C> .@prefix ssn: <http://ssa.gov.us/ns/ssn#> .
individual:Jane_Doe owl:sameAs <http://aetna.com/ns/persons/42164323C> , individual:Jane_Doe , ssn:351644715 ; rdf:type class:Individual ; property:location "/apps/semantics1/data/Jane_Doe.xml" ; rdfs:label "Jane Elizabeth Doe" ; property:hasDependent individual:Sarah_Doe , individual:Wendy_Jones ;.
RDF as Turtle
<triples> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</predicate> <object>http://optum.com/ns/class#Individual</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://www.w3.org/2000/01/rdf-schema#label</predicate> <object datatype="http://www.w3.org/2001/XMLSchema#string">Jane Elizabeth Doe</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://optum.com/ns/property#hasDependent</predicate> <object>http://optum.com/ns/individual#Wendy_Jones</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://optum.com/ns/property#location</predicate> <object datatype="http://www.w3.org/2001/XMLSchema#string" >/apps/semantics1/data/Jane_Doe.xml</object> </triple></triples>
RDF as TriplesML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:class="http://optum.com/ns/class#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:property="http://optum.com/ns/property" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > <class:Individual rdf:about="http://optum.com/ns/individual#Jane_Doe"> <owl:sameAs rdf:resource="http://aetna.com/ns/persons/42164323C"/> <rdf:type rdf:resource="http://optum.com/ns/class#Individual"/> <sameAs rdf:resource="http://optum.com/ns/individual#Jane_Doe"/> <property:hasDependent rdf:resource="http://optum.com/ns/individual#Sarah_Doe"/> <property:hasDependent rdf:resource="http://optum.com/ns/individual#Wendy_Jones"/> <owl:sameAs rdf:resource="http://ssa.gov.us/ns/ssn#351644715"/> <property:location rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >/apps/semantics1/data/Jane_Doe.xml</property:location> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Jane Elizabeth Doe</rdfs:label> </class:Individual></rdf:RDF>
RDF as RDF-XML
Establishes rules, relationships & schemas Builds the “logic” of RDF Analogous to XSD Schemas use Open World Assumption
◦ You don’t know what you don’t know Schema model is accessible to RDF OWL – Ontology Web Language
◦ Many flavors SPIN – Extension language, ML has similar
RDF Schema + OWL, SPIN
SQL like language for the web Matches parts or all of triples Provides four modes
◦ Query – get tabular results◦ Describe – get triples back◦ Ask – gets a true/false answer◦ Construct – creates new triples
Results can be serialized to various formats:◦ Rdf-xml, triplesml (xml), json, turtle, csv, others
SPARQL Query
Completes RDF CRUD capabilities Used for inserting content and inferencing Supported in MarkLogic 8 Can support transactions in ML8 Good for serializing between semantic dbs
SPARQL Update
prefix individual: <http://optum.com/ns/individual#>prefix class: <http://optum.com/ns/class#>prefix property: <http://optum.com/ns/property#>prefix xs: <http://www.w3.org/2001/XMLSchema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix cts: <http://marklogic.com/cts#>prefix xdmp: <http://marklogic.com/xdmp#>prefix fn: <http://www.w3.org/2005/xpath-functions#>
Select ?name ?depName ?primary ?primaryPath where { ?dependent rdfs:label ?depName. filter (cts:contains(?depName,?query)) ?primary property:hasDependent ?dependent. ?primary rdfs:label ?name. ?primary property:location ?primaryPath.}
Business Use CasesExplorations
1. Document Integration2. Taxonomy Management3. Natural Language Processing4. Master Data Hub5. Fraud Analysis6. Recommendation Engine7. Rights & Contract Management8. Diagnostics Systems9. Decision Support10. Metadata Management System
Business Use Cases
In relational world, objects are tables, and relationships join tables
In semantic world, objects are documents, and relationships join documents
Related documents can be searched and composed dynamically
Eliminates duplication – one document per entity
Low cost, minimal effort, medium value
#1 Document Integration
Terms, synonyms, antonyms can be standardized across systems
Category inheritance (cat -> pet -> animal) Controlled vocabularies can include
meaning Can feed better entity extraction Centralization of controlled vocabularies
across data silos. Moderate cost, effort, value
#2 Taxonomy Management
Search becomes more relevant and accurate
Descriptive content (doctor’s reports, etc.) can provide additional metadata
Better facilitates multiple language and searches with inaccurate spelling
Documents can be made more granular for searching
Context sensitive searches become possible Moderate cost, effort, value
#3 Natural Language Processing
From ETL to ELT Turns schemas into logical relationships Initial system converts databases to RDF
representations of models Inferencing & new information create
canonical representations of entities Can query canon or source simultaneously Services architecture becomes much
simpler Medium cost, complexity, high value
#4 Master Data Hub
Can identify potential abuses One project detected more than $1B in
fraud in insurance industry Requires large data and considerable
processing Uses inferencing to detect patterns of usage Identifies individuals under multiple aliases High cost, effort, very high value
#5 Fraud Analysis
Based upon various criteria, recommends specific products to customers
Mix of semantics, search, data analytics Sensitive to changes in rates, offerings,
provisions Useful for insurance exchanges, works well
with Optum market Medium cost, medium complexity, high
value
#6 Recommendation Engine
Determines contract provisioning, enforcement and domains
Useful for regulatory tracking and fraud prevention
Identifies equivalent legal language across different policies
Medium cost, high complexity, medium value
#7 Rights & Contract Mgmt
More applicable to health care, can identify symptoms and provide likely diagnoses
Can be used in conjunction with EHRs Uses NLP and Recommendation Systems Revenue generating potential High cost, High Complexity, High to Very
High Value
#8 Diagnostics Systems
Tracks and weighs decision trees and timeline management
Uses semantics both to provide metrics to links and to manage timelines
This could have value in managing participant, provider and treatment timelines, as well as to establish both auditing and action recommendations.
Cost medium, complexity:medium, value: medium
#9 Decision Support
This would be a metadata management system for ingesting, classifying, indexing, searching and managing the rights of media assets.
It moves beyond simple taxonomy systems, both by allowing for multiple concurrent taxonomies on resources and
Cost:Moderate, Complexity:Medium, Value=Medium to High
#10 Metadata Management System
MarklogicMerging Semantics and Search
Triple index is where queries happen Triples “bound” to sem:triple XML structure. Sem:triples can either be in documents or
bundles SPARQL can perform cts:queries SPARQL is parameterized from Xquery or
Javascript Output is either triples or sequences of
maps SPARQL can output to XML, JSON, CSV,
Turtle, or other formats
MarkLogic: Semantics+Search
Why Sem+Search?◦ Facilitates joining complex, multipart documents◦ Sem makes creating ad-hoc indexes easy◦ Semantics critical for data hubs◦ Sem+search combines searching by type &
concept as well as words.◦ Makes natural language processing (much) easier◦ Works better for document-centric apps than
traditional RDF databases◦ Chaining of queries through Xquery or Javascript
possible
MarkLogic: Semantics+Search 2
Questions? ????????????????????????????????