Semantics 101

Post on 19-Feb-2017

443 views 0 download

Transcript of Semantics 101

Semantics 101Business Use Cases

Intro to Semantics Business Use Cases Marklogic: Semantics + Search

Semantics 101 Overview

Intro to SemanticsFrom conversations to query

A Conversation◦ Semantics links concepts together via triples◦ Concepts are identified by global identifiers (IRIs)◦ Concepts can also have descriptive metadata◦ Ordinary names are labels – descriptive, not unique

Triples◦ A semantic “triple” consists of a

subject (what the assertion describes) predicate (the relationship) object (that thing or descriptive metadata that is related

to the subject). context (identifies domain of interest, optional)

Intro to Semantics

Assertions are Subject | Predicate | Object◦ Michael | is | an individual.◦ Michael | has | policy X. ◦ Policy X | is sold by | InsureCo.◦ Michael | is married to | Jane.◦ Jane | is a dependent of | Michael.◦ If A is a dependent of B, then A is an Individual.◦ => Jane | is an | Individual.◦ “is married to” | is | a reflexive property.◦ If A is reflexive to B, then B is reflexive to A.◦ => Jane | is married to | Michael.

Intro to Semantics II

Declares how assertions are made. Analogous to XML. Directed assertions create labeled graphs. More generalized than hierarchies (XML or

JSON) Much of the power of RDF comes from

traversal of the graph Can be expressed in multiple ways

Resource Description Framework (RDF)

RDF as Graph

@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix individual: <http://optum.com/ns/individual#> .@prefix property: <http://optum.com/ns/property#> .@prefix class: <http://optum.com/ns/class#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix aetnaPerson: <http://aetna.com/ns/persons/42164323C> .@prefix ssn: <http://ssa.gov.us/ns/ssn#> .

individual:Jane_Doe owl:sameAs <http://aetna.com/ns/persons/42164323C> , individual:Jane_Doe , ssn:351644715 ; rdf:type class:Individual ; property:location "/apps/semantics1/data/Jane_Doe.xml" ; rdfs:label "Jane Elizabeth Doe" ; property:hasDependent individual:Sarah_Doe , individual:Wendy_Jones ;.

RDF as Turtle

<triples> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</predicate> <object>http://optum.com/ns/class#Individual</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://www.w3.org/2000/01/rdf-schema#label</predicate> <object datatype="http://www.w3.org/2001/XMLSchema#string">Jane Elizabeth Doe</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://optum.com/ns/property#hasDependent</predicate> <object>http://optum.com/ns/individual#Wendy_Jones</object> </triple> <triple> <subject>http://optum.com/ns/individual#Jane_Doe</subject> <predicate>http://optum.com/ns/property#location</predicate> <object datatype="http://www.w3.org/2001/XMLSchema#string" >/apps/semantics1/data/Jane_Doe.xml</object> </triple></triples>

RDF as TriplesML

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:class="http://optum.com/ns/class#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:property="http://optum.com/ns/property" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > <class:Individual rdf:about="http://optum.com/ns/individual#Jane_Doe"> <owl:sameAs rdf:resource="http://aetna.com/ns/persons/42164323C"/> <rdf:type rdf:resource="http://optum.com/ns/class#Individual"/> <sameAs rdf:resource="http://optum.com/ns/individual#Jane_Doe"/> <property:hasDependent rdf:resource="http://optum.com/ns/individual#Sarah_Doe"/> <property:hasDependent rdf:resource="http://optum.com/ns/individual#Wendy_Jones"/> <owl:sameAs rdf:resource="http://ssa.gov.us/ns/ssn#351644715"/> <property:location rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >/apps/semantics1/data/Jane_Doe.xml</property:location> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Jane Elizabeth Doe</rdfs:label> </class:Individual></rdf:RDF>

RDF as RDF-XML

Establishes rules, relationships & schemas Builds the “logic” of RDF Analogous to XSD Schemas use Open World Assumption

◦ You don’t know what you don’t know Schema model is accessible to RDF OWL – Ontology Web Language

◦ Many flavors SPIN – Extension language, ML has similar

RDF Schema + OWL, SPIN

SQL like language for the web Matches parts or all of triples Provides four modes

◦ Query – get tabular results◦ Describe – get triples back◦ Ask – gets a true/false answer◦ Construct – creates new triples

Results can be serialized to various formats:◦ Rdf-xml, triplesml (xml), json, turtle, csv, others

SPARQL Query

Completes RDF CRUD capabilities Used for inserting content and inferencing Supported in MarkLogic 8 Can support transactions in ML8 Good for serializing between semantic dbs

SPARQL Update

prefix individual: <http://optum.com/ns/individual#>prefix class: <http://optum.com/ns/class#>prefix property: <http://optum.com/ns/property#>prefix xs: <http://www.w3.org/2001/XMLSchema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix cts: <http://marklogic.com/cts#>prefix xdmp: <http://marklogic.com/xdmp#>prefix fn: <http://www.w3.org/2005/xpath-functions#>

Select ?name ?depName ?primary ?primaryPath where { ?dependent rdfs:label ?depName. filter (cts:contains(?depName,?query)) ?primary property:hasDependent ?dependent. ?primary rdfs:label ?name. ?primary property:location ?primaryPath.}

Business Use CasesExplorations

1. Document Integration2. Taxonomy Management3. Natural Language Processing4. Master Data Hub5. Fraud Analysis6. Recommendation Engine7. Rights & Contract Management8. Diagnostics Systems9. Decision Support10. Metadata Management System

Business Use Cases

In relational world, objects are tables, and relationships join tables

In semantic world, objects are documents, and relationships join documents

Related documents can be searched and composed dynamically

Eliminates duplication – one document per entity

Low cost, minimal effort, medium value

#1 Document Integration

Terms, synonyms, antonyms can be standardized across systems

Category inheritance (cat -> pet -> animal) Controlled vocabularies can include

meaning Can feed better entity extraction Centralization of controlled vocabularies

across data silos. Moderate cost, effort, value

#2 Taxonomy Management

Search becomes more relevant and accurate

Descriptive content (doctor’s reports, etc.) can provide additional metadata

Better facilitates multiple language and searches with inaccurate spelling

Documents can be made more granular for searching

Context sensitive searches become possible Moderate cost, effort, value

#3 Natural Language Processing

From ETL to ELT Turns schemas into logical relationships Initial system converts databases to RDF

representations of models Inferencing & new information create

canonical representations of entities Can query canon or source simultaneously Services architecture becomes much

simpler Medium cost, complexity, high value

#4 Master Data Hub

Can identify potential abuses One project detected more than $1B in

fraud in insurance industry Requires large data and considerable

processing Uses inferencing to detect patterns of usage Identifies individuals under multiple aliases High cost, effort, very high value

#5 Fraud Analysis

Based upon various criteria, recommends specific products to customers

Mix of semantics, search, data analytics Sensitive to changes in rates, offerings,

provisions Useful for insurance exchanges, works well

with Optum market Medium cost, medium complexity, high

value

#6 Recommendation Engine

Determines contract provisioning, enforcement and domains

Useful for regulatory tracking and fraud prevention

Identifies equivalent legal language across different policies

Medium cost, high complexity, medium value

#7 Rights & Contract Mgmt

More applicable to health care, can identify symptoms and provide likely diagnoses

Can be used in conjunction with EHRs Uses NLP and Recommendation Systems Revenue generating potential High cost, High Complexity, High to Very

High Value

#8 Diagnostics Systems

Tracks and weighs decision trees and timeline management

Uses semantics both to provide metrics to links and to manage timelines

This could have value in managing participant, provider and treatment timelines, as well as to establish both auditing and action recommendations.

Cost medium, complexity:medium, value: medium

#9 Decision Support

This would be a metadata management system for ingesting, classifying, indexing, searching and managing the rights of media assets.

It moves beyond simple taxonomy systems, both by allowing for multiple concurrent taxonomies on resources and

Cost:Moderate, Complexity:Medium, Value=Medium to High

#10 Metadata Management System

MarklogicMerging Semantics and Search

Triple index is where queries happen Triples “bound” to sem:triple XML structure. Sem:triples can either be in documents or

bundles SPARQL can perform cts:queries SPARQL is parameterized from Xquery or

Javascript Output is either triples or sequences of

maps SPARQL can output to XML, JSON, CSV,

Turtle, or other formats

MarkLogic: Semantics+Search

Why Sem+Search?◦ Facilitates joining complex, multipart documents◦ Sem makes creating ad-hoc indexes easy◦ Semantics critical for data hubs◦ Sem+search combines searching by type &

concept as well as words.◦ Makes natural language processing (much) easier◦ Works better for document-centric apps than

traditional RDF databases◦ Chaining of queries through Xquery or Javascript

possible

MarkLogic: Semantics+Search 2

Questions? ????????????????????????????????