Presented by: Stephen Buxton, Amir Halfon - …...It's easy – MarkLogic combines the features of a...
Transcript of Presented by: Stephen Buxton, Amir Halfon - …...It's easy – MarkLogic combines the features of a...
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Semantics Presented by: Stephen Buxton, Amir Halfon MarkLogic World Tour – 2014
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
MarkLogic Semantics MarkLogic Semantics in MarkLogic 7 (and beyond)
Deeper dive? – see MarkLogic Semantics - Under the Hood
What our customers are doing with Semantics Deeper dive? – see A Field Guide to MarkLogic Semantics
Questions?
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
MARKLOGIC SEMANTICS Powerful, Smarter Applications Faster & Easier
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4
Semantics: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England
Query with SPARQL, gives us simple lookup .. and more! Find people who live in (a place that's in) England
"John Smith" "England" livesIn "London" isIn
livesIn
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
Triple Store Enterprise ready. Store RDF triples
alongside documents and values
Triple Index In addition to value,
structure, text, scalar, metadata, security, and
geospatial
SPARQL Industry-standard
language for querying triples
SPARQL
MARKLOGIC SEMANTICS
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
DOMAIN WORLD AT LARGE
DOCUMENTS
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
Context from the World at Large
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data
Facts that are freely available In a form that’s easily consumed
DBpedia (wikipedia as structured information)
Einstein was born in Germany
Ireland’s currency is the Euro GeoNames
Doha is the capital of Qatar
Doha has these lat/long coordinates
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8
Context from Domain Like Open Data, but domain specific
Might be proprietary within a company
Or shared across an industry
Includes data and ontologies
Some Examples
A bank's proprietary reference data
A pharmaceutical company's drug ontology
An industry-wide ontology such as FIBO
Proprietary Semantic Facts (Facts and Taxonomies in your
organization or industry)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9
Context from Documents Document metadata
Ex: Categories, author, publish date, source
Facts in free-flowing text
Entities: this document mentions the person Richard Nixon, the product Advil, the company IBM
Events: this document says that Nixon went to China, John Smith met Jane Doe, Barclays acquired Lehman Brothers
Found automatically or provided at authoring time
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
The World of Triples Linked Open Data
(Free semantic facts available to anyone)
Facts from Free-Flowing Text (Derived from semantic enrichment)
Proprietary Semantic Facts (Facts and Taxonomies in your organization)
Facts in Documents (Part of metadata or added with authoring tools)
Sem
anti
c W
orld
Doc
um
ent
Wor
ld
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
WHY SEMANTICS?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
Why Semantic Technologies?
Triples are atomic – easy to create, manage, combine Semantic Web shares data as triples A natural choice for metadata and real-world facts
.. and facts embedded in a document Adds relationships between facts, between documents Standards encourage tools and sharing Graph model – easy to follow links Ontologies – share information, infer new facts
Because …
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
Why Semantics and Search?
Many use cases need documents, triples, and data One database means a simple, efficient, powerful architecture Combination queries – query documents, triples, data in a single query – open up new
possibilities
Because …
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
WHAT'S A COMBINATION QUERY?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
Two Hemispheres, One Brain
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
Two Hemispheres, One Brain
Triples: Highly structured Atomic Do one thing well
XML and JSON: Flexible structure Rich documents Rich applications
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Combination query - scenario You work in an Incident Call Center A call comes in:
"some maniac in a blue van just tried to run me down" "I got the first three letters of his license plate: ABC"
You could look up "ABC*" in the license plate database, or … .. Look for similar incident reports
Reports that mention a "blue van" … around the same time … around the same place … with a license plate that starts with "ABC"
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
<SAR> <title> Suspicious vehicle… Suspicious vehicle near airport <date> <type> <threat>
2012-11-12Z observation/surveillance
<type> suspicious activity <category> suspicious vehicle
<location> <lat> 37.497075 <long> -122.363319
<subject> IRIID <subject> IRIID
<predicate> <predicate>
isa value
<triple> <triple>
<object> license-plate <object> ABC 123
<description> A blue van… A blue van with license plate ABC 123 was observed parked behind the airport sign…
</title> </date>
</type>
</type> </category>
</threat>
</lat> </long>
</location>
</subject> </subject>
</predicate> </predicate>
</object> </object>
</description> </SAR>
</triple> </triple>
An XML or JSON document can represent many information types:
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
Combination Query: Example <SAR>
<title>
Suspicious vehicle…
<date>
2012-11-12Z
<type>
<threat>
suspicious activity <category>
suspicious vehicle
<location>
<lat>
37.497075
<long>
-122.363319
<description>
A blue van…
<subject> <subject>
<predicate>
<object>
IRIID
IRIID
isa
value
license-plate
ABC 123 <predicate>
<object>
observation/surveillance <type>
<triple>
<triple>
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
WHAT'S IN MARKLOGIC 7?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
XQY XSLT SQL SPARQL
GRAPH SPARQL
Semantics Architecture
TRIPLE
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22
What did we build? Database – Enterprise Triple Store
Store, manage RDF triples Query - Native SPARQL
SPARQL queries over triples
combination queries across documents, values, triples Scalability – Indexing
special-purpose Triple Index and Cache
horizontal scaling in a shared-nothing cluster Application Development
Updated REST APIs, SPARQL end point
SPARQL Query Console Enterprise Ready
all integrated with MarkLogic's Enterprise NoSQL Database
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
Technical Drivers -Why MarkLogic?
MarkLogic is an Enterprise Triple Store Robust
Horizontally scalable – billions of triples per box
HA/DR features such as backup/restore, replication, automatic failover
Government-grade security
Triples can be embedded in documents
Address problems of provenance and reification
Annotate/add metadata to a triple (or set of triples), then do a combination query
SPARQL queries across facts: search and manage the source documents too
show me all the people John met with
… in the last 6 months, with 70% confidence, where the source is an FBI report that mentions explosives and a place within 100 miles of Paris
In the Triple Store world:
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
Technical Drivers -Why MarkLogic?
Add a triple store to your document store It's easy – MarkLogic combines the features of a document store and a triple store
It's simple – a single architecture for documents, values, and triples
It's powerful – combination queries let you query across documents, triples, and values in the same query
Triples add value to your documents
Better search – leverage facts to expand your search
Better User Experience – show facts as well as documents and facets to help users understand, discover, and make decisions
Combination queries – new kinds of queries you cant do with separate document and triple store
In the documents/search world:
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
MarkLogic Semantics: Bringing it all Together
Document Store + Data Store + Triple Store
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
HOW DO I MAKE IT WORK?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27
Semantic Implementation Details SPARQL -centric:
SPARQL with XQuery built-in functions (including cts:contains)
SPARQL with a search argument
SPARQL with variable bindings
SPARQL with forest-ids
XQuery -centric: Inside an XQuery program
sem:sparql( sparql query, search criteria ) cts:triples( subject, predicate, object , search criteria ) cts:triple-range-query( subject, predicate, object , [=,<,>] )
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
Triples and Documents
Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
Triples can be annotated in documents <source>AP Newswire</source> <sem:triple date="1972-02-21" confidence="100"> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; sem:sparql(' SELECT ?country WHERE { <http://example.org/news/Nixon> <http://example.org/wentTo> ?country } ', (), (), cts:and-query( ( cts:path-range-query( "//sem:triple/@confidence", ">", 80) , cts:path-range-query( "//sem:triple/@date", "<", xs:date("1974-01-01")), cts:or-query( ( cts:element-value-query( xs:QName("source"), "AP Newswire" ), cts:element-value-query( xs:QName("source"), "BBC" ) ) ) ) ) )
Which countries did Nixon visit?
.. before 1974?
.. only show me answers where I have at least 80% confidence
.. and the source is AP Newswire OR BBC
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34
WHAT'S NEXT?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35
MarkLogic Roadmap – Search and Semantics MarkLogic 7
Semantics RDF storage and management RDF bulk load Specialized triple index Native SPARQL SPARQL over REST Combination queries Search Custom tokenization SQL MATCH (driving Tableau) Dynamic boosting (cf XRANK) Range index scoring More-better plans, traces, controls
Foundation The essential building blocks of Semantics Search features at least on par with FAST
MarkLogic 8 Semantics Graph traversal and discovery Automatic Inference SPARQL 1.1 aggregates SPARQL 1.1 Update Basic Visualizations SPARQL from JavaScript, Node.js
Search Search from JavaScript, Node.js Entity Enrichment best practices
Completeness World-class triple store with inference, SPARQL 1.1 Search and Semantics work together everywhere
MarkLogic 9 Search and Semantics Content Analytics Advanced read/write visualizations Graph analytics Ontology management tools Ontology-driven concept extraction, classification More / faster combinations Extreme Performance and Scale
Do More With All Information Content analytics, advanced visualizations, and management tools give you power over all information
https://ea.marklogic.com/
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 36
WHERE ARE WE GOING WITH THIS?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 37
Search: Understand, Discover, Make decisions
Search Fetch documents
Extract relevant facts
Analyze
Old
Search Fetch facts, data, and documents in context
Analyze and annotate Fetch supporting facts, data, and documents
As needed
New
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantic Technology Use Cases
Amir Halfon, CTO, Financial Services, MarkLogic
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40
Use Case: Master Data Management Environment:
Hundreds of Business Units Hundreds of Products Thousands of Applications Multiple Data Formats
Structured Unstructured
Multiple Identifiers
40 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge:
Aggregate all data for across business units and geographies.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41
Semantic Reference Data Management
41 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
UltimateParent
JointVenture
WhollyOwnedSubsidiary
MajorityOwnedSubsidiary
SignificantlyOwnedSubsidiary
Customer
Customer APAC Subsidiary
Customer Japanese Subsidiary
ultimateParentOf, whollyOwnsAndControls
majorityOwnsAndControls
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42
Semantic Meta Data Management
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43
Use Case: Semantic-Driven Analytics
Challenge: Progress from transaction flow analysis to person-centric analytics, combining data from many diverse sources
Environment:
Dozens of transaction
systems, each with their own analytics
Interaction records External data sources Connections among
customers and other entities
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44
Marketing
Profile Configuration Tools
Profile Data Extracted From multiple sources
Profiles include social graphs
Fraud and Financial Crime
Semantic-Driven Customer Insight
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 45
Linked Open Data Insights
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 46
Use Case: Regulatory Compliance
Environment:
Thousands of rules, millions
of accounts and onboarding documents
Impossible to pre-define dimensions, relationships
Challenge:
Provide a scalable way to map regulations to internal policies, and automate regulated workflows.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 47
Semantic Regulatory Validation
Documents
MarkLogic Workflow
Policies Ontology
Regulations
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 48
Use Case: Data Provenance
48 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge: Provide a consistent way to identify the source, timeliness and accuracy of the data
Environment:
Regulations requiring data
lineage Complex data lifecycle, which
makes it hard to keep track of data elements and their changes
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 49
Data Provenance Using RDF
<Trade> <Cashflows>
<subject> <subject> TradeID
<predicate> <predicate>
wasDerivedFrom wasAttributedTo
<triple> <triple>
<object> CDS_xyz <object> System_123
<provenance> </subject>
</subject> </predicate>
</predicate> </object> </object>
</provenance> </Trade>
</triple> </triple>
Cashflows
<PartyIdentifier> <TradeID> 123456 </TradeID>
</PartyIdentifier> </Cashflows>
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 50
Use Case: Information Dissemination Environment:
SEC Filings Analyst Briefing Transcripts News Feeds Press Releases
50 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge:
Provide a simple search solution for investment analysts to quickly identify opportunities
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 51
Semantic Investment Research Authoring
51 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
SEC Filings
News Feeds
Analyst Briefings
Press Releases
Research Ontology
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 52
Semantic Search
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 53
Suggested / Related Content
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 54
Semantic Publishing
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 55
Summary Compelling use cases are driving industry adoption
MDM Semantic-driven Insights Regulatory Compliance Data Provenance Information Delivery
Several ongoing initiatives attest to maturity: Financial Industry Business Ontology ISO/TC 68/WG 5 - ISO 20022 Semantic Models ACORD Framework
55 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 56
QUESTIONS?