Post on 14-Aug-2015
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
How Semantics Solves Big Data Challenges
Matt AllenMarkLogic
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
Without context, organizing information is really hardWhy do we need semantics?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
Disconnected Data, Unable to Handle Complexity
#1 impediment to big data success is having too many silos
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4
Example: Categorizing media assets
Disconnected Data, Unable to Handle Complexity
Image ABC
File Name
Format
Create Date
Rights
Caption
Dog Image
Story
Title
Run Date
Credit
Position
Image 123
Costs
Rights
Usage
Revenue
Photographer
Photographer Accountant Editor
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
Disconnected Data, Unable to Handle Complexity
Example: Searching people, places, and things with context
vs vsvs
sub hoagie
vs
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
Disconnected Data, Unable to Handle Complexity
Example: Product research and development pipeline
Pre-LaunchAdvanced Product Development
Early Product Development
Proof of ConceptInitial Identification
Phase 1Discovery Phase 2 Phase 3 Phase 4
Can I know more about this particular area of research?
Can I find out more about whether this new product is viable?
What locations with product X, showed Y characteristic, during May-June in year 2007, 2008?
What global testing was done around product X were undertaken across the world in 2012?
Does this product already exists in the pipeline?
The problem… different words describe the same things, product names change over time, domain knowledge is not captured and made searchable, and there are too many data silos to search in a limited time
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
Disconnected Data, Unable to Handle Complexity
Example: Managing overlapping domains of knowledge in healthcare
Is “Psychoses” a “mental disorder” or “psychotic illness”?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8
We’ve created elaborate systems to categorize information
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9
But it ends up looking more like this
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
Problems With the Relational ApproachInflexible Data Model
Everything modeled up front
Schema complexity
Difficult to make changes later
Fixed to a specific business purpose
Lots of expensive ETL
Inability to store unstructured data
Mismatch for modern app development
Inability to Model Relationships
No standard for modeling people, places, things
Lack of context within taxonomies/ontologies
Inability to Query Heterogeneous Data
Inability to handle complex queries across varied data
Limited Scalability
Scale up, not out
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
360 ViewHealthcare
How do we achieve this?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
Enter Semantics…
John livesIn IsIn EnglandLondon
TriplesSubject :Predicate :Object
Semantics is a simple and elegant way to model data as facts and relationships. Semantics uses a data format called RDF that you query with SPARQL.
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
Triples Come in Different Formats
John livesIn London
<sem:triple><sem:subject> http://xmlns.com/foaf/0.1/name/"John"</sem:subject><sem:predicate> http://example.org/livesIn</sem:predicate><sem:object datatype="http://www.w3.org/2001/XMLSchema#string">"London"</sem:object>
</sem:triple>
{"triple" : {
"subject": "http://xmlns.com/foaf/0.1/name" "John","predicate": "http://example.org/livesIn","object": { "value": "London", "datatype": "xs:string" }
}
<http://dbpedia.org/resource/John><http://dbpedia.org/ontology/LivesIn><http://dbpedia.org/resource/London> .
Turtle
JSON
XML
3 IRI’s
2 IRI’s, 1 string
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
Relationships and Context Are Obvious with Triples
Tweeted TweetXYZ Sentiment Positive(=High Value)
This customer is saying good things about us. They’ve just walked into our store. Should we reward them?
Customer123
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
Documents + Triples Provide a Better Model
Title
HD MasterDates
Production Date
Editing Date
Release Date
International Date
Asset
is
<work>
<collection>
<category>
is part of
<character>
<place>
<performer>
appears in
is a
played
lives in
Title
Character
Film Series
Animated
Actress
City
Semantic TriplesDocument
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
DocumentHospital Name: Johns Hopkins
Operation Type: Cataract removal
Operation ID: 13
Surgeon Name: Robert Allen
Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg
Maxicillan Canada4Less 400 mg
Minicillan Drugs USA 150 mg
Graph Relational
+ >Operation
Person
Hospital
excels at
operated on
works at
Surgeon performed
operated on
patient at
Operation
Operation IDHospitalSurgeonProcedure
Hospital
Hospital IDHospital Name
Surgeon
Surgeon IDSurgeon Name
Procedure
Procedure IDCPT Code
More Capable Than Relational
300% growth in popularity of graph databases
Document databasesare the most popular type of NoSQL database
of enterprise data
of database spend
20%95%
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Data Documents Triples
RDF
Enterprise FeaturesHA/DR, SECURITY, ACID TRANSACTIONS, SCALABILITY & ELASTICITY
JSON, XML
Flexible Data Model
Search & QueryBUILT-IN SEARCH & QUERY, POWERFUL INDEXING CAPABILITY
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
NoSQL
KEY-VALUE
COLUMN
DOCUMENT
GRAPH
A.I.
COGNITIVE COMPUTING
PROPERTY GRAPHS
TRIPLE STORES
PREDICTIVE ANALYTICS
NATURAL LANGUAGE
PROCESSING
Seeking Clarity in the World of Data
DATA MINING
MACHINE LEARNING
ENTITY EXTRACTION
KNOWLEDGE GRAPHS
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
From the Classroom to the Boardroom
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
Benefits of MarkLogic Semantics Model facts about people, places, and things
Model complex relationships
Share your data using a common standard
Discover “hidden” facts in your data
Visualize your data as a graph
Use triples as metadata
Work with open linked data
Reconcile and integrate disparate data
Provide context for a specific domain of knowledge
Automate publishing of facts
Work with other semantic technologies
– Extract meaning from unstructured data
– Classify large amounts of data
Remember: Facts, Relationships, Metadata
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
Leading Organizations Using Semantics
Intelligent Search
Complex Data Integration
Dynamic Semantic Publishing
Object-based Intelligence
Compliance
EntertainmentCompany
AgricultureCompany
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
The World of Dooneese Maharelle
TalentKristen Wiig
Acted in
Episode 4Anne Hathaway and Killers
Part ofPlayed
CharacterMaharelle Sister
Season 34
SegmentThe Lawrence Welk Show
Aired on
Date10/4/08
Era
Acted in
Includes
Part of Has
CharacteristicTiny hands
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
What if you only know a characteristic?
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
The World of Barack Obama Real vs. Impersonation
– Barack Obama cameo vs. Barack Obama impersonation
Different Impersonations
– Fred Armisen as Barack Obama
– Jay Pharoah as Barack Obama
Characters
– The Rock Obama
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
When Data Takes Center Stage…
More Information…http://info.marklogic.com/semantics-summer
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Thank you!
Matt Allen <matt.allen@marklogic.com>