Semantics for Data Integration · Provide clean, curated data to operational processes •...
Transcript of Semantics for Data Integration · Provide clean, curated data to operational processes •...
![Page 1: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/1.jpg)
4 June 2018© MARKLOGIC CORPORATION
Semantics for Data Integration
Stephen BuxtonProduct Manager, MarkLogic
@stephenbuxton
John SnelsonPrincipal Engineer, MarkLogic
@jpcs
![Page 2: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/2.jpg)
SLIDE: 2 4 June 2018© MARKLOGIC CORPORATION
The World's Best Database for Integrating Data from Silos
RELATIONAL DATABASES
MAINFRAMES
FILE SYSTEMS
ANY OTHER SOURCE
FLEXIBLE DATA MODELBUILT-IN SEARCH
ACID TRANSACTIONS
![Page 3: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/3.jpg)
SLIDE: 3 4 June 2018© MARKLOGIC CORPORATION
AgendaUsing Documents and Graphs for Data Integration:
Data Models – Documents + Graphs
7 Steps to Nirvana – From load as is to a 360° view of governed data
Demo
![Page 4: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/4.jpg)
SLIDE: 4 4 June 2018© MARKLOGIC CORPORATION
The Document Model Natural way to model an entity
Self-describing / Human-readable
Handles: hierarchical data repeating elements sparse data
Schema is flexible within/across documents Add an element/property/column easily Insert/update/delete documents in a
single transaction – even if it changes the schema
{"Customer_ID": 1001,"Fname": "Paul","Lname": "Jackson","Phone": "415-555-1212","SSN": "123-45-6789","Addr": "123 Avenue ","City": "Someville","State": "CA","Zip": 94111
}
{"Cust_ID" : 2001 ,"Given_Name" : "Karen" ,"Family_Name" : "Bender" ,"Shipping_Address" : {
"Street" : "324 Some Road" ,"City" : "San Francisco" ,"State" : "CA" ,"Postal" : "94111" ,"Country" : "USA" } ,
"Billing_Address" : {"Street" : "847 Another Ave" ,"City" : "San Carlos" ,"State" : "CA" ,"Postal" : "94070" ,"Country" : "USA" }
}
JSON/XML DOCUMENTS
MODELING ENTITIES
![Page 5: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/5.jpg)
SLIDE: 5 4 June 2018© MARKLOGIC CORPORATION
The Graph ModelMODELING RELATIONSHIPS
Record and query relationships- Entity to Entity- Entity to Concept- Concept to Concept
Most flexible data model:- Just add/delete a triple
Enable Graph-style queries- Show me a suspect in X which is Y
which is Z …
<suspect>A</suspect>
<suspect>B</suspect>
Locations
Field Report
Finances
Investigation
Criminal Operations
![Page 6: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/6.jpg)
SLIDE: 6 4 June 2018© MARKLOGIC CORPORATION
Data and Query: A Semantic ModelSemantics: The Graph Model
Data is stored in Triples, expressed as: QUERY with SPARQL, gives us a simple look up... and more! Find people who live in (a place that’s in) England
Based on W3C standards, triples are written using RDF (Resource Description Framework), queried using SPARQL, and partitioned into Named Graphs
![Page 7: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/7.jpg)
SLIDE: 7 4 June 2018© MARKLOGIC CORPORATION
Semantics Use Cases Semantic Search
Metadata Hub: Managing (Digital) Assets
Content Delivery
Intelligence, Fraud Detection
Data Integration
![Page 8: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/8.jpg)
SLIDE: 8 4 June 2018© MARKLOGIC CORPORATION
Semantics Use Cases Semantic Search
Metadata Hub: Managing (Digital) Assets
Content Delivery
Intelligence, Fraud Detection
Data Integration
![Page 9: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/9.jpg)
SLIDE: 9 4 June 2018© MARKLOGIC CORPORATION
Why Data Integration?• Query: Provide a 360 view of a customer/patient/drug trial and related entities
• Find what you're looking for
• Produce accurate, insightful reports
• Provide clean, curated data to operational processes
• Governance: Trust that the data you are querying is reliable
• From a trusted source
• Up-to-date
• Secure
• Fit-for-purpose
![Page 10: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/10.jpg)
0: Out-Of-The-Box Search
![Page 11: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/11.jpg)
SLIDE: 11 4 June 2018© MARKLOGIC CORPORATION
Search{ “ID” : 1001 ,
“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111 }
{ “Customer_ID” : 2001 ,“Given_Name” : “Karen” ,“Family_Name” : “Bender” ,“Shipping_Address” : {
“Street” : “324 Some Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Postal” : “94111” ,“Country” : “USA” } }
Data: Load as is
Index: Universal Index
Search: Show me all documents that contain "Paul" Load data as-is and search anywhere in any entity
Zero Modeling, No Schema (yet)
cf:SELECT * FROM every table WHERE some column [contains|equals] "Paul"
![Page 12: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/12.jpg)
SLIDE: 12 4 June 2018© MARKLOGIC CORPORATION
What if "Paul" has variants?
![Page 13: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/13.jpg)
1: Variations in Vocabulary
![Page 14: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/14.jpg)
SLIDE: 14 4 June 2018© MARKLOGIC CORPORATION
Variations in Vocabulary "Paul" may also be "Pablo" (or "Paolo", or an alias, or …)
Variants are recorded as Triples in a Graph (ontology)
Show me all documents that contain "Paul" or a variant
Query the Triples to find all variants of "Paul"
Use the result to expand the query
cf:SELECT * FROM every table WHERE some column [contains|equals] some form of "Paul"
{ “ID” : 1001 ,“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111 }
{ “Customer_ID” : 2001 ,“Given_Name” : “Karen” ,“Family_Name” : “Bender” ,“Shipping_Address” : {
“Street” : “324 Some Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Postal” : “94111” ,“Country” : “USA” } }
![Page 15: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/15.jpg)
SLIDE: 15 4 June 2018© MARKLOGIC CORPORATION
Context from a graph, for better search over WHATSemantic Search
Synonyms: Another word for "trust deed" is "trust agreement"
SubClasses: Every Henley is a Shirt
Other:
- Joe Smith alias Fred Bloggs
- John Matthews member of TEHC
- John Cleese similar to Peter Cook
- Catheter requires a sterile environment
Domain-specific:
- Use one or more graphs to provide a domain-specific lens
- E.g. “LA Basin” is the same as “LA Area” for some purposes
![Page 16: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/16.jpg)
SLIDE: 16 4 June 2018© MARKLOGIC CORPORATION
Why do this with graphs?Semantic Search
Flexible: Easy to add/remove a single triple (no modeling required!)
Relationships between values are a graph (an artifact)
- Centrally manage, query, visualize
- Graph-style queries: A catheter is an embeddable device which requires a sterile environment
Data is sacrosanct: Iteration means changing the graph, not the data
- Original data is always available; no updates/re-indexing required
Iterative: Define triples just for the integration you need, and iterate
Graphs can be layered to present domain- or task-based lenses
Separation of concerns: Ontology builder is separate from query builder
![Page 17: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/17.jpg)
SLIDE: 17 4 June 2018© MARKLOGIC CORPORATION
How do I search just in zip code?
![Page 18: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/18.jpg)
2: Variations in Structure
![Page 19: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/19.jpg)
SLIDE: 19 4 June 2018© MARKLOGIC CORPORATION
Semantic Query Layer: Example{ “ID” : 1001 ,
“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111 }
{ “Customer_ID” : 2001 ,“Given_Name” : “Karen” ,“Family_Name” : “Bender” ,“Shipping_Address” : {
“Street” : “324 Some Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Postal” : “94111” ,“Country” : “USA” } }
CANONICAL ZIP ZIP
LOOK IN
CANONICAL ZIP POSTAL
LOOK IN
![Page 20: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/20.jpg)
SLIDE: 20 4 June 2018© MARKLOGIC CORPORATION
Semantic Query Layer: Example{ “ID” : 1001 ,
“Fname” : “Paul” ,“Lname” : “Jackson” ,“Phone” : “415-555-1212” ,“SSN” : “123-45-6789” ,“Addr” : “123 Avenue Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Zip” : 94111 }
{ “Customer_ID” : 2001 ,“Given_Name” : “Karen” ,“Family_Name” : “Bender” ,“Shipping_Address” : {
“Street” : “324 Some Road” ,“City” : “San Francisco” ,“State” : “CA” ,“Postal” : “94111” ,“Country” : “USA” } }
Triples Named Graphs
:sourceA
:sourceBCANONICAL
name nameIS THE SAME AS
CANONICAL name name
IS THE SAME AS
CANONICAL name name
IS THE SAME AS
CANONICAL ZIP POSTAL
IS THE SAME AS
CANONICAL name name
IS THE SAME AS
CANONICAL name name
IS THE SAME AS
CANONICAL name name
IS THE SAME AS
CANONICAL ZIP ZIP
IS THE SAME AS
![Page 21: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/21.jpg)
SLIDE: 21 4 June 2018© MARKLOGIC CORPORATION
Data mapping information is in a graph; better search over WHERESemantic Query Layer: Description
Recall Semantic Search: Better search over WHAT (values)
Semantic Query Layer: Better search over WHERE (locations, columns)
Same Field: Look in "Zip" and "Postal" and …
- Optionally tie field names to sources via collections
More complex mappings are possible
- e.g. include a function call in the graph"fullname maps to (firstname||lastname)"
![Page 22: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/22.jpg)
SLIDE: 22 4 June 2018© MARKLOGIC CORPORATION
Semantic Query Layer: Benefits Flexible: To add a new source, add to the graph and the next query will reflect the change
- You can have layered graphs in place; choose one or more at query-time
The Semantic Query Layer is a graph (an artifact)
- Centrally manage, query, visualize
Data is sacrosanct: Data Integration means only changing the graph, not the data
- Original data is always available; no updates/re-indexing required
Iterative: Define triples just for the integration you need, and iterate
![Page 23: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/23.jpg)
SLIDE: 23 4 June 2018© MARKLOGIC CORPORATION
How do I record the canonical model of an entity? (such as a Customer)
![Page 24: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/24.jpg)
3: Canonical Model of an Entity
![Page 25: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/25.jpg)
SLIDE: 25 4 June 2018© MARKLOGIC CORPORATION
Entity Type Model
EntitiesA Customer is something that exists as part of my business/mission
PropertiesCustomer entities have a Name that is of type string and is required
RelationshipsCustomers place Orders
Entities Properties Relationships
![Page 26: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/26.jpg)
SLIDE: 26 4 June 2018© MARKLOGIC CORPORATION
Entity Type Model Governance Provenance …anything else
Extending the Entity Type Model
![Page 27: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/27.jpg)
SLIDE: 27 4 June 2018© MARKLOGIC CORPORATION
Double-click on Entity Services: The Entity ModelThe Entity Type Model: Description
Create an Entity Model Descriptor (a document)
- Manual: create a JSON document and insert with a magic collection
- GUI: create an Entity in the Data Hub Framework
Model Descriptor is automagically indexed as triples
- Query the model as a graph; link to other graphs
Once you have a model, you can (auto-)generate artifacts
- Indexes
- Transformation code
- Security
![Page 28: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/28.jpg)
SLIDE: 28 4 June 2018© MARKLOGIC CORPORATION
Entity Model Descriptor (JSON) + Derived GraphThe Semantic Model: Example
![Page 29: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/29.jpg)
SLIDE: 29 4 June 2018© MARKLOGIC CORPORATION
Entity Model Descriptor (JSON) + Derived GraphThe Semantic Model: Example
![Page 30: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/30.jpg)
SLIDE: 30 4 June 2018© MARKLOGIC CORPORATION
The Semantic Model: Querying the ModelBecause the Model is available as a Graph, you can:
Apply Graph-style queries
- In the Customer model, show me the name and description of each property* that is Range Indexed
Link to other graphs and apply graph-style queries
- For all entities that are a sub-type of Person, show me the name and description of each property* that is Range Indexed,and is subject to Business Rule X or Security Policy Y
Easily extend the Model
- Add Triples
- Link to another Graph
![Page 31: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/31.jpg)
SLIDE: 31 4 June 2018© MARKLOGIC CORPORATION
The Semantic Model: Benefits Flexible: the Model is easy to change and extend
The Model for each Entity Type is "written down" in one place (an artifact)
- Model is managed, versioned, secured, queried, visualized
Model can be used to manage the creation of entity instances
- … and other artifacts
Model can be used to robustly create Services
![Page 32: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/32.jpg)
SLIDE: 32 4 June 2018© MARKLOGIC CORPORATION
I want to query over transformed data, but I don’t want to change the data
![Page 33: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/33.jpg)
4: Querying Transformed Data
![Page 34: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/34.jpg)
SLIDE: 34 4 June 2018© MARKLOGIC CORPORATION
Transforming data in the indexTransformed Data
Data transformation/manipulation in the index
- Pull out a phone number from a complex field
- Source has firstname and lastname; I want to query for fullname
Data manufacturing in the index
- Fill in default values for a missing field
- Add pre-calculated fields such as sum, average
![Page 35: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/35.jpg)
SLIDE: 35 4 June 2018© MARKLOGIC CORPORATION
Faster Semantic Data Layer without changing the dataTemplates: Index-level Integration
Query-level integration: Query the Model, expand the query
Index-level integration: project out canonical, transformed data into the (triple) index
Data-level integration: Harmonize the data
![Page 36: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/36.jpg)
SLIDE: 36 4 June 2018© MARKLOGIC CORPORATION
Faster Semantic Data Layer without changing the dataTemplates: Index-level Integration
Query-level integration: Query the Model, expand the query
Index-level integration: Project out canonical, transformed data into the index
Data-level integration:Harmonize the data
![Page 37: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/37.jpg)
SLIDE: 37 4 June 2018© MARKLOGIC CORPORATION
Documents Template Driven Extraction Triple Index
Rich hierarchy
Flexible schema
Natural fit for messages, forms, objects
Project out of trees into
Triples
Tuples
Transactionally at index time
Declarative mapping, functional transform
Index optimized for joins, aggregates
Shared by SPARQL, SQL, and Optic
![Page 38: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/38.jpg)
SLIDE: 38 4 June 2018© MARKLOGIC CORPORATION
Parts of my model are stable.I want to push down to the document
![Page 39: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/39.jpg)
5. Harmonize Documents
![Page 40: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/40.jpg)
SLIDE: 40 4 June 2018© MARKLOGIC CORPORATION
Harmonize Documents Most flexible – transformations in code
Faster indexing
- Transformations are pre-calculated
Simpler queries
- Query directly (only) against the document
Visibility: All the data is together
![Page 41: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/41.jpg)
SLIDE: 41 4 June 2018© MARKLOGIC CORPORATION
Data HarmonizationMARKLOGIC APPROACH
Harmonize data with different schemas in order to create a common view: E.g., Standardizing “male” and “female” versus, “m” and “f” or “M” and “F”)
Only harmonize what you need to
Preserve and query data and metadata as is
Update the model later without re-ingesting
New data and schemas are okay
See also: Data Hub Framework
SCHEMA 2
SCHEMA 3
SCHEMA 1
ENVELOPE PATTERN
![Page 42: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/42.jpg)
SLIDE: 42 4 June 2018© MARKLOGIC CORPORATION
Entity Type Info
Harmonized canonical data
Raw source data
![Page 43: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/43.jpg)
SLIDE: 43 4 June 2018© MARKLOGIC CORPORATION
How do I know I can trust the data I'm querying?Where did it come from?
![Page 44: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/44.jpg)
6: Provenance
![Page 45: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/45.jpg)
SLIDE: 45 4 June 2018© MARKLOGIC CORPORATION
Add a Provenance Graph to an EntityProvenance
Provenance tells me where this entity came from
- … and where that came from, and where that came from...
Provenance data is naturally Graph-shaped
Queries over Provenance are generally Graph-style queries
![Page 46: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/46.jpg)
SLIDE: 46 4 June 2018© MARKLOGIC CORPORATION
Provenance Graph
GENERATED BYHarmonize-1
APPROVED BY
RAN AT
2018-01-10 11:45
DEPT
Sr. Data Scientist
Eng-1
ROLE
Source-1
FROM
Source-2
FROM
CONSUMED CONSUMED
![Page 47: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/47.jpg)
SLIDE: 47 4 June 2018© MARKLOGIC CORPORATION
So much for a single Entity. How can I query across Entities?
![Page 48: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/48.jpg)
7: Linking Entities
![Page 49: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/49.jpg)
SLIDE: 49 4 June 2018© MARKLOGIC CORPORATION
Customer-1
Customer-1Materialized View-1
Log
ModelCust-0.0.1
Provenance
Log
Provenance
USAGE
FROM
USAGE
FROM
CASUAL
HAS MODEL
Credit Summary
Laptop-123
HAS CREDIT
PURCHASED
ModelCust-0.0.1
HAS MODEL
![Page 50: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/50.jpg)
SLIDE: 50 4 June 2018© MARKLOGIC CORPORATION
The (Extended) Semantic Model Use the Entity Type Model to generate:
- Code to transform from raw source to an instance of the model
- TDE Templates to expose data as triples, views
- Config settings (such as indexes)
- REST endpoints for APIs
Use the extended Entity Type Model to create:
- Provenance
- Links to other entity instances
- Entity usage
- Security settings
- Special-purpose materializations
![Page 51: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/51.jpg)
SLIDE: 51 4 June 2018© MARKLOGIC CORPORATION
LOAD DATA AS IS
ONTOLOGY
SEMANTIC QUERY LAYER
TEMPLATE
ENTITY TYPE MODEL EXTENSIONS
ENTITY TYPE MODEL EXTENSIONS
OUT-OF-THE-BOX SEARCH
SEMANTIC SEARCH
QUERY-BASED INTEGRATION
INDEX-BASED INTEGRATION
GOVERNANCE: SECURITY, PROVENANCE
SPECIAL-PURPOSE MATERIALIZATION
HARMONIZE DOCUMENTS DOCUMENT-BASED INTEGRATION
ENTITY TYPE MODEL CENTRALLY-MANAGED, QUERIABLE MODEL
![Page 52: Semantics for Data Integration · Provide clean, curated data to operational processes • Governance: Trust that the data you are querying is reliable ... The Model for each Entity](https://reader035.fdocuments.us/reader035/viewer/2022081516/6023bbf0a9e7976ebf16f70a/html5/thumbnails/52.jpg)
4 June 2018© MARKLOGIC CORPORATION
Semantics for Data Integration
Stephen BuxtonProduct Manager, MarkLogic
@stephenbuxton
John SnelsonPrincipal Engineer, MarkLogic
@jpcs