Domain Semantics

23
Domain Semantics or: How I Learned to Stop Worrying and Love the Ontology Michael Lang Jr. Director of Ontology Services Revelytix, Inc.

description

This is a talk I gave at Semtech East - 2011. The main topic is how the Domain Ontology is used to enable Distributed Information Management.

Transcript of Domain Semantics

Domain Semanticsor:

How I Learned to Stop Worrying and Love the Ontology

Michael Lang Jr.Director of Ontology Services

Revelytix, Inc.

Software Today

Siloed Information Management - Good● Drives nearly all day-to-day operations of any business● Optimized for transactions - ACID, CRUD

Distributed Information Management - Bad● Massive amounts of data generated● Operations mindset assumes single application context

○ Stovepipes and silos● Managing distributed information is extremely difficult

○ Analysis!

Use Cases

● Pharma○ Drug Pipeline Management

● Department of Defense○ Enterprise Information Web

● Financial Services○ Back-office Trade Data Analysis

Pharma - Drug Pipeline Management

● Long, expensive development time● Low success rate

Pharma - Drug Pipeline Management

Pharma - Drug Pipeline Management

Pharma - Drug Pipeline Management

● Better ability to analyze data improves success rate● Any increase in success rate of drugs in pipeline can

represent huge ROI○ Kill one drug in Phase 1 instead of Phase 3,

save $1 Billion

DOD - Business Mission Area

● Services own and operate their own systems● Numerous OSD-level reporting requirements (LAWS!)

○ DOD is not audit-able (ILLEGAL!)

Financial ServicesBack-office Trade Data Analysis

● Many compliance regulations written by national and global bodies which change often

● Relatively small IT ecosystem...still can't meet compliance reporting requirements

Financial ServicesBack-office Trade Data Analysis

Financial ServicesBack-office Trade Data Analysis

● Inability to meet compliance reporting requirements means regulations are impossible to enforce

● No regulations means trouble...○ 2008 Financial crisis

Problem Summary

● Most major business operations involve many different groups of people

○ Different Roles○ Different Organizations○ Different Companies

● Different groups use different systems○ Systems are built in silos

● Many different sets of semantics ● Many different schemas

Analyzing and sharing data is difficult

Distributed Information Management

Enterprises require a new paradigm of information technology where distributed information is assumed...Semantics

Capabilities include (some good buzzwords):● Data Integration

○ Virtualization○ Federation

● Data Quality○ Provenance○ Validation

● Data Discovery

Semantic Technology:The Ground Floor

Anyone can say Anything about Anything

Standards● URI - the universal identification scheme

○ URL - the universal location scheme● RDF - the data model● SPARQL - the query language

Benefits● URIs give universal identifiers (non-local) identifiers to things● RDF is schema-less; extensibility is not an issue● RDF-merge defines a standard way to combine disparate datasets● SPARQL specification defines federation capabilities● SPARQL operates over HTTP using URLs

Why Ontology?

Remaining Challenges● URIs/RDF/SPARQL take you a long way, but you are not

home yet● Distributed data is easy to combine and access, but difficult

to interpret○ How do you know how to combine data?○ How do you find they data you need?○ How do you know what anything means?

Domain Ontology

Machine and human readable description of a domain

● Expressed as RDF○ Part of your data○ Meta layers depend on your point of view; not your toolset

● Formal semantics○ Define your vocabulary with precision○ Infer new information○ Detect data quality issues

● Layered Descriptions○ Easily combine one type of description with another

■ Data model, Provenance, Architecture, Standards, Policies, Processes...anything

Data Integration, Federation, and Virtualization

Data Quality

Ontology Architecture

● A collection of descriptions that are used to enable a specific set of analytic use cases

○ Enumerates the set of ontologies to be used○ Defines the high-level structure and logical profile of

individual ontologies○ Defines relationships between ontologies

● Not defined in a vacuum○ Domain Ontology○ Metadata Ontology○ Executable Semantic Languages○ R2RML, SPARQL, RIF○ Tools!

■ Triple-stores, query engines, RDB2RDF translators, rule engines, existing applications, etc.

Data Provenance

W3C Provenance Ontology in development by Provenance Working Group

Data Discovery

Distributed Information Management

● Integration, Virtualization, Federation, Quality, Provenance, Validation, Discovery

● Semantic Technologies lay the foundation for a new paradigm

○ RDF, RDFS, OWL, SPARQL, R2RML, RIF, Provenance Ontology...

● Tools are catching up● Domain Ontology makes sense of it all

Questions?See Revelytix.com for more information

Thank You!

Michael Lang [email protected]