Domain Semantics
-
Upload
mlang22 -
Category
Technology
-
view
520 -
download
0
description
Transcript of Domain Semantics
Domain Semanticsor:
How I Learned to Stop Worrying and Love the Ontology
Michael Lang Jr.Director of Ontology Services
Revelytix, Inc.
Software Today
Siloed Information Management - Good● Drives nearly all day-to-day operations of any business● Optimized for transactions - ACID, CRUD
Distributed Information Management - Bad● Massive amounts of data generated● Operations mindset assumes single application context
○ Stovepipes and silos● Managing distributed information is extremely difficult
○ Analysis!
Use Cases
● Pharma○ Drug Pipeline Management
● Department of Defense○ Enterprise Information Web
● Financial Services○ Back-office Trade Data Analysis
Pharma - Drug Pipeline Management
● Better ability to analyze data improves success rate● Any increase in success rate of drugs in pipeline can
represent huge ROI○ Kill one drug in Phase 1 instead of Phase 3,
save $1 Billion
DOD - Business Mission Area
● Services own and operate their own systems● Numerous OSD-level reporting requirements (LAWS!)
○ DOD is not audit-able (ILLEGAL!)
Financial ServicesBack-office Trade Data Analysis
● Many compliance regulations written by national and global bodies which change often
● Relatively small IT ecosystem...still can't meet compliance reporting requirements
Financial ServicesBack-office Trade Data Analysis
● Inability to meet compliance reporting requirements means regulations are impossible to enforce
● No regulations means trouble...○ 2008 Financial crisis
Problem Summary
● Most major business operations involve many different groups of people
○ Different Roles○ Different Organizations○ Different Companies
● Different groups use different systems○ Systems are built in silos
● Many different sets of semantics ● Many different schemas
Analyzing and sharing data is difficult
Distributed Information Management
Enterprises require a new paradigm of information technology where distributed information is assumed...Semantics
Capabilities include (some good buzzwords):● Data Integration
○ Virtualization○ Federation
● Data Quality○ Provenance○ Validation
● Data Discovery
Semantic Technology:The Ground Floor
Anyone can say Anything about Anything
Standards● URI - the universal identification scheme
○ URL - the universal location scheme● RDF - the data model● SPARQL - the query language
Benefits● URIs give universal identifiers (non-local) identifiers to things● RDF is schema-less; extensibility is not an issue● RDF-merge defines a standard way to combine disparate datasets● SPARQL specification defines federation capabilities● SPARQL operates over HTTP using URLs
Why Ontology?
Remaining Challenges● URIs/RDF/SPARQL take you a long way, but you are not
home yet● Distributed data is easy to combine and access, but difficult
to interpret○ How do you know how to combine data?○ How do you find they data you need?○ How do you know what anything means?
Domain Ontology
Machine and human readable description of a domain
● Expressed as RDF○ Part of your data○ Meta layers depend on your point of view; not your toolset
● Formal semantics○ Define your vocabulary with precision○ Infer new information○ Detect data quality issues
● Layered Descriptions○ Easily combine one type of description with another
■ Data model, Provenance, Architecture, Standards, Policies, Processes...anything
Ontology Architecture
● A collection of descriptions that are used to enable a specific set of analytic use cases
○ Enumerates the set of ontologies to be used○ Defines the high-level structure and logical profile of
individual ontologies○ Defines relationships between ontologies
● Not defined in a vacuum○ Domain Ontology○ Metadata Ontology○ Executable Semantic Languages○ R2RML, SPARQL, RIF○ Tools!
■ Triple-stores, query engines, RDB2RDF translators, rule engines, existing applications, etc.
Distributed Information Management
● Integration, Virtualization, Federation, Quality, Provenance, Validation, Discovery
● Semantic Technologies lay the foundation for a new paradigm
○ RDF, RDFS, OWL, SPARQL, R2RML, RIF, Provenance Ontology...
● Tools are catching up● Domain Ontology makes sense of it all