Download - Reinventing Laboratory Data To Be Bigger, Smarter & Faster

Transcript

Heiner Oberkampf, PhD

Consultant Semantic Technologies

[email protected]

Smartlab Exchange Feb. 2016, Berlin

Reinventing Laboratory Data To Be

Bigger, Smarter & Faster

Slide 2

Data! …but how?

Slide 3

Two Ends of a Spectrum of Possible Solutions

Data Warehouse Data Lake

Slide 4

Big Data has continued to evolve rapidly

Data Warehouses exist and are still widely used

Requires too much effort for limited gains

Data Lakes are a rising trend

Can hold all types of data

Little to no data transformation required

Schema-on retrieval and analytics

Graph Technology gains traction

Taxonomies, ontologies, controlled vocabularies, etc.

Very flexible schema

Focus on linking information

Data Continues to Rapidly Change and Grow

“Big data predictive analytics

architectures are changing

beyond just data lakes.

Expect a lot of progress over

the next few years.”

“Graph Databases are rapidly

gaining traction in the market

as an effective method for

deciphering meaning”

Forrester: Brian Hopkins' Blog, July 27, 2015

Forbes: Tony Agresta, Apr 6, 2015

Slide 5

Understanding the 4V’s of Big Data

Normally the focus of

Big Data Solutions

Performance is

Critical to Success

Data Complexity is

Increasing

Handling Uncertainty

Requires Statistics

Majority of Big Data analytics

approaches treat these two V’s

Semantic

technologies provide

clear advantages

Mathematical

Clustering

Techniques

provide clear

advantages

Focus of OSTHUS

Slide 6

Laboratory Data Covers all V’s of Big Data

Slide 7

Many challenges exist for data to be

captured, integrated and shared:

Data Silos

Incompatible instruments and

software systems, proprietary data

formats

Legacy architectures are brittle and

rigid

SME knowledge resides in people’s

heads, little common vocabulary

Lack of common vision between

business units and scientists

Laboratory Data Has Not Been Able to Keep Pace

The Average Scientists Desktop

Slide 8

Data Lakes are centered around Big Data

Utilize cloud technology for scalability

Extensive user access across an organization

Data Lakes can contain numerous types of data

Structured & unstructured data can be captured

in the same way

Raw data can be maintained over time

Because data is not “transformed” via standard

ETL – it can be “sliced and diced” in a lot of

different ways

What Are Data Lakes & Why Are They So Popular?

Slide 9

Using Data Lakes “the proposition of enterprise

wide data management has yet to be realized”

(Gartner, July 28, 2014)

Governance is a big issue

Data Lakes are best used by specific groups of

trained individuals (Data Scientists)

Not meant to be used by an entire enterprise

Customers we are engaged with have varied

results with Data Lakes

The ones who tend to have the most success put

some kind of light-weight schema in place

Somewhere between heavy ETL (Data

Warehouse) and nothing

What is Problematic About Data Lakes?

“Not if you have to clean up a data swamp!”

Slide 10

AT OSTHUS LAB DATA SCIENCE IS

B IG ANALYS IS

STA

TIS

TIC

AL

SE

MA

NT

ICS

MA

CH

INE

LE

AR

NIN

G

RE

AS

ON

ING

Slide 11

At OSTHUS Data Science has a special meaning

Data Science is more than just statistical analysis

We combine math-based approaches (statistics) with logic-based approaches (semantics)

Conceptual + Computational

Semantics

Provides the vocabularies, definitions, class structures, logical relationships and conceptual

models

Statistics

Provide computations, trending, analysis, learning over time from the data itself

What is Data Science?

Slide 12

Semantic Spectrum of Knowledge Organization Systems

• Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003.

• Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420

• Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010.

• Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008.

Sources

Slide 13

Allotrope Example: Semantics Provides Common Meaning

Allotrope Data Format (ADF)

Instance Data

Allotrope Data Models (ADM)

Constraints

Allotrope Foundation Ontologies (AFO)

Classes and Properties

is structured by

is classified by

provide standardized

vocabulary for

Slide 14

Enterprise Applications Often Require Hybrid Architectures

Cloud DBs (NoSQL)

Analytics

Dashboards & Reports

Structured Data

Semantic DBs

Unstructured

Documents

Public Data

Instrument Data

Light-weight Semantic Integration Layer

Slide 15

Smart labs in the future will provide the

enterprise with:

Integrated Data: common reference data

structures (vocabularies)

Sharable Data: easier interaction across

teams and business units

Scalability: Big data applications that can be

highly elastic

Conceptual Representations: context and

perspective are captured

Advanced Analytics: complex & automated

problem-solving capabilities

21st

Century Labs Can Gain From This Approach

Thank You! Questions?