Reinventing Laboratory Data To Be Bigger, Smarter & Faster

Click here to load reader

Embed Size (px)

Transcript of Reinventing Laboratory Data To Be Bigger, Smarter & Faster

  • Heiner Oberkampf, PhD

    Consultant Semantic Technologies

    [email protected]

    Smartlab Exchange Feb. 2016, Berlin

    Reinventing Laboratory Data To Be

    Bigger, Smarter & Faster

  • Slide 2

    Data! but how?

  • Slide 3

    Two Ends of a Spectrum of Possible Solutions

    Data Warehouse Data Lake

  • Slide 4

    Big Data has continued to evolve rapidly

    Data Warehouses exist and are still widely used

    Requires too much effort for limited gains

    Data Lakes are a rising trend

    Can hold all types of data

    Little to no data transformation required

    Schema-on retrieval and analytics

    Graph Technology gains traction

    Taxonomies, ontologies, controlled vocabularies, etc.

    Very flexible schema

    Focus on linking information

    Data Continues to Rapidly Change and Grow

    Big data predictive analytics

    architectures are changing

    beyond just data lakes.

    Expect a lot of progress over

    the next few years.

    Graph Databases are rapidly

    gaining traction in the market

    as an effective method for

    deciphering meaning

    Forrester: Brian Hopkins' Blog, July 27, 2015

    Forbes: Tony Agresta, Apr 6, 2015

  • Slide 5

    Understanding the 4Vs of Big Data

    Normally the focus of

    Big Data Solutions

    Performance is

    Critical to Success

    Data Complexity is

    Increasing

    Handling Uncertainty

    Requires Statistics

    Majority of Big Data analytics

    approaches treat these two Vs

    Semantic

    technologies provide

    clear advantages

    Mathematical

    Clustering

    Techniques

    provide clear

    advantages

    Focus of OSTHUS

  • Slide 6

    Laboratory Data Covers all Vs of Big Data

  • Slide 7

    Many challenges exist for data to be

    captured, integrated and shared:

    Data Silos

    Incompatible instruments and

    software systems, proprietary data

    formats

    Legacy architectures are brittle and

    rigid

    SME knowledge resides in peoples

    heads, little common vocabulary

    Lack of common vision between

    business units and scientists

    Laboratory Data Has Not Been Able to Keep Pace

    The Average Scientists Desktop

  • Slide 8

    Data Lakes are centered around Big Data

    Utilize cloud technology for scalability

    Extensive user access across an organization

    Data Lakes can contain numerous types of data

    Structured & unstructured data can be captured

    in the same way

    Raw data can be maintained over time

    Because data is not transformed via standard

    ETL it can be sliced and diced in a lot of

    different ways

    What Are Data Lakes & Why Are They So Popular?

  • Slide 9

    Using Data Lakes the proposition of enterprise

    wide data management has yet to be realized

    (Gartner, July 28, 2014)

    Governance is a big issue

    Data Lakes are best used by specific groups of

    trained individuals (Data Scientists)

    Not meant to be used by an entire enterprise

    Customers we are engaged with have varied

    results with Data Lakes

    The ones who tend to have the most success put

    some kind of light-weight schema in place

    Somewhere between heavy ETL (Data

    Warehouse) and nothing

    What is Problematic About Data Lakes?

    Not if you have to clean up a data swamp!

  • Slide 10

    AT OSTHUS LAB DATA SCIENCE IS

    B IG ANALYS IS

    STA

    TIS

    TIC

    AL

    SE

    MA

    NT

    ICS

    MA

    CH

    INE

    LE

    AR

    NIN

    G

    RE

    AS

    ON

    ING

  • Slide 11

    At OSTHUS Data Science has a special meaning

    Data Science is more than just statistical analysis

    We combine math-based approaches (statistics) with logic-based approaches (semantics)

    Conceptual + Computational

    Semantics

    Provides the vocabularies, definitions, class structures, logical relationships and conceptual

    models

    Statistics

    Provide computations, trending, analysis, learning over time from the data itself

    What is Data Science?

  • Slide 12

    Semantic Spectrum of Knowledge Organization Systems

    Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003.

    Michael Uschold and Michael Gruninger Ontologies and semantics for seamless connectivity SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420

    Leo Obrst The Ontology Spectrum. Book section in of Roberto Poli, Michael Healy, Achilles Kameas Theory and Applications of Ontology: Computer Applications. Springer Netherlands, 17 Sep 2010.

    Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities. 2008.

    Sources

  • Slide 13

    Allotrope Example: Semantics Provides Common Meaning

    Allotrope Data Format (ADF)

    Instance Data

    Allotrope Data Models (ADM)

    Constraints

    Allotrope Foundation Ontologies (AFO)

    Classes and Properties

    is structured by

    is classified by

    provide standardized

    vocabulary for

  • Slide 14

    Enterprise Applications Often Require Hybrid Architectures

    Cloud DBs (NoSQL)

    Analytics

    Dashboards & Reports

    Structured Data

    Semantic DBs

    Unstructured

    Documents

    Public Data

    Instrument Data

    Light-weight Semantic Integration Layer

  • Slide 15

    Smart labs in the future will provide the

    enterprise with:

    Integrated Data: common reference data

    structures (vocabularies)

    Sharable Data: easier interaction across

    teams and business units

    Scalability: Big data applications that can be

    highly elastic

    Conceptual Representations: context and

    perspective are captured

    Advanced Analytics: complex & automated

    problem-solving capabilities

    21st

    Century Labs Can Gain From This Approach

  • Thank You! Questions?