Reinventing Laboratory Data To Be Bigger, Smarter & Faster
-
Upload
osthus -
Category
Data & Analytics
-
view
33 -
download
0
Transcript of Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Heiner Oberkampf, PhD
Consultant Semantic Technologies
Smartlab Exchange Feb. 2016, Berlin
Reinventing Laboratory Data To Be
Bigger, Smarter & Faster
Slide 4
Big Data has continued to evolve rapidly
Data Warehouses exist and are still widely used
Requires too much effort for limited gains
Data Lakes are a rising trend
Can hold all types of data
Little to no data transformation required
Schema-on retrieval and analytics
Graph Technology gains traction
Taxonomies, ontologies, controlled vocabularies, etc.
Very flexible schema
Focus on linking information
Data Continues to Rapidly Change and Grow
“Big data predictive analytics
architectures are changing
beyond just data lakes.
Expect a lot of progress over
the next few years.”
“Graph Databases are rapidly
gaining traction in the market
as an effective method for
deciphering meaning”
Forrester: Brian Hopkins' Blog, July 27, 2015
Forbes: Tony Agresta, Apr 6, 2015
Slide 5
Understanding the 4V’s of Big Data
Normally the focus of
Big Data Solutions
Performance is
Critical to Success
Data Complexity is
Increasing
Handling Uncertainty
Requires Statistics
Majority of Big Data analytics
approaches treat these two V’s
Semantic
technologies provide
clear advantages
Mathematical
Clustering
Techniques
provide clear
advantages
Focus of OSTHUS
Slide 7
Many challenges exist for data to be
captured, integrated and shared:
Data Silos
Incompatible instruments and
software systems, proprietary data
formats
Legacy architectures are brittle and
rigid
SME knowledge resides in people’s
heads, little common vocabulary
Lack of common vision between
business units and scientists
Laboratory Data Has Not Been Able to Keep Pace
The Average Scientists Desktop
Slide 8
Data Lakes are centered around Big Data
Utilize cloud technology for scalability
Extensive user access across an organization
Data Lakes can contain numerous types of data
Structured & unstructured data can be captured
in the same way
Raw data can be maintained over time
Because data is not “transformed” via standard
ETL – it can be “sliced and diced” in a lot of
different ways
What Are Data Lakes & Why Are They So Popular?
Slide 9
Using Data Lakes “the proposition of enterprise
wide data management has yet to be realized”
(Gartner, July 28, 2014)
Governance is a big issue
Data Lakes are best used by specific groups of
trained individuals (Data Scientists)
Not meant to be used by an entire enterprise
Customers we are engaged with have varied
results with Data Lakes
The ones who tend to have the most success put
some kind of light-weight schema in place
Somewhere between heavy ETL (Data
Warehouse) and nothing
What is Problematic About Data Lakes?
“Not if you have to clean up a data swamp!”
Slide 10
AT OSTHUS LAB DATA SCIENCE IS
B IG ANALYS IS
STA
TIS
TIC
AL
SE
MA
NT
ICS
MA
CH
INE
LE
AR
NIN
G
RE
AS
ON
ING
Slide 11
At OSTHUS Data Science has a special meaning
Data Science is more than just statistical analysis
We combine math-based approaches (statistics) with logic-based approaches (semantics)
Conceptual + Computational
Semantics
Provides the vocabularies, definitions, class structures, logical relationships and conceptual
models
Statistics
Provide computations, trending, analysis, learning over time from the data itself
What is Data Science?
Slide 12
Semantic Spectrum of Knowledge Organization Systems
• Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003.
• Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420
• Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010.
• Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008.
Sources
Slide 13
Allotrope Example: Semantics Provides Common Meaning
Allotrope Data Format (ADF)
Instance Data
Allotrope Data Models (ADM)
Constraints
Allotrope Foundation Ontologies (AFO)
Classes and Properties
is structured by
is classified by
provide standardized
vocabulary for
Slide 14
Enterprise Applications Often Require Hybrid Architectures
Cloud DBs (NoSQL)
Analytics
Dashboards & Reports
Structured Data
Semantic DBs
Unstructured
Documents
Public Data
Instrument Data
Light-weight Semantic Integration Layer
Slide 15
Smart labs in the future will provide the
enterprise with:
Integrated Data: common reference data
structures (vocabularies)
Sharable Data: easier interaction across
teams and business units
Scalability: Big data applications that can be
highly elastic
Conceptual Representations: context and
perspective are captured
Advanced Analytics: complex & automated
problem-solving capabilities
21st
Century Labs Can Gain From This Approach