Tamr | cdo-summit

13
Enterprise Data Unification in Practice IHAB ILYAS Professor, University of Waterloo Co-founder, Tamr, Inc. @ihabilyas

Transcript of Tamr | cdo-summit

Enterprise Data Unification in Practice

IHAB ILYAS

Professor, University of Waterloo

Co-founder, Tamr, Inc.

@ihabilyas

Top-Down Data Integration Limits Data Quality and Connectedness

<10%

Enterprise data

is siloed . . .. . . expensive to

connect & curate

# of sources

$

The Consequences:

• Limited data available

• Missed opportunity

• Ballooning costs

Hiring More Data Experts Is Not the Answer

Reality Enterprise RealityGoal

• Manual data collection

and preparation

• Long lead-time to

analyses

• Limited individual view

on variety of data

• Extensive rework

• No cohesive view of

data efforts

• Expertise across organization

is underutilized

Data Curation: Many Definitions and One Goal

Extract Value from Data

“For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”NYtimes August, 2014

Exploding Big Data Variety Will Make the Problem WorseR

ad

ica

l Incre

ase

in

Da

ta V

arie

ty

0

2000 2011

Source: IDC 2011 Digital Universe Study

1.0

2.0

Corporate databases

Semi-structured data JSON Sources

Increasingly valuable

Missing Capability:

Connecting and

curating in an

automated way

Structured and

Semi-structured

Data Sources

Collaborative

Curation

Data Experts

(Source owners)

Data Stewards

and Curators

Data

Inventory

APIs

Systems

Tools

Data

Scientists

The Core of Tamr: Machine Learning with Human Insight

Advanced

Algorithms &

Machine

Learning

Expert

Input

Integrated

Data &

Metadata

Identify sources, understand relationships and curate the massive variety of siloed data

Expert

Directory

DemoExample

Use Cases

Solution Overview: Sourcing & Supply Chain Spend Optimization

The Problem

• Part/supplier data in ERPs, life cycle management

systems, and catalogs across departments

• Inaccurate data / incongruent naming conventions

The Solution

• Create a unified schema that leverages all

relevant data sources, including parts,

procurement, logistical, and vendor data

Benefit

• Discover opportunities to optimize purchases

across different suppliers and lines of business Tamr Unified View

Hundreds of Potential Sources

Solution Overview: Customer Data Integration

The Problem

• Customer data stored in CRMs, data warehouses,

back-office applications, and other enriching sources

• Complexity of unifying personal data / incongruent

naming conventions / data sparseness / manual entry

The Solution

• Create a holistic and adaptive customer view by

unifying disparate data sources across the enterprise

Benefits

• Apply a unified and enriched customer view across

multiple channels / lines of business

• Discover hidden opportunities to improve upsell /

cross-sell, reduce churn, and identify key opinion

leaders (KOL) via enhanced segmentation/targeting

Solution Overview: Clinical Trials

The Problem

• Clinical trial data is reported in a wide variety of

formats, ontologies and standards

• Underspecified attribute names, varying

qualities of annotation, duplicate data etc…

The Solution

• Unify attribute names to build a common clinical

trial data model

Benefit

• Ability to cluster clinical trials based on drug, target or investigator

• Easier way to aggregate and report ongoing trial data

• Simplified reporting for various agency ontologies

Solution Overview: Medical Instruments

The Problem

• Instruments perform experiments at thousands

of labs and hospitals across the world

• Data stored in inconsistent formats and

standards across various labs and hospitals

The Solution

• Build a unified view of instruments leveraging all

available internal/external data-sources

Benefit

• Ability to cluster analysis based on instrument,

location and other attributes

Tamr Architecture: a Data Curation Stack

DemoQuestions?

@Tamr_Inc

www.tamr.com