Mainstreaming BI and Analytics with Enterprise Data
Unification
Shobhit Chugh | Tamr
Data Heterogeneity is Inherent in Large Companies
Data sources are bound to applications with idiosyncratic bias
Sales
Marketing
Manufacturing
HR
Support
Finance
AppsStoreApps Store
Sales
Marketing
Manufacturing
HR
Support
Finance
Aggregation of Data Creates Ambiguity/Complexity
Broad analytics create need to bring data together from many sources
Outside Forces = More Confusion + Complexity
Leadership
Changes
Mergers &
AcquisitionsReorganizations
Result: Just 10% of Data is Consumable by Any One Person
And 80% of data scientist time is spent preparing it
90%Dark Data
Expectations for Global Corporate IT as Data Broker
Increasing quickly -- along with the hype about Big Data/Analytics 3.0
HR
Sales
FinanceDivisions
Marketing MFG
ENG
Some Options
Option #1 - Deny Variety - use information that is easiest/closest
Option #2 - Manage Variety incrementally - using traditional approaches:
● Standardization
● Aggregation
● Master Data Management
● Rationalize Systems
● Throw Bodies at it
● Improve Individual Productivity
Option #3 - Embrace Variety using probabalistic/model based approach - Tamr
Logical Evolution to Probabilistic/Model-Based Approach
Probabilistic
Deterministic
Probabilistic
Deterministic
Today Future
Probabilistic (Tamr) complements, NOT Replaces, Deterministic (MDM)
INTRODUCING TAMR
▪ Founded in 2013 by
enterprise database software
veterans
▪ World-class engineering team
▪ Top tier venture backing
(Google Ventures, NEA)
Jerry Held, PhD
Andy Palmer Mike Stonebraker,PhD
Ihab Ilyas, PhD
Kevin Burke Nidhi Aggarwal, PhD
Min Xiao Nik Bates-Haus
Kevin Willis
9
Managing enterprise information as an asset requires a new,
bottom-up design pattern
Catalog Connect Consume
ALL your metadata and
map it to logical entitiesEntities and attributes to
remove information silos
Unified data in the application
of your choice via APIs
“Embrace” Variety -- Tamr’s NextGen Approach
Tamr’s Design Pattern: “Back to the Future”
1990’s Web:Yahoo’s top-down
organization
2020’s Enterprise:Probabilistic data source cataloging,
connection and consumption
12
ARCHITECTURE
DATA &
METADAT
A
SOURCES
Analytics,
visualization,
Data Warehouse
Expert Sourcing
Data
Profiling
Schema
Matching
Record
Deduplication
Data Connection Activities
Data
Security
Data
Governance
Machine Learning
DB, ERP,
CRM, CSV
+ DATA
USES
Data
Security
Fortune 50 company -- Optimized Sourcing Analysis
Benefits● Massive reductions in
supplier list size & number
of distinct suppliers
● Automated data
maintenance; lower cost
of ownership
● Powering strategic
sourcing analytics and
governance
● Empowering individual
procurement team with
global view of payment
terms
Catalog
Tamr helps you catalog
metadata across the entire
enterprise, providing a logical
map of all of your information
Find us at Booth #613
Connect
Tamr helps match entities
and attributes across the
full variety of your sources,
leveraging entity relationships
for high accuracy
Consume
Tamr provides a consolidated
view of entities and records for
downstream applications via
a set of RESTful APIs
learn more at tamr.com
Find us at Booth #613
ABSTRACT (FOR REFERENCE)
Organizations want to use all the data available to them for analytics. But they’ve been thwarted by
data silos and top-down, mostly manual approaches to unifying data for analytics. A new approach,
based on machine learning combined with human expert sourcing, dramatically speeds analytics’
time-to-value. It automates data unification end-to end: from finding and connecting diverse data to
interactive consumption by virtually anyone using any analytic tool.
Top Related