Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
-
Upload
huguk -
Category
Technology
-
view
188 -
download
0
Transcript of Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Hadoop User Group London: Data Wrangling on Hadoop September 8 2016
Olivier de Garrigues, EMEA Solutions Lead
Creating radical productivity for people who analyze data.
JEFFREY HEER Co-Founder & CXO
VISUALIZATION
JOE HELLERSTEIN Co-Founder & CSO
BIG DATA
SEAN KANDEL Co-Founder & CTO
HUMAN-COMPUTER INTERACTION
What is Data Wrangling?
4
QUESTION ANALYZE INSIGHT DISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH
The Bridge Between Raw Data & Analysis
5
v
Ingestion Storage Processing
ANALYSIS & VISUALIZATION
LOB CLEANING ENRICHMENT DISTILLATION STRUCTURING DISCOVERY
End-User Capabilities
IT GOVERNANCE INTEGRATION AVAILABILTIY SCALABILITY SECURITY
Technical Capabilities
TRIFACTA
DATA WRANGLING WORKFLOW
Trifacta. Confidential & Proprietary.
Sample Scale Up
Refine Sample
Results
Identify/Register Data
1. Predictive Interaction
2.
Co
nsu
me
Schedulers
Monitor and Adjust
3.
Schedule
Visualization & Analysis
Secure Access
Ingestion Processing Storage
ANALYSIS & CONSUMPTION
v
Discover Structure Clean Enrich Distill
LOB
IT
News Topics Time
Trades Tickers Date
$
eMails Recipients
Topics
Phone Logs Call Details Recipients
Corporations Company Relations
Individuals
Financial Services use case: Trader Fraud