Making EDW More Flexible with Hadoop - SNIA SNIA Analytics and Big Data Summit. © Pentaho...
Transcript of Making EDW More Flexible with Hadoop - SNIA SNIA Analytics and Big Data Summit. © Pentaho...
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
Making EDW More Flexible
with Hadoop
Rob Rosen Big Data GTM Lead Pentaho Corporation
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
The State of Data Warehouses
2
Gartner Research Publication Date: 1 December 2010 ID Number: G00208101 Predicts 2011: Data Management Disciplines Elevate Business Criticality
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
TDWI Hadoop Survey: Business Intelligence and Data Warehouse
3
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
Competitive Advantage vs. Operational Efficiency
4
Operational Efficiency Competitive Advantage
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
5
Hadoop = Infrastructure Software
Costs
Time
Flexibility
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
6
Barriers to Implementing Hadoop Technologies
“It’s complex & difficult, plus our executives don’t understand it. Where should I start?”
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
7
Use Case Scenario – Call Volume Analysis VOIP service provider with a B2B customer base
wants to sub-lease excess capacity on the weekends
COO: what are the top 10 states for outbound calls on Fridays, Saturdays and Sundays?
Detailed information available, but not in the EDW: Call records: date/timestamp & source phone # Reference data: area code by country, state & time
zone (North American Numbering Plan)
?
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
8 © 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Extract Transform
Load
Structured Data
Dashboard
Report
Analysis
Data Mart(s) / Warehouse
Metadata
Data Integration
Data acquisition & ingestion
Parsing
Cleansing
Enrichment
Data Integration Dimension management
Bulk loading DB management
“SQL or ETL Tool”
Traditional EDW Architecture
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
Challenges with the Traditional EDW
EDW can’t handle increasing data and workloads, so companies must:
• Reduce the volume of data
• Restrict end-user access (# of users or access windows) to accommodate longer batch processing windows
• Purchase additional capacity (hardware / licenses), which can be as much as $100K / TB
Then, companies are faced with the following challenges:
• The trade-off: more data versus user-experience
• The incremental outlay of capital required to expand the EDW or purchase more proprietary ETL tool capacity
• The inability of the incumbent ETL vendor to work with Hadoop
9
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
EDW Architecture – Hadoop Front-End
10
Data Integration
Data acquisition & ingestion
Parsing
Cleansing
Enrichment
ETL ETL
Data Integration Dimension management
Bulk loading DB management
Structured Data
Unstructured Data
Dashboard
Report
Analysis
Data Mart(s) / Warehouse
Metadata
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
Data Pipeline
11
2012/03/06 00:00:00.000,12054290060 2012/02/21 00:00:00.000,18774230140 2012/03/08 00:00:00.000,12152900580 2012/02/18 00:00:00.000,17732350700 2012/03/08 00:00:00.000,17242490750
3,2012,6,201,NJ,UNITED STATES,E,Friday,1 3,2012,6,513,OH,UNITED STATES,E,Friday,1 3,2012,6,850,FL,UNITED STATES,EC,Friday,1 3,2012,7,631,NY,UNITED STATES,E,Saturday,1 3,2012,6,650,CA,UNITED STATES,P,Friday,1
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
12
Analysis & Visualization
3,2012,6,201,NJ,UNITED STATES,E,Friday,1 3,2012,6,513,OH,UNITED STATES,E,Friday,1 3,2012,6,850,FL,UNITED STATES,EC,Friday,1 3,2012,7,631,NY,UNITED STATES,E,Saturday,1 3,2012,6,650,CA,UNITED STATES,P,Friday,1 . . .
SQL over Hadoop tool
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
13
EDW Optimization is a logical first use case for
Hadoop: Tremendous cost savings Revenue enhancement potential Deliver value and gain experience with Hadoop
Benefits: Increased revenue…AND lowered costs Archive onto lower-cost storage
platform…recover EDW operational headroom…lower storage costs
Understand transactional context to gain deeper insight into customer behavior
Leverage the Ecosystem for Assistance
Summary
Costs
Time
Flexibility
2013 SNIA Analytics and Big Data Summit. © Pentaho Corporation. All Rights Reserved.
Questions and Answers
Thank You! [email protected] @robrosen3 415-525-5555 ofc 925-998-4422 mob
14