The Power of 11g Automatic SQL Tuning Julian Dontcheff, Nokia ...
Big Data Analytics Platform @ Nokia · Nokia Internal Use Only 18 . Hadoop VS SQL/Analytical DB ....
Transcript of Big Data Analytics Platform @ Nokia · Nokia Internal Use Only 18 . Hadoop VS SQL/Analytical DB ....
Selecting the Right Tool for the Right Workload
Yekesa Kosuru Nokia
Location & Commerce
Strata + Hadoop World NY - Oct 25, 2012
Big Data Analytics Platform @ Nokia
1 1
•Big Data Analytics Platform @Nokia −Who we are −Use case data flows −Big data platform −Big data challenges
•Selecting the Right Tool for the Right Workload −Hadoop VS SQL −Which analytical database −Why InfiniDB
Agenda
2 2
Nokia Internal Use Only
Great Mobile Products That Sense the World
WIN IN SMART DEVICES
CONNECT THE NEXT BILLION
INVEST IN FUTURE DISRUPTIONS
CREATE A LEADING “WHERE” PLATFORM
3 3
Nokia Internal Use Only
Apps
Smart Data Platform
Content
Positions Maps Traffic Places Directions Guidance
One Platform, Enabling Contextually Rich Mobile Experiences
4 4
Click to edit Master title style Big DATA ANALYTICS Platform @Nokia
5 5
6
Business Challenges • Data silos, missing semantics
• Multiple sources - overlapping, conflicting
• Timely processing of large volumes of data
• Partial, insufficient, inaccurate, inconsistent.. data
• Security, privacy and other policies unknown
Central Analytics Platform created!
7
Statistics • 10’s PB of data all across Nokia
• Multi-tenant, multi-petabyte analytics cluster
• 10-20K+ jobs per day
• 600+ internal users
• 250M+ KV queries
• Over a terabyte flowing every day
• Multiple data centers around the world
Nokia Internal Use Only
Places Data Store (POI)- Use Case
Search
Platform
Data flow
Cloud Infrastructure
Account Management
Places Manager
Suppliers Places API Transaction
al Data HDFS Analytical DB
BI
Place CRUD
2
Supplier Uploads Data
3 Updated Blend
Record
6
Places Data Analytics
5
ETL and Blend places
4
Places Extract Portal
Delivered to OnlineSystems
7
Access Control
Authentication
User Logs In
1
Data Intake
Data Processing
FTP Oozie Sched
MR Blend
Hive Pig
MR SQL
Places Content
Analytics
K-V Store
8
Nokia Internal Use Only
Reports
Analytical DB
Analytics Cluster
Big Data Analytics Platform Data Flows
Data Asset Catalog
Oracle
Dashboards
Data Discovery
InfiniDB
Interactive Queries
Batch Queries
Web Applications
Activity Logs
VShards (NoSQL)
Reference Data
Device Applications
Probes
3rd Party
Device
User Profile
POI, Map
Activity Sensor
Dat
a In
take
ETL,
Alg
orith
ms
Agg
rega
tion
HDFS
9
Nokia Internal Use Only 10
Big Data Analytics Platform Data Flows
Analytical DB
Analytics Cluster
Data Asset Catalog
Oracle
Data Discovery
InfiniDB
Interactive Queries
Batch Queries
Dat
a In
take
ETL,
Alg
orith
ms
Agg
rega
tion
HDFS
• Logical Tiers −Technology Platform −Data Platform −End User Layer (not shown)
Big Data Analytics Platform
ETL,
Alg
orith
ms
Agg
rega
tion
Data Asset Catalog
Data
Dat
a In
take
HDFS
Technology
Analytical DB
Technology Platform
12
Hadoop R VShards (KV) Scribe, FTP Hive, Pig InfiniDB,
Oracle
Export/ Import
Workflow Engine
Config./ Deploy Monitor Alerts Archiver Scheduler
Security/Kerberos & ACL
Cloud Infrastructure
Data Platform
13
Self Serve Tools
ETL, Agg Algorithms Data Quality Data Asset
Catalog
Data, Metadata, Operational Data
Workflow Orchestration
Technology Platform
14
Data Platform – Analytics Lifecycle
Self Serve Tools
ETL, Agg Algorithms Data Quality Data Asset
Catalog
Data, Metadata, Operational Data
Collect Ingest Organize Analyze Deliver
Technology Platform
Nokia Internal Use Only
Data Platform: Managing the Data Asset • Data Quality - garbage in , garbage out − Rules for validating, cleaning data, other heuristics − Trusting your insights − Process Quality − Light weight governance (semantics, integrity, privacy and
quality)
• Data Asset Catalog – describe your data − Capture essential metadata and logical domain models for
assets −physical model, logical model, policies, classifications −dependencies with other assets
− Serves as a entry-point to data browsing and asset discovery − Insulates subject matter experts from physical details of data
asset
Nokia Internal Use Only
Big Data Challenges
• At every level - capture, curate, storage, process, visualize..
• Hadoop or SQL ? − Performance of analytical database ? − Batch or Interactive analysis − Neither SQL nor MR fits all problems
• Data & Metadata Fragmentation
Click to edit Master title style Selecting the Right Tool for the Right Workload
17 17
Nokia Internal Use Only 18
Hadoop VS SQL/Analytical DB
SQL/DW • Discover the question • Interactive/Fast • No coding • Standard industry tools • Mutable (Type 1 SCD) • Schema on Write • Analyst • Time to Wisdom
SQL/Analytical DB • Standard industry
tools • Interactive/Fast
(secs) • No coding, e.g. built-
in functions • Reasonable complex • Discover the
question
Hadoop/Hive/MR • ETL on steroids,
Scale • Batch/slow • Bunch of coding,
arbitrary complex • Harvest & load
into DW • Discover the
answer
19
Why InfiniDB ?
• Works with BI tools (standard JDBC driver)
• Column oriented, MPP, clean architecture
• Horizontal and vertical partitioning, clever pruning
• Stream based MR like processing
• Efficient joins
• No indexes
• Impressive benchmarks
• Cloud deployment model
Nokia Internal Use Only
InfiniDB vs Hive Performance
0
500
1000
1500
2000
2500
A B C D E F G
infiniDB (sec)
Hive (sec)
Query InfiniDB (sec) Hive (sec) A B 76.32 2155.92 C 25.59 1181.48 D 59.72 1497.22 E 1.8 446.5 F 12.38 1307.38 G 24.32 1886.81
Analytic Queries
Exe
cutio
n Ti
me(
secs
)
Click to edit Master title style InfiniDB Under the Hood
21 21
What is InfiniDB?
22
®
Scalable
Fast
Simple
Analytics Data Platform Foundation
23
Analytics Data Platform
Columnar Performance Efficiency
MapReduce style Query Execution
Widely used MySQL Interface
®
InfiniDB Building Blocks
24
Purpose built for big data analytics. •User Module (UM)
•Performance Module (PM)
or …
Single Server
InfiniDB Building Blocks
25
Purpose built for big data analytics. •User Module (UM)
Understands SQL •Performance Module (PM)
Operates on data blocks
or …
Single Server
Nokia Internal Use Only
InfiniDB M/R Style Distribution of Work “Map-Reduce Inside”
InfiniDB DoW Hadoop M/R Scalability Linear Linear
N-squared Problem Avoided Avoided
Latency Low Medium-High
Intermediate Results Handling
Stream-based File-based
Report Language SQL Erlang M/R, Hive, Pig
Tuning Automatic Manual
Real-Time Analytics Real-time access to granular data
Access to pre-defined aggregates
Ad-Hoc Full Ad-Hoc performance None
Data Storage Structured Unstructured
26
Independent InfiniDB Benchmark
Q1 Series 2 table Joins
Q2 Series 3 table Joins
Q3 Series 4 table Joins
Q4 Series 5 table Joins
27
28
Takeaways
• Hadoop is good but….
• Pay attention to data quality
• Hadoop or SQL
• Describe your data
THANK YOU Yekesa Kosuru Distinguished Architect, Nokia [email protected] www.nokia.com @Nokia Jim Tommaney CTO, Calpont [email protected] www.calpont.com @Calpont, @InfiniDB