Analytic Platforms in the Real World with 451Research and Calpont_July 2012
-
Upload
calpont-corporation -
Category
Technology
-
view
1.474 -
download
0
description
Transcript of Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Calpont InfiniDB® Accelerating Data Insights
Where the Rubber Meets the Road – Analytic Platforms in the Real World
®
Featuring Matt Aslett, 451Research July 18, 2012
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Today’s Presenters
2
Matt Aslett • Research Manager,
Data Management and Analytics • With 451 Research since 2007 • www.twitter.com/maslett
Information Management Operational databases Data warehousing Data caching Event processing
Commercial Adoption of Open Source (CAOS) Open source projects Adoption of open source software Vendor strategies
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Today’s Presenters
Bob Wilkinson • Calpont Vice President of Engineering • Formerly CTO for Tektronix
Communications • 16 years of product development •Responsible for design, development,
and support of InfiniDB
3
®
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Today’s Discussion
•Matt Aslett o Total Data and the Rise of the Analytic Platform o Analytic Platforms in the Big Data ecosystem o Defining the Analytic Platform
•Bob Wilkinson o InfiniDB Analytic Platform o InfiniDB in Action
• Telecommunications • Online Advertising
• Summary and Q&A
4
© 2012 by The 451 Group. All rights reserved
Overview
5
The analytic platform’s place in the ‘big data’ ecosystem Where and when
The key characteristics of an analytic platform How and which
The rise of the analytic platform What and why
© 2012 by The 451 Group. All rights reserved
The 451 Group
6
© 2012 by The 451 Group. All rights reserved
Big Data – Implications for Data Management
Velocity The data is being produced at a rate that is beyond the performance limits of traditional systems
Volume The volume of data is too large for traditional database software tools to cope with
Variety The data lacks the structure to make it suitable for storage and analysis in traditional databases and data warehouses
“Big data” - realization of greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies to handle its volume, velocity and/or variety.
© 2012 by The 451 Group. All rights reserved
Total Data - Beyond ‘Big Data’
Exploration The interest in exploratory analytic approaches, in which schema is defined in response to the nature of the query.
Totality The desire to process and analyze data in its entirety, rather than analyzing a sample of data and extrapolating the results.
Dependency The reliance on existing technologies and skills, and the need to balance investment in those existing technologies and skills with the adoption of new techniques.
Frequency The desire to increase the rate of analysis in order to generate more accurate and timely business intelligence.
The adoption of non-traditional data processing technologies is driven not just by the nature of the data, but also by the user’s particular data processing requirements.
© 2012 by The 451 Group. All rights reserved
Beyond the limitations of traditional data warehousing The EDW is supposed to be a single source of the ‘truth’ and avoid
data silos.
One of the most significant inefficiencies of data warehousing is that users have traditionally had to design their data-warehouse models to match their planned queries.
This approach is too rigid in a world of rapidly changing business requirements and real-time decision-making
And its inflexibility serves to encourage the growth of data silos and the exact redundancy and duplication issues the EDW was apparently designed to avoid.
A business analyst or executive unable to get the answers to queries they require from the EDW is likely to find their own ways to answer these queries.
© 2012 by The 451 Group. All rights reserved
The Rise of Specialist Platforms
The alternative is to embrace dispersed data, adopting not silos but specialist data platforms, that complement the EDW.
‘Total Data’ describes an approach that treats the various data management components as an integrated whole.
eBay is a prime example of this approach in action, with its Singularity analytic platform, as well as an EDW and Hadoop.
Structured SQL analysis Semi-structured SQL Unstructured analysis
© 2012 by The 451 Group. All rights reserved
Defining “Analytic Platform”
Enterprises have used specialist data marts/warehouses for many years for departmental/application-specific use-cases.
Analytic platforms are designed to enable different analytic approaches, that complement traditional EDW workloads.
Large data volumes Raw/close-to-raw data Multiple dimensions Complex variables Near real-time requirements Columnar storage SQL, user-defined functions MapReduce In-database analytics Flexible schema
© 2012 by The 451 Group. All rights reserved
Flexible schema
Apply structural patterns as the data is analyzed, rather than when it is loaded into the database.
Results Schema Data storage
Results Data storage Schema Application
Application
Schema on read
Schema on write Query
Query
© 2012 by The 451 Group. All rights reserved
“Exploratory Analytic Platform”
The need for EAPs is not necessarily driven by the choice of storage platform (e.g., Hadoop or analytic database) or query language (e.g., SQL or MapReduce).
Instead it is driven by the nature of the query or workload, or the skills and tools employed by the person interacting with the data.
While data analysts are analyzing data to find answers to existing
questions, data scientists are exploring patterns in data to prompt new questions.
E.g. customer analysis, interactive marketing, targeted advertising, churn analysis, sentiment analysis, fraud analysis.
An EAP should be flexible enough to enable the use of multiple techniques to support exploratory analysis.
© 2012 by The 451 Group. All rights reserved
EAP in larger Total Data landscape
EDW retains core role for stable schema and structured SQL analytics on ERP, CRM apps etc.
Hadoop for storage and processing of raw data, analysis of unstructured, schemaless data.
EAP for flexible, exploratory analytics on rapidly updated data with evolving schema.
© 2012 by The 451 Group. All rights reserved
Integration enables a ‘total data’ approach that treats the various platforms as points on a spectrum depending on the rigidity and importance of schema, rather than individual silos.
The Spectrum of Analytic Approaches
© 2012 by The 451 Group. All rights reserved
Integration enables a ‘total data’ approach that treats the various platforms as points on a spectrum depending on the rigidity and importance of schema, rather than individual silos.
The Spectrum of Analytic Approaches
© 2012 by The 451 Group. All rights reserved
Integration enables a ‘total data’ approach that treats the various platforms as points on a spectrum depending on the rigidity and importance of schema, rather than individual silos.
The Spectrum of Analytic Approaches
Calpont InfiniDB • Columnar MPP • Vertical and horizontal range partitioning • Integrated MapReduce • Distributed user-defined functions
© 2012 by The 451 Group. All rights reserved
Considerations for Deploying an Analytic Platform
Scalability – the ability to handle large volumes of data and expand as data volumes grow
Performance – high performance processing is required to deliver rapid results
Efficiency – in-database analytics approaches that take the query to the data
Flexibility – no reliance on restrictive schema to deliver the desired performance
Variability – support for multiple query approaches and advanced functions to enable exploratory analysis
Calpont Corporation
• Software Company
• High Perf/ HA Analytic Data Platform
• Dallas HQ, Silicon Valley
• Partners in North America, Europe, Japan
• Online Media, Digital Networks, Telco
Calpont Mission To provide a highly
scalable data platform that enables
analytic business decisions as timely as customers and markets dictate.
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
What is InfiniDB?
20
Columnar Performance Efficiency
Widely used MySQL Interface
MPP, MapReduce style Query Execution
Simple, Powerful Platform for Big Data Analytics
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Benefits of InfiniDB
21
Real-time, Consistent Query Performance
Linear Scale for Massive Data
Removes Limits to Dimensions and Granularity
Easy to Deploy and Maintain
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Data Warehouse
Hadoop
Operational
Transactional
Dimensional Analytics
Data Discovery
Predictive Analytics
Analytic Data Store
Analytic Needs Analytic Platform Big Data Sources Data Integration
ETL
MDM
Direct Load Model Legacy RDBMS
InfiniDB Analytic Platform – DW and Exploration
InfiniDB - Telecommunications
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Telecommunications Market Challenges
24 7/18/2012
Voice Revenue Data Revenue Total ARPU
Global Mobile Voice and Data Revenues/ARPU – 2007-2013
Source: Informa Telecoms & Media
US
$ M
illio
ns p
er Y
ear
Macro Drivers: • Subscriber Growth declining • ARPU declining • Revenue Growth vs. Cost to
Carry Do carriers? • Attempt to control costs via
throttling, etc. • Increase revenue through
monetization strategies
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
The Telco Gold Mine
25
Quality • Meets CSP expectations? • Meets Subscriber expectations?
Location • Where are they? • Movement patterns, etc.
Usage • What applications/services? • How much, how long, etc.
Data Sources • Element feeds • Probe feeds • Device agents • Log files • Care data
Telco data is rich – Can it be fully leveraged?
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Challenge? or Opportunity? Multi-Dimensional Analysis
service application
network
kpi
customer
Dimensions
Linkage?
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Telco Success
Legacy InfiniDB Improvement
# of DRs 15 billion 15 billion n/a
Database size 4 TB < 1TB (75%)
Load rates 30k/sec >120K/sec 400%
Typical analytics query
300 sec. 5 sec. (98%)
Representative data from Customer Experience (CEM) analytics :
Benefits Game-changer for storage of and access to non-aggregated data Near linear scale out performance
InfiniDB - Online Advertising
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Online Advertising – Market Challenges
• Advertising Analytics (≠ Web Analytics) o Interactions and performance of ads on other sites o Attribution analysis - ad optimization, efficient targeting,
and return on ad spend
• Challenges o Massive daily data consumption – “Billions Served” o Ad targeting is not real-time with traditional data tech o Attribution analytics effectiveness
Wide Dimensionality Granularity
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Mobile Advertising – Analytic Data Environment
30
Location Ads
WiFi Captive Display
Free WiFi Ad Share
App Embedded Ads
Info Sources Source Data
ETL Analytic Platform BI / Analytic Front End
Special Needs Latitudinal / Longitudinal Geospatial Functions Military Grid Ref System (MGRS) Functions
Non-Calpont product names are trademarks of their respective owners
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Online Advertising Success
Legacy InfiniDB Improvement
# of DRs 300 Million 300 Million n/a
Database size >6 TB 3 TB (50%)
Load rates 100k/sec 1M+/sec 1000%
Typical analytics query
20-30 min with cubes
15 sec. (99.2%)
Location-based Mobile Advertiser Funnels Big Data Insights
Benefits Real-time analytics about niche segments Simple MySQL interface for easy use of Hadoop ETL extracts “Mobile Audience Insights” for segment affinity and engagement strategies
Mobile Audience Insights Report
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved.
Key Takeaways
A spectrum of analytic platforms address structured and unstructured needs that complement the traditional EDW Proper choice of an analytics platform should depend on rigidity
and importance of schema, as well as skills and tools of users InfiniDB is a scalable MPP columnar platform supporting
exploratory analytics for structured data Calpont is helping partners create transformational solutions in
Telco Customer Experience and Online Advertising
InfiniDB® Scalable. Fast. Simple. © 2012 Calpont. All Rights Reserved. 33
More Info on 451 Research and Calpont
Matt Aslett 451 Research www.451research.com @maslett @451research
451 examines trends behind Big Data and the Total Data management approach
Bob Wilkinson Calpont Corporation www.calpont.com @Calpont, @InfiniDB
Calpont discusses why Big Data in online marketing needs modern data technology
®