David Freriks Principal Solution Architect · Big Data Ecosystem –Much More Than Just Hadoop Big...
Transcript of David Freriks Principal Solution Architect · Big Data Ecosystem –Much More Than Just Hadoop Big...
Big Data & QlikView
Democratizing Big Data Analytics
David Freriks – Principal Solution Architect
TDWI – Vancouver Agenda
– What really is Big Data?
– How do we separate hype from reality?
– How does that relate to actually finding useful
business information?
– Why is Qlik unique in leading the industry in solving
Big Data solutions?
– Demo
TDWI – Vancouver Agenda
– What really is Big Data?
• Most people think of Hadoop….
– How do we separate hype from reality?
– How does that relate to actually finding useful
business information?
– Why is Qlik unique in leading the industry in solving
Big Data solutions?
– Demo
A Brief History of Hadoop
2005 2008 2011 2013
Cutting joins Yahoo,
estimates a billion pg
index will cost $500k
and $30k/mos to
support
A 1400n Yahoo cluster
sorts 500GB in 59s.
Cloudera launches
Google releases a
paper on GFS, based
on a distributed
search platform called
Nutch
Hadoop promoted to top
level Apache project,
predictive search index
creation time reduced from
12days to 8hrs
Yahoo spins
remaining Hadoop
folks out into
Hortonworks
Cloudera adds real-time
search, based on
Lucene, also created by
Cutting
3rd Hadoop World conf
attracts 2300 developers,
up from 275 in 2010
•Hadoop Distributed File System HDFS
•Processing framework for writing scalable data applicationsMapReduce
•Procedural language that abstracts lower level MapReducePig
•Highly reliable distributed coordinationZookeeper
•System for querying data on top of HDFS (SQL-like query)Hive
•Database for random, real time read/write accessHBase
•Scalable machine learning librariesMahout
• In-memory large-scale data processing– 100x faster than HadoopSpark
•SQL engine on top of Spark Shark
•Scalable multi-master database with no single points of failureCassandra
And on, and on…
Hadoop
Example Apache Hadoop or Next-Gen Components
Big Data: Expanding on 3 fronts
Real
Time
Near Real
Time
Periodic
Batch
MB
GB
TB
PB
Table
Database
Web XML
Audio Social
Video
Data Velocity Data Volume
Data Variety
What is “Big Data”?
• Big Data is: Nebulous
• Big Data is: Really Big or Not
• Big Data is: Mostly Useless Noise
• Big Data is: Slow
• Big Data is: Difficult
Big Data Ecosystem – Much More Than Just Hadoop
Big Insights & Streams
Big Data Appliance
HANA
Open source Distributed Processing Frameworks
Big Data Analytic Appliances
Massively Parallel Processing Platforms
Big data Integration
Packaged Mapreduce platforms
Data Visualization, Statistical & In-memory Analytics
8
Splunk >
Who What Why
Telecom Usage and Location Analysis Call Detail
Records (CDRs)
Next Product to Buy (NPTB) Real-time
Bandwidth Allocation
Operational Excellence
Customer Retention
Profitability
Financial Services New Account Risk Screens
Fraud Detection
Trading Risk
Real-Time P&L
Portfolio Analysis
Improve Profit
Minimize Risk
Utilities Smart Metering Analysis Operational Excellence
Retail 360o Customer View
Brand Sentiment Analysis
Up Sell/Cross Sell
Clickstream Analysis
Increase Revenues
Customer Loyalty
Brand Awareness
Manufacturing Supply Chain & Logistics
Assembly Line QA
Proactive Maintenance
Operational Excellence
Profitability
Source: Gartner “50 Real World Examples of Big Data and Analytics”, 2013
Some uses of Big Data today
TDWI – Vancouver Agenda
– What really is Big Data?
– How do we separate hype from reality?
– How does that relate to actually finding
useful business information?
– Why is Qlik unique in leading the industry in
solving Big Data solutions?
– Demo
• You need to have Ga-zinga-bytes of data to deploy a Big Data solution
– Typical Cloudera Cluster is 15-20 nodes, < 10TB of data
– Hadoop storage is 3-400% cheaper than an EDW
• Hadoop is all you need
– Hadoop is an enabling technology that provides the foundation for Big Data solutions
– Focus today is on data management
• The RDBMS is dead
– RDBMS is still critical – but not for high volume, low quality analytics
• QlikView can’t handle Big Data
– Reality is a Human can’t handle Big Data
– It’s all about the use case
Popular “Big Data” Myths
• Big Data is rapidly shifting from how much data you can handle to how quickly you can deliver value
– Volume of Data is just one, less and less critical factor
– Context is key and difficult to pinpoint
• Big Data:
– Hadoop is designed to support petabytes and beyond
• Fast Data:
– Teradata, SAP HANA, Netezza, Hbase, MongoDB, ParStream, etc
• Big Data is slow & cheap, Fast Data is neither
• A Big Data Solution requires components that address both
– Hadoop is the data system that combines Fast and Big platform
– QlikView is the platform that supports both scenarios simultaneously
Big Data vs. Fast Data vs. Right Data
Unstructured/Semi-structured data
Data Accelerator???
Web data Docs & text
data
Audio/Video
data
Structured data
Machine data Operational systems
Where Big Data fits today: The new BI architecture
Data Warehouse???Big Data Repository
many organizations lack the skills required to exploit big data
most of these skills are in short supply and rare in the market at large
data science encompasses hard skills
Big Data comes with big challenges
The Big Data bottleneck
Reports
Data Scientists
Business Users
Source: Gartner Big Data Hype Cycle Report 2013
“ ”“ ”
“ ”
Big Data
Organizations have trouble finding qualified professionals to manage big
data and providing training to those already on board
Big Data comes with big challenges
Source: Ventana Research, The Challenge of Big Data Benchmark Research, November 2013
Obstacles to Big Data Analytics
Organizations are challenged in staffing and training
“”
Staffing
Training
Real-Time
License Cost
Integration
79%
77%
67%
64%
64%
TDWI – Vancouver Agenda
– What really is Big Data?
– How do we separate hype from reality?
– How does that relate to actually finding
useful business information?
– Why is Qlik unique in leading the industry in
solving Big Data solutions?
– Demo
Operational
systems
Machine data, web
data, cloud dataHadoop
cluster
Data
warehouse
BigQuery
Insight Comes from Data, in Context
Big Data Business Needs
Descriptive Analytics Predictive Analytics
DATA
Clinical,
Claims,
Monitoring,
others
How are we doing? What might happen in
the future?
Prescriptive Analytics
Best course of action
given objectives,
requirements &
constraints
How many claims did we pay
today?
Which of tomorrow’s claims
might be requesting an
Emergency Room (ER)
admission?
What would be effective
steps to reduce probability of
ER admission?
TDWI – Vancouver Agenda
– What really is Big Data?
– How do we separate hype from reality?
– How does that relate to actually finding
useful business information?
– Why is Qlik unique in leading the industry in
solving Big Data solutions?
– Demo
Who are we - QlikView
• What Is QlikView?
– QlikView is a Business Discovery platform – User-driven
BI supporting the creation and consumption of dynamic
apps for analyzing information
– QlikView apps allow non-technical users to explore visual
views of information and ask streams of questions,
through simple interactions such as clicks and taps
– QlikView’s patented software engine dynamically
calculates new views of information, instantly, based on
user selections
QlikView - A New Kind of Software Company
• Leader in Business
Discovery – user-driven BI
• 28,000+ customers in
100 countries
• 1,500 global partners
• 1,500 employees across
28 offices in 23 countries
• No. 1 fastest-growing
enterprise technology
company (ZDNet)
• Gartner Magic Quadrant
Leader for 3 consecutive
years
Broad Base of 28,000 Customers
These are Tools… And this is How BI has been done…
This is a Platform
An
aly
tica
l Q
uo
tie
nt
Usefulness
Managed
Reporting
Ad-Hoc
Reporting
Dashboards /
Visualization
OLAP /
Analysis
Exploration
Associative
/ Statistical
Predictive
QlikView’s
Sweet Spot
The Evolution of Business Intelligence
1) Associative Query Language + Full Search*not another query tool….
2) Core Technology: True In-memory, columnar database with built in visualization,
analytics, and ELT in a single product.
3) Designed for Heterogeneous & Complex Data (*again not just another query tool)
4) Application / Mobile Design First (Mobile, Desktop, Tablet… Design once, consume anywhere)
What Makes QlikView Unique?
How traditional BI and
visualization tools work
QlikView Natural Analytics™
• Limited view and access to data
• Forced down linear drill paths
• Need to involve IT to modify
• What-if and on-the-fly analysis
is limited
• Freedom to explore data from any point in
analysis in a dynamic, interactive interface
• Answer any question on the fly, real-time
• Easily see connections, and
disconnects in data
QlikView’s Natural Analytics™ makes data analysis a
natural part of every business process – for everyone
The Green, The White and The Gray
The Visualization Bottleneck
Response Time
Query
Size Big Data
Tableau
Spotfire
MSTR
Analytics Desktop
Datameer
Connectivity to every Big Data Source
NoSQL
Databases
Real-time
Batch
Hadoop
MPP
Warehouse
SAP HANA
BigQuery
Advanced
Analytics
SAP HANA
Hard Disk
Drives (HDD)
Solid State
Storage (SSD)
Random
Access
Memory (RAM)
Speed (t/TB) 3300s 1000-300s 1s
Price $/TB $ 50 $ 500 $ 4500
• Keep data in memory when the value obtained from processing it is high
• Leave data on disk when it is inactive or the value from processing it is low
Value
Size
The Big Data Value Chain
Flexible Big Data deployment models
Direct Discovery
Billions of rows via Direct Discovery
100’s millions rows into Memory
Aggregates / Detail
Combine Big Data and traditional data sources
Combine data sources using pure In-Memory
Aggregates / Detail
EDW Data
Data
Warehouse
Today’s challenge:
What to do with Big Data? Who should do it?
IT
What to do with this?
Business
How to define requirements?
QlikView as a catalyst for implementing Big Data
QlikView gives business users ability to discover with Big Data, not just
data scientists
More Access > More Questions > More Use > Higher ROI of Big Data
IT & Business
QlikView as a catalyst for implementing Big Data
QlikView In-Memory approach
• Loads compressed data into memory
• Enables associative search and analysis
• Supports 100’s millions to billions of rows of data
In-Memory
QlikView Direct Discovery Approach
• Combines the associative capabilities of the QlikView in-memory
dataset with a query model where:
The aggregated query result is passed back to a QlikView object
without being loaded into the QlikView data model
The result set is still part of the associative experience
Capability to Drill to Detail records
QlikView Application
QlikView In-Memory Data Model
Direct Discovery
Batch Load
100% in-memory for:
• All the necessary (i.e. relevant and
contextual) data can fit in-memory
• Users require only aggregated or
summary data, i.e. hourly or daily
averages, or record-level detail
over a limited time period.
• Query performance of external
source is not satisfactory
Direct Discovery for:
• Data cannot fit in memory and
document chaining is not sufficient
• Users require access to record-
level of detail stored in a large fact
table that will not fit in memory.
• Network bandwidth limits ability to
copy data to QlikView server
The Design of Direct Discovery lets you alternate between these
approaches with absolutely no change to the application itself
A Hybrid Approach for Tackling Big Data
DEMO