Modernising Your Analytics Insights - CDP · Advanced Analytics • Machine Learning • Predictive...
Transcript of Modernising Your Analytics Insights - CDP · Advanced Analytics • Machine Learning • Predictive...
Modernising Your Analytics Insights
Creating a culture of agile data innovation:• speeding up data ingestion, preparation & validation• driving self-service insights into all the metrics that matter• streaming & processing data in real-time as events occur• empowering next-generation predictive & machine learning models
Together, we’ll take a look at modern data engineering frameworks, and how our clients are effectively using the capabilities to transform and optimise their decision-making.
With Stream Processing & Learning Models
Modernising Your Analytic Insights
CHALLENGES FOR BUSINESSES
THE PROBLEM: • Struggle to utilise the value of data
• Business never stand still
• Limits to traditional Data Warehousing
THE OPPORTUNITY: • Use data more effectively
• Leverage data for maximum advantage
ARCHITECTURE LANDSCAPE
Why we should embrace the new architecture
TRADITIONAL vs NEW ARCHITECTURE
New Architecture
• Augmentation of existing environment
• Streamed data or Historical archiving
• Fast Capture of a variety information sources –
• Model data on use
• Supports a wide variety use cases not traditionally seen in DWH
• Driving the business forwards
Traditional
• Optimised for batch processed data processes
• Large development effort
• Model data on load
• Mostly structured data
• Monitoring and measuring the business
BIG DATA LANDSCAPE
ARCHITECTURE LANDSCAPE
Batch
Processing
Real-timeApplications
Data Visualization Data WranglingStatistical / Predictive
Batch Data Ingestion (ETL)
Da
taL
og
Da
taF
low
Cluster Management
Data Lake
Streaming
Operational Database
Security
MachineLearning
SQL Search
Da
ta G
ove
rna
nc
e
Foundation
• Data Lake
• Distributed processing
• Security
• Batch Ingestion
• SQL-like access Interface
Data Discovery
• Data wrangling
• Data visualisation
Specialised Components
• Real-time application
• Search capability
• Operational database
• Time Series databaseStreaming
• Real-time data ingestion
• Simple & Complex event process
Advanced Analytics
• Machine Learning
• Predictive
DELIVERY FOCUS
SESSION FOCUS
Batch
Processing
Real-timeApplications
Data Visualization Data WranglingData WranglingStatistical / Predictive
Batch Data Ingestion (ETL)
Da
taL
og
Da
taF
low
Da
taF
low
Cluster Management
Data LakeData Lake
Streaming
Operational
Database
Security
Machine Learning
SQL Search
Da
ta G
ove
rna
nc
e
1. Foundation
2. Data Discovery
3. Advanced Analytics
4. Streaming
5. Specialist Components
1. Foundation
FOUNDATION
Batch
Processing
Real-timeApplications
Data Visualization Data WranglingStatistical / Predictive
Batch Data Ingestion (ETL)
Da
taL
og
Da
taF
low
Cluster Management
Data (Lake) StorageData (Lake) Storage
Streaming
Operational
Database
Security
Machine Learning
SQL Search
Da
ta G
ove
rna
nc
e
Data Lake
https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems
FOUNDATION
Many use cases amongst others
�An environment that ensures optimal use of structured and unstructured data to deliver business value
�Central repository of critical and sensitive data
�Data maintained over long duration
�Users can access and analyze data in new and different ways
�Conceptual not technical environment
CUSTOMER FOUNDATION USE CASES
� Analytical Data Lake
� Data ingestion
Layer
1. Foundation
2. Data Discovery
3. Advanced Analytics
4. Streaming
5. Specialist Components
2. Data Discovery
DATA DISCOVERY
Batch
Processing
Real-timeApplications
Data Visualization Data WranglingStatistical / Predictive
Batch Data Ingestion (ETL)
Da
taL
og
Da
taF
low
Cluster Management
Data Lake
Streaming
Operational
Database
Security
Machine Learning
SQL Search
Da
ta G
ove
rna
nc
e
DATA DISCOVERY
Data Wrangling/ Data Exploration/ Data Understanding
“Data wrangling is an iterative agile process for exploring, combining, cleaning and transforming
raw data into curated datasets for data science, data discovery, business intelligence and analytics.”
Data Exploration
DATA DISCOVERY USE CASES
� Data preparation and visualisation
� Data wrangling process, preparation for analytics
1. Foundation
2. Data Discovery
3. Advanced Analytics
4. Streaming
5. Specialist Components
3. Advanced Analytics
ADVANCED ANALYTICS
Batch
Processing
Batch
Processing
Real-timeApplications
Real-timeApplications
Data VisualizationData Visualization Data WranglingData WranglingStatistical / Predictive
Batch Data Ingestion (ETL)
Batch Data Ingestion (ETL)
Da
taL
og
Da
taL
og
Da
taF
low
Da
taF
low
Cluster ManagementCluster Management
Data LakeData Lake
StreamingStreaming
Operational
Database
Operational
Database
SecuritySecurity
Machine Learning
SQLSQL SearchSearch
Da
ta G
ove
rna
nc
eD
ata
Go
ve
rna
nc
e
ADVANCED ANALYTICS
Data in
Motion(Cloud)
Data in
Motion(on-
premises)
Data at
Rest(on-
premises)
Edge
Data
Data in
Motion
Edge Analytics
Data at
Rest(Cloud)
Edge
Data
Data at
Rest(on-
premises)
Closed Loop
Analytics
MachineLearning
Deep HistoricalAnalysis
Cloud
ADVANCED ANALYTICS USE CASES
� Work pattern time
series analysis
� Risk prediction
1. Foundation
2. Data Discovery
3. Advanced Analytics
4. Streaming
5. Specialist Components
4. Streaming
STREAMING
Batch
Processing
Batch
Processing
Real-timeApplications
Real-timeApplications
Data VisualizationData Visualization Data WranglingData WranglingStatistical / PredictiveStatistical / Predictive
Batch Data Ingestion (ETL)
Batch Data Ingestion (ETL)
Da
taL
og
Da
taF
low
Da
taF
low
Cluster ManagementCluster Management
Data LakeData Lake
Streaming
Operational
Database
Operational
Database
SecuritySecurity
Machine LearningMachine Learning
SQLSQL SearchSearch
Da
ta G
ove
rna
nc
eD
ata
Go
ve
rna
nc
e
STREAMING
STREAMING USE CASES
� Retail real time cash slip
processing to detect
anomalies and patterns
� IOT device data
collection or transport
patterns (late busses)
1. Foundation
2. Data Discovery
3. Advanced Analytics
4. Streaming
5. Specialist Components5. Specialist Components
Batch
Processing
Batch
Processing
Real-timeApplications
Data VisualizationData Visualization Data WranglingData WranglingStatistical / PredictiveStatistical / Predictive
Batch Data Ingestion (ETL)
Batch Data Ingestion (ETL)
Da
taL
og
Da
taL
og
Da
taF
low
Da
taF
low
Cluster ManagementCluster Management
Data LakeData Lake
StreamingStreaming
Operational
Database
SecuritySecurity
Machine LearningMachine Learning
SQLSQL Search
Da
ta G
ove
rna
nc
eD
ata
Go
ve
rna
nc
e
SPECIALIST COMPONENTS
SPECIALIST COMPONENTS
Batch
Processing
Real-time
Applications
Data Visualization Data Wrangling Statistical / Predictive
Batch Data Ingestion (ETL)
Da
taLo
g
Da
taFl
ow
Cluster Management
Data Lake
Streaming
Operational
Database
Security
Machine
Learning
SQL Search
Da
ta G
ove
rna
nce
HDFS YARN
OPEN TECHNOLOGY LANDSCAPE
Transition and Augmentation
HYBRID TECHNOLOGY LANDSCAPE
Batch
Processi
ng
Real-time
ApplicationsData Visualization Data Wrangling Statistical /
Predictive
Batch Data Ingestion (ETL)
Da
taLo
g
Da
taFl
ow
Cluster Management
Data Lake
Streami
ng
Operation
al
Database
Security
Machine
Learning
SQL Search
Da
ta G
ove
rna
nce
HDFS YARN
Existing
DWH
Existing
operational Existing
ETL
Coffee break
CDP GROUP – WHO ARE WE
GOAL: Driving insight into your business
OUR FOCUS: • Design & develop analytics, planning &
data engineering solutions
• Enhance your ability to make high-impact & factually informed decisions
• Reputation based on our deep level of expertise
IBM VALUE-ADD
� Augmented Analytics with built-in AI• Contextual visualisation recommendations• Automated blend & join data-preparation• Smart natural language
� Data Science Platform for collaboration• Incorporate all your R, Python, Spark & others• Model & library management• Version & deployment control
� SQL Workload Enhancement over HIVE• Rich & complex SQL with outstanding performance• Re-use existing DW queries against Hadoop• Federate DW in place augmented with Hadoop
HORTONWORKS OPEN-SOURCE
� Modern data architecture• Scalable, secure, hybrid data lakes• Enables deep / machine learning
� Real-time streaming platform• Data pipelines, ingestion & routing• High-volumes from sensors & devices
Hortonworks
HOW DO YOU BEGIN THE MODERN ANALYTICS
JOURNEY?
PERFORMANCE PLAN
Our Performance Plan approach:
• identifies core business information requirements
• measures against current capability
• creates a plan to deliver.
Information, once captured:
• presented back as a templated plan
• road map of prioritised quick-win deliveries.
CDP is offering a first steps Performance Plan engagement from your attendance today:
• 5 Days - $7.5k
Typical initial delivery
20 days or $30k
Thank you