Large Scale Data Analytics
-
Upload
shankarradhakrishnan -
Category
Technology
-
view
495 -
download
0
Transcript of Large Scale Data Analytics
Large Scale Data Analytics
Shankar Radhakrishnan [email protected] linkedin.com/in/connect2shankar
Scenario
• Insurer uses meteorological data for pricing model • At present data from 2000 weather stations are
collected for analysis • Plan is to use 10,000 weather station data
( or more ) • Stochastic simulation needs to run to ID pattern in
weather data, to determine pricing • Volumetric : peta-bytes of information
( for 1 region )
2
Trends
3
Data Analytics Is Mostly About $$, Customers, Markets
4
How Widespread Is Data Analytics?
5
Expectations On Payback Period ( Aggressive )
6
Large Scale Data Analytics
7
“Involves using different algorithms, distributed platforms, tools and techniques to analyze big data and provide actionable insights”
Big Data
“ Data sets that are very large in volume and complex “
8
New platforms, tools and techniqueshave emerged to manage Big Data
We broke away from traditionalways to process and analyze them
Data Structures
Vector, Matrix,
Or Complex Structure
Free Text Image or Binary Data Data “bags”
Iterative Logic Or Complex Branching
Advanced Analytic Routines
Rapidly Repeated
Measurements
Extreme Low
Latency
Access to all data required
Search Ranking X X X X X X
Ad Tracking X X X X X X X X
Location or Proximity Tracking X X X X X
Social CRM X X X X X X X
Document Similarity Testing X X X X X X X X
Genomic Analysis X X X X X
Customer Cohort groups X X X X X X
Fraud Detection X X X X X X X X X
Smart Utility Metering X X X X X X
Churn Analysis X X X X X X X
Satellite Image Analysis X X X X
Game Gesture Analysis X X X X X X X X
Data Bag Exploration X X X X X X
9
Business Interests : Well Informed Customer Executive
10
Speech to Text Conversion
Voice Data
Unstructured data Analytical System
Customer Persona
• Customer Persona - Demographics, Top interactions, Channel Preferences, Dissatisfies
• Customer Lifetime Value • Recent Contact History • Customer Sentiment &
Trend during the call
Customer’s state of mind
Sentimental Analysis
Social media
Depositions
ComplaintsOther Channel
information (ATM, Branch)
Big Data Warehouse
Traditional Warehouse
Decision Engine • Customer Executive Dashboard presents all intelligence required to make a decision
• The decision engine also presents important decisions to be taken for the particular customer issue
Well Informed Customer Executive…
Customer calls Banking Call Center
Executive understands the customer problemExecutive authenticates
customer and pulls up Customer Persona
Executive reviews risk of attrition
against Customer Lifetime Value
Executive reviews Last 5 call center
and banking transactions
Executive views customer’s state of
mind (risk of attrition ) through a barometer chart
Analytical Solution -Converts Speech to
textAnalytical engine listens to
customer voice
Suggested top 5 Actions requiredDecision Engine
Executive performs below actions based on his analysis and recommendations from Decision engine1. Reversal of overdraft fee2. One time fee waiver on Cheque book (predicting customer need based on historic usage cycles )3. Cash back Reward card for a minimum spend of $X through debit card4. Offer interest revision for investment products or mortgage5. Promote new mutual funds or credit cards based on customer willingness
Analytical engine monitors sentiment
Executive analyzes Customer Persona (demographic / Preferences / Satisfiers /
dissatisfies etc )
11
Business Interests : Fraud Prevention
12
Envisaged Benefits ▪New fraud patterns can be identified by building ‘analytical models’ to run against historical data
▪ ‘Web crawling’, ‘Contextual text analysis’, ‘Natural Language Processing’ allows fraud behavior identification from social media. It may increase Fraud detection success rate
▪ ‘Real time’ models to capture behavioral patters and do pattern analysis against History data to evaluate Fraud case validity. The model learns by self and updates ‘Fraud pattern master sets.
▪Brings ‘artificial intelligent’ fraud pattern detection and analysis
▪ ‘Real time’ (in the order of .5-1 minute refresh rate) alerts to Fraud analysts about ‘self learned’ fraud patterns based on new customer behavior patterns
Big Data Usage ▪ Formation of key value groups to the order of XcY (where X no. of attributes that are relevant to Fraud
and Y is no. of attributes that should be combined to identify patterns)
▪High speed history data loading from source systems
▪ Efficient Real time fraud detection by identifying patterns through customer behavioral events and processing them over X yrs. of history data – e.g. using HBase
Scenario Formation of Fraud pattern reference tables using ▪ Real time data coming from different departments like IVR, WEB, Customer profile, Transactions etc ▪ Real time Mining and analysis of history data to form prior patterns (no. of years in range to 50-100 TB)
Fraud Pattern Detection…
13
Legacy Fraud Data
Customer Profile Data
IVR Audio Data Web / Online
Card Transaction
Data
Fraud Pattern
Master Table Fraud Analyst
History Data Processing to
determine Fraud
Patterns over X years
Real-time Customer Behavior
Analysis for Fraud
Detection
Customer Behavior Change
Events
Customer Behavior Change
Events
Customer Behavior Change
Events
Real time Analysis of behavior patterns over
historical data
Real time update to Master Table on New
Fraud Patterns
Real time alert to Fraud Analyst
RDBMS RDBMS(JSON Files) RDBMS
Customer Behavior Change
Events
Fraud Prevention…
14
Benefits
15
BenefitsIndustry
Financial services▪ Customer Insights – Integrating Transactional data (CRM/Payments) and unstructured Social feeds ▪ Regulatory Compliance – Risk exposures across asset classes, LOBs and firms ▪ Fraud Detection in Credit Cards & Financial Crimes (AML) in Banks
Travel, Hospitality & Retail
▪ Customer centricity – Customer behavior analysis from Omni channel retailing & Social feeds ▪ Markdown Optimization – Improve markdown based on actual customer buying patters ▪ Market basket analysis – Narrow down market basket analysis by demographics
Life Science▪ Improve targeting & predictions – Automatic Detection of Adverse Drug Effects (ADEs) ▪ Patient data analysis – Longitudinal Patient Data (LPD) analysis ▪ Predictive Sciences – Analyze Preclinical Side Effect Profiles of Marketed Drugs
Healthcare (Payers & Providers)
▪ Cost of Care – Drug effectiveness & Cost of Care Analysis based on electronic Health Records (EMR) ▪ Self Service Healthcare – Increase in mHealth & eHealth to allow consumer access to health information ▪ Claims Analytics – Analyze insurance claims data for fraud detection & preferred treatment plans
Communication, Media & Entertainment
▪ Discover churn patterns based on Call data records (CDRs) and activity in subscribers’ networks ▪ Digital Asset Management (DAM) – Analyze & capitalize digital data assets
Manufacturing▪ Proactive Maintenance & Recommendation – Sensor Monitoring for automobile, buildings & machinery ▪ Energy Efficiency – Leveraging Smart meters for utility energy consumption ▪ Location or Proximity Tracking – Location based analytics using GPS Data
Hi-Tech ▪ Extend and complement conventional information supply chain with big data path ▪ Predictive analysis and real time decision support
Hadoop
16
Hadoop - HDFS
17
Hadoop - MapReduce
18
Hadoop - MapReduce
19
Apache Spark
20
Spark
Iterative Processing
Batch Processing
Machine Learning
SQL
Stream Processing
Graph Processing
Hadoop
21
NoSQL Databases
22
NoSQL Databases
23
Modern Data Architecture
24
Lambda Architecture
25
Lambda Architecture
26
Data Analytics Lifecycle
27
Analytics - Trends
• Big Data Analytics In The Cloud • AWS, AWS-Redshift
• Hadoop • Enterprise Data Operating
System • Data Analytics Platform • SQL on Hadoop
• NoSQL • IoT ( Internet of Things )
28
• Multi-polar Analytics • Predictive Analytics ( Spark ) • In-memory Analytics • Data Lake • Deep Learning • Machine Learning • Neural Networks • Data Monetization
Q & A
Thank You !
“Any Sufficiently Advanced Technology Is Indistinguishable From Magic “
- Arthur C. Clarke