Large Scale Data Analytics

Post on 18-Jul-2015

497 views 0 download

Tags:

Transcript of Large Scale Data Analytics

Large Scale Data Analytics

Shankar Radhakrishnan shankar.r3@cognizant.com linkedin.com/in/connect2shankar

Scenario

• Insurer uses meteorological data for pricing model • At present data from 2000 weather stations are

collected for analysis • Plan is to use 10,000 weather station data

( or more ) • Stochastic simulation needs to run to ID pattern in

weather data, to determine pricing • Volumetric : peta-bytes of information

( for 1 region )

2

Trends

3

Data Analytics Is Mostly About $$, Customers, Markets

4

How Widespread Is Data Analytics?

5

Expectations On Payback Period ( Aggressive )

6

Large Scale Data Analytics

7

“Involves using different algorithms, distributed platforms, tools and techniques to analyze big data and provide actionable insights”

Big Data

“ Data sets that are very large in volume and complex “

8

New platforms, tools and techniqueshave emerged to manage Big Data

We broke away from traditionalways to process and analyze them

Data Structures

 Vector, Matrix,

Or Complex Structure

Free Text Image or Binary Data Data “bags”

Iterative Logic Or Complex Branching

Advanced Analytic Routines

Rapidly Repeated

Measurements

Extreme Low

Latency

Access to all data required

Search Ranking X X X X X X

Ad Tracking X X X X X X X X  

Location or Proximity Tracking X   X X     X X  

Social CRM X X X X X X      X

Document Similarity Testing X X X X X X   X X

Genomic Analysis X X X X X

Customer Cohort groups X X   X X X     X

Fraud Detection X X X X X X X X X

Smart Utility Metering X X X X X X

Churn Analysis X X X X X X   X  

Satellite Image Analysis X X X X

Game Gesture Analysis X X X X X X X X

Data Bag Exploration X X X X X X

9

Business Interests : Well Informed Customer Executive

10

Speech to Text Conversion

Voice Data

Unstructured data Analytical System

Customer Persona

• Customer Persona - Demographics, Top interactions, Channel Preferences, Dissatisfies

• Customer Lifetime Value • Recent Contact History • Customer Sentiment &

Trend during the call

Customer’s state of mind

Sentimental Analysis

Social media

Depositions

ComplaintsOther Channel

information (ATM, Branch)

Big Data Warehouse

Traditional Warehouse

Decision Engine • Customer Executive Dashboard presents all intelligence required to make a decision

• The decision engine also presents important decisions to be taken for the particular customer issue

Well Informed Customer Executive…

Customer calls Banking Call Center

Executive understands the customer problemExecutive authenticates

customer and pulls up Customer Persona

Executive reviews risk of attrition

against Customer Lifetime Value

Executive reviews Last 5 call center

and banking transactions

Executive views customer’s state of

mind (risk of attrition ) through a barometer chart

Analytical Solution -Converts Speech to

textAnalytical engine listens to

customer voice

Suggested top 5 Actions requiredDecision Engine

Executive performs below actions based on his analysis and recommendations from Decision engine1. Reversal of overdraft fee2. One time fee waiver on Cheque book (predicting customer need based on historic usage cycles )3. Cash back Reward card for a minimum spend of $X through debit card4. Offer interest revision for investment products or mortgage5. Promote new mutual funds or credit cards based on customer willingness

Analytical engine monitors sentiment

Executive analyzes Customer Persona (demographic / Preferences / Satisfiers /

dissatisfies etc )

11

Business Interests : Fraud Prevention

12

Envisaged Benefits ▪New fraud patterns can be identified by building ‘analytical models’ to run against historical data

▪ ‘Web crawling’, ‘Contextual text analysis’, ‘Natural Language Processing’ allows fraud behavior identification from social media. It may increase Fraud detection success rate

▪ ‘Real time’ models to capture behavioral patters and do pattern analysis against History data to evaluate Fraud case validity. The model learns by self and updates ‘Fraud pattern master sets.

▪Brings ‘artificial intelligent’ fraud pattern detection and analysis

▪ ‘Real time’ (in the order of .5-1 minute refresh rate) alerts to Fraud analysts about ‘self learned’ fraud patterns based on new customer behavior patterns

Big Data Usage ▪ Formation of key value groups to the order of XcY (where X no. of attributes that are relevant to Fraud

and Y is no. of attributes that should be combined to identify patterns)

▪High speed history data loading from source systems

▪ Efficient Real time fraud detection by identifying patterns through customer behavioral events and processing them over X yrs. of history data – e.g. using HBase

Scenario Formation of Fraud pattern reference tables using ▪ Real time data coming from different departments like IVR, WEB, Customer profile, Transactions etc ▪ Real time Mining and analysis of history data to form prior patterns (no. of years in range to 50-100 TB)

Fraud Pattern Detection…

13

Legacy Fraud Data

Customer Profile Data

IVR Audio Data Web / Online

Card Transaction

Data

Fraud Pattern

Master Table Fraud Analyst

History Data Processing to

determine Fraud

Patterns over X years

Real-time Customer Behavior

Analysis for Fraud

Detection

Customer Behavior Change

Events

Customer Behavior Change

Events

Customer Behavior Change

Events

Real time Analysis of behavior patterns over

historical data

Real time update to Master Table on New

Fraud Patterns

Real time alert to Fraud Analyst

RDBMS RDBMS(JSON Files) RDBMS

Customer Behavior Change

Events

Fraud Prevention…

14

Benefits

15

BenefitsIndustry

Financial services▪ Customer Insights – Integrating Transactional data (CRM/Payments) and unstructured Social feeds ▪ Regulatory Compliance – Risk exposures across asset classes, LOBs and firms ▪ Fraud Detection in Credit Cards & Financial Crimes (AML) in Banks

Travel, Hospitality & Retail

▪ Customer centricity – Customer behavior analysis from Omni channel retailing & Social feeds ▪ Markdown Optimization – Improve markdown based on actual customer buying patters ▪ Market basket analysis – Narrow down market basket analysis by demographics

Life Science▪ Improve targeting & predictions – Automatic Detection of Adverse Drug Effects (ADEs) ▪ Patient data analysis – Longitudinal Patient Data (LPD) analysis ▪ Predictive Sciences – Analyze Preclinical Side Effect Profiles of Marketed Drugs

Healthcare (Payers & Providers)

▪ Cost of Care – Drug effectiveness & Cost of Care Analysis based on electronic Health Records (EMR) ▪ Self Service Healthcare – Increase in mHealth & eHealth to allow consumer access to health information ▪ Claims Analytics – Analyze insurance claims data for fraud detection & preferred treatment plans

Communication, Media & Entertainment

▪ Discover churn patterns based on Call data records (CDRs) and activity in subscribers’ networks ▪ Digital Asset Management (DAM) – Analyze & capitalize digital data assets

Manufacturing▪ Proactive Maintenance & Recommendation – Sensor Monitoring for automobile, buildings & machinery ▪ Energy Efficiency – Leveraging Smart meters for utility energy consumption ▪ Location or Proximity Tracking – Location based analytics using GPS Data

Hi-Tech ▪ Extend and complement conventional information supply chain with big data path ▪ Predictive analysis and real time decision support

Hadoop

16

Hadoop - HDFS

17

Hadoop - MapReduce

18

Hadoop - MapReduce

19

Apache Spark

20

Spark

Iterative Processing

Batch Processing

Machine Learning

SQL

Stream Processing

Graph Processing

Hadoop

21

NoSQL Databases

22

NoSQL Databases

23

Modern Data Architecture

24

Lambda Architecture

25

Lambda Architecture

26

Data Analytics Lifecycle

27

Analytics - Trends

• Big Data Analytics In The Cloud • AWS, AWS-Redshift

• Hadoop • Enterprise Data Operating

System • Data Analytics Platform • SQL on Hadoop

• NoSQL • IoT ( Internet of Things )

28

• Multi-polar Analytics • Predictive Analytics ( Spark ) • In-memory Analytics • Data Lake • Deep Learning • Machine Learning • Neural Networks • Data Monetization

Q & A

Thank You !

“Any Sufficiently Advanced Technology Is Indistinguishable From Magic “

- Arthur C. Clarke