Download - Big data solutions for advanced marketing analytics

Transcript
Page 1: Big data solutions for advanced marketing analytics

Big Data Solutions for Marketing Analytics

Natalino Busa@natalinobusa

Page 2: Big data solutions for advanced marketing analytics

Parallelism Hadoop Cassandra Akka

Machine Learning Statistics Big Data

Algorithms Cloud Computing Scala Spray

Natalino Busa@natalinobusa

www.natalinobusa.com

Page 3: Big data solutions for advanced marketing analytics

Humanize Data

Page 4: Big data solutions for advanced marketing analytics

The bank statements

Page 5: Big data solutions for advanced marketing analytics

Back to routine.Grocery, broken washmachine

After-vacation funPancake house.

Traveling back.

Just back home. Pizza.

Shopping in SicilyVacation!

The bank statements How I read the bank bills

Page 6: Big data solutions for advanced marketing analytics

Back to routine.Grocery, broken washmachine

After-vacation funPancake house.

Traveling back.

Just back home. Pizza.

Shopping in SicilyVacation!

The bank statements How I read the bank bills What happened those days

Page 7: Big data solutions for advanced marketing analytics

data is the fabric of our livesLet’s give more meaning and context to data.

Page 8: Big data solutions for advanced marketing analytics

Abraham Harold Maslow (April 1, 1908 – June 8, 1970) was an American psychologist who was best known for creating Maslow's hierarchy of needs

Page 9: Big data solutions for advanced marketing analytics

breathing, food, water, sleep

security of body, resources, health, employment, property

friend, family, partnersecurity of love and belonging

self-esteem, confidence, achievements, respect

spontaneity, creativity, acceptance, freedom, ethics

Physiology

Contractual

Love & Caring

Esteem

Self-actualization

Very human needs

Page 10: Big data solutions for advanced marketing analytics

How much caring can technology be?

Page 11: Big data solutions for advanced marketing analytics

Connectivity, Electricity, Hardware / Infra

security of basic operationsREST APIs, Encryption, Authentication

Notification, Alerts,Social bonding, Predictions

Set goals, planning,Achievements, Advisory role

Freedom, Trusted Companion

Physiology

Contractual

Love & Caring

Esteem

Self-actualization

Technology is reaching out

Page 12: Big data solutions for advanced marketing analytics

Data science top 3

Dimensionality

Reduction

Predictive

Analytics

Clustering

Segmentation

Page 13: Big data solutions for advanced marketing analytics

Data science: what’s working?

- Random Forests

- Artificial Neural Networks

- Clustering Algorithms

- Pattern Recognition

- Time-Serie analysis

- RegressionMost actual models are a

combination of these ones

Page 14: Big data solutions for advanced marketing analytics

Data science ^.^/

keep it scientific

cross-validate your models

keep it measurable

play with it

create new features

explore the available data

Page 15: Big data solutions for advanced marketing analytics

How to code data science?

Page 16: Big data solutions for advanced marketing analytics

# Multiple Linear Regression Example

fit <- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results

● Language for statistics● Easy to Analyze and shape data● Advanced statistical package● Fueled by academia and professionals● Very clean visualization packages

Packages for machine learningtime serie forecasting, clustering, classification decision trees, neural networks

Remote procedure calls (RPC)From scala/java via RProcess and Rserve

Data Science: R

Page 17: Big data solutions for advanced marketing analytics

>>> from sklearn.datasets import load_iris>>> from sklearn import tree>>> iris = load_iris()>>> clf = tree.DecisionTreeClassifier()>>> clf = clf.fit(iris.data, iris.target)

● Flexible, concise language● Quick to code and prototype● Portable, visualization libraries

Machine learning libraries:scipy, statsmodels, sklearn, matplotlib, ipython

Web librariesflask, tornado, (no)SQL clients

Data Science: Python

Page 18: Big data solutions for advanced marketing analytics

Earn the trust

Page 19: Big data solutions for advanced marketing analytics

The customer’s context

Personal history: amount of transactions ever done

Long term Interaction:how the users’ action correlate with others

Real time events:Trends and recent events

Page 20: Big data solutions for advanced marketing analytics

The customer’s context

context is related to time:

slow changing: the defining characteristic of a person

fast changing: events which influence our lives, trends

Require very different technology solutions !!!

Page 21: Big data solutions for advanced marketing analytics

Challenges

Not much time to reactEvents must be delivered fast to the new machine APIsIt’s Web, and Mobile Apps: latency budget is limited

Loads of information to processUnderstand well the user historyAccess a larger context

Page 22: Big data solutions for advanced marketing analytics

Big Data and Fast data

ranking and preference

segmentation and clustering

short term trending topics

rule-based recommendations

10’s Terabytes of Data. This can take hours ….

100’s of events per second.This must be fast ….

Page 23: Big data solutions for advanced marketing analytics

Back to the drawing board

Page 24: Big data solutions for advanced marketing analytics

core banking systems

SOAP services and DBs

System BUS

customer facing appls

channels

A high-level bank schematic

Page 25: Big data solutions for advanced marketing analytics

Higher separation !

Less silos

Interactions

with core

systems

Bigger and Faster

Page 26: Big data solutions for advanced marketing analytics

Human-centric applications

Page 27: Big data solutions for advanced marketing analytics

Some techs

Page 28: Big data solutions for advanced marketing analytics

Hadoop: Distributed Data OS

ReliableDistributed, Replicated File System

Low cost↓ Cost vs ↑ Performance/Storage

Computing Powerhouse

All clusters CPU’s working in parallel for running queries

Page 29: Big data solutions for advanced marketing analytics

Cassandra: A low-latency 2D store

ReliableDistributed, Replicated File System

Low latencySub msec. read/write operations

Tunable CAPDefine your level of consistency

Data model: hashed rows, sorted wide columns

Architecture model: No SPOF, ring of nodes, omogeneous system

Page 30: Big data solutions for advanced marketing analytics

Scala / Akka / Spray: a WEB API reactive framework

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4● it scales horizontally (can run in cluster mode)

● maximum use of the available cores/memory

● processing is non-blocking, threads are re-used

● can parallelize computing power across many actors

Very fast: 1000’s messages/sec

Very reliable: auto recovery

Lazy: compute only when required

Page 31: Big data solutions for advanced marketing analytics

Putting it all together

Hadoop

application (actor based)

millions of millions of

λ= conversions

( lamda )Data queues

Page 32: Big data solutions for advanced marketing analytics

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Page 33: Big data solutions for advanced marketing analytics

Some lessons learned

● Mix and match technologies is a good thing● Fast Data must complement Big Data● Ease integration among teams● Hadoop, Cassandra, and Akka● Data Science takes time to figure out

Page 34: Big data solutions for advanced marketing analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Thanks !Any questions?