Download - Treatment of surveydata by multidimensionalCRM approachcedric.cnam.fr/~saporta/camillo_lezione1_cnam.pdf · 2011. 6. 25. · Uso del linguaggio matriciale IML di SAS 3. Uso del Kernel

Treatment of survey data by

multidimensional CRM approach

Furio Camillo

Alma Mater Studiorum

Università di Bologna

What’s multidimensional CRM?

• A very large set of informations

• Old segmentation: based on behaviours

• New segmentation: based on ideas

• French approach, tandem approach

• Factorial reduction (pca, bca, mca)

• Clustering based on the «cleaned» variance

• Cluster description

Data mining and Micro Data Mining

Under an applied point of view, there is not much difference among the operative and

that strategic one marketing any more

In the years seventy, many companies have realized some behavioral segmentations of

their customer table.

They are conscious that the observations of the customer behavior (and the

segmentation) was only the observation of the EFFECTS: the behavior is an EFFECT

not a CAUSE of the EFFECT!

Behaviour=Effect = f ( Latent Causes )

A new strategy

1. Hypothesis about the latent causes: which causes?

2. Observation by a survey of this potential causes (very important also the qualitative

research: focus groups)

3. To Search the RULE (the link function) between effect and potential causes

4. Adopting this rules as TOOLS for the one to one marketing campaign

Behaviour=Effect = f ( Latent Causes )

How many groups? Which algorithm?

Donner la couleur à une BasDonnés

au sujet des clients

Internal DB target variable

External

information

(opinions,

preferences, needs,

sentiments, ideas)

Survey data

Internal DB target variable

External

information

(opinions,

preferences, needs,

sentiments, ideas)

Survey data

4 clusters = 4 policies

Donner la couleur à une BasDonnés au sujet des clients

Internal DB

target

variable

External information

(opinions, preferences,

needs, sentiments, ideas)

Survey data

Donner la couleur à une BasDonnés au sujet des clients

Strategie

• Etudier la B_données interne

• Echantillonage

• Survey plan: questionnaire, cati-cawi, etc.

• Golden questions!

• Reduction factoriel

• Clustering (hierarchique)

• Cluster interpretation

• Prediction de la «target variable» à le reste de la base

Strategie et methodes

• Etudier la B_données interne (stat multidimexplorative)

• Echantillonage: stratifié par quotes??

• Survey plan: questionnaire, cati-cawi, etc. (causalinference)

• Golden questions! (semiotique des questions, valeurs)

• Reduction factoriel (pca, bca, mca,..)

• Clustering (hierarchique, ward, CCC, pseudo-F)

• Cluster interpretation (t-test, chi2 test)

• Prediction de la target variable à le reste de la base (discriminant analysis)

Wind

case










Wind

quest

What is a semiometric approach and what's Semiometrie? The formal definition is "a

long list of words and thousand of people, in all Europe, are asked to give a mark (a

score) more or less high depending on the agreeable or disagreeable characteristic of the

single word" (Lebart, Piron, Steiner, 2003).

This definition is clearly the statement of a strict and elaborate experimental protocol,

that describes the subject of many research on the field, repeated in space and in time,

by which information about citizens of old Europe have been collected in a 210 word list.

The composition of this list is indeed the real initial value of the method. The words, in

fact, have been selected through a long selection and assessment process, in order to

represent, directly or indirectly, the main values of western society.

As described in detail in Lebart's, Piron's and Steiner's work, the lexicon of reference for

the selection work has been derived from a very wide literature, characterising the whole

historic process of western thought and of its expression, using even the Old Testament's

first five books.

Semiometrie

33 è principalmente per comunicare

36 utilizzo il cell per inviare e ricevere e.mail

43 vorrei avere una tastiera vera per poter inviare sms + velocemente

50 il cell meno si nota e meglio è

61 Non vedo l'ora che si possa avere tutti i servizi PC sul cell 70 faccio telefonate brevi per non spendere

21 Mi piace avere più numeri

81 I vas fanno spendere solo di +

82 il deve costare poco, importante che si possa telefonare

-60

-65

DISTANZA

ETICA

ATTRAZIONE

EMOTIVA

RAGIONE

CONTROLLO

DIPENDENZA

EMOTIVA

Atteggiamenti, idee, emozioni,

immaginario: needs and attitudes

3

1

2

4

Items omissis

11 Solo il cell solo per le emergenze

14 mi piace sapere che in ogni momento potrei ricevere una chiamata

36 utilizzo il cell per inviare e ricevere e.mail

41 per me è importante avere auricolare e vivavoce, così faccio altro

43 vorrei avere una tastiera vera per poter inviare sms + velocemente

-60

60

DISTANZA

ETICA

ATTRAZIONE

EMOTIVA

RAGIONE

CONTROLLO

DIPENDENZA

EMOTIVA

Semiometrie: needs and attitudes

3

1

2

4

pleasure

duty

sublimation

materialism

idealism

pragmatism

peacemoney

Items omissis

assi

sten

zaaf

fidab

ilita

prat

icita

dial

ogo

capa

cita

info

rmaz

ioni

lavo

roqu

alm

atur

itaon

esta

norm

elav

oro

utili

taso

cpr

estig

io

form

asi

mpa

tiabe

llezz

ael

egan

za

pres

tigio

capa

cita

mat

urita

affid

abili

ta

form

asi

mpa

tiapr

atic

itael

egan

zabe

llezz

adi

alog

oin

form

azio

nias

sist

enza

lavo

roqu

alon

esta

norm

elav

oro

utili

taso

c

info

rmaz

ioni

assi

sten

zapr

atic

itaaf

fidab

ilita

dial

ogo

pres

tigio

mat

urita

lavo

roqu

alno

rmel

avor

oca

paci

taut

ilita

soc

ones

ta

form

abe

llezz

ael

egan

zasi

mpa

tia

Emozione esteticaRagione e relazione

Emozione solida Etica

Pragmatismo assistito Emozione amica

Il brand e gli altri brand, concorrenti o cooperanti: gli assi

della MoV

assi

sten

zain

form

azio

nifo

rma

eleg

anza

dial

ogo

sim

patia

lavo

roqu

alaf

fidab

ilita

prat

icita

capa

cita

norm

elav

oro

belle

zza

ones

taut

ilita

soc

mat

urita

pres

tigio

Relazioni e fascino Etica solida

form

aut

ilita

soc

prat

icita

eleg

anza

ones

taaf

fidab

ilita

capa

cita

norm

elav

oro

mat

urita

assi

sten

zapr

estig

ioin

form

azio

nisi

mpa

tiala

voro

qual

dial

ogo

belle

zza

Serietà efficiente Umanità affabile

Il brand e gli altri brand, concorrenti o cooperanti: gli assi

della MoV

Emozione estetica

Ragione e relazione

Etica

Emozione da

solidità

dovere

piacere

umiltà

sovranità

Wind

H3G

Tim

VodafoneIntesa

Unicredit

MD

Samsonite






• Reduction factoriel (acp, acb, acmu)



The «French Approach» (SPAD)

What is the «aperitif» session for you?

1 10

The scale

The Ward

geometry

C’ C

data job.aperitivi_1; set job.aperitivi;

if _n_

+1

-1

0

maxminmean

Original

scale

Recoded

scale

A non-linear re-coding method

(MG-Strategy) (endogenous for each respondent)

Ref: F.Camillo – MicroMacro Marketing – 1999/1 –

Il Mulino

Linear Discriminant Analysis

L’obiettivo dell’Analisi Discriminate (LDA) [proc discrim di sas] è quello di trovare una regola per assegnare correttamente soggetti sui quali abbiamo osservato alcune variabili al gruppo o alla popolazione alla quale appartengono utilizzando una combinazione di variabili di input che hanno un buon potere di discriminazione (discrimination function)

Ciò viene ottenuto con un algoritmo che massimizza il rapporto

Wvv

Bvv

'

'=ϕ

B Between covariance matrix

W Within covariance matrix

Questo approccio presenta delle limitazioni non trascurabili

bxxf

n

i

ii +=∑=1

)( ν Linear Discriminant function

Limitations of LDA

L’Analisi Discriminate (LDA) è un metodo

parametrico inadeguato a cogliere strutture di dati

non-lineari

L’Analisi Discriminate (LDA) fallisce quando l’informazione discriminatoria non è nella media ma nella varianza.

gruppo1

gruppo 2

x2

x1

Feature Space

Utilizzando le Kernel machines

è possibile proiettare i dati in un spazio dimensionalmente infinito F (Feature Space) in cui le distanze tra i soggetti sono le stesse dello spazio di partenza.

Le Kernel machines permettono la costruzione di funzioni di separazioni non lineari nell’input space che sono equivalenti a funzioni lineari nel Feature Space

Fn →ℜ:φ

Input Space Feature Space

))(),((, ,, xxxxk φφ>=

Kernel Discriminant Analysis

La Kernel Discriminant Analysis (KDA) è la proiezione della DA nel Feature Space; ed è ottenuta massimizzando il criterio di Fisher

αα

ααα

φ

φ

W

BF

S

SJ

'

')(

φBS

Between Covariance Matrix in Feature Space

Within Covariance Matrix in Feature Space

φWS

bxxkxf

n

i

ii +=∑=1

),()( α Kernel Discriminant function

b bias

α autovettore

Results

Dall’analisi effettuata si ottengono 4 funzioni discriminanti:

I dati rivelano una forte componente non lineare che viene colta

dal kernel di Cauchy

Spazio fattoriale tradizionale

Spazio fattoriale kernel

Colorazione Lista InternaRisultati

Percentuale di Corretta Classificazione: 51,2%

1 2 3 4 5

1 60.04 6.04 8.17 19.08 6.67

2 15.67 48.33 11.91 16.33 7.77

3 21.27 16.88 37.35 17.51 7.00

4 18.48 9.82 10.56 56.55 4.59

5 12.74 11.75 11.56 18.70 45.25

Cluster

Originario

Cluster Riclassificato

Indice di performace: 217 = (51,2 / 23,5)

Confusion matrix LDA

Confusion matrix K-NN

K=20

Confusion matrix KDA with Hybridization on K-NN K=20

Tasso di corretta classificazione

= 73%

Problems and Future Developments

1. KDA è ancora una tecnica sperimentale che presenta dei problemi

tecnologici: non è un processo automatico per ottenere una regola easy

and fast.

2. Uso del linguaggio matriciale IML di SAS

3. Uso del Kernel machine presenta alcune scelte soggettive: la funzione

KERNEL e alcuni parametri

4. La scelta del numero di confronti da effettuare nel K-NN

I kernel possono avere diverse forme funzionali

−−= ∑

=

n

k jk

jkk

jr

cxx

12

2)(exp)(ϕ

( )

−−+

=

∑=

n

k jk

jkk

j

r

cxx

12

2

1

1)(ϕ

( )2

12

2

1)(

−−+= ∑

=

n

k jk

jkk

jr

cxxϕ

( )2

12

2

1

1)(

−−+

=

∑=

n

k jk

jkk

j

r

cxxϕ

Gaussians

Cauchy

Multiquadric

Inverse Multiquadric

• Nuovo approccio: teoria della complessità

• Prof. Bozdogan – University of Tennessee

• dal primo dicembre su www.furiocamillo.it

• Cladag2005 – Convegno di Parma (giugno scorso)

)1^(2)(log2)( 1 −+−= FCbLbICOMP

Problems and Future Developments

MOLTO IMPORTANTE: LA COLLABORAZIONE DI

AZIENDE E ENTI

…. e le aziende cosa ne pensano?