Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
-
Upload
mongodb -
Category
Technology
-
view
163 -
download
1
Transcript of Big Data Analytics 1: Driving Personalized Experiences Using Customer Profiles
Driving Personalized Experiences Using Customer Profiles
Matt Kalan
Sr. Solution Architect
MongoDB, Inc.
@matthewkalan
2
Big Data Analytics Track
1. Driving Personalized Experiences Using Customer Profiles
2. Leveraging Customer Behavior to Enhance Relevancy in Personalization
3. Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB
3
Agenda For This Session
1.Benefits of Personalization
2.High level process
3.Data capture steps
4.Data analysis steps
5.Real-time personalization
6.Summary
7.Q&A
4
You Notice When Content is Personalized
When it looks like this outside
Left: from www.johnbyronkuhner.com via Google ImagesRight: from www.steinmart.com via Google Images
Is this the best ad to show you?
5
Or Better This
When it looks like this outside
Left: from www.johnbyronkuhner.com via Google ImagesRight: www.linkedin.com/pulse/20140729161519-34678510-take-note-time-to-move-beyond-personalization-to-contextualization
More relevant
8
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public dataCommon technologies• R• Hadoop• Spark• Python• Java• Many other
options4 & 5 performed much less often than tagging
9
Why MongoDB for Personalization?
• Document model => customer profiles are rich structures perfect for documents
• High throughput => profiles are read/written every page so high performance is critical
• High scalability => high performance must scale easily for any data size & request volume
• Rich querying & indexes => often only portions of the profile are queried for and especially ad hoc marketing requires rich querying capabilities. Geospatial indexes critical for mobile
• Real-time analytics => can analyze directly on MongoDB or prepare aggregated results for external analysis with the aggregation framework
• Strong consistency => want profile changes & tracking to take effect immediately
• Hadoop/Spark integration => can run distributed analytics on data in MongoDB or copy it to HDFS to run there both with the MongoDB Hadoop Connector
• Low TCO => Low cost enterprise software license, commodity hardware, & management
10
Customer Example: Scratchpad
• Records all activity in researched trips
• Needed – Document
model– Dynamic
schema– Rich
querying– Easy scaling
11
And Many Other Customers Personalizing with MongoDB
• Sailthru
• Sitecore
• Adobe (AEM)
• Expedia
• ADP
• Foursquare
• Otto
• Chico’s
and 100s more…
13
Anonymous user
Might just start with this if no cookie
{"ipAddress" : "216.58.219.238","referrer" : "google.com"
}
Pretty useless, right?
14
More Than Just What You Collect
IP Address
Referrer
Information Broker
Location
Company
Weather
Avg Income
Interests
Possible Interests
e.g. Kay Jewelers, Dick’s Sporting Goods
Budget Indicatione.g. Barney’s
Search term
15
Often User Creates a Profile
{"_id" : ObjectId("553ea57b588ac9ef066428e1"),"ipAddress" : "216.58.219.238","referrer" : ”kay.com","firstName" : "John","lastName" : "Doe","email" : "[email protected]"
}
17
Available Early in Relationship
{"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName” : "John","lastName” : "Doe","address” : "229 W. 43rd St.","city” : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" : "johndoe@gmail",
"gender" : "male”}
19
Easy to Store in Profile
{"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName” : "John","lastName” : "Doe","address” : "229 W. 43rd St.","city” : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" : "[email protected]","gender" : "male”,
"interests" : [”dumplings",”board games",”rooftop",”ginger beer",”ahi tuna",”healthy food"
]}
21
Customer Activity Valuable to Track
{"_id”:
ObjectId("553e7dca588ac9ef066428e0"),"firstName : "John","lastName” : "Doe","address” : "229 W. 43rd St.","city” : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" :
"[email protected]","gender" : "male”,
...
"visitedCounts" : {"watches" : 3,"shirts" : 1,"sunglasses" : 1,"bags" : 2
}}
From gilt.com
22
Purchases Are Usually Even More Valuable
{"_id”:
ObjectId("553e7dca588ac9ef066428e0"),"firstName : "John","lastName” : "Doe","address” : "229 W. 43rd St.","city” : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" :
"[email protected]","gender" : "male”,
...
"purchases" : [{
"id" : 1,"desc" : "Power
Oxford Dress Shoe","category" : "Mens
shoes"},{
"id" : 2,"desc" : "Striped
Sportshirt","category" : "Mens
shirts"}
]}
From gilt.com
23
Data Capture – Simple to Sophisticated{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName" : "John","lastName" : "Doe","address" : "229 W. 43rd St.","city" : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" : "[email protected]","twitterHandle" : "johndoe","gender" : "male","interests" : [
"electronics","basketball","weightlifting","ultimate frisbee","traveling","technology"
],"visitedCounts" : {
"watches" : 3,"shirts" : 1,"sunglasses" : 1,"bags" : 2
},"purchases" : [
{"id" : 1,"desc" : "Power Oxford Dress
Shoe","category" : "Mens shoes"
},{
"id" : 2,"desc" : "Striped Sportshirt","category" : "Mens shirts"
}]
}
Additional behavior tracking• How long on each page (e.g. publishing)?• What is reaction to pop-up promotions?• Looks at cross-sold items on page?• What categories are clicked on?• Does a certain price point drive buying?• Purchases at certain times of year?
25
Clustering Overview
• Think of each of your customers or users of your site as a data point• How can we group users into like sets for marketing, cross-sell, etc. similarly• K-means is a common algorithm for clustering
Image from: http://pypr.sourceforge.net/kmeans.html
Clustered DataOriginal Unclustered Data
26
Clustering Process for Personalization
Customer Profile Documents
Map to Vectors[1, 3, 0, …]
Clustering AlgoVectors
Iterate on inputs
Define Personas
Clusters of customersUpdate profiles with persona
Tag Profiles with Personas
Clusters of customers
27
Mapping Profile to Vector Input
{"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName" : "John",
...
"visitedCounts" : {”Mens watches" : 3,”Mens shirts" : 1,”Mens sunglasses" : 1,”Mens bags" : 2},"purchases" : [{"id" : 1,"desc" : "Power Oxford Dress Shoe","category" : "Mens shoes"},{"id" : 2,"desc" : "Striped Sportshirt","category" : "Mens shirts"}]
}
Mens shirts Mens pants Mens shoes
Mens tiesMens
SunglassMens Watch …
11 0 10 0 1 3
[ 11, 0, 10, 0, 1, 3, ...]
(example vector)
e.g. 1 purchase = 10 visited counts
28
Aggregation Framework for Filtering Profiles
//Adds up the visited counts (vc) and purchases to filter out those below 20 countsdb.profiles.aggregate( [
{$project: {
vc: "$vc",purchases: "$purchases",total: {$add: [
{$ifNull: ["$vc.mShirts", 0]}, {$ifNull: ["$vc.mPants", 0]}, {$ifNull: ["$vc.mShoes", 0]}, {$ifNull: ["$vc.mTies", 0]}, {$ifNull: ["$vc.mSunglass", 0]}, {$ifNull: ["$vc.mWatch", 0]}, {$ifNull: ["$vc.mBags", 0]}, {$multiply: [ {$size: "$purchases"}, 10 ]}
]}}
}, {$match:
{total: {$gte: 20}}}
])
29
Input/Output for K-Means Algo
Clustering Algo
Iterate on inputs
Clusters of customers
Vectors: [[11, 0, 10, 0, 1, 3, ...],[ 0, 5, 10, 3, 0, 0, ...],...
]
K = # of clusters
Driven by marketing effort or data analysis
N = # of iterations
{ Centers: [
{name: C1, vector:[..] }, {name: C2, vector:[..] }], ...
]Clusters: [
{C1: [[11, 0, 10, 0, 1, 3, ...],...]},{C2: [[ 0, 5, 0, 0, 10, 0, ...],...]},...
]}
Vectors
30
Clustered DataOriginal Unclustered Data
Choosing Personas
• Each cluster would usually map to one persona you can identify, name, and target• Common to name personas to be memorable, e.g. shoe fanatic, bargain hunter, researcher, etc.
C1
C2
C3Shoe Fanatic?
31
Mapping Customer Profile to Persona
{ Centers: [
{name: C1, vector:[..] }, {name: C2, vector:[..] }], ...
]Clusters: [
{C1: [[11, 0, 10, 0, 1, 3, ...],...]},{C2: [[ 0, 5, 0, 0, 10, 0, ...],...]},...
]}
{"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName" : "John",
...
"visitedCounts" : {”Mens watches" : 3,”Mens shirts" : 1,”Mens sunglasses" : 1,”Mens bags" : 2},"purchases" : [{"id" : 1,"desc" : "Power Oxford Dress Shoe","category" : "Mens shoes"},{"id" : 2,"desc" : "Striped Sportshirt","category" : "Mens shirts"}],"persona" : "shoe-fanatic"
}
Loop through each vector in cluster, map to
customer, and tag customer with persona
33
Easier with a Rich Customer Profile to Personalize{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),"firstName" : "John","lastName" : "Doe","address" : "229 W. 43rd St.","city" : "New York","state" : "NY","zipCode" : "10036","age" : 30,"email" : "[email protected]","twitterHandle" : "johndoe","gender" : "male","interests" : [
"electronics","basketball","weightlifting","ultimate frisbee","traveling","technology"
],"visitedCounts" : {
"watches" : 3,"shirts" : 1,"sunglasses" : 1,"bags" : 2
},"purchases" : [
{"id" : 1,"desc" : "Power Oxford Dress
Shoe","category" : "Mens shoes"
},{
"id" : 2,"desc" : "Striped Sportshirt","category" : "Mens shirts"
}],"persona" : "shoe-fanatic”
}
35
Many Personalization Techniques to Mix & Match
• Related content
• Content history
• Next best offer
• Trigger-based
• Threshold
• Last behavior
• Time & event
• Offer matching
• Filter-based
• Crowd-sourcing
• Voice of customer
• User-directed
• Persona matching
Source: http://semphonic.blogs.com/semangel/2014/03/strategies-for-personalization-delivering-an-extra-unexpected-treat-.html
36
Alternatives Give Less Capabilities
Activity LogsCustomer Profiles
(no activity)
Application
Option - separate weblogs
Customer Profiles with Activity Tracking
Application
Better option
Tag with Persona
Marketing
Clustering & Analytics
Can market:• On activity today• With rich & specific
queries
37
Better Option Enables Real-time Persona Matching
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public data
Can even match customer to a persona while customer is engaged
Logic is to calculate the distance to each cluster center and tag with the closest one’s persona
40
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public dataCommon technologies• R• Hadoop• Spark• Python• Java• Many other
options4 & 5 performed much less often than tagging
41
Big Data Analytics Track
Driving Personalized Experiences Using Customer Profiles
2. Leveraging Customer Behavior to Enhance Relevancy in Personalization
3. Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB