Optimizing Online Yield via Predictive Modeling of Individual Site Visitors Magnify360 Liasons:...
-
Upload
sydney-blair -
Category
Documents
-
view
218 -
download
1
Transcript of Optimizing Online Yield via Predictive Modeling of Individual Site Visitors Magnify360 Liasons:...
Optimizing Online Yield via Predictive Modeling
of Individual Site Visitors
Magnify360 Liasons:
Olivier Chaine, Jim Healy, Nate Pool,
Gilles ?????
David LapayowkerMarissa Quitt
Elaine Shaver (PM)Devin Smith
HMC Advisor:
Zachary Dodds
Magnify360
Designs multiple websites for clients with each site customized to meet the needs of different types of users.
Analyzes clickstream data from site visitors in order to provide the website that will best suit each one.
The result is to convert a larger set of users than a single page.
old Facebook new Facebook
System OverviewNavigates to a site
serve pageclickstream data
User Actions
Dataflow
Our system
classify user
Musician
Tailored interactions "Conversion"
results
choose page
• user data• pages served• conversion data
Musician
Pachyphile
Bioengineer
Musician
Pasadena resident
InsomniacUser
groups
Online classifier Offline analysis
clustering
Problem StatementNavigates to a site
serve pageclickstream data
User Actions
Dataflow
Our system
classify user
Musician
Tailored interactions "Conversion"
results
choose page
• user data• pages served• conversion data
Musician
Pachyphile
Bioengineer
Musician
Pasadena resident
InsomniacUser
groups
Online classifier Offline analysis
clustering
Detailed problem statement here
Clickstream Dataexample columns…
Database
80 tables 110,000,000 rows 13 GB
ethics ~ anonymous ~ no purchased data!
User profilesA profile is a binary attribute that captures a specific combination of data values.
Currently 42 of them, hand-specified
insomniac something something
Tradeoffs:+ captures experienced intuition about what is important
+ takes advantage of Magnify360's site-design expertise
- binary attributes- may miss patterns not captured by the user profiles
from Mag360's site
Conversion dataThe site yield, or conversion, is client-specified
Amount of transaction(s)
3% conversion
Time spent on (a part of) the site
Contact information
presence and/or time of an email address
table
Goal: to determine those clusters of visitors who will be best served (convert) via a particular version of a client site
Offline analysis ~ user clustering
Visitors ~ vectors of profile
attributes
hand-tuned clusters
decision-tree clustering
fuzzy k-means clustering
support vector machines
one big cluster ~ "best page"
growing neural gas
hierarchical clustering
Offline analysis ~ user clustering
Visitors ~ vectors of profile
attributes
hand-tuned clusters
decision-tree clustering
fuzzy k-means clustering
support vector machines
one big cluster ~ "best page"
growing neural gashierarchical clustering
Offline analysis ~ user clustering
Visitors ~ vectors of profile
attributes
hand-tuned clusters
decision-tree clustering
fuzzy k-means clustering
support vector machines
one big cluster ~ "best page"
growing neural gashierarchical clustering
Offline analysis ~ user clustering
Visitors ~ vectors of profile
attributes
hand-tuned clusters
decision-tree clustering
fuzzy k-means clustering
support vector machines
one big cluster ~ "best page"
growing neural gashierarchical clustering
Support vector machine example
Can we get one of the real data pages?
This cluster of six people responds better to site B,
Page: AYield: 7 Page: A
Yield: 1
Page: AYield: 1
Page: BYield: 3
Page: BYield: 8
Page: BYield: 7
page A score ~ 3.0
page B score ~ 6.0
+7 1 1+
3 (visits)
+7 8 3+
3 (visits)
From clusters to sitesTraining data from each cluster determines the best site:
(yield)
(yield)
Magnify360 wants to adapt quickly to new preferences:
but site A has had better recent performance.
Page: AYield: 7t: 0
Page: AYield: 1t: 3
Page: AYield: 1t: 4
Page: BYield: 3t: 1 Page: B
Yield: 8t: 5
Page: BYield: 7t: 4
page A score ~ 6.05
page B score ~ 3.68
+ +2-3 • 120 • 7 2-4 • 1
20 + 2-3 + 2-4
+ +2-5 • 82-4 • 7 2-1 • 3
2-4 + 2-5 + 2-1
t ~ age of data
Time-based site choice
Time-weighted average yields:
procedure
Online classification
Possible results…
all on one graph
Results ~ Packet 8
comments
what about hand-tuned system results?
talk about SVM parameters here?
A closer look…
comments
Sensitivity to scoring parameters?
comments
David's charts
Software structure
comments
Diagram
What's done and not done…
Software structure
comments
Diagram
What's done and not done…
Perspective
Concluding comments
Questions?
Clickstream DataThe Good: We have DATA!
Too much?The Bad:
What is this data!?The Ugly:
~ 80 tables
~ 13 GB
One of our tables…
ID, anyone?
Fun Statistics
Data: To do
Understand the purpose of each table / column
Understand relationships between tables
Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms.
(table demodularization, against all design principles)
Clustering Algorithmsk-Means: Choose centroids at random, and place points in cluster such that distances inside clusters are minimized. Recalculate centroids and repeat until a steady state is reached
Fuzzy k-Means: Similar, but every datapoint is in a cluster to some degree, not just in or out.
Heirarchical Clustering: Uses a bottom-up approach to bring together points and clusters that are close together
Bottom line: These clustering algorithms are simple and effective techniques for categorizing data, but they cannot exist in a vacuum; we are investigating other techniques that may be used in parallel or instead.
FuzME's best 10-cluster results ~ synthetic data
Growing Neural Gas
A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines
nodes or “centroids” to represent the data
Growing Neural Gas
A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines
nodes or “centroids” to represent the data
User Profiles
Representative Nodes
Growing Neural Gas
A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines nodes
or “centroids” to represent the data
“Dynamic” because it adds or deletes nodes as necessary, as well as adapting nodes toward changes in the data.
User Profiles
Representative Nodes
How it works…
Find the closest node, s, and the next closest, t. Update the error of s by εw|s – x| Shift s and its neighbors toward x, and increment
the age of all those edges. If s and t are adjacent, set the age of that edge to
0. Otherwise, create that edge. Remove edges that are too old, decrease the
error of all edges by a small amount. Add a node every generations, putting it between the
node with the largest error and its largest-error neighbor. Repeat!
Given some input x:
A Few Parameters…
λ: Controls how frequently new nodes are inserted Max Edge Age: Dictates how often old edges are deleted εw: Factor to scale the value of the “winning” node εn: Factor to scale the value of the next nearest node α: Scale factor for decreasing the error of parent nodes β: Scale factor for decreasing error of all nodes
(Making sense of the GUI)
… and the difference they make.
λ= 1000λ= 100
• Larger λ, nodes inserted less often• Takes longer, but yields more accurate placement of nodes
• Smaller λ, nodes inserted more often • Leaves straggler nodes that don’t accurately match data
Support Vector Machines
Clearly planar
Planar in feature space
Support Vector Regression (Machine?)Goal: Minimize error between hyper-plane and data points.
SVM SVR
Maximize cluster separation Minimize plane-to-data distance
Getting the correct page…
What do we want from a technique?
Input: User data.Output: Page to serve.
Input: User data and possible page.Output: Predicted Success.
Both require multiple SVMs.
CLASSIFICATION:
REGRESSION:
Using Classification via SVMs
Predicted Page:
CDATA
C
B
C
Using Regression via SVRs
Page APredictor
Page BPredictor
Page CPredictor
0.42
0.24
0.78
Predicted Page:
CDATA
DataThe Good: We have DATA!
Too much?The Bad:
What is this data!?The Ugly:
~ 80 tables
~ 13 GB
One of our tables…
ID, anyone?
Fun Statistics
Data: To do
Understand the purpose of each table / column
Understand relationships between tables
Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms.
(table demodularization, against all design principles)
Goal Breakdown
Short-term Plan
Plan for Algorithm Comparison
Plan for Algorithm Comparison
Plan for Algorithm Comparison
Schedule and Conclusion
Friday November 14 Prototype algorithm comparison method
Friday November 21 Initial testing on real data Meeting with Magnify360
Friday December 5 Initial composition of classification algorithms
Friday December 12 Midyear Report
Questions?
Questions?
SVM vs SVR
SVM SVR
Maximize Distance Minimize Distance
Data
The Bad, or, The Challenges:
Lots of SQL data
Some Data Tables
80 tables total…
Data Size
Problem StatementOfficially: Develop an innovative predictive modeling system to predict shopping cart abandonment based on profiles, clusters, shopping cart contents
Most importantly: GRAB from email ! Research and implement various AI techniques to optimize the process of matching users with websites
Individualized Online Experiences
Classifying Users
Unsupervised clustering: points are clustered without knowledge of the results
Supervised clustering: clusters are built using prior knowledge of the results
Ethical concerns?
Recap: What Magnify360 Does
Individualize a website for different types of users
Collect data on users from their clickstream, and give them the site that will appeal to them best
Appeal to a larger base of users by making the site more interesting to a larger group
serving both!old Facebook