Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph
-
Upload
paul-lam -
Category
Technology
-
view
106 -
download
1
description
Transcript of Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph
![Page 1: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/1.jpg)
Customer Behaviour Analytics: Billions of Events to
one Customer-Product GraphStrata London, 12th November 2013
Presented by Paul Lam
![Page 2: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/2.jpg)
About Paul LamJoined uSwitch.com as first Data Scientist in 2010
• developed internal data products• built distributed data architecture• team of 3 with a developer and a statistician
Code contributor to various open source tools • Cascalog, a big data processing library built on top of
Cascading (comparable to Apache Pig)• Incanter, a statistical computing platform in Clojure
Author of Web Usage Mining: Data Mining Visitor Patterns From Web Server Logs* to be published in late 2014
* tentative title
![Page 3: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/3.jpg)
What is it
![Page 4: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/4.jpg)
Customer-Product Graph
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
[:BUY {:timestamp 2013-11-01 13:00:00}]
boughtviewed
![Page 5: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/5.jpg)
Question: Who bought an iPhone?
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
boughtviewed
![Page 6: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/6.jpg)
Query: Who bought an iPhone?
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
X
START x=node:node_auto_index(product='iPhone')
MATCH (person)-[:BUY]->(x)
RETURN person
![Page 7: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/7.jpg)
Question: What else did they buy?
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
X
Y
boughtviewed
![Page 8: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/8.jpg)
Query: What else did they buy?
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
X
START x=node:node_auto_index(product='iPhone')
MATCH (x)<-[:BUY]-(person)-[:BUY]->(y)
RETURN y
![Page 9: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/9.jpg)
Hypothesis: People that buy X has interest in Y
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
X
Y
boughtviewed
![Page 10: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/10.jpg)
Query: Who to recommend Y
{:name “Bob”}
{:name “Tom”}
{:name “Emily”}
{:name “Lily”}
{:product “iPhone”}
{:product “American Express”}
X
Y
START x=node:node_auto_index(product='iPhone'), y=node:node_auto_index(product='American Express')
MATCH (p)-[:BUY]->(x), (p)-[:VIEW]->(y)WHERE NOT (p)-[:BUY]->(y)
RETURN p
Haven’t bought AE
Lookedat AE
![Page 11: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/11.jpg)
Product Recommendation by Reasoning Example
1. Start with an idea2. Trace to connected nodes3. Identify patterns from viewpoint of those nodes4. Repeat from #1 until discovering actionable item5. Apply pattern
Interactive demo at http://bit.ly/customer_graph
![Page 12: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/12.jpg)
Challenge: Event Data to Graph Data
User ID Product ID Action
Bob iPhone Bought
Tom iPhone Bought
Emily iPhone Bought
Bob AE Bought
Emily AE Viewed
Lily AE Bought
![Page 13: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/13.jpg)
Why should you care
![Page 14: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/14.jpg)
14
Customer Journey
![Page 15: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/15.jpg)
Interest Desire
Action
Purchase Funnel
15
![Page 16: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/16.jpg)
16
Understanding Customer Intents
InterestDesire
Action
![Page 17: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/17.jpg)
17
Customer Experience as a Graph
![Page 18: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/18.jpg)
18
http://insideintercom.io/criticism-and-two-way-streets/
![Page 19: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/19.jpg)
19
A Feedback System to Drive Profit
![Page 20: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/20.jpg)
Minimise effort between Q & A
Question
Answer Data Interrogation
![Page 21: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/21.jpg)
One Approach: Make data querying easier
Query = Function(Data) [1]
~ Function(Data Structure)
[1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
![Page 22: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/22.jpg)
Data Structure: Relations versus Relations
User ID Name
1 Bob
2 Emily
Product ID Name
1 iPhone
2 A.E.
Sale ID User ID Product ID Profit
1 1 1 £100
2 1 2 £50 {:profit £100}
{:profit £50}
aka Edges
![Page 23: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/23.jpg)
Using the right database for the right task
RDBMS Graph DB
Data
Model
Relation
Example use
Attributes Entities and relations
Record-based Associative
By-product of normalisation First class citizen
Reporting Reasoning
![Page 24: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/24.jpg)
How does it work
![Page 25: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/25.jpg)
User actions as time-stamped records
HDFS
time
Paul Ingles, “User as Data”, Euroclojure 2012
![Page 26: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/26.jpg)
Our User Event to Graph Data Pipeline
Graph DatabaseHDFS Reshape
time
![Page 27: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/27.jpg)
From Input to Output with Hadoop, aka ETL step
Graph DatabaseHDFS Reshape
4... The 80%
2. input data 3. output data
1. interface
![Page 28: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/28.jpg)
Hadoop interface to Neo4J
Graph DatabaseHDFS Reshape
• Cascading-Neo4j tap [1]
• Faunus Hadoop binaries [2]
• CSV files*• etc.
[1] http://github.com/pingles/cascading.neo4j[2] http://thinkaurelius.github.io/faunus/
![Page 29: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/29.jpg)
Input data stored on HDFS
time
1 2 3 4User Timestamp Viewed Page Referrer
Paul 2013-11-01 13:00 /homepage/ google.comPaul 2013-11-01 13:01 /blog/ /homepage/User Timestamp Viewed Product Price Referrer
Paul 2013-11-01 13:04 iPhone £500 /blog/User Timestamp Purchased Paid Attrib.
Paul 2013-11-01 13:05 iPhone £500 google.comUser Landed Referral Email
Paul 2013-11-01 13:00 google.com [email protected]
12
3
4
![Page 30: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/30.jpg)
Nodes and Edges CSVs to go into a property graphNode ID Properties
1 {:name “Paul”, email: “[email protected]”}
2 {:domain “google.com”}
3 {:page “/homepage/”}
... ...
5 {:product “iPhone”}
From To Type Properties
1 2 :SOURCE {:timestamp “2013-11-01 13:00”}
1 3 :VIEWED {:timestamp “2013-11-01 13:00”}
... ... ... ...
1 5 :BOUGHT {:timestamp “2013-11-01 13:05”}
![Page 31: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/31.jpg)
31
Records to Graph in 3 Steps
1. Design graph
2. Extract Nodes
3. Build Relations
^ importable CSV
![Page 32: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/32.jpg)
32
Step 1: Designing your graph
{:name “Paul” :email “[email protected]}
{:product “iPhone”}
{:domain “google.com”}
{:page “homepage”}
{:page “blog”} :source {:timestamp ... }
:viewed { ... }
:bought {:timestamp ... }:viewed {:timestamp ... }
:viewed { ... }
![Page 33: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/33.jpg)
Step 2: Extract list of entity nodes
User Timestamp Purchased Paid Attrib.
Paul 2013-11-01 13:05 iPhone £500 google.com
User Landed Referral Email
Paul 2013-11-01 13:00 google.com [email protected]
User Timestamp Viewed Page Referrer
Paul 2013-11-01 13:00 /homepage/ google.comPaul 2013-11-01 13:01 /blog/ /homepage/
User Timestamp Viewed Product Price Referrer
Paul 2013-11-01 13:04 iPhone £500 /blog/
![Page 34: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/34.jpg)
User Landed Referral Email
Paul 2013-11-01 13:00 google.com [email protected]
Step 3: Building node-to-node relations
User Timestamp Purchased Paid Attrib.
Paul 2013-11-01 13:05 iPhone £500 google.com
User Timestamp Viewed Page Referrer
Paul 2013-11-01 13:00 /homepage/ google.comPaul 2013-11-01 13:01 /blog/ /homepage/
User Timestamp Viewed Product Price Referrer
Paul 2013-11-01 13:04 iPhone £500 /blog/
![Page 35: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/35.jpg)
Do this across all customers and products
Use your data processing tool of choice: • Apache Hive• Apache Pig• Cascading
• Scalding• Cascalog
• Spark• your favourite programming language
Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
and more ...
![Page 36: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/36.jpg)
36
Cascalog code to build user nodes• 145 lines of Cascalog code in production• a couple hundred lines more of utility functions• build entity nodes and meta nodes• sink data into database with Cascading-Neo4j Tap
![Page 37: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/37.jpg)
37
Cascalog code to build user nodes• 145 lines of Cascalog code in production• a couple hundred lines more of utility functions• build entity nodes and meta nodes• sink data into database with Cascading-Neo4j Tap
![Page 38: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/38.jpg)
38
Code to build user to product click relations
• 160 lines of Cascalog code in production• + utility functions• build direct and categoric relations• sink data with Cascading-Neo4j Tap
![Page 39: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/39.jpg)
39
Code to build user to product click relations
• 160 lines of Cascalog code in production• + utility functions• build direct and categoric relations• sink data with Cascading-Neo4j Tap
![Page 40: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/40.jpg)
Our User Event to Graph Data Pipeline
Graph DatabaseHDFS Reshape
time
![Page 41: Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph](https://reader034.fdocuments.us/reader034/viewer/2022051819/54c69d714a7959d9668b457a/html5/thumbnails/41.jpg)
Summary
Graph HDFS Reshape