1 Electronic Commerce 2 High-Level Overview uThe course presents basic algorithmic techniques,...
-
Upload
betty-gaines -
Category
Documents
-
view
212 -
download
0
Transcript of 1 Electronic Commerce 2 High-Level Overview uThe course presents basic algorithmic techniques,...
1
Electronic Commerce
2
High-Level Overview
The course presents basic algorithmic techniques, considered to be fundamental to state of the art e-commerce, as captured by executives in the CEO/CTO/VP Business Development level in B2B and B2C companies:
An introduction to the science behind
Google, Amazon, and ebay.
3
High-Level Overview
Background required: Algorithms, and basic principles of
computer science Basic mathematical background in
algebra and probability. Exposure to the Internet.
4
High-Level Overview Discovering buyers and sellers
Buyers finding sellers• Search engines
Sellers finding buyers• Data mining• Recommender systems
Making a deal Auctions
Executing the deal Payments, security
5
Searching for sellers
6
Finding Sellers
A major use of search engines is finding pages that offer an item for sale.
How do search engines find the right pages?
We’ll study: Google’s PageRank technique and other
“tricks” “Hubs and authorities.”
7
Page Rank
Intuition: solve the recursive equation: “a page is important if important pages link to it.”
In technical terms: compute the principal eigenvector of the stochastic matrix of the Web. A few fixups needed.
8
Stochastic Matrix of the Web
Enumerate pages. Page i corresponds to row and column
i. M[i,j] = 1/n if page j links to n pages,
including page i; 0 if j does not link to i. Seems backwards, but allows
multiplication by M on the left to represent “follow a link.”
9
Example
i
j
Suppose page j links to 3 pages, including i
1/3
10
Random Walks on the Web
Suppose v is a vector whose i-th component is the probability that we are at page i at a certain time.
If we follow a link from i at random, the probability distribution of the page we are then at is given by the vector Mv.
11
The multiplicationp11 p12 p13 p1
p21 p22 p23 X p2
p31 p32 p33 p3
If the probability that we are in page i is pi, then in the next iteration p1 will be the probability we are in page 1 and will stay there + the probability we are in page 2 times the probability of moving from 2 to 1 + the probability that we are in page 3 times the probability of moving from 3 to 1:
p11 x p1 + p12 x p2+ p13 x p3
12
Random Walks 2
Starting from any vector v, the limit M(M(…M(Mv)…)) is the distribution of page visits during a random walk.
Intuition: pages are important in proportion to how often a random walker would visit them.
The math: limiting distribution = principal eigenvector of M = PageRank.
13
Example: The Web in 1839
Yahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 1m 0 1/2 0
y a m
14
Simulating a Random Walk
Start with the vector v = [1,1,…,1] representing the idea that each Web page is given one unit of “importance.”
Repeatedly apply the matrix M to v, allowing the importance to flow like a random walk.
Limit exists, but about 50 iterations is sufficient to estimate final distribution.
15
Example
Equations v = Mv: y = y/2 + a/2 a = y/2 + m m = a/2
ya =m
111
13/21/2
5/4 13/4
9/811/81/2
6/56/53/5
. . .
16
Solving The Equations
These 3 equations in 3 unknowns do not have a unique solution.
Add in the fact that y+a+m=3 to solve.
In Web-sized examples, we cannot solve by Gaussian elimination (we need to use other solution (relaxation = iterative solution).
17
Real-World Problems
Some pages are “dead ends” (have no links out). Such a page causes importance to
leak out. Other (groups of) pages are spider
traps (all out-links are within the group). Eventually spider traps absorb all
importance.
18
Microsoft Becomes Dead EndYahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 0m 0 1/2 0
y a m
19
Example
Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2
ya =m
111
11/21/2
3/41/21/4
5/83/81/4
000
. . .
20
M’soft Becomes Spider Trap
Yahoo
M’softAmazon
y 1/2 1/2 0a 1/2 0 0m 0 1/2 1
y a m
21
Example
Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2 + m
ya =m
111
11/23/2
3/41/27/4
5/83/82
003
. . .
22
Google Solution to Traps, Etc.
“Tax” each page a fixed percentage at each iteration. This percentage is also called “damping factor”.
Add the same constant to all pages. Models a random walk in which surfer
has a fixed probability of abandoning search and going to a random page next.
23
Ex: Previous with 20% Tax
Equations v = 0.8(Mv) + 0.2: y = 0.8(y/2 + a/2) + 0.2 a = 0.8(y/2) + 0.2 m = 0.8(a/2 + m) + 0.2
ya =m
111
1.000.601.40
0.840.601.56
0.7760.5361.688
7/11 5/1121/11
. . .
24
Solving the Equations
We can expect to solve small examples by Gaussian elimination.
Web-sized examples still need to be solved by more complex (relaxation) methods.
25
Search-Engine Architecture
All search engines, including Google, select pages that have the words of your query.
Give more weight to the word appearing in the title, header, etc.
Inverted indexes speed the discovery of pages with given words.
26
Google Anti-Spam Devices
Early search engines relied on the words on a page to tell what it is about. Led to “tricks” in which pages attracted
attention by placing false words in the background color on their page.
Google trusts the words in anchor text Relies on others telling the truth about
your page, rather than relying on you.
27
Use of Page Rank
Pages are ordered by many criteria, including the PageRank and the appearance of query words. “Important” pages more likely to be
what you want. PageRank is also an antispam device.
Creating bogus links to yourself doesn’t help if you are not an important page.
28
Discussion
Dealing with incentives Several types of links Page ranking as voting
29
Hubs and Authorities
Distinguishing Two Roles for Pages
30
Hubs and Authorities
Mutually recursive definition: A hub links to many authorities; An authority is linked to by many hubs.
Authorities turn out to be places where information can be found. Example: information about how to use a
programming language Hubs tell who the authorities are.
Example: a catalogue of sources about programming languages
31
Transition Matrix A
H&A uses a matrix A[i,j] = 1 if page i links to page j, 0 if not.
A’, the transpose of A, is similar to the PageRank matrix M, but A’ has 1’s where M has fractions.
32
Example
Yahoo
M’softAmazon
y 1 1 1a 1 0 1m 0 1 0
y a m
A =
33
Using Matrix A for H&A
Let h and a be vectors measuring the “hubbiness” and authority of each page.
Equations: h = Aa; a = A’ h. Hubbiness = scaled sum of
authorities of linked pages. Authority = scaled sum of hubbiness
of linked predecessors.
34
Consequences of Basic Equations
From h = Aa; a = A’ h we can derive: h = AA’ h a = A’Aa
Compute h and a by iteration, assuming initially each page has one unit of hubbiness and one unit of authority.
There are different normalization techniques (after each iteration in an iterative procedure; other implementation is “normalization at end”).
35
The multiplication
1 1 1 a1 h1
1 0 1 x a2 = h2
0 1 0 a3 h3
In order to know the hubbiness of page 2, h2, we need to add up the level of authority of the pages it points to (1 and 3).
36
The multiplication
1 1 0 h1 a1
1 0 1 x h2 = a2
1 1 0 h3 a3
In order to know the level authority of page 3, a3, we need to add up the amount of hubbiness of the pages that point to it (1 and 2).
37
Example
1 1 1A = 1 0 1 0 1 0
1 1 0A’ = 1 0 1 1 1 0
3 2 1AA’= 2 2 0 1 0 1
2 1 2A’A= 1 2 1 2 1 2
a(yahoo)a(amazon)a(m’soft)
===
111
545
241824
114 84114
. . .
. . .
. . .
1+sqrt(3)21+sqrt(3)
h(yahoo) = 1h(amazon) = 1h(m’soft) = 1
642
132 96 36
. . .
. . .
. . .
1.0000.7350.268
2820 8
38
Solving the Equations
Solution of even small examples is tricky.
As for PageRank, we need to solve big examples by relaxation.
39
Approaching potential buyers and algorithmic
data mining
40
Data Mining: Associations
Frequent itemsets, market baskets A-priori algorithm Hash-based improvements One- or two-pass approximations High-correlation mining
41
Purpose
If people tend to buy A and B together, then a buyer of A is a good target for an advertisement for B.
The same technology has other uses, such as detecting plagiarism and organizing the Web.
42
The Market-Basket Model
A large set of items, e.g., things sold in a supermarket.
A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day.
43
Support
Simplest question: find sets of items that appear “frequently” in the baskets.
Support for itemset I = the number of baskets containing all items in I.
Given a support threshold s, sets of items that appear in >= s baskets are called frequent itemsets.
44
Example Items={milk, coke, pepsi, beer,
juice}. Support = 3 baskets.
B1 = {m, c, b} B2 = {m, p, j}B3 = {m, b} B4 = {c, j}B5 = {m, p, b} B6 = {m, c, b, j}B7 = {c, b, j} B8 = {b, c}
Frequent itemsets: {m}, {c}, {b}, {j}, {m, b}, {c, b}, {j, c}.
45
Applications 1
Real market baskets: chain stores keep terabytes of information about what customers buy together. Tells how typical customers navigate
stores, lets them position tempting items. Suggests tie-in “tricks,” e.g., run sale on
hamburger and raise the price of ketchup. High support needed, or no $$’s .
46
Applications 2
“Baskets” = documents; “items” = words in those documents. Let us find words that appear together
unusually frequently, i.e., linked concepts. “Baskets” = sentences, “items” =
documents containing those sentences. Items that appear together too often
could represent plagiarism.
47
Applications 3
“Baskets” = Web pages; “items” = linked pages. Pairs of pages with many common
references may be about the same topic.
“Baskets” = Web pages p ; “items” = pages that link to p . Pages with many of the same links may
be mirrors or about the same topic.
48
Scale of Problem
WalMart sells 100,000 items and can store hundreds of millions of baskets.
The Web has 100,000,000 words and several billion pages.
49
Association Rules
If-then rules about the contents of baskets.
{i1, i2,…, ik} -> j Means: “if a basket contains all of i1,
…,ik, then it is likely to contain j. Confidence of this association rule
is the probability of j given i1,…,ik.
50
Example
B1 = {m, c, b} B2 = {m, p, j}B3 = {m, b} B4 = {c, j}B5 = {m, p, b} B6 = {m, c, b, j}B7 = {c, b, j} B8 = {b, c}
An association rule: {m, b} -> c. Confidence = 2/4 = 50%.
+__ +
51
Finding Association Rules
A typical question is “find all association rules with support >= s and confidence >= c.”
The hard part is finding the high-support itemsets. Once you have those, checking the
confidence of association rules involving those sets is relatively easy.
52
Computation Model
Typically, data is kept in a “flat file” rather than a database system. Stored on disk. Stored basket-by-basket.
• Expand baskets into pairs, triples, etc. as you read baskets.
True cost = # of Disk I/O’s. Count # of passes through the data.
53
Main-Memory Bottleneck
In many algorithms to find frequent itemsets we need to worry about how main-memory is used. As we read baskets, we need to count
something, e.g., occurrences of pairs. The number of different things we can
count is limited by main memory. Swapping counts in/out is a disaster.
54
Finding Frequent Pairs
The hardest problem often turns out to be finding the frequent pairs.
We’ll concentrate on how to do that, then discuss extensions to finding frequent triples, etc.
55
Naïve Algorithm
A simple way to find frequent pairs is: Read file once, counting in main
memory the occurrences of each pair.• Expand each basket of n items into its
n(n-1)/2 pairs.
Fails if #items-squared exceeds main memory.
56
A-Priori Algorithm 1
A two-pass approach called a-priori limits the need for main memory.
Key idea: monotonicity : if a set of items appears at least s times, so does every subset. Converse for pairs: if item i does not
appear in s baskets, then no pair including i can appear in s baskets.
57
A-Priori Algorithm 2 Pass 1: Read baskets and count in main
memory the occurrences of each item. Requires only memory proportional to #items.
Pass 2: Read baskets again and count in main memory only those pairs both of which were found in Pass 1 to have occurred at least s times. Requires memory proportional to square of
frequent items only.
58
Picture of A-Priori
Item counts
Pass 1 Pass 2
Frequent items
Counts ofcandidate pairs
59
PCY Algorithm 1
Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory
is idle. Use that memory to keep counts of
buckets into which pairs of items are hashed. Just the count, not the pairs themselves.
Gives extra condition that candidate pairs must satisfy on Pass 2.
60
Picture of PCY
Hashtable
Item counts
Bitmap
Pass 1 Pass 2
Frequent items
Counts ofcandidate pairs
61
PCY Algorithm 2 PCY Pass 1:
Count items. Hash each pair to a bucket and increment
its count by 1. PCY Pass 2:
Summarize buckets by a bitmap : 1 = frequent (count >= s ); 0 = not.
Count only those pairs that (a) are both frequent and (b) hash to a frequent bucket.
62
Multistage Algorithm
Key idea: After Pass 1 of PCY, rehash only those pairs that qualify for Pass 2 of PCY.
On middle pass, fewer pairs contribute to buckets, so fewer false drops --- buckets that have count s , yet no pair that hashes to that bucket has count s .
63
Multistage Picture
Firsthash table
Secondhash table
Item counts
Bitmap 1 Bitmap 1
Bitmap 2
Freq. items Freq. items
Counts ofCandidate pairs
64
Finding Larger Itemsets We may proceed beyond frequent pairs to
find frequent triples, quadruples, . . . Key a-priori idea: a set of items S can only be
frequent if S - {a } is frequent for all a in S . The k th pass through the file is counts the
candidate sets of size k : those whose every immediate subset (subset of size k - 1) is frequent.
Cost is proportional to the maximum size of a frequent itemset.
65
Low-Support, High-Correlation
Finding rare, but very similar items
66
Assumptions
1. Number of items allows a small amount of main-memory/item.
2. Too many items to store anything in main-memory for each pair of items.
3. Too many baskets to store anything in main memory for each basket.
4. Data is very sparse: it is rare for an item to be in a basket.
67
Applications
While marketing may require high-support, or there’s no money to be made, mining customer behavior is often based on correlation, rather than support. Example: Few customers buy Handel’s
Watermusick, but of those who do, 20% buy Bach’s Brandenburg Concertos.
68
Matrix Representation
Columns = items. Baskets = rows. Entry (r , c ) = 1 if item c is in
basket r ; = 0 if not. Assume matrix is almost all 0’s.
69
In Matrix Formm c p b j
{m,c,b} 1 1 0 1 0{m,p,b} 1 0 1 1 0{m,b} 1 0 0 1 0{c,j} 0 1 0 0 1{m,p,j} 1 0 1 0 1{m,c,b,j} 0 1 1 1 1{c,b,j} 0 1 0 1 1{c,b} 0 1 0 1 0
70
Similarity of Columns
Think of a column as the set of rows in which it has 1.
The similarity of columns C1 and C2, sim (C1,C2), is the ratio of the sizes of the intersection and union of C1 and C2. (Jaccard measure)
Goal of finding correlated columns becomes finding similar columns.
71
Finding similar columns
Non-trivial algorithms (e.g. minhash) are used due to the fact that we have storage problems as mentioned before.
72
Summary
Finding frequent pairs: A-priori --> PCY (hashing) -->
multistage. Finding all frequent itemsets Finding similar pairs:
Minhash
73
Clustering
74
The Problem of Clustering
Given a set of points, with a notion of distance between points, group the points into some number of clusters, so that members of a cluster are in some sense as nearby as possible.
75
Example
x xx x x xx x x x
x x xx x
xxx x
x x x x x
xx x x
x
x xx x x x x x x
x
x
x
76
Applications
E-Business-related applications of clustering tend to involve very high-dimensional spaces. The problem looks deceptively easy in
a 2-dimensional, Euclidean space.
77
Example: Clustering CD’s
Intuitively, music divides into categories, and customers prefer one or a few categories. But who’s to say what the categories really
are?
Represent a CD by the customers who bought it.
Similar CD’s have similar sets of customers, and vice-versa.
78
The Space of CD’s
Think of a space with one dimension for each customer. Values 0 or 1 only in each dimension.
A CD’s point in this space is (x1,x2,…,xk), where xi = 1 iff the i th customer bought the CD. Compare with the “correlated items”
matrix: rows = customers; cols. = CD’s.
79
Distance Measures
Two kinds of spaces: Euclidean: points have a location in space, and
dist(x,y) = sqrt(sum of square of difference in each dimension).
• Some alternatives, e.g. Manhattan distance = sum of magnitudes of differences.
Non-Euclidean: there is a distance measure giving dist(x,y), but no “point location.”
• Obeys triangle inequality: d(x,y) < d(x,z)+d(z,y).• Also, d(x,x) = 0; d(x,y) > 0; d(x,y) = d(y,x).
80
Examples of Euclidean Distances
x = (5,5)
y = (9,8)L2-norm:dist(x,y) =sqrt(42+32)= 5
L1-norm:dist(x,y) =4+3 = 7
4
35
81
Non-Euclidean Distances
Jaccard measure for binary vectors = ratio of intersection (of components with 1) to union.
Cosine measure = angle between vectors from the origin to the points in question.
82
Jaccard Measure
Example: p1 = 00111; p2 = 10011. Size of intersection = 2; union = 4,
J.M. = 1/2. Need to make a distance function
satisfying triangle inequality and other laws.
dist(p1,p2) = 1 - J.M. works. dist(x,x) = 0, etc.
83
Cosine Measure
Think of a point as a vector from the origin (0,0,…,0) to its location.
Two points’ vectors make an angle, whose cosine is the normalized dot-product of the vectors. Example p1 = 00111; p2 = 10011. p1.p2 = 2; |p1| = |p2| = sqrt(3). cos(p1,p2) = 2/3.
84
Example
010
011 110
101
100
001
110
85
Methods of Clustering
Hierarchical: Initially, each point in cluster by itself. Repeatedly combine the two “closest”
clusters into one. Centroid-based:
Estimate number of clusters and their centroids.
Place points into closest cluster.
86
Hierarchical Clustering
Key problem: as you build clusters, how do you represent the location of each cluster, to tell which pair of clusters is closest?
Euclidean case: each cluster has a centroid = average of its points. Measure intercluster distances by
distances of centroids.
87
Example
(5,3)o
(1,2)o
o (2,1) o (4,1)
o (0,0) o (5,0)
x (1.5,1.5)
x (4.5,0.5)
x (1,1)x (4.7,1.3)
88
Comments
In a typical implementation the number of clusters to be reached is determined in advance (other implementations exist).
89
And in the Non-Euclidean Case?
The only “locations” we can talk about are the points themselves.
Approach 1: Pick a point from a cluster to be the clustroid = point with minimum maximum distance to other points. Treat clustroid as if it were centroid,
when computing intercluster distances.
90
Example
1 2
34
5
6
interclusterdistance
clustroid
clustroid
91
Other Approaches
Approach 2: let the intercluster distance be the minimum of the distances between any two pairs of points, one from each cluster.
92
k-Means
Assumes Euclidean space. Starts by picking k, the number of
clusters. Initialize clusters by picking one
point per cluster. For instance, pick one point at random,
then k -1 other points, each as far away as possible from the previous points.
93
Populating Clusters
For each point, place it in the cluster whose centroid it is nearest.
After all points are assigned, fix the centroids of the k clusters.
Reassign all points to their closest centroid. Sometimes moves points between
clusters.
94
Example
1
2
3
4
5
6
7 8x
x
95
Comments
In a typical implementation the centroid of each cluster is dynamically determined when points are added, and the transition of points to other clusters based on the location of the centroids at end is applied only once (other implementations exist).
96
Decision Trees
97
Example Decision Tree
Married?y n
Own dog?y n
Own home?y n
Own home?y n
Own dog?y n
Bad
??
Good
BadBad Good
98
Constructing Decision Trees
Typically, we are given data consisting of a number of records, perhaps representing individuals.
Each record has a value for each of several attributes. Often binary attributes, e.g., “has dog.” Sometimes numeric, e.g. “age”, or
discrete, multiway, like “school attended.”
99
Making a Decision
Records are classified into “good” or “bad.” More generally: some number of
outcomes. The goal is to make a small
number of tests involving attributes to decide as best we can whether a record is good or bad.
100
Using the Decision Tree
Given a record to classify, start at the root, and answer the question at the root for that record. E.g., is the record for a married person?
Move next to the indicated child. Recursively, apply the DT rooted at
that child, until we reach a decision.
101
Training Sets
Decision-tree construction is today considered a type of “machine learning.”
We are given a training set of example records, properly classified, with which to construct our decision tree.
102
Applications
Credit-card companies and banks develop DT’s to decide whether to grant a card or loan.
Medical apps, e.g., given information about patients, decide which will benefit from a new drug.
Many others.
103
Example
Here is the data on which our example DT was based:
Married?Home? Dog?Rating0 1 0 G
0 0 1 G0 1 1 G1 0 0 G1 0 0 B0 0 0 B1 0 1 B1 1 0 B
104
Selecting Attributes
We can pick an attribute to place at the root by considering how nonrandom are the sets of records that go to each side.
Branches correspond to the value of the chosen attribute.
105
Entropy: A Measure of Goodness
Consider the pools of records on the “yes” and “no” sides.
If fraction p on on a side are “good,” the entropy of that branch is -(p log2p + (1-p) log2(1-p)).
= p log2(1/p) + (1-p) log2(1/(1-p)) Pick attribute that minimizes maximum
entropies of the branches. Another (more common) alternative: pick an
attribute that minimizes the weighted entropy over all branches.
106
Shape of Entropy Function
0
1
0 1/2 1
107
Intuition
Entropy 1 = random behavior, no useful information.
Low entropy = significant information. At entropy = 0, we know exactly.
Ideally, we find an attribute such that most of the “good’s” are on one side, and most of the “bad’s” are on the other.
108
Example
Our Married, Home, Dog, Rating data: 010G, 001G, 011G, 100G, 100B, 000B,
101B, 110B. Married: 1/4 of Y is G; 1/4 of N is B.
Entropy = ((1/4) log 4 + (3/4) log (4/3)) = .81 on both sides.
The average is 4/8 x 0.81 + 4/8 x 0.81 = 0.81
109
Example, Continued
010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B.
Dog: 1/3 of Y is B; 2/5 of N is G. Entropy is (1/3) log 3 + (2/3) log (3/2) = .92
on Y side. Entropy is (2/5) log (5/2) + (3/5) log (5/3)
= .98 on N side. The average is 3/8 x .92 + 5/8 x .98, greater
than for Married. Home is similar, so Married “wins.”
110
Example (Cont.)
0 1 0 G0 0 1 G0 1 1 G0 0 0 B
Married?Home? Dog? Rating
111
Example (Cont.)
Entropy for home (in the branch of not married) is 0 for Y and 1 for N, so the average entropy is 2/4 x 0 + 2/4 x 1= 0.5.
Entropy for dog (in the branch of not married) is also 0.5. We should take the minimum of them, so we can take an arbitrary here.
112
Example (Cont.)
1 0 0 G1 0 0 B1 0 1 B1 1 0 B
Married?Home? Dog? Rating
113
Example (Cont.)
The computation is now applied to the branch of married.
Notice that in principle different attribute may be selected there!
We continue until all input examples (training set) are classified.
114
The “Training” Process Married?
100G, 100B, y n 010G, 001G101B, 110B 011G, 000B
Dog?101B y n 100G, 100B
110BBad
Home?010G, y n 001G,011G 000BGood Home?
110B y n 100B, 100G
Bad ??
Dog?001G y n 000B
Good Bad
115
Handling Numeric Data
While complicated tests at a node are permissible, e.g., “age = 30 or age <= 50 and age >= 42,” the simplest thing is to pick one breakpoint, and divide records by value <= breakpoint and value > breakpoint.
Rate an attribute and breakpoint by min-max or average entropy of the sides.
116
Overfitting
A major problem in designing decision trees is that one tends to create too many levels. The number of records reaching a
node is small, so significance is lost.
117
Possible Solutions
1. Limit depth of tree so that each decision is based on a sufficiently large pool of training data.
2. Create several trees independently (needs randomness in choice of attribute). Decision based on vote of D.T.’s.
118
Selecting Products
119
Problem Statement
Select a multi-set (set with number) of products, subject to certain constraints, that maximizes profit
120
Essence of Selling
What products do I stock in my stores? Constraint: capital tied up in keeping products in
stores (inventory) What products do I keep in my end-caps
(checkout counters)? Constraint: shelf-space
What paid-listings do I show first in a search? Constraint: online real-estate
For a given customer, what’s the best product to advertise? Constraint: online real-estate
121
Two Scenarios
Focus on aggregate customer behavior Problem definition
• E.g. what products do I stock in my stores?
No information available about individual customers
Focus on individual customer personalization
122
General Framework
X1
X2
Xn
Xi
P1 (M1)
P3 (M3)
Pj (Mj)
Pm (Mm)
P2 (M2)
.
.
.
.
.
.
.
.
.
E(X1,P1)
E(Xi,Pj)
Xi: Personi, Pi: Producti.
E(Xi, Pj): Expected number of Pj that Xi buys (clicks through, etc…)Mj: Profit-Margin on Pj
123
Aggregate User Case
X1
X2
Xn
XiPj (Mj)
.
.
....
E(X1,P1)
E(Xi,Pj)
X Pj (Mj)
Demand, Dj = i E(Xi,Pj)
Dj
Collapse all the Xi’s to one node
124
Problem Statement
Maximize: j kj*Mj, Turns, kj = 0,1,2,… ( number of Pj selected)
Subject to : j kj*cj <= C, cj – cost associated with Pj &
kj <= Dj not to exceed demand
Profit, $j = kj*Mj
125
Example
Margin Demand Cost Margin/Cost
P1 3 12 25 12%
P2 9 3 40 22.5%
P3 10 1 55 18.2%
Constraint: total cost <= 100 (C)
Greedy (pick maximal margin/cost at each step): {P22}
LP: { P3, P2}
126
Retailers and LP
In general product selection can be set up as a linear/integer program (LP)
Retailers are giant multi-stage LP execution engines!
127
In real life…
Space of products may be too large• Eg. Wal-mart has millions of products to consider
All information may not be available Implementation complexity and
Performance impact• Problems too large to run in real-time
Intractability Buyers do the job of product selection
• More in line with greedy algorithm
128
Product Selection in Retailers
If all retailers solve the same equations, why don’t they all have the same products?
Product Selection defines Retailer (brand) Brand constraint: maximize profits in the future
• E.g. Wal-mart brand constraint: select only products that will be bought by 80% of population
• E.g. Gucci brand constraint: select only high-value (margin) products
129
ExampleMargin Demand Cost Margin/
Cost
P1 3 12 25 12%
P2 9 3 40 22.5%
P3 10 1 55 18.2%
Constraint: total cost <= 100 (C)
Wal-mart brand constraint: maximize turns: {P14}
Gucci brand constraint: no low-margin products: { P3,P2}
130
Classifying Retailers
Margin
Turns
Wal-mart
Costco
JC Penney’s
Gucci
Efficient frontier
Newco
131
Online Search
Overture Amazon Google
132
133
134
135
136
Personalization
Given customer Xi, what products do I recommend to her? Xi is a loyal customer – purchase history available
• Collaborative-Filtering based Recommender Systems Xi is a new customer – has done certain operations
on the site like search, view products, etc…• Assortment of techniques
Xi is a new customer – know nothing about her• Mass merchandizing as in offline retailers, bestsellers,…
In practice, combination of all of the above
137
Personalization
Offline retail: merchandizers pick products to advertise One size fits all – no personalization
Millions of customers, cannot have human merchandizing to each customer
Algorithms that look at only customer’s data do not work well
Heuristic: customers help each other Algorithms enable this to happen!
138
Recommender Systems
Xi
P1
P3
Pj
Pm
P2
.
.
.
E(Xi,P1)
E(Xi,Pj)
Purchase History of Xi availableWhat new products to advertise to Xi?
Given set of products that Xi has bought B = { Pi1, Pi2,… Pin}
Find Pj, such that E(Xi,Pj) is maximum
139
Recommender Systems
Intuition: Ask your friends, what products they
like
Friends = people who have similar behavior to you
140
141
Collaborative Filtering
Representation of Customer and Product data
Neighborhood formation (find my friends)
Recommendation Generation from neighborhood
142
Representation
M*N customer product matrix, R rij = 1 if Xi has bought Pj , 0 otherwise
Issues: Sparsity
• Mostly 0’s. E.g. Amazon.com 2 million books, less than 0.1% is 1
Scalability• Very large data sets
Authority• Take into account similarity between products
– E.g. paperback “Cold Mountain” is same as hardcover “Cold Mountain”
143
Finding Neighbors
Similar to clustering cluster around a given customer
First compute similarity between customers: Xa, Xb
Xa^ -- corresponding product vector
Cosine measure• Cosine of angle between vectors gives similarity• Sim(Xa, Xb) = Xa
^ . Xb^/| Xa
^ | | Xb^ |
• See class on Clustering for examples, more info
144
Neighborhood
Now compute neighborhood of Xa
Center-based• Select k closest neighbors to Xa
Centroid-based• Assume j closest neighbors selected• Select j+1st neighbor by picking customer
closest to centroid of first j neighbors• Repeat 1..k
145
Generating Recommendations
From the neighborhood among products Xa has not bought yet, pick: most frequently occuring Weighted Average based on similarity Based on Association Rules
See Sarwar et al (sections 1-3) (http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/ec00.pdf)
146
Example
Shrek Star Wars
MIB Harry Potter
X-files
John 1 1 1
Jane 1 1 1 1
Pete 1 1
Jeff 1 1
Ellen 1 ? 1 ? ?
What new movie should we recommend to Ellen?
147
Similarity FunctionShrek Star
WarsMIB Harry
PotterX-files Similarity
to Ellen
John 1 1 1 1/sqrt(6) = 0.41
Jane 1 1 1 1 1/sqrt(2) = 0.71
Pete 1 1 1/2
Jeff 1 1 1/2
Ellen 1 ? 1 ? ?Use Cosine measure for similarity
148
NeighborsShrek Star
WarsMIB Harry
PotterX-files Similarit
y to Ellen
John 1 1 1 0.41
Jane 1 1 1 1 0.71
Pete
1 1 0.5
Jeff 1 1 0.5
Ellen 1 ? 1 ? ?Use Center-based approach and pick 3 closest neighbors
149
RecommendationShrek Star
WarsMIB Harry
PotterX-files
Jane 1 1 1 1
Pete 1 1
Jeff 1 1
Ellen 1 2 1 1 1
Recommend Star Wars
150
Implementation Issues
Serious application Large data sizes: millions of users * millions of
products CPU cycles
Scalability key Partition the data set and the processing
Real-time vs Batch Real-time can lead to poor response times Real-time preferable – recommend immediately
after a customer purchase! Incremental solution key for real time
151
Summary
Product Selection is the essence of retailing Personalization is unique to online retailing
Every customer can have their own store Most successful personalization techniques,
get customers to help one another Algorithms, like CF, enable this interaction
In real life, algorithms are complex monsters due to scaling issues, repeated tweaking, etc…
152
Public-Key Cryptosystems
153
Public-Key Cryptosystems
M – message (treated as a number) E – Encryption procedure D – Decryption procedure Required properties: 1. D(E(M))=M 2. E and D are easy to compute 3. Revealing E does not reveal easy way
to compute D 4. E(D(M))=M
154
Public-Key Cryptosystems
Two users A(lice) and B(ob) A and B publicly announce EA,EB respectively.
B sends a private message M to A, EA(M) A decipher the message by computing
DA(EA(M))
Signature by B on message M to be sent to A: B computes S=DB(M) (can add its name for example to
M) B sends EA(S) to A
155
Public-Key Cryptosystems
Given the signed message S, A can find the original message M by computing EB(S)
B can not deny sending the message M to A, because no one else could generate S.
A can not change M to M’ and claim it has been sent, since it will have for that to generate a corresponding signature S’=DB(M’)
156
RSAThe public key is a pair (e,n) of positive integers.A message M is treated as an integer between 0 and n-1.C=E(M)=Me (mod n) D(C)=Cd (mod n)We need to get an appropriate decryption key. 1. Choose n=pq, where p and q are very large random primes. 2. Pick an integer d that is relatively prime to (p-1)(q-1), I.e. satisfy gcd(d,(p-1)(q-1))=1_ 3. Pick e, such that ed=1(mod (p-1)(q-1))
157
Digital Cash
158
Digital Cash
Players: Bank (B), Vendor (V), User (A, Alice) Four protocols: withdrawl (user from bank) spend (user at vendor) deposit (vendor at bank) transfer (user to user; will skip)
Goal of basic schemes: avoid obvious attacks, such as “double spending”.
159
A basic scheme
Withdrawl: 1. AB give $1 2. BA Coin: SigB[$1, “Alice”,seq#]
(seq# is unique for every coin) 3. B deducts $1 from Alice’s
account
160
A basic scheme
Spend: 1. AV buy something for $1 2. VA choose random r{0,1}128
3. AV SigA[r,coin]=Vcoin
4. VA verify Alice signature and release good
161
A basic scheme
Deposit: VB deposit $1 , Vcoin =[r,coin] The bank stores all seq# that have been previously spent,
and check whether the sequence number has been already spent.
If not – everything fine. If it has been spent: who to blame? If V’coin =[r’,coin] is already spent (in B’s database) then if
r=r’ then V is to blame with overwhelming probability, and otherwise Alice is guilty with overwhelming probability.
3
162
Escrow services
When one user has a good and when one has a different good (or money) they may wish to make an exchange.
Escrow service will take the goods and exchange them.
One can show that without escrow services simple fair exchanges are impossible.
The more general problem: contract signing by two parties.
163
Off-line Escrow services
We do not wish the escrow service to deal with each instance of contract signing.
Off-line escrow service: will be used only if there is a problem.
We now describe such a service.
164
Fair exchange
E-escrow service with public key pe and private key se
A and B need to sign a contract M. The basic idea -- Verifiable Escrow:
User A signs on M – SA(M), create
CA =Epe [SA(M)+condition], and PROVE (without revealing information) that
CA has been built correctly.
165
Fair exchange
1. AB verifiable escrow CA where the condition is e.g. that B reveals SB=SigB(M)
2. BA B verifies validity of CA and
send SB=SigB(M)
3. AB A verifies SB (to be valid signature), and if fine sends SA(M)
4. BA B verifies SA(M)
What happens if A aborts the protocol before sending the signature to B?
166
Fair exchange
5. If in step 4, B claims it has been cheated, then it sends CA,SB to E, who verifies that SB=SigB(M) , recover SA(M) from CA and sends to B.
E also sends SB to A.
167
Micropayments
168
Micropayments schemes Payment schemes that emphasize the ability to
make payments of small amounts are called micropayment schemes.
Applications of micropayments include paying for each web page visited, and for each minute of music or video as it streamed to the user.
The problem: the cost of transactions is much higher than the worth of each transaction.
Micropayment schemes try to aggregate many small payments info fewer, larger payments, whose processing costs are relatively small.
169
Observations Hash functions are about 100 times
faster than RSA signature verifications, and about 10000 faster than RSA signature generation.
On a typical workstation, one can sign two messages per second, verify 200 signatures per second, and compute 20000 hash functions per second.
170
Notation B – broker/bank U – user V- Vendor PK – public key SK – secret Key h – cryptographically strong hash function
(such as MD5). – a very large search is required to produce a single input producing a given output, or to find two inputs producing the same output.
171
PayWord U computes an h-chain, x0,x1,….,xn,
where xi = h(xi+1) U commits to the entire chain by sending his
signature on x0 to V. Each successive payment is made by releasing
the next consecutive value in the chain, which can be verified by checking that it hashes to the previous element.
172
PayWord If after i micropaymens, V wishes to make
a deposit, then it can deposit i cents by giving B xi and the user signature on xo
B can verify the signature and iterate h i times to verify the operation.
173
User-Bank relationship U request an account, and gives B over
secure channel her credit card number, PKU, delivery address AU, etc.
U’s certificate will have an expiration date E, and may include further information IU
The user’s certificate has the form CU={B,U,AU,PKU,E,IU}SKB
174
User-Vendor Relationship U and V relationships occur when e.g. user visits a
web-site, use/purchase 10 pages, and then move elsewhere.
Commitments: When U contacts a new vendor V, U computes a fresh payword w1,….,wn with root w0, where n is chosen to be “convenient”.
U then compute her commitment to the chain
M={V,CU,w0,D,IM}SKU
where D is the current date and IM is additional
information (such as the value of n).
175
User-Vendor RelationshipThe commitment authorizes B to pay V for any
paywords w1,….,wn that V redeem with B before date D (+ perhaps additional day, assuming a micropayment by the end of the day).
Payments: assume some agreement on each payment (e.g. 1 cent), a payment P from U to V consists of a payword and its index.
Notice that the payment need not be signed, and it is short.
The user spend her paywords in order starting from w1
176
User-Vendor Relationship Payment policy: for each commitment a
vendor V is paid l cents where (wl,l) is the corresponding payment received with the largest index.
V needs to store only the payment with the highest index. Once a user spends wi, she can not spend wj for j < i.
177
Vendor-Bank relationships V needs to obtain PKB
V needs to establish a way for B to pay V.
By the end of period (e.g a day) V sends B a redemption message for each of B’s users.
B needs to verify user signatures, and verify each (wl,l) payments (by l applications of h).