1 Electronic Commerce 2 High-Level Overview uThe course presents basic algorithmic techniques,...

1

Electronic Commerce

2

High-Level Overview

The course presents basic algorithmic techniques, considered to be fundamental to state of the art e-commerce, as captured by executives in the CEO/CTO/VP Business Development level in B2B and B2C companies:

An introduction to the science behind

Google, Amazon, and ebay.

3

High-Level Overview

Background required: Algorithms, and basic principles of

computer science Basic mathematical background in

algebra and probability. Exposure to the Internet.

4

High-Level Overview Discovering buyers and sellers

Buyers finding sellers• Search engines

Sellers finding buyers• Data mining• Recommender systems

Making a deal Auctions

Executing the deal Payments, security

5

Searching for sellers

6

Finding Sellers

A major use of search engines is finding pages that offer an item for sale.

How do search engines find the right pages?

We’ll study: Google’s PageRank technique and other

“tricks” “Hubs and authorities.”

7

Page Rank

Intuition: solve the recursive equation: “a page is important if important pages link to it.”

In technical terms: compute the principal eigenvector of the stochastic matrix of the Web. A few fixups needed.

8

Stochastic Matrix of the Web

Enumerate pages. Page i corresponds to row and column

i. M[i,j] = 1/n if page j links to n pages,

including page i; 0 if j does not link to i. Seems backwards, but allows

multiplication by M on the left to represent “follow a link.”

9

Example

i

j

Suppose page j links to 3 pages, including i

1/3

10

Random Walks on the Web

Suppose v is a vector whose i-th component is the probability that we are at page i at a certain time.

If we follow a link from i at random, the probability distribution of the page we are then at is given by the vector Mv.

11

The multiplicationp11 p12 p13 p1

p21 p22 p23 X p2

p31 p32 p33 p3

If the probability that we are in page i is pi, then in the next iteration p1 will be the probability we are in page 1 and will stay there + the probability we are in page 2 times the probability of moving from 2 to 1 + the probability that we are in page 3 times the probability of moving from 3 to 1:

p11 x p1 + p12 x p2+ p13 x p3

12

Random Walks 2

Starting from any vector v, the limit M(M(…M(Mv)…)) is the distribution of page visits during a random walk.

Intuition: pages are important in proportion to how often a random walker would visit them.

The math: limiting distribution = principal eigenvector of M = PageRank.

13

Example: The Web in 1839

Yahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 1m 0 1/2 0

y a m

14

Simulating a Random Walk

Start with the vector v = [1,1,…,1] representing the idea that each Web page is given one unit of “importance.”

Repeatedly apply the matrix M to v, allowing the importance to flow like a random walk.

Limit exists, but about 50 iterations is sufficient to estimate final distribution.

15

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 + m m = a/2

ya =m

111

13/21/2

5/4 13/4

9/811/81/2

6/56/53/5

. . .

16

Solving The Equations

These 3 equations in 3 unknowns do not have a unique solution.

Add in the fact that y+a+m=3 to solve.

In Web-sized examples, we cannot solve by Gaussian elimination (we need to use other solution (relaxation = iterative solution).

17

Real-World Problems

Some pages are “dead ends” (have no links out). Such a page causes importance to

leak out. Other (groups of) pages are spider

traps (all out-links are within the group). Eventually spider traps absorb all

importance.

18

Microsoft Becomes Dead EndYahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 0m 0 1/2 0

y a m

19

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2

ya =m

111

11/21/2

3/41/21/4

5/83/81/4

000

. . .

20

M’soft Becomes Spider Trap

Yahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 0m 0 1/2 1

y a m

21

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2 + m

ya =m

111

11/23/2

3/41/27/4

5/83/82

003

. . .

22

Google Solution to Traps, Etc.

“Tax” each page a fixed percentage at each iteration. This percentage is also called “damping factor”.

Add the same constant to all pages. Models a random walk in which surfer

has a fixed probability of abandoning search and going to a random page next.

23

Ex: Previous with 20% Tax

Equations v = 0.8(Mv) + 0.2: y = 0.8(y/2 + a/2) + 0.2 a = 0.8(y/2) + 0.2 m = 0.8(a/2 + m) + 0.2

ya =m

111

1.000.601.40

0.840.601.56

0.7760.5361.688

7/11 5/1121/11

. . .

24

Solving the Equations

We can expect to solve small examples by Gaussian elimination.

Web-sized examples still need to be solved by more complex (relaxation) methods.

25

Search-Engine Architecture

All search engines, including Google, select pages that have the words of your query.

Give more weight to the word appearing in the title, header, etc.

Inverted indexes speed the discovery of pages with given words.

26

Google Anti-Spam Devices

Early search engines relied on the words on a page to tell what it is about. Led to “tricks” in which pages attracted

attention by placing false words in the background color on their page.

Google trusts the words in anchor text Relies on others telling the truth about

your page, rather than relying on you.

27

Use of Page Rank

Pages are ordered by many criteria, including the PageRank and the appearance of query words. “Important” pages more likely to be

what you want. PageRank is also an antispam device.

Creating bogus links to yourself doesn’t help if you are not an important page.

28

Discussion

Dealing with incentives Several types of links Page ranking as voting

29

Hubs and Authorities

Distinguishing Two Roles for Pages

30

Hubs and Authorities

Mutually recursive definition: A hub links to many authorities; An authority is linked to by many hubs.

Authorities turn out to be places where information can be found. Example: information about how to use a

programming language Hubs tell who the authorities are.

Example: a catalogue of sources about programming languages

31

Transition Matrix A

H&A uses a matrix A[i,j] = 1 if page i links to page j, 0 if not.

A’, the transpose of A, is similar to the PageRank matrix M, but A’ has 1’s where M has fractions.

32

Example

Yahoo

M’softAmazon

y 1 1 1a 1 0 1m 0 1 0

y a m

A =

33

Using Matrix A for H&A

Let h and a be vectors measuring the “hubbiness” and authority of each page.

Equations: h = Aa; a = A’ h. Hubbiness = scaled sum of

authorities of linked pages. Authority = scaled sum of hubbiness

of linked predecessors.

34

Consequences of Basic Equations

From h = Aa; a = A’ h we can derive: h = AA’ h a = A’Aa

Compute h and a by iteration, assuming initially each page has one unit of hubbiness and one unit of authority.

There are different normalization techniques (after each iteration in an iterative procedure; other implementation is “normalization at end”).

35

The multiplication

1 1 1 a1 h1

1 0 1 x a2 = h2

0 1 0 a3 h3

In order to know the hubbiness of page 2, h2, we need to add up the level of authority of the pages it points to (1 and 3).

36

The multiplication

1 1 0 h1 a1

1 0 1 x h2 = a2

1 1 0 h3 a3

In order to know the level authority of page 3, a3, we need to add up the amount of hubbiness of the pages that point to it (1 and 2).

37

Example

1 1 1A = 1 0 1 0 1 0

1 1 0A’ = 1 0 1 1 1 0

3 2 1AA’= 2 2 0 1 0 1

2 1 2A’A= 1 2 1 2 1 2

a(yahoo)a(amazon)a(m’soft)

===

111

545

241824

114 84114

. . .

. . .

. . .

1+sqrt(3)21+sqrt(3)

h(yahoo) = 1h(amazon) = 1h(m’soft) = 1

642

132 96 36

. . .

. . .

. . .

1.0000.7350.268

2820 8

38

Solving the Equations

Solution of even small examples is tricky.

As for PageRank, we need to solve big examples by relaxation.

39

Approaching potential buyers and algorithmic

data mining

40

Data Mining: Associations

Frequent itemsets, market baskets A-priori algorithm Hash-based improvements One- or two-pass approximations High-correlation mining

41

Purpose

If people tend to buy A and B together, then a buyer of A is a good target for an advertisement for B.

The same technology has other uses, such as detecting plagiarism and organizing the Web.

42

The Market-Basket Model

A large set of items, e.g., things sold in a supermarket.

A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day.

43

Support

Simplest question: find sets of items that appear “frequently” in the baskets.

Support for itemset I = the number of baskets containing all items in I.

Given a support threshold s, sets of items that appear in >= s baskets are called frequent itemsets.

44

Example Items={milk, coke, pepsi, beer,

juice}. Support = 3 baskets.

B1 = {m, c, b} B2 = {m, p, j}B3 = {m, b} B4 = {c, j}B5 = {m, p, b} B6 = {m, c, b, j}B7 = {c, b, j} B8 = {b, c}

Frequent itemsets: {m}, {c}, {b}, {j}, {m, b}, {c, b}, {j, c}.

45

Applications 1

Real market baskets: chain stores keep terabytes of information about what customers buy together. Tells how typical customers navigate

stores, lets them position tempting items. Suggests tie-in “tricks,” e.g., run sale on

hamburger and raise the price of ketchup. High support needed, or no $$’s .

46

Applications 2

“Baskets” = documents; “items” = words in those documents. Let us find words that appear together

unusually frequently, i.e., linked concepts. “Baskets” = sentences, “items” =

documents containing those sentences. Items that appear together too often

could represent plagiarism.

47

Applications 3

“Baskets” = Web pages; “items” = linked pages. Pairs of pages with many common

references may be about the same topic.

“Baskets” = Web pages p ; “items” = pages that link to p . Pages with many of the same links may

be mirrors or about the same topic.

48

Scale of Problem

WalMart sells 100,000 items and can store hundreds of millions of baskets.

The Web has 100,000,000 words and several billion pages.

49

Association Rules

If-then rules about the contents of baskets.

{i1, i2,…, ik} -> j Means: “if a basket contains all of i1,

…,ik, then it is likely to contain j. Confidence of this association rule

is the probability of j given i1,…,ik.

50

Example

B1 = {m, c, b} B2 = {m, p, j}B3 = {m, b} B4 = {c, j}B5 = {m, p, b} B6 = {m, c, b, j}B7 = {c, b, j} B8 = {b, c}

An association rule: {m, b} -> c. Confidence = 2/4 = 50%.

+__ +

51

Finding Association Rules

A typical question is “find all association rules with support >= s and confidence >= c.”

The hard part is finding the high-support itemsets. Once you have those, checking the

confidence of association rules involving those sets is relatively easy.

52

Computation Model

Typically, data is kept in a “flat file” rather than a database system. Stored on disk. Stored basket-by-basket.

• Expand baskets into pairs, triples, etc. as you read baskets.

True cost = # of Disk I/O’s. Count # of passes through the data.

53

Main-Memory Bottleneck

In many algorithms to find frequent itemsets we need to worry about how main-memory is used. As we read baskets, we need to count

something, e.g., occurrences of pairs. The number of different things we can

count is limited by main memory. Swapping counts in/out is a disaster.

54

Finding Frequent Pairs

The hardest problem often turns out to be finding the frequent pairs.

We’ll concentrate on how to do that, then discuss extensions to finding frequent triples, etc.

55

Naïve Algorithm

A simple way to find frequent pairs is: Read file once, counting in main

memory the occurrences of each pair.• Expand each basket of n items into its

n(n-1)/2 pairs.

Fails if #items-squared exceeds main memory.

56

A-Priori Algorithm 1

A two-pass approach called a-priori limits the need for main memory.

Key idea: monotonicity : if a set of items appears at least s times, so does every subset. Converse for pairs: if item i does not

appear in s baskets, then no pair including i can appear in s baskets.

57

A-Priori Algorithm 2 Pass 1: Read baskets and count in main

memory the occurrences of each item. Requires only memory proportional to #items.

Pass 2: Read baskets again and count in main memory only those pairs both of which were found in Pass 1 to have occurred at least s times. Requires memory proportional to square of

frequent items only.

58

Picture of A-Priori

Item counts

Pass 1 Pass 2

Frequent items

Counts ofcandidate pairs

59

PCY Algorithm 1

Hash-based improvement to A-Priori. During Pass 1 of A-priori, most memory

is idle. Use that memory to keep counts of

buckets into which pairs of items are hashed. Just the count, not the pairs themselves.

Gives extra condition that candidate pairs must satisfy on Pass 2.

60

Picture of PCY

Hashtable

Item counts

Bitmap

Pass 1 Pass 2

Frequent items

Counts ofcandidate pairs

61

PCY Algorithm 2 PCY Pass 1:

Count items. Hash each pair to a bucket and increment

its count by 1. PCY Pass 2:

Summarize buckets by a bitmap : 1 = frequent (count >= s ); 0 = not.

Count only those pairs that (a) are both frequent and (b) hash to a frequent bucket.

62

Multistage Algorithm

Key idea: After Pass 1 of PCY, rehash only those pairs that qualify for Pass 2 of PCY.

On middle pass, fewer pairs contribute to buckets, so fewer false drops --- buckets that have count s , yet no pair that hashes to that bucket has count s .

63

Multistage Picture

Firsthash table

Secondhash table

Item counts

Bitmap 1 Bitmap 1

Bitmap 2

Freq. items Freq. items

Counts ofCandidate pairs

64

Finding Larger Itemsets We may proceed beyond frequent pairs to

find frequent triples, quadruples, . . . Key a-priori idea: a set of items S can only be

frequent if S - {a } is frequent for all a in S . The k th pass through the file is counts the

candidate sets of size k : those whose every immediate subset (subset of size k - 1) is frequent.

Cost is proportional to the maximum size of a frequent itemset.

65

Low-Support, High-Correlation

Finding rare, but very similar items

66

Assumptions

1. Number of items allows a small amount of main-memory/item.

2. Too many items to store anything in main-memory for each pair of items.

3. Too many baskets to store anything in main memory for each basket.

4. Data is very sparse: it is rare for an item to be in a basket.

67

Applications

While marketing may require high-support, or there’s no money to be made, mining customer behavior is often based on correlation, rather than support. Example: Few customers buy Handel’s

Watermusick, but of those who do, 20% buy Bach’s Brandenburg Concertos.

68

Matrix Representation

Columns = items. Baskets = rows. Entry (r , c ) = 1 if item c is in

basket r ; = 0 if not. Assume matrix is almost all 0’s.

69

In Matrix Formm c p b j

{m,c,b} 1 1 0 1 0{m,p,b} 1 0 1 1 0{m,b} 1 0 0 1 0{c,j} 0 1 0 0 1{m,p,j} 1 0 1 0 1{m,c,b,j} 0 1 1 1 1{c,b,j} 0 1 0 1 1{c,b} 0 1 0 1 0

70

Similarity of Columns

Think of a column as the set of rows in which it has 1.

The similarity of columns C1 and C2, sim (C1,C2), is the ratio of the sizes of the intersection and union of C1 and C2. (Jaccard measure)

Goal of finding correlated columns becomes finding similar columns.

71

Finding similar columns

Non-trivial algorithms (e.g. minhash) are used due to the fact that we have storage problems as mentioned before.

72

Summary

Finding frequent pairs: A-priori --> PCY (hashing) -->

multistage. Finding all frequent itemsets Finding similar pairs:

Minhash

73

Clustering

74

The Problem of Clustering

Given a set of points, with a notion of distance between points, group the points into some number of clusters, so that members of a cluster are in some sense as nearby as possible.

75

Example

x xx x x xx x x x

x x xx x

xxx x

x x x x x

xx x x

x

x xx x x x x x x

x

x

x

76

Applications

E-Business-related applications of clustering tend to involve very high-dimensional spaces. The problem looks deceptively easy in

a 2-dimensional, Euclidean space.

77

Example: Clustering CD’s

Intuitively, music divides into categories, and customers prefer one or a few categories. But who’s to say what the categories really

are?

Represent a CD by the customers who bought it.

Similar CD’s have similar sets of customers, and vice-versa.

78

The Space of CD’s

Think of a space with one dimension for each customer. Values 0 or 1 only in each dimension.

A CD’s point in this space is (x1,x2,…,xk), where xi = 1 iff the i th customer bought the CD. Compare with the “correlated items”

matrix: rows = customers; cols. = CD’s.

79

Distance Measures

Two kinds of spaces: Euclidean: points have a location in space, and

dist(x,y) = sqrt(sum of square of difference in each dimension).

• Some alternatives, e.g. Manhattan distance = sum of magnitudes of differences.

Non-Euclidean: there is a distance measure giving dist(x,y), but no “point location.”

• Obeys triangle inequality: d(x,y) < d(x,z)+d(z,y).• Also, d(x,x) = 0; d(x,y) > 0; d(x,y) = d(y,x).

80

Examples of Euclidean Distances

x = (5,5)

y = (9,8)L2-norm:dist(x,y) =sqrt(42+32)= 5

L1-norm:dist(x,y) =4+3 = 7

4

35

81

Non-Euclidean Distances

Jaccard measure for binary vectors = ratio of intersection (of components with 1) to union.

Cosine measure = angle between vectors from the origin to the points in question.

82

Jaccard Measure

Example: p1 = 00111; p2 = 10011. Size of intersection = 2; union = 4,

J.M. = 1/2. Need to make a distance function

satisfying triangle inequality and other laws.

dist(p1,p2) = 1 - J.M. works. dist(x,x) = 0, etc.

83

Cosine Measure

Think of a point as a vector from the origin (0,0,…,0) to its location.

Two points’ vectors make an angle, whose cosine is the normalized dot-product of the vectors. Example p1 = 00111; p2 = 10011. p1.p2 = 2; |p1| = |p2| = sqrt(3). cos(p1,p2) = 2/3.

84

Example

010

011 110

101

100

001

110

85

Methods of Clustering

Hierarchical: Initially, each point in cluster by itself. Repeatedly combine the two “closest”

clusters into one. Centroid-based:

Estimate number of clusters and their centroids.

Place points into closest cluster.

86

Hierarchical Clustering

Key problem: as you build clusters, how do you represent the location of each cluster, to tell which pair of clusters is closest?

Euclidean case: each cluster has a centroid = average of its points. Measure intercluster distances by

distances of centroids.

87

Example

(5,3)o

(1,2)o

o (2,1) o (4,1)

o (0,0) o (5,0)

x (1.5,1.5)

x (4.5,0.5)

x (1,1)x (4.7,1.3)

88

Comments

In a typical implementation the number of clusters to be reached is determined in advance (other implementations exist).

89

And in the Non-Euclidean Case?

The only “locations” we can talk about are the points themselves.

Approach 1: Pick a point from a cluster to be the clustroid = point with minimum maximum distance to other points. Treat clustroid as if it were centroid,

when computing intercluster distances.

90

Example

1 2

34

5

6

interclusterdistance

clustroid

clustroid

91

Other Approaches

Approach 2: let the intercluster distance be the minimum of the distances between any two pairs of points, one from each cluster.

92

k-Means

Assumes Euclidean space. Starts by picking k, the number of

clusters. Initialize clusters by picking one

point per cluster. For instance, pick one point at random,

then k -1 other points, each as far away as possible from the previous points.

93

Populating Clusters

For each point, place it in the cluster whose centroid it is nearest.

After all points are assigned, fix the centroids of the k clusters.

Reassign all points to their closest centroid. Sometimes moves points between

clusters.

94

Example

1

2

3

4

5

6

7 8x

x

95

Comments

In a typical implementation the centroid of each cluster is dynamically determined when points are added, and the transition of points to other clusters based on the location of the centroids at end is applied only once (other implementations exist).

96

Decision Trees

97

Example Decision Tree

Married?y n

Own dog?y n

Own home?y n

Own home?y n

Own dog?y n

Bad

??

Good

BadBad Good

98

Constructing Decision Trees

Typically, we are given data consisting of a number of records, perhaps representing individuals.

Each record has a value for each of several attributes. Often binary attributes, e.g., “has dog.” Sometimes numeric, e.g. “age”, or

discrete, multiway, like “school attended.”

99

Making a Decision

Records are classified into “good” or “bad.” More generally: some number of

outcomes. The goal is to make a small

number of tests involving attributes to decide as best we can whether a record is good or bad.

100

Using the Decision Tree

Given a record to classify, start at the root, and answer the question at the root for that record. E.g., is the record for a married person?

Move next to the indicated child. Recursively, apply the DT rooted at

that child, until we reach a decision.

101

Training Sets

Decision-tree construction is today considered a type of “machine learning.”

We are given a training set of example records, properly classified, with which to construct our decision tree.

102

Applications

Credit-card companies and banks develop DT’s to decide whether to grant a card or loan.

Medical apps, e.g., given information about patients, decide which will benefit from a new drug.

Many others.

103

Example

Here is the data on which our example DT was based:

Married?Home? Dog?Rating0 1 0 G

0 0 1 G0 1 1 G1 0 0 G1 0 0 B0 0 0 B1 0 1 B1 1 0 B

104

Selecting Attributes

We can pick an attribute to place at the root by considering how nonrandom are the sets of records that go to each side.

Branches correspond to the value of the chosen attribute.

105

Entropy: A Measure of Goodness

Consider the pools of records on the “yes” and “no” sides.

If fraction p on on a side are “good,” the entropy of that branch is -(p log2p + (1-p) log2(1-p)).

= p log2(1/p) + (1-p) log2(1/(1-p)) Pick attribute that minimizes maximum

entropies of the branches. Another (more common) alternative: pick an

attribute that minimizes the weighted entropy over all branches.

106

Shape of Entropy Function

0

1

0 1/2 1

107

Intuition

Entropy 1 = random behavior, no useful information.

Low entropy = significant information. At entropy = 0, we know exactly.

Ideally, we find an attribute such that most of the “good’s” are on one side, and most of the “bad’s” are on the other.

108

Example

Our Married, Home, Dog, Rating data: 010G, 001G, 011G, 100G, 100B, 000B,

101B, 110B. Married: 1/4 of Y is G; 1/4 of N is B.

Entropy = ((1/4) log 4 + (3/4) log (4/3)) = .81 on both sides.

The average is 4/8 x 0.81 + 4/8 x 0.81 = 0.81

109

Example, Continued

010G, 001G, 011G, 100G, 100B, 000B, 101B, 110B.

Dog: 1/3 of Y is B; 2/5 of N is G. Entropy is (1/3) log 3 + (2/3) log (3/2) = .92

on Y side. Entropy is (2/5) log (5/2) + (3/5) log (5/3)

= .98 on N side. The average is 3/8 x .92 + 5/8 x .98, greater

than for Married. Home is similar, so Married “wins.”

110

Example (Cont.)

0 1 0 G0 0 1 G0 1 1 G0 0 0 B

Married?Home? Dog? Rating

111

Example (Cont.)

Entropy for home (in the branch of not married) is 0 for Y and 1 for N, so the average entropy is 2/4 x 0 + 2/4 x 1= 0.5.

Entropy for dog (in the branch of not married) is also 0.5. We should take the minimum of them, so we can take an arbitrary here.

112

Example (Cont.)

1 0 0 G1 0 0 B1 0 1 B1 1 0 B

Married?Home? Dog? Rating

113

Example (Cont.)

The computation is now applied to the branch of married.

Notice that in principle different attribute may be selected there!

We continue until all input examples (training set) are classified.

114

The “Training” Process Married?

100G, 100B, y n 010G, 001G101B, 110B 011G, 000B

Dog?101B y n 100G, 100B

110BBad

Home?010G, y n 001G,011G 000BGood Home?

110B y n 100B, 100G

Bad ??

Dog?001G y n 000B

Good Bad

115

Handling Numeric Data

While complicated tests at a node are permissible, e.g., “age = 30 or age <= 50 and age >= 42,” the simplest thing is to pick one breakpoint, and divide records by value <= breakpoint and value > breakpoint.

Rate an attribute and breakpoint by min-max or average entropy of the sides.

116

Overfitting

A major problem in designing decision trees is that one tends to create too many levels. The number of records reaching a

node is small, so significance is lost.

117

Possible Solutions

1. Limit depth of tree so that each decision is based on a sufficiently large pool of training data.

2. Create several trees independently (needs randomness in choice of attribute). Decision based on vote of D.T.’s.

118

Selecting Products

119

Problem Statement

Select a multi-set (set with number) of products, subject to certain constraints, that maximizes profit

120

Essence of Selling

What products do I stock in my stores? Constraint: capital tied up in keeping products in

stores (inventory) What products do I keep in my end-caps

(checkout counters)? Constraint: shelf-space

What paid-listings do I show first in a search? Constraint: online real-estate

For a given customer, what’s the best product to advertise? Constraint: online real-estate

121

Two Scenarios

Focus on aggregate customer behavior Problem definition

• E.g. what products do I stock in my stores?

No information available about individual customers

Focus on individual customer personalization

122

General Framework

X1

X2

Xn

Xi

P1 (M1)

P3 (M3)

Pj (Mj)

Pm (Mm)

P2 (M2)

.

.

.

.

.

.

.

.

.

E(X1,P1)

E(Xi,Pj)

Xi: Personi, Pi: Producti.

E(Xi, Pj): Expected number of Pj that Xi buys (clicks through, etc…)Mj: Profit-Margin on Pj

123

Aggregate User Case

X1

X2

Xn

XiPj (Mj)

.

.

....

E(X1,P1)

E(Xi,Pj)

X Pj (Mj)

Demand, Dj = i E(Xi,Pj)

Dj

Collapse all the Xi’s to one node

124

Problem Statement

Maximize: j kj*Mj, Turns, kj = 0,1,2,… ( number of Pj selected)

Subject to : j kj*cj <= C, cj – cost associated with Pj &

kj <= Dj not to exceed demand

Profit, $j = kj*Mj

125

Example

Margin Demand Cost Margin/Cost

P1 3 12 25 12%

P2 9 3 40 22.5%

P3 10 1 55 18.2%

Constraint: total cost <= 100 (C)

Greedy (pick maximal margin/cost at each step): {P22}

LP: { P3, P2}

126

Retailers and LP

In general product selection can be set up as a linear/integer program (LP)

Retailers are giant multi-stage LP execution engines!

127

In real life…

Space of products may be too large• Eg. Wal-mart has millions of products to consider

All information may not be available Implementation complexity and

Performance impact• Problems too large to run in real-time

Intractability Buyers do the job of product selection

• More in line with greedy algorithm

128

Product Selection in Retailers

If all retailers solve the same equations, why don’t they all have the same products?

Product Selection defines Retailer (brand) Brand constraint: maximize profits in the future

• E.g. Wal-mart brand constraint: select only products that will be bought by 80% of population

• E.g. Gucci brand constraint: select only high-value (margin) products

129

ExampleMargin Demand Cost Margin/

Cost

P1 3 12 25 12%

P2 9 3 40 22.5%

P3 10 1 55 18.2%

Constraint: total cost <= 100 (C)

Wal-mart brand constraint: maximize turns: {P14}

Gucci brand constraint: no low-margin products: { P3,P2}

130

Classifying Retailers

Margin

Turns

Wal-mart

Costco

JC Penney’s

Gucci

Efficient frontier

Newco

131

Online Search

Overture Amazon Google

136

Personalization

Given customer Xi, what products do I recommend to her? Xi is a loyal customer – purchase history available

• Collaborative-Filtering based Recommender Systems Xi is a new customer – has done certain operations

on the site like search, view products, etc…• Assortment of techniques

Xi is a new customer – know nothing about her• Mass merchandizing as in offline retailers, bestsellers,…

In practice, combination of all of the above

137

Personalization

Offline retail: merchandizers pick products to advertise One size fits all – no personalization

Millions of customers, cannot have human merchandizing to each customer

Algorithms that look at only customer’s data do not work well

Heuristic: customers help each other Algorithms enable this to happen!

138

Recommender Systems

Xi

P1

P3

Pj

Pm

P2

.

.

.

E(Xi,P1)

E(Xi,Pj)

Purchase History of Xi availableWhat new products to advertise to Xi?

Given set of products that Xi has bought B = { Pi1, Pi2,… Pin}

Find Pj, such that E(Xi,Pj) is maximum

139

Recommender Systems

Intuition: Ask your friends, what products they

like

Friends = people who have similar behavior to you

141

Collaborative Filtering

Representation of Customer and Product data

Neighborhood formation (find my friends)

Recommendation Generation from neighborhood

142

Representation

M*N customer product matrix, R rij = 1 if Xi has bought Pj , 0 otherwise

Issues: Sparsity

• Mostly 0’s. E.g. Amazon.com 2 million books, less than 0.1% is 1

Scalability• Very large data sets

Authority• Take into account similarity between products

– E.g. paperback “Cold Mountain” is same as hardcover “Cold Mountain”

143

Finding Neighbors

Similar to clustering cluster around a given customer

First compute similarity between customers: Xa, Xb

Xa^ -- corresponding product vector

Cosine measure• Cosine of angle between vectors gives similarity• Sim(Xa, Xb) = Xa

^ . Xb^/| Xa

^ | | Xb^ |

• See class on Clustering for examples, more info

144

Neighborhood

Now compute neighborhood of Xa

Center-based• Select k closest neighbors to Xa

Centroid-based• Assume j closest neighbors selected• Select j+1st neighbor by picking customer

closest to centroid of first j neighbors• Repeat 1..k

145

Generating Recommendations

From the neighborhood among products Xa has not bought yet, pick: most frequently occuring Weighted Average based on similarity Based on Association Rules

See Sarwar et al (sections 1-3) (http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/ec00.pdf)

146

Example

Shrek Star Wars

MIB Harry Potter

X-files

John 1 1 1

Jane 1 1 1 1

Pete 1 1

Jeff 1 1

Ellen 1 ? 1 ? ?

What new movie should we recommend to Ellen?

147

Similarity FunctionShrek Star

WarsMIB Harry

PotterX-files Similarity

to Ellen

John 1 1 1 1/sqrt(6) = 0.41

Jane 1 1 1 1 1/sqrt(2) = 0.71

Pete 1 1 1/2

Jeff 1 1 1/2

Ellen 1 ? 1 ? ?Use Cosine measure for similarity

148

NeighborsShrek Star

WarsMIB Harry

PotterX-files Similarit

y to Ellen

John 1 1 1 0.41

Jane 1 1 1 1 0.71

Pete

1 1 0.5

Jeff 1 1 0.5

Ellen 1 ? 1 ? ?Use Center-based approach and pick 3 closest neighbors

149

RecommendationShrek Star

WarsMIB Harry

PotterX-files

Jane 1 1 1 1

Pete 1 1

Jeff 1 1

Ellen 1 2 1 1 1

Recommend Star Wars

150

Implementation Issues

Serious application Large data sizes: millions of users * millions of

products CPU cycles

Scalability key Partition the data set and the processing

Real-time vs Batch Real-time can lead to poor response times Real-time preferable – recommend immediately

after a customer purchase! Incremental solution key for real time

151

Summary

Product Selection is the essence of retailing Personalization is unique to online retailing

Every customer can have their own store Most successful personalization techniques,

get customers to help one another Algorithms, like CF, enable this interaction

In real life, algorithms are complex monsters due to scaling issues, repeated tweaking, etc…

152

Public-Key Cryptosystems

153


M – message (treated as a number) E – Encryption procedure D – Decryption procedure Required properties: 1. D(E(M))=M 2. E and D are easy to compute 3. Revealing E does not reveal easy way

to compute D 4. E(D(M))=M

154


Two users A(lice) and B(ob) A and B publicly announce EA,EB respectively.

B sends a private message M to A, EA(M) A decipher the message by computing

DA(EA(M))

Signature by B on message M to be sent to A: B computes S=DB(M) (can add its name for example to

M) B sends EA(S) to A

155


Given the signed message S, A can find the original message M by computing EB(S)

B can not deny sending the message M to A, because no one else could generate S.

A can not change M to M’ and claim it has been sent, since it will have for that to generate a corresponding signature S’=DB(M’)

156

RSAThe public key is a pair (e,n) of positive integers.A message M is treated as an integer between 0 and n-1.C=E(M)=Me (mod n) D(C)=Cd (mod n)We need to get an appropriate decryption key. 1. Choose n=pq, where p and q are very large random primes. 2. Pick an integer d that is relatively prime to (p-1)(q-1), I.e. satisfy gcd(d,(p-1)(q-1))=1_ 3. Pick e, such that ed=1(mod (p-1)(q-1))

157

Digital Cash

158

Digital Cash

Players: Bank (B), Vendor (V), User (A, Alice) Four protocols: withdrawl (user from bank) spend (user at vendor) deposit (vendor at bank) transfer (user to user; will skip)

Goal of basic schemes: avoid obvious attacks, such as “double spending”.

159

A basic scheme

Withdrawl: 1. AB give $1 2. BA Coin: SigB[$1, “Alice”,seq#]

(seq# is unique for every coin) 3. B deducts $1 from Alice’s

account

160

A basic scheme

Spend: 1. AV buy something for $1 2. VA choose random r{0,1}128

3. AV SigA[r,coin]=Vcoin

4. VA verify Alice signature and release good

161

A basic scheme

Deposit: VB deposit $1 , Vcoin =[r,coin] The bank stores all seq# that have been previously spent,

and check whether the sequence number has been already spent.

If not – everything fine. If it has been spent: who to blame? If V’coin =[r’,coin] is already spent (in B’s database) then if

r=r’ then V is to blame with overwhelming probability, and otherwise Alice is guilty with overwhelming probability.

3

162

Escrow services

When one user has a good and when one has a different good (or money) they may wish to make an exchange.

Escrow service will take the goods and exchange them.

One can show that without escrow services simple fair exchanges are impossible.

The more general problem: contract signing by two parties.

163

Off-line Escrow services

We do not wish the escrow service to deal with each instance of contract signing.

Off-line escrow service: will be used only if there is a problem.

We now describe such a service.

164

Fair exchange

E-escrow service with public key pe and private key se

A and B need to sign a contract M. The basic idea -- Verifiable Escrow:

User A signs on M – SA(M), create

CA =Epe [SA(M)+condition], and PROVE (without revealing information) that

CA has been built correctly.

165

Fair exchange

1. AB verifiable escrow CA where the condition is e.g. that B reveals SB=SigB(M)

2. BA B verifies validity of CA and

send SB=SigB(M)

3. AB A verifies SB (to be valid signature), and if fine sends SA(M)

4. BA B verifies SA(M)

What happens if A aborts the protocol before sending the signature to B?

166

Fair exchange

5. If in step 4, B claims it has been cheated, then it sends CA,SB to E, who verifies that SB=SigB(M) , recover SA(M) from CA and sends to B.

E also sends SB to A.

167

Micropayments

168

Micropayments schemes Payment schemes that emphasize the ability to

make payments of small amounts are called micropayment schemes.

Applications of micropayments include paying for each web page visited, and for each minute of music or video as it streamed to the user.

The problem: the cost of transactions is much higher than the worth of each transaction.

Micropayment schemes try to aggregate many small payments info fewer, larger payments, whose processing costs are relatively small.

169

Observations Hash functions are about 100 times

faster than RSA signature verifications, and about 10000 faster than RSA signature generation.

On a typical workstation, one can sign two messages per second, verify 200 signatures per second, and compute 20000 hash functions per second.

170

Notation B – broker/bank U – user V- Vendor PK – public key SK – secret Key h – cryptographically strong hash function

(such as MD5). – a very large search is required to produce a single input producing a given output, or to find two inputs producing the same output.

171

PayWord U computes an h-chain, x0,x1,….,xn,

where xi = h(xi+1) U commits to the entire chain by sending his

signature on x0 to V. Each successive payment is made by releasing

the next consecutive value in the chain, which can be verified by checking that it hashes to the previous element.

172

PayWord If after i micropaymens, V wishes to make

a deposit, then it can deposit i cents by giving B xi and the user signature on xo

B can verify the signature and iterate h i times to verify the operation.

173

User-Bank relationship U request an account, and gives B over

secure channel her credit card number, PKU, delivery address AU, etc.

U’s certificate will have an expiration date E, and may include further information IU

The user’s certificate has the form CU={B,U,AU,PKU,E,IU}SKB

174

User-Vendor Relationship U and V relationships occur when e.g. user visits a

web-site, use/purchase 10 pages, and then move elsewhere.

Commitments: When U contacts a new vendor V, U computes a fresh payword w1,….,wn with root w0, where n is chosen to be “convenient”.

U then compute her commitment to the chain

M={V,CU,w0,D,IM}SKU

where D is the current date and IM is additional

information (such as the value of n).

175

User-Vendor RelationshipThe commitment authorizes B to pay V for any

paywords w1,….,wn that V redeem with B before date D (+ perhaps additional day, assuming a micropayment by the end of the day).

Payments: assume some agreement on each payment (e.g. 1 cent), a payment P from U to V consists of a payword and its index.

Notice that the payment need not be signed, and it is short.

The user spend her paywords in order starting from w1

176

User-Vendor Relationship Payment policy: for each commitment a

vendor V is paid l cents where (wl,l) is the corresponding payment received with the largest index.

V needs to store only the payment with the highest index. Once a user spends wi, she can not spend wj for j < i.

177

Vendor-Bank relationships V needs to obtain PKB

V needs to establish a way for B to pay V.

By the end of period (e.g a day) V sends B a redemption message for each of B’s users.

B needs to verify user signatures, and verify each (wl,l) payments (by l applications of h).

1 Electronic Commerce 2 High-Level Overview uThe course presents basic algorithmic techniques,...

Documents

Transcript of 1 Electronic Commerce 2 High-Level Overview uThe course presents basic algorithmic techniques,...