1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem,...

41
1 • By Gil Kalai • Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel • presented by: Yair Cymbalista

Transcript of 1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem,...

1

• By Gil Kalai

• Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel

• presented by: Yair Cymbalista

2

• In this lecture we would like to study the extent to which concepts of choice used in economic theory imply “learnable” behavior.

• Learning theory which was introduced and developed by Valiant, Angluin and others at the early 1970’s deals with the question how well a family of objects (functions in this lecture) is learnable from examples. Valiants learning concept is statistical i.e. the examples are chosen randomly.

3

• In order to analyze Learnability we will use a basic model of statistical learning theory introduced by Valiant called the model of PAC-Learnability (PAC stands for “probably approximately correct”). It is based on choosing examples randomly according to some probability distribution.

4

• Let Let 0 1 and be a probability distribution on U. We say that F is learnable from t examples with probability of at least 1 with respect to the probability distribution if the following assertion holds:

}.:|{ YUffF

5

• For every f F if and are chosen at random and independently according to the probability distribution and if f F satisfies , i = 1, 2, . . . , t, then

with probability of at least 1 . We say that F is learnable from t examples with probability of at least 1 if this is the case with respect to every probability distribution on U.

uuuu t ,....,, 21

)()(' ii ufuf )()(' ufuf

6

• Given a set X of N alternatives, a choice function c is a mapping which assigns to a nonempty subset S X an element c(S) S.

• A “rationalizable” choice function is consistent with maximizing behavior, i.e. there is a linear ordering on the alternatives in X and c(S) is the maximum among the elements of S with respect to this ordering.

7

• Rationalizable choice functions are characterized by the Independence of Irrelevant alternatives condition (IIA): the chosen element from a set is also chosen from every subset which contains it.

• A class of choice functions is symmetric if it is invariant under permutations of the alternatives.

8

• In our lecture we will concentrate in the proof of the next two theorems.

• Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives.

• Theorem 1.2. Every symmetric class of choice functions requires at least a number of examples which is linear in the number of alternatives for learnability in the PAC-model.

9

. if )( and if )(that

so function a is there},...,2,1{subset

everyfor such that of ,...,, valuesand

of ,...,, elements ofnumber maximal the

is , dimby denoted , ofdimension -P The

21

21

P

BiyufBiyuf

FfsB

YyyyU

uuu

FF

iiBiiB

B

s

s

}:|{ YUffF

10

• Theorem 2.1. For a fixed value of 0 1, the number of examples t needed to learn a class of functions with probability of at least 1 is bounded above an below by a linear function of the P-dimension.

11

• Proposition 2.2. Let Then,

• Proof: The proof is obvious since in order of the P-dimension of F to be s we need at least distinct functions.

}:|{ YUffF

.||log)(dim 2P FF

s2

12

• • If F can be learned from t examples with

probability of at least 1 then so can F' (since we can choose to be supported only on U').

}.|{'Let

.'let ,}:|{Let

'FffF

UUYUffF

U

).(dim)'(dim PP FF

13

• Let C be the class of rationalizable choice

functions defined on nonempty subsets of a set X

of alternatives where |X|=N.

• |C|=N!• Proposition 2.2 • Proposition 2.1 The number of examples

required to learn a rationalizable function in the PAC-model is O(NlogN).

N.logNN!logdim 22P C

14

• Theorem 3.1.• Proof: We will first show that Consider

the elements For

each an order relation which satisfies

gives an appropriate function .

.1dimP NC.1dimP NC

.11 , ),,( 111 Niayaau iii

},...,,{ 121 NuuuB

Bi fBu , if 1111 and , if aaBuaa iii

15

Proof of Theorem 3.1(cont.)

• We will now show that We need to show that for every N sets and N elements i=1,2,…,N there is a subset S {1,2,…,N} such that there is no linear order on the ground set X for which is the maximal element of if and only if i S. Let , clearly we can assume for every k. Let

.1dimP NCNAAA ,...,, 21

ii Aa

iaiA

|| kk As 1ks

).,...,,( 21 NxxxX

16

Proof of Theorem 3.1(cont.)

For every j = 1,2,…,N consider the following vector: . Where:

NNjjjj vvvv R),...,,( 21

jkjkkj

jkjkj

jkkj

axAxv

axsv

Axv

and if 1-

if 1-

if 0

17

Proof of Theorem 3.1(cont.)

• Note that all vectors belong to an N-1 dimensional of vectors whose sum of coordinates is 0. Therefore the vectors are linearly dependent. Suppose now that

and that not all equal zero.

• Let S={ j| }. We will now show that there is no c C such that when k S and when k S.

jv

01

N

j jjvr

NRV Nvvv ,...,, 21

s'jr0jc

kk aAc )( kk aAc )(

18

Proof of Theorem 3.1(cont.)

• Assume to the contrary that there is such a function c. Let , and let

Denote by y the mth coordinate in the linear combination we will show that y is positive.

If or if then

}0|{ jj rAB ).(Bcxm

.1

N

j jjvr

.0 0 jjjmj vrAxr

19

Proof of Theorem 3.1(cont.)

• Assume now that and therefore . There are two cases to consider:

• Therefore Contradiction!!!.0)1(

)( 0

.0)1(

)( 0

jmjj

mjjj

jjmjj

mjjj

rvr

xAcaSjr

srvr

xAcaSjr

jmj Axr 0mj xAc )(

.01

N

j

mjjvry

20

• Theorem 1.1. A rationalizable choice function can be statistically learned with high probability from a number of examples which is linear in the number of alternatives.

• Proof: Theorem 1.1 follows from Theorem 3.1 and theorem 2.1.

21

• Let . Let 0 , 1 and be a probability distribution on U. Let g be an arbitrary function . Define the distance from g to F, dist(g,F), to be the minimum probability over fF that f(x) g(x), with respect to .

• Given t random elements (drawn independently according to ), define the empirical distance of g to F, as: min

}:|{ YUffF

YUg :

tuuu ,....,, 21

,),(dist emp Fg./|)}()(:{| tugufi iiFf

22

• Theorem 4.1. There exists K(,) such that for every probability distribution on U and every function , the number of independent random examples t needed such that

with probability of at least 1- ,

is at most

YUg :

|),(dist),dist(| emp fgFg

. )(dim) , ( P FK

23

• Corollary 4.2. For every probability distribution on U and every function , if g agrees with a function in F on t independent random examples and t then: dist(g,F) < with probability of at least 1- .

YUg :

)(dim) , ( P FK

24

• The class of rationalizable choice functions is symmetric under relabeling of the alternatives. Mathematically speaking, every permutation on X induces a symmetry among all choice functions given by

• A class of choice functions will be called symmetric if it is closed under all permutations of the ground set of alternatives X.

)).( ())(( 1 ScSc

25

• A choice function defined on pairs of elements is an asymmetric preference relation. Every choice function describes an asymmetric preference relation by restricting it to pairs of elements.

• Every choice function defined on pairs of elements of X describes a tournament whose vertices are the elements of X, such that for two elements x,yX, c({x,y})=x if and only if in the graph induced by the tournament there is an edge oriented from x to y.

26

• Theorem 5.1. (1) The P-dimension of every symmetric class C of preference relations (considered as choice functions on pairs) on N alternatives is at least N/2. (2) When N 8 the P-dimension is at least N-1. (3) when N 68, if the P-dimension is precisely N-1, then the class is the class of order relations.

27

Proof of Theorem 5.1

(1)Let Let Let cC and assume without loss of generality that Let R{1,2,…,m}. We will define as follows: (If N is odd define .)

. 2/ and },...,,{ 21 NmxxxX N .1 },,{ 212 mixxA iii

.1 ,)( 12 mixAc ii

R

.)( and )(

: then If

22R1212R kkkk xxxx

Rk

.)( and )(

: then If

122R212R

kkkk xxxx

Rk

NN xx )(R

28

Proof of Theorem 5.1(cont.)

Therefore: and hence is satisfactory.

)()())((

)))((())((: then If

12121

R1

R

R1

RR

kkkk

kk

AcxxAc

AcAcRk

)()())((

)))((())((: then If

2121

R1

R

R1

RR

kkkk

kk

AcxxAc

AcAcRk

RkAcxAc kkk )())(( 12R)(R c

29

Proof of Theorem 5.1(cont.)

(2) To prove part (2) we will use the next conjecture made by Rosenfeld and proved by Havet and Thomson: When N 8, for every path P on N vertices with an arbitrary orientation of the edges, every tournament on N vertices contains a copy of P.

30

Proof of Theorem 5.1(cont.)

Let c be a choice function in the class and consider the tournament T described by c. Let Every choice function c' on describes a directed path P. Suppose that a copy of P can be found in our tournament and that the vertices of this copy (in the order they appear on the path) are

Define a permutation by The choice function (c) will agree with c' on

.11 ,},{ 1 NkxxA kkk

121 ,...,, NAAA

.,...,,21 Niii xxx .)(

jij xx

.,...,, 121 NAAA

31

• Theorem 5.1 implies the following Corollary:

Corollary 5.2. The P-dimension of every symmetric class of choice functions on N alternatives, N 8 is at least N-1.

32

• Consider a tournament with N players such that for two players i and j there is a probability that i beats j in a match between the two. Among a set A N of players let c(A) to be the player most likely to win a tournament involving the players in A.

• Consider the class W of choice functions that arise in this model where

ijp

].1,0[ijp

33

• Theorem 6.1. The class of choice functions W requires examples for learning in the PAC-model.

)( 3NO

34

• Consider m polynomials in r variables For a point the sign pattern where

mixxQ ri 1 ),...,( 1.,...,, 21 rxxx rc R

mmsss }1,0,1{),...,,( 21

.0)( if 1

0)( if 0

0)( if )1(

:namely ),(

cQs

cQs

cQs

csignQs

jj

jj

jj

jj

35

• Theorem 6.2. If the degree of every is at most D and if 2m>r then the number of sign patterns given by the polynomials is at most

.)/8( rreDm

jQ

mQQQ ,...,, 21

36

• Given a set A of players, the probability that the k-th player will be the winner in a tournament between the players of A is described by a polynomial Q(A,k) in the variables as follows: Let M= to be an s by s matrix representing the out come of all matches between the players of A in a possible tournament such that

if player i won the match against player j and otherwise.

ijp)( ijm

1 ,1for 0 ijii msim0ijm

37

Proof of Theorem 6.1(cont.)

• The probability that such a matrix M will represent the results of matches in a tournament is:

• Define Q(A,k)=

• C(A) is the player kA for which Q(A,k) is maximal.

Mp

1

,M

ijmAji

ijpp

}by drepresentet tournamenin the won :{ M Mkp

38

Proof of Theorem 6.1(cont.)

• Q(A,k) is a polynomial of degree N(N-1)/2 in N(N-1)/2 variables i, jA.

• Now consider Q(A,k, j) = Q(A,k) - Q(A, j) for all subsets A N and k, jA k j. We have all together less than polynomials in N(N-1)/2 variables The degree of these polynomials is at most N(N-1)/2.

ijp

.ijp

22 NN

39

Proof of Theorem 6.1(cont.)

• Now c(A)=k Q(A,k, j) >0 for every jA j k. Therefore the choice function given by a vector of probabilities is determined by the sign pattern of all polynomials Q(A,k, j).

• We can now invoke Warren’s theorem with r = D = and According to Warren’s theorem the number of different sign patterns of the polynomials is at most

ijp

)(2N .22 NNm

.)28( 2/)1(2 NNN Ne

40

Proof of Theorem 6.1(cont.)

• • Therefore it follows from theorem 2.1 and

Proposition 2.2 that the number of examples needed to learn W in the PAC-model is

.)28(log 32/)1(22 NNe NNN

).( 3NO

41

• Our main result determined the P-dimension of the class of rationalizable choice functions and showed that the number of examples needed to learn a rationalizable choice function is linear in the number of alternatives.

• We also described a mathematical method for analyzing the statistical learnability of complicated choice models.