s02P s 1 0 f q 2 n >nt1:n and n = ( >n n aI 1 s 1The tapping is performed when the bird lie above...

1
A Bayesian Ensemble Regression Framework on the Angry Birds Game Nikolaos Tziortziotis, Georgios Papagiannis and Konstantinos Blekas Department of Computer Science & Engineering, University of Ioannina, Greece B email: {ntziorzi, gpapagia kblekas}@cs.uoi.gr Proposed Strategy 1. Tree structure construction 2. Feasibility examination 3. Prediction: expected reward calculation 4. Target and tap timing selection 5. Regression model parameters adjustment 1. Tree Structure Tree construction: Constructed in hierarchical fashion ( bottom-up ) Each node represents a union of adjacent objects of the same material. Root can be supposed as a virtual node Three features per tree node are considered: x 1 (s): Personal weight ,x 1 (s) = Area(s) c s x 2 (s): Parents cumulative weight , i.e. x 2 (s) = P s 0 2P (s) x 1 (s 0 ) x 3 (s): Distancefrom the nearest pig (normalized to [0; 1]) s 11 s 12 s 13 s 14 s 15 s 21 s 22 s 31 s 41 s 51 s 52 s 53 s 61 s 71 s 81 s 91 Root Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 7 Level 8 Level 9 2. Feasibility notion Two different tranjectories are considered: Direct shot (angle< = 45 ) High arching shot (angle> 45 ) A node is considered as: feasible , if can be reached directly by at least one shot a infeasible , otherwise Feasible node Infeasible node Pig is reachable by at least one trajectory Wood is not directry reachable due to structure a In the case of the white bird a node supposed as feasilbe if can be reached by bird’s egg. 3. Ensemble linear regresion models A separate linear model is used for each (bird,object material) pair Linear regression model The rewards are considered as the target values: t n = P M i =1 w i i (x n )+ n =w > (x n ) | {z } Gaussian kernel + n |{z} noise Supposing that n N (0; 1 ), t n jx=x n N(w > (x); 1 ) Conditional probability: p(t 1:n jw; )= N(t 1:n j |{z} Design matrix w; 1 I n ); t 1:n , ft k g n k =1 Bayesian linear regression Conjugate prior: wj N (wj0; a 1 |{z} prior parameter I ): Marginal distribution: p(wjt 1:n ;; )= N(wj n ; n ) ; where, n = n > n t 1:n and n =( > n n + aI ) 1 . Predictive distribution: p(t jt 1:n ;; )= N(t j > n (x ); 1 + (x ) > n (x )): 4(a). Target selection mechanism X Only the feasible nodes are examined X The best arm is selected greedily according to: j = arg max q f (q) n f (q) > (x q )+ C q 2 ln N n f (q) f (q): denotes the regression model for node, q n f (q) : number of times where has been selected N : total number of plays C: has been selected equal to 3000 4(b). Tapping selection No tapping At the 75% of the trajectory from the slingshot to the rst collision point At the 85% of the trajectory(direct shot) At the 90% of the trajectory(high-arching) The tapping is performed when the bird lie above the target No tapping White bird tapping A node supposed as feasible if can be reached by bird’s egg. 5. Online model’s parameters learning X Regressor k, f (j ) has been selected The received observation (reward) t n k +1 follows, p(t n k +1 jw k )= N(t n k +1 jw T k (x n k +1 ); ): The weights’ posterior distribution is given as: p(w k jt 1:n k +1 ) = p(t n k +1 jw k )p(w k jt 1:n k ) = N(w k j k n k +1 ; k n k +1 ), where the Gaussian parameters are given as: k n k +1 = ( k n k ) 1 + (x n k +1 ) T (x n k +1 ) 1 k n j +1 = k n k +1 k T (x n k +1 )t n k +1 +( k n k ) 1 k n k : Acknowledgment This work was partially supported by a travel grant from the ECCAI society.

Transcript of s02P s 1 0 f q 2 n >nt1:n and n = ( >n n aI 1 s 1The tapping is performed when the bird lie above...

Page 1: s02P s 1 0 f q 2 n >nt1:n and n = ( >n n aI 1 s 1The tapping is performed when the bird lie above the target No tapping White bird tapping A node supposed as feasible if can be reached

A Bayesian Ensemble Regression Framework on the Angry Birds GameNikolaos Tziortziotis, Georgios Papagiannis and Konstantinos BlekasDepartment of Computer Science & Engineering, University of Ioannina, Greece

B email: {ntziorzi, gpapagia kblekas}@cs.uoi.gr

Proposed Strategy

1. Tree structureconstruction

2. Feasibilityexamination

3. Prediction: expectedreward calculation

4. Target and taptiming selection

5. Regression modelparameters adjustment

1. Tree Structure

Tree construction:• Constructed in hierarchical fashion (bottom-up)• Each node represents a union of adjacent objects

of the same material.• Root can be supposed as a virtual node

Three features per tree node are considered:• x1(s): Personal weight, x1(s) = Area(s)× cs• x2(s): Parents cumulative weight, i.e. x2(s) =∑

s′∈P(s) x1(s′)

• x3(s): Distance from the nearest pig (normalizedto [0, 1])

s11 s12 s13 s14 s15

s21 s22

s31

s41

s51 s52 s53

s61

s71

s81

s91

Root

Level 1

Level 2

Level 3

Level 4

Level 5

Level 6

Level 7

Level 8

Level 9

2. Feasibility notion

Two different tranjectories are considered:• Direct shot (angle <= 45◦)• High arching shot (angle > 45◦)

A node is considered as:• feasible, if can be reached directly by at least one

shota

• infeasible, otherwiseFeasible node Infeasible node

Pig is reachable by at least one trajectory Wood is not directry reachable due to structure

aIn the case of the white bird a node supposed as feasilbe if can be reached by bird’s egg.

3. Ensemble linear regresion models

A separate linear model is used for each (bird,object material) pairLinear regression model

The rewards are considered as the target values:

tn =∑Mi=1 wiφi(xn) + εn = w> φ(xn)︸ ︷︷ ︸

Gaussiankernel

+ εn︸︷︷︸noise

Supposing that εn ∼ N (0, β−1),

tn | x = xn ∼ N (w>φ(x), β−1)

Conditional probability:

p(t1:n | w, β) = N (t1:n | Φ︸︷︷︸Design matrix

w, β−1In), t1:n , {tk}nk=1

Bayesian linear regressionConjugate prior: w|α ∼ N (w|0, a−1︸︷︷︸

prior parameter

I).

Marginal distribution:

p(w|t1:n, α, β) = N (w|µn, Σn),

where,

µn = βΣnΦ>n t1:n and Σn = (βΦ>nΦn + aI)−1.

Predictive distribution:

p(t∗|t1:n, α, β) = N (t∗|µ>nφ(x∗), 1β + φ(x∗)

>Σnφ(x∗)).

4(a). Target selection mechanismX Only the feasible nodes are examinedX The best arm is selected greedily according to:

j∗ = argmaxq

{(µf(q)nf(q)

)>φ(xq) + C

√2 lnNnf(q)

}• f(q): denotes the regression model for node, q• nf(q): number of times where has been selected• N : total number of plays• C: has been selected equal to 3000

4(b). Tapping selection

No tapping

At the 75% of the trajectory from the slingshot to thefirst collision point

At the 85% of the trajectory(direct shot)At the 90% of the trajectory(high-arching)

The tapping is performed when the bird lie above thetarget

No tapping

White bird tapping

A node supposed as feasible if can be reached by bird’s egg.

5. Online model’s parameters learning

X Regressor k , f(j∗) has been selectedThe received observation (reward) tnk+1 follows,

p(tnk+1|wk) = N (tnk+1|wTk φ(xnk+1), β).

The weights’ posterior distribution is given as:p(wk|t1:nk+1) = p(tnk+1|wk)p(wk|t1:nk

)

= N (wk|µknk+1, Σknk+1) ,

where the Gaussian parameters are given as:

Σknk+1 =

[(Σk

nk)−1 + βφ(xnk+1)

Tφ(xnk+1)]−1

µknj∗+1 = Σknk+1

[βkφ

T (xnk+1)tnk+1 + (Σknk)−1µknk

].

AcknowledgmentThis work was partially supported by a travel grant from the ECCAI society.