Evolving B oard Game Players Without Using Expert Knowledge
description
Transcript of Evolving B oard Game Players Without Using Expert Knowledge
1
Evolving Board Game Players Without Using Expert Knowledge
A presentation of research by Amit BenbassatAdvisor: Moshe Sipper.
A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010New yet unpublished results.
Includes results:
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose
Checkers. Expanding work to other games. Available projects.
2
A Bit About Tree-Based GP
A method of solving problems by evolving solver programs.
The programs are represented in memory in tree form (i.e. the genomes are trees).
Initially promoted mostly through the efforts of John Koza.
3
Tree-Based GPTurning expressions into a tree shaped data
structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X)
4
+
−
SQRT
XX 1
IFT
≤ +
+ 3
X Y
*
X Y
* XX 3
Generic Genetic Operators:Self-Replication
5
IFT
≤ +
+ 3
X Y
*
X Y
* XX 3
IFT
≤ +
+ 3
X Y
*
X Y
* XX 3
Generic Genetic Operators:Rebuild Mutation
6
IFT
≤ +
+ 3
X Y
*
X Y
* XX 3
−
Y 4
Generic Genetic Operators:Two-Way Crossover
7
IFT
≤ +
+ 3
X Y
X 3
−
Y 4
+
−
SQRT
XX 1
Synopsis Previous results in games using GP and
GAs. Applying tree based GP to Lose Checkers.
Design. Algorithm and operators. Results.
Expanding work to other games. Conclusions and future work.
8
Applying GP to Lose Checkers:From Genotype to Phenotype
Used strongly typed tree based GP. Trees are seen as board-state
evaluators. The individual players are built around
the evaluator, using it (integrated with alpha-beta search) to decide which move to take.
9
Terminal Nodes
10
Terminal Nodes (cont’d)
11
Function Nodes
12
Applying GP to Lose Checkers
Algorithm:Generate random population consisting
of individuals of tree height 5 for
generation 0.Repeat for each generation i
Evaluate fitness.Selection().Procreation(XOprob,mutProb). 13
Fitness Calculations The system supports a sequence of guides.
Each guide has a number of rounds assigned to it. Each guide has a number of games per round
assigned to it. The system also supports play between
individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games.
Players get 1 fitness point for winning a game and 0.5 points for a draw. 14
Fitness Calculations (cont’d)
for each guide i dofor j ← 1 to guide i‘s Number of rounds do
Have every individual in the population deemed fit enough play guide i’s round size games against guide i.
Have every individual in the population play coPlayNum
games as black against coPlayNum random opponents in
the population.
15
SelectionRepeat until number of parents selected is equal to original population size
Randomly choose two different individuals from population : I1 and I2if I1.Fitness > I2.Fitness thenSelect a copy of I1 for parent population.
elseSelect a copy of I2 for parent population.
16
Genetic Operators:Local Mutation
17
Every tree node N returning a floating point value was assigned a number.
This number was initialized to 1.0 and acted as a factor for the return value.
Local mutation is a slight change in the node’s factor.
+
A B
<f1> Returnsf1*(A+B) +
A B
<f2> Returnsf2*(A+B)
Genetic Operators:One-Way Crossover
18
IFT
≤ +
+ 3
X Y
X 3
−
Y 4
+
−
SQRT
XX 11
Procreation(XOprob,mutProb)
While there remain at least 2 unselected individuals.find two unselected individuals I1 I2 at random.with probability XOprobIf I1.Fitness > I2.Fitnessuse one-way XO to transfer genes from I1 to I2.Else
use two-way XO between I1 and I2.For each individual I1 in population.
with probability mutProb choose a node in I1‘s tree atrandom and mutate it by either rebuild or local mutation.19
Opponents There is no known simple evaluation
function for Lose Checkers. All hand-crafted players used the
random function to evaluate non-trivial board-states.
Two types of opponents were written in code: The random player. An α-β player of depth d with a random
evaluation function.20
Quality of α-β Players To insure that α-β
players using a random evaluation function are indeed proficient players, their performance was tested.
Each test tournament consists of 10000 games.
21
1st player win ratio 2nd player
1st player
0.9665 Random αβ2
0.8502 αβ2 αβ3
0.5873 αβ3 αβ8
0.82535 αβ3 αβ5
0.5562 αβ8 αβ5
Results with Search Againstα-β Players
Using lookahead 3, playing 1000 games against αβ3.
22
vs. αβ3 Fitness Eval
Run ID
744.0 50Co r00044698.5 50Co r00046765.5 50Co r00047696.5 50Co r00048781.5 50Co r00049721.0 50Co r00056786.5 50Co r00057697.0 50Co r00058737.0 50Co r00060737.0 50Co r00061
Results with Search Againstα-β Players (cont’d)
Using lookahead 3, playing against various opponents.
23
vs. αβ8 vs .αβ6 vs. αβ4 vs. αβ3
Run ID
758.0 816.0 944.5 744.0 r00044476.0 722.5 899.0 765.5 r00047735.5 809.0 915.0 781.5 r00049399.5 745.5 909.0 786.5 r00057408.5 627.0 897.0 737.0 r00060715.5 781.5 947.0 737.0 r00061
Results with Search Againstα-β Players: Parameters
Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black,
search depth 3. maximum tree depth:
12 in runs 44A-49A. 14 in runs 56A-61A
XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.
24
Evolving Players using Deeper Search
Results with players using lookahead 4.
25
vs. αβ8 vs. αβ6 vs. αβ5 Run ID395.0 603.5 582.0 r00064561.5 782.5 537.0 r00065483.5 757.5 567.0 r00066385.5 723.0 598.5 r00067524.0 787.0 548.0 r00068523.0 715.5 573.5 r00069476.0 691.5 577.0 r00070401.5 582.5 551.5 r00071
Results with Search Againstα-β Players: Parameters
Run parameters: Population 50, 70 generations. guide play:
20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb
0.5.26
The Role of Mobility Initial runs with search produced tepid
results. The introduction of the mobility
terminal greatly improved those results.
Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 27
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose
Checkers. Expanding work to other games.
New results in Lose Checkers. 10X10 Checkers. Reversi. Dodgem.
Conclusions and future work. 28
New Results in Lose Checkers
29
vs. αβ5 Fitness Eval Run ID
632.0 10αβ2_20Co r00090645.0 10αβ2_20Co r00091608.0 25Co r00096575.0 25Co r00097575.5 40Co r00098633.5 40Co r00099
Results with players using lookahead 4.
New Results in Lose Checkers (cont’d)
30
Run parameters: Population: 120-150 Generations: 90-100. Guide play:
10 games against αβ2 in two of the runs. 20-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb
0.5.
10x10 Checkers
31
10x10 Board. Objective: To
eliminate all opponent pieces or render all opponent pieces immobile.
Rules: As in 8x8 version.
Quality of α-β Players Evolved players were
tested against α-β players that chose a material evaluation function at random for each turn.
To insure that α-β players are indeed proficient players, their performance was tested.
Each test tournament consists of 10000 games.
32
1st player win ratio
2nd player
1st player
0.99885 Random αβ2
0.5229 αβ2 αβ3
0.876 αβ3 αβ5
10x10 Checkers Results
33
vs. αβ3 Search Depth
Fitness Eval
Run ID
889.0 3 50Co r00084927.0 3 50Co r00085732.0 2 25Co r00092615.5 2 25Co r00093554.0 2 25Co r00094631.0 2 25Co r00095
10x10 Checkers Results (cont’d)
Run parameters: Population: 100-150 Generations: 100 No guide play. 25-50 co-play games as
black. Search depth 4. Maximum tree depth 13-14. XO_Prob 0.8, mutProb 0.2, local_muteProb
0.5.34
8x8 Reversi Popular board game.
AKA Othello. 8x8 board. Each piece has black
side and white side. Each player places
piece on her turn, flipping trapped opponent pieces.
Objective: Maximize number of friendly pieces on the board.35
Reversi Specific Terminals
36
Return Value Return Type
Node Name
Number of corners occupied by
opponent
F EnemyCornerCount
Number of corners occupied by player
F FriendlyCornerCount
FriendlyCornerCount− EnemyCornerCount
F CornerCount
Quality of α-β Players
37
1st player win ratio
2nd player 1st player
0.8471 Random αβ2
0.6004 αβ2 αβ3
0.7509 αβ3 αβ5
0.7662 αβ5 αβ7
Evolved players were tested against α-β players that chose a material evaluation function at random for each turn.
To insure that α-β players are indeed proficient players, their performance was tested.
Each test tournament consists of 10000 games.
Reversi Results
38
vs. αβ7
vs. αβ5
Search Depth
Fitness Eval
Run ID
758.5 875.0 4 25Co r00100803.0 957.5 4 25Co r00101640.5 942.5 4 40Co r00102711.5 905.5 4 40Co r00103760.0 956.0 4 40Co r00108826.0 912.5 4 40Co r00109730.5 953.5 4 40Co r00110815.5 961.0 4 40Co r00111
Reversi Results (cont’d) Run parameters:
Population: 120 Generations: 100 No guide play. 25-40 co-play games as
black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb
0.5.39
Dodgem
40
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose
Checkers. Expanding work to other games. Available projects.
41
Your mission (should you decide to accept it)
1. Choose a game.2. Write game program in C and
interface with Java system.3. Write game specific terminal nodes
and adjustments if necessary.4. Run it, document results, produce
report.
42
Games
43
My Current Areas of Interest.
Games with high branching factor. Games with random element. Multiplayer games. Games with partial information.
44
Another project.I want to check my selective crossover operator.
Adapt system to a toy problem. Execute runs with selective XO and with
typical XO using several parameter sets. Compare and analyze results. Write report.
45