COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago)...

39
COSC343: Artificial Intelligence Lech Szymanski Lech Szymanski (Otago) Lech Szymanski Dept. of Computer Science, University of Otago COSC343 Lecture13 Lecture 13 : Deep learning

Transcript of COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago)...

Page 1: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

COSC343:ArtificialIntelligence

LechSzymanskiLech Szymanski (Otago)

Lech Szymanski

Dept. of Computer Science, University of Otago

COSC343Lecture13

Lecture13:Deeplearning

Page 2: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

• Hierarchicalrepresentation• Deeplearning• ConvolutionalNeuralNetworks• AlphaGo

Intoday’slecture

COSC343Lecture13

Page 3: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Whileit’spossibletoencodeanyfunctioninanetworkwithonehiddenlayer,it’softeneasiertoencodecomplexfunctionsindeepnetworkswithmanyhiddenlayers.

Theintuition:• Thefirstlayerlearnssimplepatterns...• Thenextlayerlearnspatternsofthesepatterns...andsoon.

Visualobjectclassification isatypeofapplicationwherethishierarchicalfeaturedetectionisveryeffective.

Networkwithmanyhiddenlayers

COSC343Lecture13

Page 4: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Thehierarchicalstructureofobjectclassification

COSC343Lecture13

• A motorbike has wheels, a seat, an engine...• A wheel has a tyre, an axle, spokes...• A tyre has treads, an axle has shiny bits...• ...• Lines, dots...• Pixels

Page 5: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Inthepastcoupleofyears,deeplearninghasbecomeadominantmachinelearningdevicefor‘big’,‘hard’learningtasks.

Deeplearningconsistsoftwodifferentapproachestotrainingartificialneuralnetworks:• DeepBeliefNets

(inventedbyGeoffreyHinton)• ConvolutionalNeuralNetworks

(inventedbyYann LeCun)

Deeplearning

COSC343Lecture13

we will only cover Convolutional

Neural Networks

Page 6: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Processingimagesinafeedforwardneuralnetwork

COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

outp

ut o

f the

1st

lay

er a

s a

vect

or

1st layer neurons

2nd layer neurons

input to output mapping is determined by network weights (and biases), which are derived

through training

outputneuron

Page 7: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Processingimagesinaconvolutionalneuralnetwork

COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

Page 8: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 9: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 10: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 11: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 12: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 13: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 14: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 15: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 16: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 17: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 18: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 19: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 20: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

2nd convolutional layer

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer

Page 21: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer 2nd convolutional layer

Page 22: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer 2nd convolutional layer

Page 23: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

1st convolutional layer

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

Processingimagesinaconvolutionalneuralnetwork

2nd convolutional layer

Page 24: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer 2nd convolutional layer

Page 25: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago) COSC343Lecture13

input image

inpu

t im

age

as a

vec

tor

filter sensitive to / edge

filter sensitive to \ edge

outp

ut o

f the

1st

lay

er a

s a

vect

or

interpretation of 1st layer’s output

as an image

interpretation of 1st layer’s output

as an image

filter: y1 - y2 + 1

filter: y1 + y2

feat

ures

neu

ron

s ar

e se

nsi

tive

to

?

? ?

?

?

?

?

?

?

?

?

?? - filters/features are

determined by network

weights, which are derived through training

Processingimagesinaconvolutionalneuralnetwork1st convolutional

layer 2nd convolutional layer

Page 26: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Convolutionalneuralnetwork(convNet)parameters

COSC343Lecture13

(or pooling layer) (or pooling layer)

• The entire model is trained using backpropagation:• Typically with the softmax function at the output (for classification)• The pooling layer has no trainable parameters, but it can pass

error “blame” backward• Convolutional layers just needs some additional averaging of the

updates across given filter’s weights• Rectifier linear unit (ReLU) activation function is typically used in

convolutional layers• A sigmoid function is typically used in the fully connected MLP

Page 27: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

-3 -2 -1 0 1 2 3-0.5

0

0.5

1

1.5

2

2.5

3

ReLU activationfunction

COSC343Lecture13

Rectifier Linear Unity activation function

Page 28: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

convNet networkarchitecture

COSC343Lecture13

• As is typical for neural networks, the training cannot determine the architecture • In a convNet there are many more things to set, because layers are not fully

connected• For each convolutional layer, the user must specify

• Number of filters

• Size of the filters• The step size (how a given filter is shifted over the image map)• The padding (whether filters extend beyond the edge of the image)

• For each pooling layer, the user must specify• The size of the pooling window (the subsampling ratio)

• The type of pooling (max or average)• The user must decide on the number of convolutional+pooling layers and the

architecture of the fully connected MLP that follows

Page 29: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

convNet features

COSC343Lecture13

824 M.D. Zeiler and R. Fergus

Layer 2

Layer 1

Layer 3

Layer 4 Layer 5

Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top9 activations in a random subset of feature maps across the validation data, projecteddown to pixel space using our deconvolutional network approach. Our reconstructionsare not samples from the model: they are reconstructed patterns from the validation setthat cause high activations in a given feature map. For each feature map we also showthe corresponding image patches. Note: (i) the the strong grouping within each featuremap, (ii) greater invariance at higher layers and (iii) exaggeration of discriminativeparts of the image, e.g. eyes and noses of dogs (layer 4, row 1, cols 1). Best viewed inelectronic form. The compression artifacts are a consequence of the 30Mb submissionlimit, not the reconstruction algorithm itself.

824 M.D. Zeiler and R. Fergus

Layer 2

Layer 1

Layer 3

Layer 4 Layer 5

Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top9 activations in a random subset of feature maps across the validation data, projecteddown to pixel space using our deconvolutional network approach. Our reconstructionsare not samples from the model: they are reconstructed patterns from the validation setthat cause high activations in a given feature map. For each feature map we also showthe corresponding image patches. Note: (i) the the strong grouping within each featuremap, (ii) greater invariance at higher layers and (iii) exaggeration of discriminativeparts of the image, e.g. eyes and noses of dogs (layer 4, row 1, cols 1). Best viewed inelectronic form. The compression artifacts are a consequence of the 30Mb submissionlimit, not the reconstruction algorithm itself.

824 M.D. Zeiler and R. Fergus

Layer 2

Layer 1

Layer 3

Layer 4 Layer 5

Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top9 activations in a random subset of feature maps across the validation data, projecteddown to pixel space using our deconvolutional network approach. Our reconstructionsare not samples from the model: they are reconstructed patterns from the validation setthat cause high activations in a given feature map. For each feature map we also showthe corresponding image patches. Note: (i) the the strong grouping within each featuremap, (ii) greater invariance at higher layers and (iii) exaggeration of discriminativeparts of the image, e.g. eyes and noses of dogs (layer 4, row 1, cols 1). Best viewed inelectronic form. The compression artifacts are a consequence of the 30Mb submissionlimit, not the reconstruction algorithm itself.

Here’s a visualisation of the responses of units in different layers of a deep convolutional neural network for object classification (Zeiler and Fergus, 2014)

Page 30: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

convNet problems

COSC343Lecture13

convNets do throw away location information:• As long as a combination of

features is found, the actual order of those features might be ignored

• As a result, it’s quite easy to “fool” convNet into bad classification (Nguyen, Yosinski, Clune, 2015)

• As any other learning model, they perform well when test data is not too different from the training data...

• ...and so state of the art deep learning network are trained on HUGE training sets (utilising parallel nature of neural networks to speed up the computation with lots of GPUs

Page 31: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGo works

COSC343Lecture13

• AlphaGo versus Lee Sedolwas a five-game Go match betweenSouth Korean professional Go player Lee Sedol and AlphaGo, acomputer program developedby Google DeepMind.

• Played in Seoul, South Korea between 9 and 15 March 2016.• AlphaGowon 4 out of 5 games.• AlphaGo's victory was a major milestone in artificial

intelligence research.• Go had previously been regarded as a hard problem in

machine learning that was expected to be out of reach for thetechnologyof the time.

AlphaGo relies heavily on a ConvolutionalNeural Network

Page 32: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Silveretal.,“MasteringthegameofGowithdeepneuralnetworksandtreesearch”,Nature 529,484–489(28January2016)doi:10.1038/nature16961.

Example:howAlphaGoworks

COSC343Lecture13

Katpatuka (Wikipedia)

Game of Go:• Two players alternately place black

and white stones on the vacant intersections of a board with a 19×19 grid of lines

• Objective is to surround a larger total area of the board with one's stones than the opponent.

• Perfect information game• 2x10170 possible board states• Average game has 150 moves, with

an average of about 250 choices per move

Page 33: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

COSC343Lecture13

Policy model:• Its job is to predict next action to take (next

move) given the current state• A convolutional neural network

• Input: the board state as an image• Output: an image of the board with

probability distribution of placing a stone at a given position

• Softmax activity at the output• 13 layers (convolutional + pooling)• Trained on 30 million moves done by

experts (input is state, the desired output is a move made by an expert)

• 56% correct (predicting the expert’s move) after training input - state

output - probability distribution of action

given state

Page 34: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

COSC343Lecture13

Improving the Policy model:• Play against a random

previous version of itself• Policy gradient

reinforcement learning• Save state of the

network at every move

• Reward/punishment given once game won/lost

• Propagate reward/punishment back through states and change the parameters so that expectation of the reward goes up

Version 1 Version 1

vs.

Oponnent move 1

Oponnent move 2

.

.

.

Won:

Move N

Move 2

Move 1

Page 35: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

COSC343Lecture13

Improving the Policy model:• Play against a random

previous version of itself• Policy gradient

reinforcement learning• Save state of the

network at every move

• Reward/punishment given once game won/lost

• Propagate reward/punishment back through states and change the parameters so that expectation of the reward goes up

Version 2 Version 1

vs.

Oponnent move 1

Oponnent move 2

.

.

.

Won:

Move N

Move 2

Move 1

Page 36: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

Improving the Policy model:• Play against a random

previous version of itself• Policy gradient

reinforcement learning• Save state of the

network at every move

• Reward/punishment given once game won/lost

• Propagate reward/punishment back through states and change the parameters so that expectation of the reward goes up

Version X Version 1

vs.

Version 2

Version 3

Version X

.

.

.

COSC343Lecture13

Page 37: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

Value function:• Predict chance of

winning given game played by strongest policy against itself

• A convolutional neural network:

• Similar architecture as policy model

• Input is the game state

• Output is a single value – chance of winning from the current state

• Trained using stochastic gradient descent minimizing MSE

Version X Version X

vs.

Oponnent move 1

Oponnent move 2

.

.

.input - state

output p(win| )

Won

COSC343Lecture13

Move N

Move 2

Move 1

Page 38: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Example:howAlphaGoworks

When making a move in a real game, use a tree search algorithm to look many moves ahead…• Use the policy network to generate possible own (and opponents optimal)

moves• Use the value network to estimate chances of winning for a given scenario

This is still a massive tree search, but computationally manageable, because…• Considering only several actions, deemed best by the policy• Evaluation of the value (chance of winning) for given action is fast

COSC343Lecture13

Page 39: COSC343: Artificial Intelligence - Otago · COSC343: Artificial Intelligence Lech Szymanski (Otago) Lech ... • Convolutional Neural Networks • AlphaGo In today’s lecture COSC343

LechSzymanskiLech Szymanski (Otago)

Readingforthelecture:NoreadingReadingfornextlecture:AIMASection4.1.4

Summary

COSC343Lecture11

• Deeplearning– justneuralnetworkswithmanyhiddenlayersandcleverconstraintsonthearchitecture

• Convolutionalneuralnetworks• Hierarchicalfeatures• Setstacked,trainablefiltersthatselectively seekhigher

abstractionfeaturesinfurtherlayersofthenetwork• Today’sstateoftheartmachinelearningforobject

recognitioninimagesandspeechprocessing