Neural_Programmer_Interpreter

50
Neural Programmer- Interpreters ICLR 2016 Best Paper Award Scott Reed & Nando de Freitas Google DeepMind citation: 19 London, UK Katy, 2016/10/14

Transcript of Neural_Programmer_Interpreter

Page 1: Neural_Programmer_Interpreter

Neural Programmer-Interpreters

ICLR 2016 Best Paper Award

Scott Reed & Nando de Freitas Google DeepMind

citation: 19 London, UK

Katy, 2016/10/14

Page 2: Neural_Programmer_Interpreter

Motivation

• ML is ultimately about automating tasks, hoping that machine can do everything for human

• For example, I want the machine to make a cup of coffee for me

Page 3: Neural_Programmer_Interpreter

Motivation

• Ancient way: is to write full highly-detailed program specifications to carry them out

• AI way: come up with a lot of training examples that capture the variability in the real world, and then train some general learning machine on this large data set.

Page 4: Neural_Programmer_Interpreter

Motivation

• but sometimes the dataset is not big enough! and it doesn’t generalize well..

• NPI is an attempt to use neural methods to train machines to carry out simple tasks based on a small amount of training data.

Page 5: Neural_Programmer_Interpreter

NPI Goals• 1. Long-term prediction: Model potentially long sequences

of actions by exploiting compositional structure.

• 2. Continual learning: Learn new programs by composing previously- learned programs, rather than from scratch.

• 3. Data efficiency: Learn generalizable programs from a small number of example traces.

• 4. Interpretability: By looking at NPI’s generated commands, we can understand what it is doing at multiple levels of temporal abstraction.

Page 6: Neural_Programmer_Interpreter

Related Work

• Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

• Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

Page 7: Neural_Programmer_Interpreter

Sequence to sequence learning with neural networks

Page 8: Neural_Programmer_Interpreter

Neural turing machines

http://cpmarkchang.logdown.com/posts/279710-neural-network-neural-turing-machine

Page 9: Neural_Programmer_Interpreter

Outline

• NPI core module: how it works

• Demos

• Experiment

• Conclusion

Page 10: Neural_Programmer_Interpreter

Outline

• NPI core module: how it works

• Demos

• Experiment

• Conclusion

Page 11: Neural_Programmer_Interpreter

NPI core module• The NPI core is a LSTM network that acts as a

router between programs conditioned on the current state observation and previous hidden unit states

• input: a learnable program embedding, program arguments passed on by the calling program, and a feature representation of the environment.

• output: a key indicating what program to call next, arguments for the following program and a flag indicating whether the program should terminate.

core: an LSTM-based sequence model

Page 12: Neural_Programmer_Interpreter

Adding Numbers Together

Page 13: Neural_Programmer_Interpreter

Bubble Sort

Page 14: Neural_Programmer_Interpreter

Car Rendering• Whatever the starting position, the program should generate a

trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.

Page 15: Neural_Programmer_Interpreter

NPI Model

Page 16: Neural_Programmer_Interpreter

How it Works

Page 17: Neural_Programmer_Interpreter

How it Workse: environment

a: program argument p: embedded program vector

r(t): probability to terminate the current program

Page 18: Neural_Programmer_Interpreter

How it Works

Page 19: Neural_Programmer_Interpreter

How it Works

Page 20: Neural_Programmer_Interpreter

Outline

• NPI core module: how it works

• Demos

• Experiment

• Conclusion

Page 21: Neural_Programmer_Interpreter

Adding Numbers

• Environment:

• Scratch pad with the two numbers to be added, a carry row and output row.

• 4 read/write pointers location

• Program:

• LEFT, RIGHT programs that can move a carry pointer left or right, respectively.

• WRITE program that writes a specified value to the location of a specified pointer

Page 22: Neural_Programmer_Interpreter

Adding Numbers

Actual trace of addition program generated by our model on the

problem shown to the left.

Page 23: Neural_Programmer_Interpreter

Adding Numbers

• all output actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT.

all output actions (primitive atomic actions that can be performed on the environment) are performed with a single

instruction – ACT.

Page 24: Neural_Programmer_Interpreter

Adding Numbers Together

Page 25: Neural_Programmer_Interpreter

Bubble Sort

• environment:

• Scratch pad with the array to be sorted.

• Read/Write pointers

Page 26: Neural_Programmer_Interpreter

Bubble Sort

Page 27: Neural_Programmer_Interpreter

Bubble Sort

Page 28: Neural_Programmer_Interpreter

Car Rendering• Environment:

• Rendering of the car (pixels). (use CNN as feature encoder)

• The current car pose is NOT provided

• Target angle and elevation coordinates.

Page 29: Neural_Programmer_Interpreter

Car Rendering

Page 30: Neural_Programmer_Interpreter

Car Rendering• Whatever the starting position, the program should generate a

trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.

Page 31: Neural_Programmer_Interpreter

GOTO

Page 32: Neural_Programmer_Interpreter

HGOTO

• horizontal goto

Page 33: Neural_Programmer_Interpreter

LGOTO

• Left goto

Page 34: Neural_Programmer_Interpreter

ACT

• rotate 15 degree

Page 35: Neural_Programmer_Interpreter

give control back to LGOTO

Page 36: Neural_Programmer_Interpreter

core realized it haven’t done with horizontal rotation

Page 37: Neural_Programmer_Interpreter

Control back to GOTO

Page 38: Neural_Programmer_Interpreter
Page 39: Neural_Programmer_Interpreter

Outline

• NPI core module: how it works

• Demos

• Experiment

• Conclusion

Page 40: Neural_Programmer_Interpreter

Experiments

• Data Efficiency

• Generalization

• Learning new programs with a fixed NPI core

Page 41: Neural_Programmer_Interpreter

Data Efficiency - Sorting• Seq2Seq LSTM and NPI

used the same number of layersand hidden units.

• Trained on length 20 arrays of single-digit numbers.

• NPI benefits from mining multiple subprogram examples per sorting instance

accuracy v.s. training example

Page 42: Neural_Programmer_Interpreter

Generalization - Sorting• For each length 2 up

to 20, we provided 64 example bubble sort traces, for a total of 1216 examples.

• Then, we evaluated whether the network can learn to sort arrays beyond length 20

Page 43: Neural_Programmer_Interpreter

Generalization - Adding

only train on sequence length up to 20

Page 44: Neural_Programmer_Interpreter

Learning New Programs with a Fixed NPI Core

• example task: find the Max in array

• RJMP: move all pointers to the right by repeatedly calling RSHIFT program

• MAX: call BUBBLESORT and then RJMP

• Expand program memory by adding 2 slots. Randomly initialize, then learn by backpropagation with the NPI core and all other parameters fixed.

Page 45: Neural_Programmer_Interpreter

• 1. Randomly initialize new program vectors in memory

• 2. Freeze core and other program vectors

• 3. Backpropagate gradients to new program vectors

Page 46: Neural_Programmer_Interpreter

• + Max: performance after addition of MAX program to memory.

• “unseen” uses a test set with disjoint car models from the training set.

Page 47: Neural_Programmer_Interpreter

Outline

• NPI core module: how it works

• Demos

• Experiment

• Conclusion

Page 48: Neural_Programmer_Interpreter

Conclusion(1/2)

• NPI is a RNN/LSTM-based sequence-to-sequence translator with the ability to keep track of calling programs while recurse into sub-program

• NPI generalizes well in comparison to sequence-to-sequence LSTMs.

• A trained NPI with a fix core can learn new task while not forgetting about the old task

Page 49: Neural_Programmer_Interpreter

Conclusion(2/2)

• provide far fewer examples, but where the labels contains richer information allowing the model to learn compositional structure(It’s like sending kids to school)

Page 50: Neural_Programmer_Interpreter

Further Discussion

• Can each task help each other during training?

• Can we share environment encoder?

• Any comments?

project page: http://www-personal.umich.edu/~reedscot/iclr_project.html