Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori...

14
Adaptive Neural Trees Ryutaro Tanno, Kai Arulkumaran, Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82

Transcript of Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori...

Page 1: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Adaptive Neural TreesRyutaro Tanno, Kai Arulkumaran,

Daniel C. Alexander, Antonio Criminisi, Aditya Nori

#82

Page 2: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

WaterGrey matter

White Matter

Two Paradigms of Machine LearningDeep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017]

ImageNet classifiers with CNNs[Zeiler and Fergus, ECCV 2014]

Low-level features

Mid-level features

High-level features

Trainable Classifier

Oriented edges & colours

Textures & patterns Object parts

Page 3: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Two Paradigms of Machine LearningDeep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

Page 4: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Two Paradigms of Machine Learning

+ learn features of data

+ scalable learning with stochastic optimisation

- architectures are hand-designed

- heavy-weight inference, engaging every parameter of the model for each input

Deep Neural Networks Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

Page 5: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Two Paradigms of Machine Learning

+ learn features of data

+ scalable learning with stochastic optimisation

- architectures are hand-designed

- heavy-weight inference, engaging every parameter of the model for each input

Deep Neural Networks

- operate on hand-designed features

- limited expressivity with simple splitting functions

+ architectures are learned from data

+ lightweight inference, activating only a fraction of the model per input

Decision Trees

『hierarchical representation of data』 『hierarchical clustering of data』

Page 6: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Joining the Paradigms

+ learn features of data

+ scalable learning with stochastic optimisation

+ architectures are learned from data

+ lightweight inference, activating only a fraction of the model per input

『hierarchical representation of data』 『hierarchical clustering of data』

Adaptive Neural Trees

ANTs unify the two paradigms and generalise previous work

Page 7: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Joining the Paradigms

+ learn features of data

+ scalable learning with stochastic optimisation

+ architectures are learned from data

+ lightweight inference, activating only a fraction of the model per input

『hierarchical representation of data』 『hierarchical clustering of data』

Adaptive Neural Trees

ANTs unify the two paradigms and generalise previous work

Page 8: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

•ANTs consist of two key designs:

What are ANTs?

Page 9: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

•ANTs consist of two key designs:

input, x

(1). DTs which uses NNs in every path and routing decisions.

What are ANTs?

Page 10: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

•ANTs consist of two key designs:

(2). DT-like architecture growth using SGD

(1). DTs which uses NNs in every path and routing decisions.

(a) Split (b) DeepenTarget Node

OR

What are ANTs?

Page 11: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

What are ANTs?•ANTs consist of two key designs:

(2). DT-like architecture growth using SGD

(1). DTs which uses NNs in every path and routing decisions.

Page 12: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Conditional Computation

0

0.9

1.8

ANT

1

ANT

2

ANT

3 0

5

10

ANT

1

ANT

2

ANT

3 0

0.8

1.6

ANT

0K

51K

101K

ANT

1

ANT

2

ANT

3

Multi-path inferenceSingle-path inference

0M

0.65M

1.3M

ANT

1

ANT

2

ANT

3

0K

50K

100K

ANT

ErrorsMNIST

(%)

Model size drops!

• Single-path inference enables efficient inference without compromising accuracy.

Number of ParametersCIFAR10

(%)SARCOS

(mse)MNIST CIFAR10 SARCOS

Page 13: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Adaptive Model Complexity

Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.

• ANTs can tune the architecture to the availability of training data.

Page 14: Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori #82. Grey matter Water White Matter Two Paradigms of Machine Learning Deep Neural

Please come & see me at poster #82 for details!

Unsupervised Hierarchical Clustering