Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori...

Adaptive Neural TreesRyutaro Tanno, Kai Arulkumaran,

Daniel C. Alexander, Antonio Criminisi, Aditya Nori

#82

WaterGrey matter

White Matter

Two Paradigms of Machine LearningDeep Neural Networks Decision Trees

『hierarchical representation of data』『hierarchical clustering of data』Super-resolution of dMR brain images with a DT [Alexander et al. NeuroImage 2017]

ImageNet classifiers with CNNs[Zeiler and Fergus, ECCV 2014]

Low-level features

Mid-level features

High-level features

Trainable Classifier

Oriented edges & colours

Textures & patterns Object parts

Two Paradigms of Machine LearningDeep Neural Networks Decision Trees

『hierarchical representation of data』『hierarchical clustering of data』

Two Paradigms of Machine Learning

+ learn features of data

+ scalable learning with stochastic optimisation

- architectures are hand-designed

- heavy-weight inference, engaging every parameter of the model for each input

Deep Neural Networks Decision Trees


Two Paradigms of Machine Learning



- architectures are hand-designed

- heavy-weight inference, engaging every parameter of the model for each input

Deep Neural Networks

- operate on hand-designed features

- limited expressivity with simple splitting functions

+ architectures are learned from data

+ lightweight inference, activating only a fraction of the model per input

Decision Trees


Joining the Paradigms



+ architectures are learned from data

+ lightweight inference, activating only a fraction of the model per input


Adaptive Neural Trees

ANTs unify the two paradigms and generalise previous work

•ANTs consist of two key designs:

What are ANTs?


input, x

(1). DTs which uses NNs in every path and routing decisions.

What are ANTs?


(2). DT-like architecture growth using SGD


(a) Split (b) DeepenTarget Node

OR

What are ANTs?

What are ANTs?•ANTs consist of two key designs:

(2). DT-like architecture growth using SGD


Conditional Computation

0

0.9

1.8

ANT

1

ANT

2

ANT

3 0

5

10

ANT

1

ANT

2

ANT

3 0

0.8

1.6

ANT

0K

51K

101K

ANT

1

ANT

2

ANT

3

Multi-path inferenceSingle-path inference

0M

0.65M

1.3M

ANT

1

ANT

2

ANT

3

0K

50K

100K

ANT

ErrorsMNIST

(%)

Model size drops!

• Single-path inference enables efficient inference without compromising accuracy.

Number of ParametersCIFAR10

(%)SARCOS

(mse)MNIST CIFAR10 SARCOS

Adaptive Model Complexity

Models are trained on subsets of size 50, 250, 500, 2.5k, 5k, 25k, 45k examples.

• ANTs can tune the architecture to the availability of training data.

Please come & see me at poster #82 for details!

Unsupervised Hierarchical Clustering

Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori...

Documents

Transcript of Adaptive Neural Trees12... · 2019-06-07 · Daniel C. Alexander, Antonio Criminisi, Aditya Nori...