Deep Continuous Networks

ICML 2021

Deep Continuous NetworksNergis Tömen, Silvia L. Pintea, Jan van Gemert

[email protected]

Computer Vision Lab, TU Delft

Kar et al., Nature Neuroscience, 2019 [9] Ecker et al., ICLR, 2019 [6] Lindsey et al., ICLR, 2019 [11]

Motivation

1 Deep Continuous Networks Motivation

Conventional feed-forward CNNs

• Spatially discrete: use discretized, typically 3 × 3 kernels

• cannot learn the receptive field size during training

• Temporally discrete: use discrete, sequential layers

• cannot model the continuous evolution of neuronal activations

Motivation


We present Deep Continuous Networks (DCNs):

• spatially continuous filter descriptions

• can learn the kernel size and receptive field size during training

• depthwise continuous evolution of feature maps

• can model temporal dynamics of neuronal activations in response to images

Deep Continuous Networks


• Spatially continuous filters: weighted sum of basis functions

• Gaussian N-jet basis [7, 10, 8] with trainable scale (f) parameter

FαFα

Spatially Continuous Filters

4 Deep Continuous Networks DCN model

• Continuous evolution of neural activations via neural ODEs [4, 12] with continuous depth C

Chen et al., NeurIPS, 2018 [4]

Depthwise Continuous Layers


• DCN Architecture: Cascade of continuous ODE Blocks

• ODE Block:

• Convolutional filters: Gaussian N-jet

• Feature maps: Computed by an ODE solver

• Baseline models

• ODE-Net (spatially discrete)

• ResNet-blocks (spatially and depthwise discrete)

Input Image (32x32x3)

kxk conv, 32

ODE Block 1, 32

Downsampling block 1, 64

ODE Block 2, 64

ODE Block 3, 128

fc, 10

Global Average Pooling

Downsampling block 2, 128

...

ODE Block 3, 128

Upscaling block 1, 64

ODE Block 4, 64

ODE Block 5, 32

1x1 conv, 3

Upscaling block 2, 32

Enco

der (

Cla

ssifi

catio

n)

Dec

oder

(Rec

onst

ruct

ion)

Deep Continuous Networks


DCNs are parameter efficient due to the structured filter definitions.

ModelContinuity

Accuracy (%) ParametersSpatial Temporal

ODE-Net x X 89.6 ± 0.3 560KResNet-blocks x x 89.0 ± 0.2 555KResNet-SRF-blocks X x 88.3 ± 0.03 426KResNet-SRF-full X x 89.3 ± 0.4 323K

DCN-ODE X X 89.5 ± 0.2 429KDCN-full X X 89.2 ± 0.3 326KDCN f 98 X X 89.7 ± 0.3 472K

CIFAR-10 Classification

7 Deep Continuous Networks Results

DCNs are data efficient.

Model # images per class

2 4 8 16 32 52 64 103 128 512 1024

ResNet34† 17.5±2.5 19.5±1.4 23.3±1.6 28.3±1.4 33.2±1.2 – 41.7±1.1 – 49.1±1.3 – –

CNTK† 18.8±2.1 21.3±1.9 25.5±1.9 30.5±1.2 36.6±0.9 – 42.6±0.7 – 48.9±0.7 – –

ResNet-blocks 16.7±0.8 19.6±1.0 22.0±1.3 28.1±1.7 35.4±0.9 39.8±0.6 41.6±1.5 49.0±0.2 50.9±0.6 70.4±1.2 76.8±0.7

ODE-Net 16.8±2.8 20.5±0.8 23.1±2.5 29.8±0.8 36.4±1.0 41.7±1.2 42.3±0.2 48.6±0.5 50.7±0.7 71.7±1.5 77.4±0.5

DCN-ODE 16.4±1.6 19.8±0.7 26.5±0.9 31.2±0.6 37.7±0.6 44.5±0.8 48.0±1.3 54.2±0.8 58.2±0.7 75.5±0.8 79.7±0.3

Baseline results † from [2].

CIFAR-10 Classification: Small-data regime


DCNs allow for meta-parametrization of filters as a function of depth C .

Model Parametrization Accuracy (%)

DCN-ODE f, U 89.46 ± 0.16

DCN f (C) f = 20C+1 , U 89.97 ± 0.30

DCN f (C2) f = 20C2+1C+2 , U 89.93 ± 0.28

DCN f (C) , U(C) f = 20BC+1B , U = 0UC + 1U 89.88 ± 0.25

CIFAR-10 Classification: Filter meta-parametrization


f distributions after training are consistent with biological observations [13].

block

1: con

v1

block

1: con

v2

block

2: con

v1

block

2: con

v2

block

3: con

v1

block

3: con

v20.4

0.6

0.8

1.0

1.2

1.4

1.6

Lear

ned

scal

e σ

DCN-ODEDCN-fullResNet-SRF-blocksResNet-SRF-full

0 2 4Learned scale σ

0

500

1000

1500

2000

2500

3000

3500

# of

filte

rs

block 1: conv1block 1: conv2block 2: conv1block 2: conv2block 3: conv1block 3: conv2

Learning the receptive field size


Similar to rate-based, continuous-receptive-field population models of biological vision [3, 1, 5],

DCNs display emergent pattern completion.

input image masked image

him(0) him_masked(0)

0.2

0.4

0.6 D(t)

him(T) him_masked(T)

0 2integration time t

0.5

1.0

D(t)DCN

basel

ine

mask 4x

4

mask 6x

6

mask 8x

8

mask 10

x10

mask 12

x12

70

80

90

Valid

atio

n Ac

cura

cy [%

]

DCNResNet-blocksODE-Net

Pattern Completion


• ODE-based dynamics are sensitive to con-

trast changes.

• Contrast robustness can be improved by

scaling the numerical ODE integration time

proportionately to input contrast at test time.

• Contrast-robust networks can cut down the

computational cost.

20

40

60

80

Valid

atio

n Ac

cura

cy [%

]

ODE-Net (robust)DCN (robust)DCN:3 (robust)ODE-Net (naive)DCN (naive)DCN:3 (naive)

0

50

NFE

ODE block 1

0.001 0.01 0.1 1Contrast Level c

100200300

Tota

l NFE All ODE blocks

Efficient ODE computation via contrast robustness


We present spatially and depthwise-continuous DCN models:

• Data efficient

• Learn biologically plausible RF sizes

• Links to biological models from computational neuroscience

• Computational efficiency via contrast-robustness

Conclusion


Thanks!

Please see the paper for more!

[1] Shunichi Amari. “Dynamics of pattern formation in lateral-inhibition type neural fields”. In: Biological Cybernetics 27.2 (1977),pp. 77–87.

[2] Sanjeev Arora et al. “Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks”. In: International Conference onLearning Representations (ICLR). 2020.

[3] Paul C Bressloff et al. “Geometric visual hallucinations, Euclidean symmetry and the functional architecture of striate cortex”.In: Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 356.1407 (2001), pp. 299–330.

[4] Ricky T. Q. Chen et al. “Neural Ordinary Differential Equations”. In: NeurIPS (2018).[5] Stephen Coombes. “Waves, bumps, and patterns in neural field theories”. In: Biological Cybernetics 93.2 (2005), pp. 91–108.[6] Alexander S Ecker et al. “A rotation-equivariant convolutional neural network model of primary visual cortex”. In: International

Conference on Learning Representations (ICLR). 2019.[7] Luc Florack et al. “The Gaussian scale-space paradigm and the multiscale local jet”. In: IJCV 18.1 (1996), pp. 61–75.[8] Jorn-Henrik Jacobsen et al. “Structured receptive fields in CNNs”. In: CVPR. 2016, pp. 2610–2619.[9] Kohitij Kar et al. “Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition be-

havior”. In: Nature Neuroscience 22.6 (2019), pp. 974–983.[10] Tony Lindeberg. Scale-space theory in computer vision. Vol. 256. Springer Science & Business Media, 2013.[11] Jack Lindsey et al. “A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained

Deep CNNs”. In: International Conference on Learning Representations (ICLR). 2019.[12] Lars Ruthotto and Eldad Haber. “Deep neural networks motivated by partial differential equations”. In: Journal ofMathematical

Imaging and Vision (2019), pp. 1–13.[13] Hsin-Hao Yu et al. “Spatial and temporal frequency tuning in striate cortex: functional uniformity and specializations related to

receptive field eccentricity”. In: European Journal of Neuroscience 31.6 (2010), pp. 1043–1062.

References

15 Deep Continuous Networks References

Deep Continuous Networks

Documents

Transcript of Deep Continuous Networks