Advanced Topics in Equivariance: Lecture 3 Machine ... · Machine Learning Summer School, Indonesia...

MLSS INDO August 2020Daniel Worrall 1

Advanced Topics in Equivariance: Lecture 3 Machine Learning Summer School, Indonesia 2020

Daniel Worrall

Legong Dance


This lecture: EquivarianceGrey Box ModelsConvolutionsInvariance and EquivarianceGroup TheoryConvolution Generalizations

Group Convolutions, Steerable Convolutions, Semigroup Correlation, Gauge Convolution

MLSS INDO August 2020Daniel Worrall

CNNs are responsible for many successes

3

ImageNet [figure from Wikipedia] Dilated Residual Networks, Yu et al. (2017)


Inductive biases and overfitting

4

Remove degrees of freedom that do not aid generalization error.

• What assumptions can we make?• How we encode these assumptions?• What framework do we use?

Space of modellable functions

Constrained space

True function

F1F2

F3

F


Polynomial example

5

Original quintic polynomial

variance

bias2

bias2 + variance

Bias—variance decomposition

Too flexible

Space of modellable functions

Constrained space

True function

F1F2

F3

Fp(w|⌧ ) = N (w|0, 102I)Unit Gaussian

wMAP = argmaxw

p(w|D, ⌧ )

p(w|D,�2, ⌧ ) =p(D|w,�2)p(w|⌧ )

p(D|⌧ )

Polynomial fitting to a quintic. What order do we use?

y = w0 + w1x...+ w5x5 + ✏

Try multiple orders

� log p(D|⌧ )- evidence

- log-lik. MAP� log p(D|wMAP)

Log-likelihood and evidence

Too rigid


Inductive Biases

6

Black box modelsWhite box models Grey box modelsMost ML models sit here.

Regression • Polynomial order• Likelihood and prior distribution

Classification • Sigmoid function• Linear features• ML or MAP?

Deep learning • Number of layers• Nonlinearities• Optimizer• Structure: Convolutions • …


EquivarianceBatik making


Convolutions

8

For images we use convolutions.

Convolutions use:1. Translational weight-tying

2. Small reception fields

Drop-in replacement for the linear layer.

Indeed, it is just a linear layer!

e.g. LeCun et al. (1991)

Standard convolution

[F ⇤W](x) =X

y2Z2

F(y)W(y � x)

Filter Feature map

x �(·)

b1

+

W1

Layer

z1⇤

Convolutions can be “reshaped” into matrix-vector products, revealing the weight-tying and locality

Locality

Weight-tying


Convolutions are special

9

https://www.healthypawspetinsurance.com/Images/V3/DogAndPuppyInsurance/Dog_CTA_Desktop_HeroImage.jpghttp://www.gatsby.ucl.ac.uk/~turner/teaching/4g3/2013/stats-recep-field.pdf

Fewer learnable parameters

Parameter efficient: same function — fewer parametersStatistically efficient: need less dataComputationally efficient: need less GPU time

Why learn translational weight-tying when we can build it in?

What about rotation, scaling, warps, shears, color jitter, blur, …?

Locality of pixel statistics Property of the data

Locality

Translational invariance of classification Property of the taskWeight-tying


Convolutional variants

10

Group convolution

Regular-representation conv.

Steerable convolution

Semigroup correlation

Gauge convolution

Unitary convolution

Tangent space convolutions

•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network

•Deep Scale Space

•Icosahedral CNN•Gauge Equivariant Mesh Convolution

•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution

• Learning to convolve

•LieConv


Symmetry: Invariance

11

Classification Signal discovery/detection

Symmetry is a property of tasks, not of data!

f(I) = f(T✓[I])

Set of input transformations leaving invariantS✓[f ](I) = f(T✓[I])

Transformation

Image

Function/feature mapping

Disentangling (cocktail party)Dynamical Systems

T✓[I](x) = I(x� ✓)

T✓[I](x) = I(R�1✓ x)

T [I] = (I� µ)/.��1

Translation

Rotation

Data whitening

Notation


Symmetry: Equivariance

12

Convolution (and correlation)

[I ⇤W](x� ✓)| {z }S✓ [I⇤W](x)

= [T✓[I] ⇤W](x)

S✓ = IdInvariance

S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])

Different representations of same transformation

Transformation in feature space

Theorem [Kondor & Trivedi (2018), Cohen et al. (2020a), Bekkers (2020)]: Group convolution is the only linear map* equivariant to group-structured** transformations

*Linear, bounded map between measurable homogeneous spaces **Countable groups and Lie groups

Feature space

Input space

f(Tg[I])= S✓[f(I)]

f(Tg[I])= S✓[f(I)]S✓[f ](I) = f(T✓[I])

Commutative diagram


S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])

Symmetry: Equivariance

13

S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])

Different representations of same transformation

Transformation in feature space

Commutative diagram

Feature space

Input space

Sophisticated invariance: Equivalence classes preserved

S✓[f ](I) = f(T✓[I])

f(Tg[I])= S✓[f(I)]

f(Tg[I])= S✓[f(I)]


Translational Equivariance Proof

14

[I ⇤W](x) =X

y2Z2

I(y)W(y � x)

[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x)

[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x)

Tt[I](x) = I(x� t)Hints

[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x)

Write out convolution in full[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x)

Expand translation operator

[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x)

Change of variablesy0 = y � t

[Tt[I] ⇤W](x) =X

y2Z2

Tt[I](y)W(y � x)

=X

y2Z2

I(y � t)W(y � x)

=X

y02Z2

I(y0)W(y0 � (x� t))

= [I ⇤W](x� t) = Tt[I ⇤W](x) Oh look, it’s a convolution!


A Little Group Theory

15

Abstraction

3. Identity: “do nothing” transformatione � g = g � e = g

1. Closure: compositions well-behaved

g � h 2 G

0�

�90�

90� 90� 90�

90�

180� 180�

I

90�Th[I]

135�Tg[Th[I]]

Tg�h[I] 225�

Action/Transformation

(g � h) � k = g � (h � k) = g � h � k2. Associativity: brackets unnecessary

g � g�1 = g�1 � g = e4. Invertibility: transformation reversible

CompositionSet of transformations

Group theoryMathematical abstraction modeling compositional structure.

Can be used to model transformations


Convolutional variants: Regular-representation convolutions

16

Group convolution




Gauge convolution

Unitary convolution



•Deep Scale Space




•LieConv


The Regular Group Convolution

17

Standard convolution

[F ⇤W](x) =X

y2Z2

F(y)W(y � x)

Inner product

Translations

Group convolution

[F ⇤W](x) =X

y2YF(y)W(T �1

x [y])

Inner product

Transformations

Tx[y] = y + x

Hints

Tx[y] = Rxy Tx[y] = Rxy + tx

Translation Rotation Roto-translation

is transformation parameterxAdjusted domain

At each location multiple transformed filter copies


Group Transformation Examples

18

Scalings

Translation

Reflections

Roto-translationRotation

Occlusions

Non-example


SO(2) Rotation

19

Group convolution

[F ⇤W](x) =X

y2YF(y)W(T �1

x [y])Ac

tivat

ion

spac

e

Input

[TR [F] ⇤W](R✓) = [F ⇤W](R�1 R✓)

Activ

atio

n sp

ace

Input

[F ⇤W](R✓) =X

y2YF(y)W(TR�1

✓[x])

Notice index of convolution

Rotating kernel


SE(2) Roto-translation

20

Group convolution

[F ⇤W](x) =X

y2YF(y)W(T �1

x [y])Ac

tivat

ion

spac

e

Input

Rotating + translating kernel

[F ⇤W](R✓,x) =X

y2Z2

F(y)W(R�1✓ (y � x))

Activ

atio

n sp

ace

Input

[TR ,z [F] ⇤W](R✓,x) = [F ⇤W](R✓� ,R�1 (x� z))


Sample efficiency

21

Bekkers et al., 2018


Sample efficiency

22

RL sample hungry: exploit symmetries

Transformation equivariant policies via MDP homomorphism theory

Cartpole

GridWorld

Pong

Van der Pol et al. (2020)


Convolutions are special

23

Group convolution




Gauge convolution

Unitary convolution



•Deep Scale Space




•LieConv


Z ⇡

�⇡F (�� ✓)e�◆m� d� =

Z ⇡�✓

�⇡�✓F (�0)e�◆m(�0+✓) d�0

= e�◆m✓

Z ⇡

�⇡F (�0)e�◆m�0

d�0

Fourier Shift Theorem m 2 Z

◆ =p�1

Steerable Convolutions

24

Can be generalized to other transformation groups

F {T✓[F ]} (m) = e�◆m✓F {F} (m) = S✓[F ] {F} (m)

F {F} (m) =

Z ⇡

�⇡F (�)e�◆m� d�

Another equivariant mapping is the Fourier transform

EquivariantInvariant

K KK KK KK KK KWm(r;�) = R(r)e◆(m�+�)

Learnable Learnable

Fixed

Inner product!

Z ⇡

�⇡F (�� ✓)e�◆m� d� =

Z ⇡�✓

�⇡�✓F (�0)e�◆m(�0+✓) d�0

= e�◆m✓

Z ⇡

�⇡F (�0)e�◆m�0

d�0

Z ⇡

�⇡F (�� ✓)e�◆m� d� =

Z ⇡�✓

�⇡�✓F (�0)e�◆m(�0+✓) d�0

= e�◆m✓

Z ⇡

�⇡F (�0)e�◆m�0

d�0


Lack of Rotation Equivariance

25


Steerable Equivariance

26


Comparison

27


Convolutional variants: Semigroup Correlations

28

Group convolution




Gauge convolution

Unitary convolution



•Deep Scale Space




•LieConv


Semigroups: How not to rescale

29

Original Subsample Bandlimit + subsamplex2



30

Bandlimit + subsamplex4

Original Subsample



31

Bandlimit + subsampleOriginal Subsamplex4


Semigroups: How to rescale

32

Subsample

Bandlimit

s =1

8

does not exist!!!!T �1s [F](x)

Blur appropriately Subsample

Ts[F](x) = Blurs[F](s�1x)

Abstraction1. Closure: compositions well-behaved

g � h 2 G

90� 90� 90�

180� 180�

I

90�Th[I]

135�Tg[Th[I]]

Tg�h[I] 225�

Action/Transformation

(g � h) � k = g � (h � k) = g � h � k2. Associativity: brackets unnecessary

3. Identity: “do nothing” transformatione � g = g � e = g

0�

�90�

90�

g � g�1 = g�1 � g = e4. Invertibility: transformation reversible


The Semigroup Correlation

33

[I ⇤W](x) =X

y2YI(y)W(Tx�1[y])

e.g. Cohen & Welling (2016)

Group convolution

2) Use non-invertible transformations

e.g. Worrall & Welling (2019)

Semigroup correlation [I ?W](x) =X

y2YTx[I](y)W(y)

[I ⇤W](x) =X

y2YI(y)Tx�1[W](y)Unitary Group Convolution

e.g. Diaconu & Worrall (2019)

1) Rescaling ≠ shuffling pixel locations


The Semigroup Correlation

34

blur dilated convolution

[I ?W](k, t) =X

y2Z2

[G2k ⇤Z2 I](2ky + t)W(y)

[I ?W](k, t) =X

y2Z2

[G2k ⇤Z2 I](2ky + t)W(y)

[I ?W](k, t) =X

y2Z2

[G2k ⇤Z2 I](2ky + t)W(y)

k = 1

k = 0

k = 2

k = 3 [I ?W](k, t) =X

y2Z2

[G2k ⇤Z2 I](2ky + t)W(y)

Lift

Deep scale-spaceScale-space


Convolutional variants: Gauge Equivariance

35

Group convolution




Gauge convolution

Unitary convolution



•Deep Scale Space




•LieConv


Gauge Convolutions

36

An alternative route to equivariance is to consider local gauge transformations

vv(1) = v(1)1 e(1)1 + v(1)2 e(1)2

v(2) = v(2)1 e(2)1 + v(2)2 e(2)2

v(1) = v(1)1 e(1)1 + v(1)2 e(1)2

v(2) = v(2)1 e(2)1 + v(2)2 e(2)2

v(1) = v(1)1 e(1)1 + v(1)2 e(1)2

v(2) = v(2)1 e(2)1 + v(2)2 e(2)2

v(1) = v(1)1 e(1)1 + v(1)2 e(1)2

v(2) = v(2)1 e(2)1 + v(2)2 e(2)2

Take inner product between signal and filter represented in all frame orientations

A frame is another name for coordinates. How to make frame independent?

v(1)1 e(1)1 + v(1)2 e(1)2 = v(2)1 e(2)1 + v(2)2 e(2)2

Like a group convolution, but…:• Defined locally• Can work on manifolds Cohen et al. (2019)

Need to map signal manifold → tangent space

manifold → tangent space

Local group convolution, with curvature correction due to manifold


Convolutional variants

37

Group convolutionSteerable convolution

Semigroup correlation Gauge convolution



Convolutions

Powerful inductive bias Linear layers + symmetry Fewer learnable parameters Many variants

This lecture: EquivarianceGrey Box ModelsConvolutionsInvariance and EquivarianceGroup TheoryConvolution Generalizations

Group Convolutions, Steerable Convolutions, Semigroup Correlation, Gauge Convolution


References

39

TheoryGroup CNNs - Cohen & Welling (2016)Steerable CNNs - Cohen & Welling (2017)On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups - Kondor & Trivedi (2018)A General Theory of Equivariant CNNs on Homogeneous Spaces - Cohen et al. (2019)Gauge Equivariance Convolutional Networks - Cohen*, Weiler*, Kicanaoglu* et al. (2019)Deep Scale-spaces: Equivariance Over Scale - Worrall & Welling (2019)

E(2): Discrete subgroupsGroup CNNs - Cohen & Welling (2016)Exploiting cyclic symmetry in convolutional neural networks - Dieleman et al. (2016)Steerable CNNs - Cohen & Welling (2017)HexaConv - Hoogeboom*, Peters* et al. (2018)

Z2 - TranslationHandwritten digit recognition with a Back-Propagation network - Lecun et al. (1990)

SE(2): Discrete subgroupsLearning steerable filters for rotation equivariant CNNs - Weiler et al. (2018)Oriented response networks - Zhou et al. (2017)Roto-translation covariant convolutions networks for medical image analysis - Bekkers et al. (2018)Rotation equivariant vector field networks - Marcos et al. (2017)

SE(2): Full groupHarmonic Networks: Deep translation and rotation equivariance - Worrall et al. (2017)

SE(2): All subgroupsGeneral E(2)-Equivariant Steerable CNNs - Weiler & Cesa (2019)


References

40

E(3): Discrete subgroups3D G-CNNs for pulmonary nodule detection - Winkels & Cohen (2018)

SE(3): Discrete subgroupsCubeNet: Equivariance to 3D rotation and translation - Worrall & Brostow (2018)

SE(3): Full groupN-body networks: a covariant hierarchical neural network architecture for learning atomic potentials - Risi (2018)Tensor Field networks: Rotation- and translation-equivariant neural networks for 3D point clouds - Thomas, Smidt* et al. (2018)3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data - Weiler*, Geiger* et al. (2018)Clebsch-Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Networks - Kondor et al. (2018)Cormorant: Covariant molecular neural networks - Anderson et al. (2019)

SO(3) on the sphere: discrete subgroupSpherical CNNs - Cohen*, Geiger*, Köhler et al. (2018)

SO(3) on the sphere: isotropic subgroupLearning SO(3) Equivariant Representations with Spherical CNNs - Esteves et al. (2018)DeepSphere: Efficient spherical Convolutions Neural Networks with HEALPix samplings for cosmological applications - Perraudin et al. (2019)

SO(3) on the sphere: full groupSpherical CNNs on unstructured grids - Jiang et al. (2019)


References

41

Scale semigroupDeep Scale-spaces: Equivariance Over Scale - Worrall & Welling (2019)

Scale groupScale equivariance in CNNs with vector fields - Marcos et al (2018)Scale steerable filters for locally scale-invariant neural networks - Ghosh & Gupta (2019)Scale-equivariant steerable networks - Sosnovik & Smeulders (2020)

Lie groups: B-Spline CNNs on Lie Groups - Bekkers (2019)Generalizing Convolutional Neural Networks for Equivariance to Lie groups on Arbitrary Continuous Data - Finzi et al. (2020)

PDEsPDE-based Group Equivariant Convolutional Neural Networks - Smets et al. (2020)Incorporating Symmetry into Deep Dynamics Models for Improved Generalization - Wang*, Walters* et al. (2020)

AttentionAffine Self Convolution - Diaconu & Worrall, (2019)Attentive Group Equivariant Convolution Networks - Romero et al., (2020)SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks - Fuchs et al. (2020)

Reverse-complement symmetryAn equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs - Brown & Lunter (2019)

Reinforcement LearningMDP Homomorphic Networks: Group Symmetries in Reinforcement Learning - van der Pol et al. (2020)


AppendixWayang Kulit Show


Group Equivariance Proof

43

HintsHomomorphism property Inversion Composition

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X

y2YF(Tz�1 [y])W(Tx�1 [y])

=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X

y02YF(y0)W(Tx�1z[y

0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

[Tz[F] ⇤W](x) =X

y2YTz[F](y)W(T �1

x [y])

=X


=X

y02YF(y0)W(T �1

x [Tz[y0]])

=X


0])

=X

y02YF(y0)W(T �1

z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)

Write out convolution in full

Expand transformation operator

Change of variables

Rewriting: use the hints

y0 = Tz�1 [y] () y = Tz[y0]

Tz�1 [y] = T �1z [y] (xy)�1 = y�1x�1 TxTy = Txy

It’s a convolution!


Semigroup Equivariance Proof

44

Write out convolution in full

Semigroup composition

It’s a convolution!

[Tz[F] ?W](x) =X

y2YTx[Tz[F]](y)W(y)

=X

y2YTxz[F](y)W(y)

= [F ?W](xz) = Tz[F ?W](x) Define this as left action on activations

Advanced Topics in Equivariance: Lecture 3 Machine ... · Machine Learning Summer School, Indonesia...

Documents

Transcript of Advanced Topics in Equivariance: Lecture 3 Machine ... · Machine Learning Summer School, Indonesia...