Advanced Topics in Equivariance: Lecture 3 Machine ... · Machine Learning Summer School, Indonesia...
Transcript of Advanced Topics in Equivariance: Lecture 3 Machine ... · Machine Learning Summer School, Indonesia...
MLSS INDO August 2020Daniel Worrall 1
Advanced Topics in Equivariance: Lecture 3 Machine Learning Summer School, Indonesia 2020
Daniel Worrall
Legong Dance
MLSS INDO August 2020Daniel Worrall 2
This lecture: EquivarianceGrey Box ModelsConvolutionsInvariance and EquivarianceGroup TheoryConvolution Generalizations
Group Convolutions, Steerable Convolutions, Semigroup Correlation, Gauge Convolution
MLSS INDO August 2020Daniel Worrall
CNNs are responsible for many successes
3
ImageNet [figure from Wikipedia] Dilated Residual Networks, Yu et al. (2017)
MLSS INDO August 2020Daniel Worrall
Inductive biases and overfitting
4
Remove degrees of freedom that do not aid generalization error.
• What assumptions can we make?• How we encode these assumptions?• What framework do we use?
Space of modellable functions
Constrained space
True function
F1F2
F3
F
MLSS INDO August 2020Daniel Worrall
Polynomial example
5
Original quintic polynomial
variance
bias2
bias2 + variance
Bias—variance decomposition
Too flexible
Space of modellable functions
Constrained space
True function
F1F2
F3
Fp(w|⌧ ) = N (w|0, 102I)Unit Gaussian
wMAP = argmaxw
p(w|D, ⌧ )
p(w|D,�2, ⌧ ) =p(D|w,�2)p(w|⌧ )
p(D|⌧ )
Polynomial fitting to a quintic. What order do we use?
y = w0 + w1x...+ w5x5 + ✏
Try multiple orders
� log p(D|⌧ )- evidence
- log-lik. MAP� log p(D|wMAP)
Log-likelihood and evidence
Too rigid
MLSS INDO August 2020Daniel Worrall
Inductive Biases
6
Black box modelsWhite box models Grey box modelsMost ML models sit here.
Regression • Polynomial order• Likelihood and prior distribution
Classification • Sigmoid function• Linear features• ML or MAP?
Deep learning • Number of layers• Nonlinearities• Optimizer• Structure: Convolutions • …
MLSS INDO August 2020Daniel Worrall 7
EquivarianceBatik making
MLSS INDO August 2020Daniel Worrall
Convolutions
8
For images we use convolutions.
Convolutions use:1. Translational weight-tying
2. Small reception fields
Drop-in replacement for the linear layer.
Indeed, it is just a linear layer!
e.g. LeCun et al. (1991)
Standard convolution
[F ⇤W](x) =X
y2Z2
F(y)W(y � x)
Filter Feature map
x �(·)
b1
+
W1
Layer
z1⇤
Convolutions can be “reshaped” into matrix-vector products, revealing the weight-tying and locality
Locality
Weight-tying
MLSS INDO August 2020Daniel Worrall
Convolutions are special
9
https://www.healthypawspetinsurance.com/Images/V3/DogAndPuppyInsurance/Dog_CTA_Desktop_HeroImage.jpghttp://www.gatsby.ucl.ac.uk/~turner/teaching/4g3/2013/stats-recep-field.pdf
Fewer learnable parameters
Parameter efficient: same function — fewer parametersStatistically efficient: need less dataComputationally efficient: need less GPU time
Why learn translational weight-tying when we can build it in?
What about rotation, scaling, warps, shears, color jitter, blur, …?
Locality of pixel statistics Property of the data
Locality
Translational invariance of classification Property of the taskWeight-tying
MLSS INDO August 2020Daniel Worrall
Convolutional variants
10
Group convolution
Regular-representation conv.
Steerable convolution
Semigroup correlation
Gauge convolution
Unitary convolution
Tangent space convolutions
•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network
•Deep Scale Space
•Icosahedral CNN•Gauge Equivariant Mesh Convolution
•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution
• Learning to convolve
•LieConv
MLSS INDO August 2020Daniel Worrall
Symmetry: Invariance
11
Classification Signal discovery/detection
Symmetry is a property of tasks, not of data!
f(I) = f(T✓[I])
Set of input transformations leaving invariantS✓[f ](I) = f(T✓[I])
Transformation
Image
Function/feature mapping
Disentangling (cocktail party)Dynamical Systems
T✓[I](x) = I(x� ✓)
T✓[I](x) = I(R�1✓ x)
T [I] = (I� µ)/.��1
Translation
Rotation
Data whitening
Notation
MLSS INDO August 2020Daniel Worrall
Symmetry: Equivariance
12
Convolution (and correlation)
[I ⇤W](x� ✓)| {z }S✓ [I⇤W](x)
= [T✓[I] ⇤W](x)
S✓ = IdInvariance
S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])
Different representations of same transformation
Transformation in feature space
Theorem [Kondor & Trivedi (2018), Cohen et al. (2020a), Bekkers (2020)]: Group convolution is the only linear map* equivariant to group-structured** transformations
*Linear, bounded map between measurable homogeneous spaces **Countable groups and Lie groups
Feature space
Input space
f(Tg[I])= S✓[f(I)]
f(Tg[I])= S✓[f(I)]S✓[f ](I) = f(T✓[I])
Commutative diagram
MLSS INDO August 2020Daniel Worrall
S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])
Symmetry: Equivariance
13
S✓[f ](I) = f(T✓[I])S✓[f ](I) = f(T✓[I])
Different representations of same transformation
Transformation in feature space
Commutative diagram
Feature space
Input space
Sophisticated invariance: Equivalence classes preserved
S✓[f ](I) = f(T✓[I])
f(Tg[I])= S✓[f(I)]
f(Tg[I])= S✓[f(I)]
MLSS INDO August 2020Daniel Worrall
Translational Equivariance Proof
14
[I ⇤W](x) =X
y2Z2
I(y)W(y � x)
[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x)
[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x)
Tt[I](x) = I(x� t)Hints
[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x)
Write out convolution in full[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x)
Expand translation operator
[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x)
Change of variablesy0 = y � t
[Tt[I] ⇤W](x) =X
y2Z2
Tt[I](y)W(y � x)
=X
y2Z2
I(y � t)W(y � x)
=X
y02Z2
I(y0)W(y0 � (x� t))
= [I ⇤W](x� t) = Tt[I ⇤W](x) Oh look, it’s a convolution!
MLSS INDO August 2020Daniel Worrall
A Little Group Theory
15
Abstraction
3. Identity: “do nothing” transformatione � g = g � e = g
1. Closure: compositions well-behaved
g � h 2 G
0�
�90�
90� 90� 90�
90�
180� 180�
I
90�Th[I]
135�Tg[Th[I]]
Tg�h[I] 225�
Action/Transformation
(g � h) � k = g � (h � k) = g � h � k2. Associativity: brackets unnecessary
g � g�1 = g�1 � g = e4. Invertibility: transformation reversible
CompositionSet of transformations
Group theoryMathematical abstraction modeling compositional structure.
Can be used to model transformations
MLSS INDO August 2020Daniel Worrall
Convolutional variants: Regular-representation convolutions
16
Group convolution
Regular-representation conv.
Steerable convolution
Semigroup correlation
Gauge convolution
Unitary convolution
Tangent space convolutions
•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network
•Deep Scale Space
•Icosahedral CNN•Gauge Equivariant Mesh Convolution
•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution
• Learning to convolve
•LieConv
MLSS INDO August 2020Daniel Worrall
The Regular Group Convolution
17
Standard convolution
[F ⇤W](x) =X
y2Z2
F(y)W(y � x)
Inner product
Translations
Group convolution
[F ⇤W](x) =X
y2YF(y)W(T �1
x [y])
Inner product
Transformations
Tx[y] = y + x
Hints
Tx[y] = Rxy Tx[y] = Rxy + tx
Translation Rotation Roto-translation
is transformation parameterxAdjusted domain
At each location multiple transformed filter copies
MLSS INDO August 2020Daniel Worrall
Group Transformation Examples
18
Scalings
Translation
Reflections
Roto-translationRotation
Occlusions
Non-example
MLSS INDO August 2020Daniel Worrall
SO(2) Rotation
19
Group convolution
[F ⇤W](x) =X
y2YF(y)W(T �1
x [y])Ac
tivat
ion
spac
e
Input
[TR [F] ⇤W](R✓) = [F ⇤W](R�1 R✓)
Activ
atio
n sp
ace
Input
[F ⇤W](R✓) =X
y2YF(y)W(TR�1
✓[x])
Notice index of convolution
Rotating kernel
MLSS INDO August 2020Daniel Worrall
SE(2) Roto-translation
20
Group convolution
[F ⇤W](x) =X
y2YF(y)W(T �1
x [y])Ac
tivat
ion
spac
e
Input
Rotating + translating kernel
[F ⇤W](R✓,x) =X
y2Z2
F(y)W(R�1✓ (y � x))
Activ
atio
n sp
ace
Input
[TR ,z [F] ⇤W](R✓,x) = [F ⇤W](R✓� ,R�1 (x� z))
MLSS INDO August 2020Daniel Worrall
Sample efficiency
21
Bekkers et al., 2018
MLSS INDO August 2020Daniel Worrall
Sample efficiency
22
RL sample hungry: exploit symmetries
Transformation equivariant policies via MDP homomorphism theory
Cartpole
GridWorld
Pong
Van der Pol et al. (2020)
MLSS INDO August 2020Daniel Worrall
Convolutions are special
23
Group convolution
Regular-representation conv.
Steerable convolution
Semigroup correlation
Gauge convolution
Unitary convolution
Tangent space convolutions
•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network
•Deep Scale Space
•Icosahedral CNN•Gauge Equivariant Mesh Convolution
•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution
• Learning to convolve
•LieConv
MLSS INDO August 2020Daniel Worrall
Z ⇡
�⇡F (�� ✓)e�◆m� d� =
Z ⇡�✓
�⇡�✓F (�0)e�◆m(�0+✓) d�0
= e�◆m✓
Z ⇡
�⇡F (�0)e�◆m�0
d�0
Fourier Shift Theorem m 2 Z
◆ =p�1
Steerable Convolutions
24
Can be generalized to other transformation groups
F {T✓[F ]} (m) = e�◆m✓F {F} (m) = S✓[F ] {F} (m)
F {F} (m) =
Z ⇡
�⇡F (�)e�◆m� d�
Another equivariant mapping is the Fourier transform
EquivariantInvariant
K KK KK KK KK KWm(r;�) = R(r)e◆(m�+�)
Learnable Learnable
Fixed
Inner product!
Z ⇡
�⇡F (�� ✓)e�◆m� d� =
Z ⇡�✓
�⇡�✓F (�0)e�◆m(�0+✓) d�0
= e�◆m✓
Z ⇡
�⇡F (�0)e�◆m�0
d�0
Z ⇡
�⇡F (�� ✓)e�◆m� d� =
Z ⇡�✓
�⇡�✓F (�0)e�◆m(�0+✓) d�0
= e�◆m✓
Z ⇡
�⇡F (�0)e�◆m�0
d�0
MLSS INDO August 2020Daniel Worrall
Lack of Rotation Equivariance
25
MLSS INDO August 2020Daniel Worrall
Steerable Equivariance
26
MLSS INDO August 2020Daniel Worrall
Comparison
27
MLSS INDO August 2020Daniel Worrall
Convolutional variants: Semigroup Correlations
28
Group convolution
Regular-representation conv.
Steerable convolution
Semigroup correlation
Gauge convolution
Unitary convolution
Tangent space convolutions
•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network
•Deep Scale Space
•Icosahedral CNN•Gauge Equivariant Mesh Convolution
•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution
• Learning to convolve
•LieConv
MLSS INDO August 2020Daniel Worrall
Semigroups: How not to rescale
29
Original Subsample Bandlimit + subsamplex2
MLSS INDO August 2020Daniel Worrall
Semigroups: How not to rescale
30
Bandlimit + subsamplex4
Original Subsample
MLSS INDO August 2020Daniel Worrall
Semigroups: How not to rescale
31
Bandlimit + subsampleOriginal Subsamplex4
MLSS INDO August 2020Daniel Worrall
Semigroups: How to rescale
32
Subsample
Bandlimit
s =1
8
does not exist!!!!T �1s [F](x)
Blur appropriately Subsample
Ts[F](x) = Blurs[F](s�1x)
Abstraction1. Closure: compositions well-behaved
g � h 2 G
90� 90� 90�
180� 180�
I
90�Th[I]
135�Tg[Th[I]]
Tg�h[I] 225�
Action/Transformation
(g � h) � k = g � (h � k) = g � h � k2. Associativity: brackets unnecessary
3. Identity: “do nothing” transformatione � g = g � e = g
0�
�90�
90�
g � g�1 = g�1 � g = e4. Invertibility: transformation reversible
MLSS INDO August 2020Daniel Worrall
The Semigroup Correlation
33
[I ⇤W](x) =X
y2YI(y)W(Tx�1[y])
e.g. Cohen & Welling (2016)
Group convolution
2) Use non-invertible transformations
e.g. Worrall & Welling (2019)
Semigroup correlation [I ?W](x) =X
y2YTx[I](y)W(y)
[I ⇤W](x) =X
y2YI(y)Tx�1[W](y)Unitary Group Convolution
e.g. Diaconu & Worrall (2019)
1) Rescaling ≠ shuffling pixel locations
MLSS INDO August 2020Daniel Worrall
The Semigroup Correlation
34
blur dilated convolution
[I ?W](k, t) =X
y2Z2
[G2k ⇤Z2 I](2ky + t)W(y)
[I ?W](k, t) =X
y2Z2
[G2k ⇤Z2 I](2ky + t)W(y)
[I ?W](k, t) =X
y2Z2
[G2k ⇤Z2 I](2ky + t)W(y)
k = 1
k = 0
k = 2
k = 3 [I ?W](k, t) =X
y2Z2
[G2k ⇤Z2 I](2ky + t)W(y)
Lift
Deep scale-spaceScale-space
MLSS INDO August 2020Daniel Worrall
Convolutional variants: Gauge Equivariance
35
Group convolution
Regular-representation conv.
Steerable convolution
Semigroup correlation
Gauge convolution
Unitary convolution
Tangent space convolutions
•Harmonic network•Spherical convolution•Steerable SE(3) network•Tensor Field Network
•Deep Scale Space
•Icosahedral CNN•Gauge Equivariant Mesh Convolution
•Translational convolution•Group CNN•HexaConv•CubeNet•Scale Convolution
• Learning to convolve
•LieConv
MLSS INDO August 2020Daniel Worrall
Gauge Convolutions
36
An alternative route to equivariance is to consider local gauge transformations
vv(1) = v(1)1 e(1)1 + v(1)2 e(1)2
v(2) = v(2)1 e(2)1 + v(2)2 e(2)2
v(1) = v(1)1 e(1)1 + v(1)2 e(1)2
v(2) = v(2)1 e(2)1 + v(2)2 e(2)2
v(1) = v(1)1 e(1)1 + v(1)2 e(1)2
v(2) = v(2)1 e(2)1 + v(2)2 e(2)2
v(1) = v(1)1 e(1)1 + v(1)2 e(1)2
v(2) = v(2)1 e(2)1 + v(2)2 e(2)2
Take inner product between signal and filter represented in all frame orientations
A frame is another name for coordinates. How to make frame independent?
v(1)1 e(1)1 + v(1)2 e(1)2 = v(2)1 e(2)1 + v(2)2 e(2)2
Like a group convolution, but…:• Defined locally• Can work on manifolds Cohen et al. (2019)
Need to map signal manifold → tangent space
manifold → tangent space
Local group convolution, with curvature correction due to manifold
MLSS INDO August 2020Daniel Worrall
Convolutional variants
37
Group convolutionSteerable convolution
Semigroup correlation Gauge convolution
Regular-representation conv.
MLSS INDO August 2020Daniel Worrall 38
Convolutions
Powerful inductive bias Linear layers + symmetry Fewer learnable parameters Many variants
This lecture: EquivarianceGrey Box ModelsConvolutionsInvariance and EquivarianceGroup TheoryConvolution Generalizations
Group Convolutions, Steerable Convolutions, Semigroup Correlation, Gauge Convolution
MLSS INDO August 2020Daniel Worrall
References
39
TheoryGroup CNNs - Cohen & Welling (2016)Steerable CNNs - Cohen & Welling (2017)On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups - Kondor & Trivedi (2018)A General Theory of Equivariant CNNs on Homogeneous Spaces - Cohen et al. (2019)Gauge Equivariance Convolutional Networks - Cohen*, Weiler*, Kicanaoglu* et al. (2019)Deep Scale-spaces: Equivariance Over Scale - Worrall & Welling (2019)
E(2): Discrete subgroupsGroup CNNs - Cohen & Welling (2016)Exploiting cyclic symmetry in convolutional neural networks - Dieleman et al. (2016)Steerable CNNs - Cohen & Welling (2017)HexaConv - Hoogeboom*, Peters* et al. (2018)
Z2 - TranslationHandwritten digit recognition with a Back-Propagation network - Lecun et al. (1990)
SE(2): Discrete subgroupsLearning steerable filters for rotation equivariant CNNs - Weiler et al. (2018)Oriented response networks - Zhou et al. (2017)Roto-translation covariant convolutions networks for medical image analysis - Bekkers et al. (2018)Rotation equivariant vector field networks - Marcos et al. (2017)
SE(2): Full groupHarmonic Networks: Deep translation and rotation equivariance - Worrall et al. (2017)
SE(2): All subgroupsGeneral E(2)-Equivariant Steerable CNNs - Weiler & Cesa (2019)
MLSS INDO August 2020Daniel Worrall
References
40
E(3): Discrete subgroups3D G-CNNs for pulmonary nodule detection - Winkels & Cohen (2018)
SE(3): Discrete subgroupsCubeNet: Equivariance to 3D rotation and translation - Worrall & Brostow (2018)
SE(3): Full groupN-body networks: a covariant hierarchical neural network architecture for learning atomic potentials - Risi (2018)Tensor Field networks: Rotation- and translation-equivariant neural networks for 3D point clouds - Thomas, Smidt* et al. (2018)3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data - Weiler*, Geiger* et al. (2018)Clebsch-Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Networks - Kondor et al. (2018)Cormorant: Covariant molecular neural networks - Anderson et al. (2019)
SO(3) on the sphere: discrete subgroupSpherical CNNs - Cohen*, Geiger*, Köhler et al. (2018)
SO(3) on the sphere: isotropic subgroupLearning SO(3) Equivariant Representations with Spherical CNNs - Esteves et al. (2018)DeepSphere: Efficient spherical Convolutions Neural Networks with HEALPix samplings for cosmological applications - Perraudin et al. (2019)
SO(3) on the sphere: full groupSpherical CNNs on unstructured grids - Jiang et al. (2019)
MLSS INDO August 2020Daniel Worrall
References
41
Scale semigroupDeep Scale-spaces: Equivariance Over Scale - Worrall & Welling (2019)
Scale groupScale equivariance in CNNs with vector fields - Marcos et al (2018)Scale steerable filters for locally scale-invariant neural networks - Ghosh & Gupta (2019)Scale-equivariant steerable networks - Sosnovik & Smeulders (2020)
Lie groups: B-Spline CNNs on Lie Groups - Bekkers (2019)Generalizing Convolutional Neural Networks for Equivariance to Lie groups on Arbitrary Continuous Data - Finzi et al. (2020)
PDEsPDE-based Group Equivariant Convolutional Neural Networks - Smets et al. (2020)Incorporating Symmetry into Deep Dynamics Models for Improved Generalization - Wang*, Walters* et al. (2020)
AttentionAffine Self Convolution - Diaconu & Worrall, (2019)Attentive Group Equivariant Convolution Networks - Romero et al., (2020)SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks - Fuchs et al. (2020)
Reverse-complement symmetryAn equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs - Brown & Lunter (2019)
Reinforcement LearningMDP Homomorphic Networks: Group Symmetries in Reinforcement Learning - van der Pol et al. (2020)
MLSS INDO August 2020Daniel Worrall 42
AppendixWayang Kulit Show
MLSS INDO August 2020Daniel Worrall
Group Equivariance Proof
43
HintsHomomorphism property Inversion Composition
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
[Tz[F] ⇤W](x) =X
y2YTz[F](y)W(T �1
x [y])
=X
y2YF(Tz�1 [y])W(Tx�1 [y])
=X
y02YF(y0)W(T �1
x [Tz[y0]])
=X
y02YF(y0)W(Tx�1z[y
0])
=X
y02YF(y0)W(T �1
z�1x[y0]) = [F ⇤W](z�1x) = Tz[F ⇤W](x)
Write out convolution in full
Expand transformation operator
Change of variables
Rewriting: use the hints
y0 = Tz�1 [y] () y = Tz[y0]
Tz�1 [y] = T �1z [y] (xy)�1 = y�1x�1 TxTy = Txy
It’s a convolution!
MLSS INDO August 2020Daniel Worrall
Semigroup Equivariance Proof
44
Write out convolution in full
Semigroup composition
It’s a convolution!
[Tz[F] ?W](x) =X
y2YTx[Tz[F]](y)W(y)
=X
y2YTxz[F](y)W(y)
= [F ?W](xz) = Tz[F ?W](x) Define this as left action on activations