Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a...
Transcript of Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a...
![Page 1: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/1.jpg)
Lecture 8 : The Topology of NeuralNets and Deep Learning
May 16, 2019
Gunnar Carlsson
Stanford University
![Page 2: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/2.jpg)
Perceptrons
![Page 3: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/3.jpg)
Perceptrons
I Input variables are initially Boolean, as is Output
I
Output = sign(∑
wixi )
I Given an output function on data, “train” the weights wi
to best approximate a given Boolean function
I Ultimately generalize to include continuous functions aswell as Boolean ones
I Couple together to get networks of perceptrons
![Page 4: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/4.jpg)
Perceptrons
I Input variables are initially Boolean, as is Output
I
Output = sign(∑
wixi )
I Given an output function on data, “train” the weights wi
to best approximate a given Boolean function
I Ultimately generalize to include continuous functions aswell as Boolean ones
I Couple together to get networks of perceptrons
![Page 5: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/5.jpg)
Perceptrons
I Input variables are initially Boolean, as is Output
I
Output = sign(∑
wixi )
I Given an output function on data, “train” the weights wi
to best approximate a given Boolean function
I Ultimately generalize to include continuous functions aswell as Boolean ones
I Couple together to get networks of perceptrons
![Page 6: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/6.jpg)
Perceptrons
I Input variables are initially Boolean, as is Output
I
Output = sign(∑
wixi )
I Given an output function on data, “train” the weights wi
to best approximate a given Boolean function
I Ultimately generalize to include continuous functions aswell as Boolean ones
I Couple together to get networks of perceptrons
![Page 7: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/7.jpg)
Perceptrons
I Input variables are initially Boolean, as is Output
I
Output = sign(∑
wixi )
I Given an output function on data, “train” the weights wi
to best approximate a given Boolean function
I Ultimately generalize to include continuous functions aswell as Boolean ones
I Couple together to get networks of perceptrons
![Page 8: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/8.jpg)
Activation Functions
![Page 9: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/9.jpg)
Neural Networks
I Idea is to hook several perceptrons together to create amore complicated and expressive computational unit
I Pattern can be encoded by a directed graph
I Each node is called a neuron. It has a state, which can beBoolean or continuous
I State is updated based on graph connections
![Page 10: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/10.jpg)
Neural Networks
I Idea is to hook several perceptrons together to create amore complicated and expressive computational unit
I Pattern can be encoded by a directed graph
I Each node is called a neuron. It has a state, which can beBoolean or continuous
I State is updated based on graph connections
![Page 11: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/11.jpg)
Neural Networks
I Idea is to hook several perceptrons together to create amore complicated and expressive computational unit
I Pattern can be encoded by a directed graph
I Each node is called a neuron. It has a state, which can beBoolean or continuous
I State is updated based on graph connections
![Page 12: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/12.jpg)
Neural Networks
I Idea is to hook several perceptrons together to create amore complicated and expressive computational unit
I Pattern can be encoded by a directed graph
I Each node is called a neuron. It has a state, which can beBoolean or continuous
I State is updated based on graph connections
![Page 13: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/13.jpg)
Neural Networks
![Page 14: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/14.jpg)
Neural Networks
I An architecture is a choice of graph structure
I Assignment of weights to edges
I Different architectures for different types of data anddifferent types of problems
I Training is done by performing a stochastic version ofgradient descent on the weights
I Many variable optimization problem
![Page 15: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/15.jpg)
Neural Networks
I An architecture is a choice of graph structure
I Assignment of weights to edges
I Different architectures for different types of data anddifferent types of problems
I Training is done by performing a stochastic version ofgradient descent on the weights
I Many variable optimization problem
![Page 16: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/16.jpg)
Neural Networks
I An architecture is a choice of graph structure
I Assignment of weights to edges
I Different architectures for different types of data anddifferent types of problems
I Training is done by performing a stochastic version ofgradient descent on the weights
I Many variable optimization problem
![Page 17: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/17.jpg)
Neural Networks
I An architecture is a choice of graph structure
I Assignment of weights to edges
I Different architectures for different types of data anddifferent types of problems
I Training is done by performing a stochastic version ofgradient descent on the weights
I Many variable optimization problem
![Page 18: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/18.jpg)
Neural Networks
I An architecture is a choice of graph structure
I Assignment of weights to edges
I Different architectures for different types of data anddifferent types of problems
I Training is done by performing a stochastic version ofgradient descent on the weights
I Many variable optimization problem
![Page 19: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/19.jpg)
Feed Forward Networks
![Page 20: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/20.jpg)
Feed Forward Networks
I Designed to compute composites of simple functions, i.e.of functions computed by individual perceptrons
I Neurons are divided into layers
I Collection of layers are given a total ordering
I There is an edge from neuron A to neuron B if and only ifthe layer containing B immediately follows the layercontaining A.
I One optimizes a fit of a function of this form to givenfunction over the space of all possible weights
![Page 21: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/21.jpg)
Feed Forward Networks
I Designed to compute composites of simple functions, i.e.of functions computed by individual perceptrons
I Neurons are divided into layers
I Collection of layers are given a total ordering
I There is an edge from neuron A to neuron B if and only ifthe layer containing B immediately follows the layercontaining A.
I One optimizes a fit of a function of this form to givenfunction over the space of all possible weights
![Page 22: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/22.jpg)
Feed Forward Networks
I Designed to compute composites of simple functions, i.e.of functions computed by individual perceptrons
I Neurons are divided into layers
I Collection of layers are given a total ordering
I There is an edge from neuron A to neuron B if and only ifthe layer containing B immediately follows the layercontaining A.
I One optimizes a fit of a function of this form to givenfunction over the space of all possible weights
![Page 23: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/23.jpg)
Feed Forward Networks
I Designed to compute composites of simple functions, i.e.of functions computed by individual perceptrons
I Neurons are divided into layers
I Collection of layers are given a total ordering
I There is an edge from neuron A to neuron B if and only ifthe layer containing B immediately follows the layercontaining A.
I One optimizes a fit of a function of this form to givenfunction over the space of all possible weights
![Page 24: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/24.jpg)
Feed Forward Networks
I Designed to compute composites of simple functions, i.e.of functions computed by individual perceptrons
I Neurons are divided into layers
I Collection of layers are given a total ordering
I There is an edge from neuron A to neuron B if and only ifthe layer containing B immediately follows the layercontaining A.
I One optimizes a fit of a function of this form to givenfunction over the space of all possible weights
![Page 25: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/25.jpg)
Hopfield Networks
![Page 26: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/26.jpg)
Hopfield Networks and Boltzmann Machines
I Fully connected, although one may enforce the fact thatsome edges are given zero weight
I Idea is to learn to recognize examples, or a distribution ona set
I The examples are local minima of an energy function
I A dynamical system is created which flows to the minimaof the energy function
I An initial point which is close to one of the exampleswould naturally flow to the corresponding local minimum
![Page 27: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/27.jpg)
Hopfield Networks and Boltzmann Machines
I Fully connected, although one may enforce the fact thatsome edges are given zero weight
I Idea is to learn to recognize examples, or a distribution ona set
I The examples are local minima of an energy function
I A dynamical system is created which flows to the minimaof the energy function
I An initial point which is close to one of the exampleswould naturally flow to the corresponding local minimum
![Page 28: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/28.jpg)
Hopfield Networks and Boltzmann Machines
I Fully connected, although one may enforce the fact thatsome edges are given zero weight
I Idea is to learn to recognize examples, or a distribution ona set
I The examples are local minima of an energy function
I A dynamical system is created which flows to the minimaof the energy function
I An initial point which is close to one of the exampleswould naturally flow to the corresponding local minimum
![Page 29: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/29.jpg)
Hopfield Networks and Boltzmann Machines
I Fully connected, although one may enforce the fact thatsome edges are given zero weight
I Idea is to learn to recognize examples, or a distribution ona set
I The examples are local minima of an energy function
I A dynamical system is created which flows to the minimaof the energy function
I An initial point which is close to one of the exampleswould naturally flow to the corresponding local minimum
![Page 30: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/30.jpg)
Hopfield Networks and Boltzmann Machines
I Fully connected, although one may enforce the fact thatsome edges are given zero weight
I Idea is to learn to recognize examples, or a distribution ona set
I The examples are local minima of an energy function
I A dynamical system is created which flows to the minimaof the energy function
I An initial point which is close to one of the exampleswould naturally flow to the corresponding local minimum
![Page 31: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/31.jpg)
Convolutional Neural Networks
![Page 32: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/32.jpg)
Convolutional Neural Networks
Locality
![Page 33: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/33.jpg)
Convolutional Neural Networks
I Locality is a restriction on the architecture
I Uses geometry of feature space (grid geometry) to restrictthe connections
I Means that the features “learned” will be local, i.e. willinvolve only the neighborhoods in the picture
I Restriction means models are smaller and learning is fasterthan fully connected situation
![Page 34: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/34.jpg)
Convolutional Neural Networks
I Locality is a restriction on the architecture
I Uses geometry of feature space (grid geometry) to restrictthe connections
I Means that the features “learned” will be local, i.e. willinvolve only the neighborhoods in the picture
I Restriction means models are smaller and learning is fasterthan fully connected situation
![Page 35: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/35.jpg)
Convolutional Neural Networks
I Locality is a restriction on the architecture
I Uses geometry of feature space (grid geometry) to restrictthe connections
I Means that the features “learned” will be local, i.e. willinvolve only the neighborhoods in the picture
I Restriction means models are smaller and learning is fasterthan fully connected situation
![Page 36: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/36.jpg)
Convolutional Neural Networks
I Locality is a restriction on the architecture
I Uses geometry of feature space (grid geometry) to restrictthe connections
I Means that the features “learned” will be local, i.e. willinvolve only the neighborhoods in the picture
I Restriction means models are smaller and learning is fasterthan fully connected situation
![Page 37: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/37.jpg)
Convolutional Neural Networks
w
w
Homogeneity or “Convolutionality”
![Page 38: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/38.jpg)
Convolutional Neural Networks
I Restriction on assignment of weights, not the architecture
I Means training will be a constrained optimizaton problem
I Means that recognizing a cat is done the same wayregardless of the position of the cat
I Exploits symmetries in space of features
I Can exploit this idea for other feature geometries
![Page 39: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/39.jpg)
Convolutional Neural Networks
I Restriction on assignment of weights, not the architecture
I Means training will be a constrained optimizaton problem
I Means that recognizing a cat is done the same wayregardless of the position of the cat
I Exploits symmetries in space of features
I Can exploit this idea for other feature geometries
![Page 40: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/40.jpg)
Convolutional Neural Networks
I Restriction on assignment of weights, not the architecture
I Means training will be a constrained optimizaton problem
I Means that recognizing a cat is done the same wayregardless of the position of the cat
I Exploits symmetries in space of features
I Can exploit this idea for other feature geometries
![Page 41: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/41.jpg)
Convolutional Neural Networks
I Restriction on assignment of weights, not the architecture
I Means training will be a constrained optimizaton problem
I Means that recognizing a cat is done the same wayregardless of the position of the cat
I Exploits symmetries in space of features
I Can exploit this idea for other feature geometries
![Page 42: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/42.jpg)
Convolutional Neural Networks
I Restriction on assignment of weights, not the architecture
I Means training will be a constrained optimizaton problem
I Means that recognizing a cat is done the same wayregardless of the position of the cat
I Exploits symmetries in space of features
I Can exploit this idea for other feature geometries
![Page 43: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/43.jpg)
Convolutional Neural Networks
Pooling
![Page 44: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/44.jpg)
Convolutional Neural Networks
I Pooling creates a lower resolution and higher abstractversion
I Such layers are interspersed between convolutional layers
I Mimics our understanding of how human visual pathwayworks
![Page 45: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/45.jpg)
Convolutional Neural Networks
I Pooling creates a lower resolution and higher abstractversion
I Such layers are interspersed between convolutional layers
I Mimics our understanding of how human visual pathwayworks
![Page 46: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/46.jpg)
Convolutional Neural Networks
I Pooling creates a lower resolution and higher abstractversion
I Such layers are interspersed between convolutional layers
I Mimics our understanding of how human visual pathwayworks
![Page 47: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/47.jpg)
Problems with Deep Learning
I Long training time
I Requires a lot of data
I Models are often very large
I Don’t generalize
I Adversarial Examples
![Page 48: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/48.jpg)
Problems with Deep Learning
I Long training time
I Requires a lot of data
I Models are often very large
I Don’t generalize
I Adversarial Examples
![Page 49: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/49.jpg)
Problems with Deep Learning
I Long training time
I Requires a lot of data
I Models are often very large
I Don’t generalize
I Adversarial Examples
![Page 50: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/50.jpg)
Problems with Deep Learning
I Long training time
I Requires a lot of data
I Models are often very large
I Don’t generalize
I Adversarial Examples
![Page 51: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/51.jpg)
Problems with Deep Learning
I Long training time
I Requires a lot of data
I Models are often very large
I Don’t generalize
I Adversarial Examples
![Page 52: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/52.jpg)
Adversarial Examples
![Page 53: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/53.jpg)
Adversarial Examples
![Page 54: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/54.jpg)
Adversarial Examples
![Page 55: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/55.jpg)
Adversarial Examples
![Page 56: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/56.jpg)
Do CNN’s Work Like Mammalian Visual Pathway?
![Page 57: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/57.jpg)
DoCNN’s Reflect Natural Image Statistics?
PRIMARY
SECONDARY SECONDARY
![Page 58: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/58.jpg)
Creating Data Sets from CNN’s
I Consider convolutional neural nets with 9 connections foreach pixel
I Construct 9-vectors for each pixel
I Many grids in first and second layers
I Many experiments
I Apply same methodology as in Lecture 4
![Page 59: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/59.jpg)
Creating Data Sets from CNN’s
I Consider convolutional neural nets with 9 connections foreach pixel
I Construct 9-vectors for each pixel
I Many grids in first and second layers
I Many experiments
I Apply same methodology as in Lecture 4
![Page 60: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/60.jpg)
Creating Data Sets from CNN’s
I Consider convolutional neural nets with 9 connections foreach pixel
I Construct 9-vectors for each pixel
I Many grids in first and second layers
I Many experiments
I Apply same methodology as in Lecture 4
![Page 61: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/61.jpg)
Creating Data Sets from CNN’s
I Consider convolutional neural nets with 9 connections foreach pixel
I Construct 9-vectors for each pixel
I Many grids in first and second layers
I Many experiments
I Apply same methodology as in Lecture 4
![Page 62: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/62.jpg)
Creating Data Sets from CNN’s
I Consider convolutional neural nets with 9 connections foreach pixel
I Construct 9-vectors for each pixel
I Many grids in first and second layers
I Many experiments
I Apply same methodology as in Lecture 4
![Page 63: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/63.jpg)
Findings: MNIST
MNIST for layer 1 of a depth two net, non-localized estimator
![Page 64: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/64.jpg)
Findings: MNIST
Mnist for layer 1 of a depth two net, localized estimator
![Page 65: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/65.jpg)
Findings: Cifar10
Layer 1 of depth two net, reduced to gray scale
![Page 66: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/66.jpg)
Findings: Cifar10
Second layer, reduced to gray scale
![Page 67: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/67.jpg)
Findings: Cifar10
1D barcode for tightly thresholded data set of for 2nd layers ,reduced to gray scale
![Page 68: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/68.jpg)
Findings: Cifar10
Mapper representations over the number of iterations, tightlydensity thresholded, gray scale reduced
![Page 69: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/69.jpg)
Findings: Cifar10
1st layer, Coarse density thresholding, color retained
![Page 70: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/70.jpg)
Findings: Cifar10
1st layer, looser density thresholding, more localized densityestimator, color retained
![Page 71: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/71.jpg)
Findings: Cifar10
2nd layer, fine density thresholding, color retained
![Page 72: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/72.jpg)
Findings: VGG16
Mapper findings from each of 13 layers, same densitythresholding, relatively local estimator
![Page 73: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/73.jpg)
Experiments
I Create new features by forming dot products with patchesmodeled on primary circle
I Add these to existing “raw” features
I Obtain a 3.5x speedup in training for SVHN data set
I Obtain improved generalization for model computed onMNIST applied to SVHN
![Page 74: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/74.jpg)
Experiments
I Create new features by forming dot products with patchesmodeled on primary circle
I Add these to existing “raw” features
I Obtain a 3.5x speedup in training for SVHN data set
I Obtain improved generalization for model computed onMNIST applied to SVHN
![Page 75: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/75.jpg)
Experiments
I Create new features by forming dot products with patchesmodeled on primary circle
I Add these to existing “raw” features
I Obtain a 3.5x speedup in training for SVHN data set
I Obtain improved generalization for model computed onMNIST applied to SVHN
![Page 76: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/76.jpg)
Experiments
I Create new features by forming dot products with patchesmodeled on primary circle
I Add these to existing “raw” features
I Obtain a 3.5x speedup in training for SVHN data set
I Obtain improved generalization for model computed onMNIST applied to SVHN
![Page 77: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/77.jpg)
Building Architectures
I Key ingredient in constructing convolutional neural nets isthe geometry of the feature space, i.e. the grid of pixels
I Are there other geometries that might be useful?
I We can build geometries on feature spaces using theprimary circle or Klein bottle
I Mapper construction gives compressed geometricstructures on the set of features of a general data set
![Page 78: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/78.jpg)
Building Architectures
I Key ingredient in constructing convolutional neural nets isthe geometry of the feature space, i.e. the grid of pixels
I Are there other geometries that might be useful?
I We can build geometries on feature spaces using theprimary circle or Klein bottle
I Mapper construction gives compressed geometricstructures on the set of features of a general data set
![Page 79: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/79.jpg)
Building Architectures
I Key ingredient in constructing convolutional neural nets isthe geometry of the feature space, i.e. the grid of pixels
I Are there other geometries that might be useful?
I We can build geometries on feature spaces using theprimary circle or Klein bottle
I Mapper construction gives compressed geometricstructures on the set of features of a general data set
![Page 80: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/80.jpg)
Building Architectures
I Key ingredient in constructing convolutional neural nets isthe geometry of the feature space, i.e. the grid of pixels
I Are there other geometries that might be useful?
I We can build geometries on feature spaces using theprimary circle or Klein bottle
I Mapper construction gives compressed geometricstructures on the set of features of a general data set
![Page 81: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/81.jpg)
Building Architectures: Feed Forward Systems
I Given two sets X and Y , a relation between X and Y is asubset R ⊆ X × Y
I Can compose relations just like functions
I Graph of a function is an example
I There is a category CR whose objects are sets and wherethe morphisms from X to Y are the relations in X × Y .
I An abstract feed forward system of depth n is a functorfrom the category
n = 1 −→ 2 −→ · · · · · · −→ n
to cR
![Page 82: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/82.jpg)
Building Architectures: Feed Forward Systems
I Given two sets X and Y , a relation between X and Y is asubset R ⊆ X × Y
I Can compose relations just like functions
I Graph of a function is an example
I There is a category CR whose objects are sets and wherethe morphisms from X to Y are the relations in X × Y .
I An abstract feed forward system of depth n is a functorfrom the category
n = 1 −→ 2 −→ · · · · · · −→ n
to cR
![Page 83: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/83.jpg)
Building Architectures: Feed Forward Systems
I Given two sets X and Y , a relation between X and Y is asubset R ⊆ X × Y
I Can compose relations just like functions
I Graph of a function is an example
I There is a category CR whose objects are sets and wherethe morphisms from X to Y are the relations in X × Y .
I An abstract feed forward system of depth n is a functorfrom the category
n = 1 −→ 2 −→ · · · · · · −→ n
to cR
![Page 84: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/84.jpg)
Building Architectures: Feed Forward Systems
I Given two sets X and Y , a relation between X and Y is asubset R ⊆ X × Y
I Can compose relations just like functions
I Graph of a function is an example
I There is a category CR whose objects are sets and wherethe morphisms from X to Y are the relations in X × Y .
I An abstract feed forward system of depth n is a functorfrom the category
n = 1 −→ 2 −→ · · · · · · −→ n
to cR
![Page 85: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/85.jpg)
Building Architectures: Feed Forward Systems
I Given two sets X and Y , a relation between X and Y is asubset R ⊆ X × Y
I Can compose relations just like functions
I Graph of a function is an example
I There is a category CR whose objects are sets and wherethe morphisms from X to Y are the relations in X × Y .
I An abstract feed forward system of depth n is a functorfrom the category
n = 1 −→ 2 −→ · · · · · · −→ n
to cR
![Page 86: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/86.jpg)
Building Architectures: Construct Computation Graphfrom Feed Forward Systems
I For an abstract feed forward system F : n→ CR, we forma directed graph
I The vertex set is∐n
i=1 F (i)
I For vertices v ∈ F (i) and w ∈ F (i + 1), there is an edgefrom v to w if and only if (v ,w) is in the relationF (i → i + 1).
I No edges from v to w unless there is an i so that v ∈ F (i)and w ∈ F (i + 1).
![Page 87: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/87.jpg)
Building Architectures: Construct Computation Graphfrom Feed Forward Systems
I For an abstract feed forward system F : n→ CR, we forma directed graph
I The vertex set is∐n
i=1 F (i)
I For vertices v ∈ F (i) and w ∈ F (i + 1), there is an edgefrom v to w if and only if (v ,w) is in the relationF (i → i + 1).
I No edges from v to w unless there is an i so that v ∈ F (i)and w ∈ F (i + 1).
![Page 88: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/88.jpg)
Building Architectures: Construct Computation Graphfrom Feed Forward Systems
I For an abstract feed forward system F : n→ CR, we forma directed graph
I The vertex set is∐n
i=1 F (i)
I For vertices v ∈ F (i) and w ∈ F (i + 1), there is an edgefrom v to w if and only if (v ,w) is in the relationF (i → i + 1).
I No edges from v to w unless there is an i so that v ∈ F (i)and w ∈ F (i + 1).
![Page 89: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/89.jpg)
Building Architectures: Construct Computation Graphfrom Feed Forward Systems
I For an abstract feed forward system F : n→ CR, we forma directed graph
I The vertex set is∐n
i=1 F (i)
I For vertices v ∈ F (i) and w ∈ F (i + 1), there is an edgefrom v to w if and only if (v ,w) is in the relationF (i → i + 1).
I No edges from v to w unless there is an i so that v ∈ F (i)and w ∈ F (i + 1).
![Page 90: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/90.jpg)
Building Architectures: Important Relations
I If Γ is a graph, then we have a relation RΓ : VΓ → VΓ
defined by (v ,w) ∈ RΓ if and only if (v ,w) is an edge in Γ.
I If (X , d) is a metric space and r > 0 is a threshold, thenRd(r) : X → X is the relation given by (x , x ′) ∈ Rd(r) ifand only if d(x , x ′) ≤ r .
I If f : X → Y is a function, then the graph {(x , f (x)|x ∈ Xis a relation from X to Y
I If Γ1 and Γ2 are two mapper models of the same data set,then there is a naturally defined relation R(Γ1, Γ2) fromVΓ1 to VΓ2 given by (v ,w) ∈ R(Γ1, Γ2) if and only if hecollections corresponding to v and w have a data point incommon
![Page 91: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/91.jpg)
Building Architectures: Important Relations
I If Γ is a graph, then we have a relation RΓ : VΓ → VΓ
defined by (v ,w) ∈ RΓ if and only if (v ,w) is an edge in Γ.
I If (X , d) is a metric space and r > 0 is a threshold, thenRd(r) : X → X is the relation given by (x , x ′) ∈ Rd(r) ifand only if d(x , x ′) ≤ r .
I If f : X → Y is a function, then the graph {(x , f (x)|x ∈ Xis a relation from X to Y
I If Γ1 and Γ2 are two mapper models of the same data set,then there is a naturally defined relation R(Γ1, Γ2) fromVΓ1 to VΓ2 given by (v ,w) ∈ R(Γ1, Γ2) if and only if hecollections corresponding to v and w have a data point incommon
![Page 92: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/92.jpg)
Building Architectures: Important Relations
I If Γ is a graph, then we have a relation RΓ : VΓ → VΓ
defined by (v ,w) ∈ RΓ if and only if (v ,w) is an edge in Γ.
I If (X , d) is a metric space and r > 0 is a threshold, thenRd(r) : X → X is the relation given by (x , x ′) ∈ Rd(r) ifand only if d(x , x ′) ≤ r .
I If f : X → Y is a function, then the graph {(x , f (x)|x ∈ Xis a relation from X to Y
I If Γ1 and Γ2 are two mapper models of the same data set,then there is a naturally defined relation R(Γ1, Γ2) fromVΓ1 to VΓ2 given by (v ,w) ∈ R(Γ1, Γ2) if and only if hecollections corresponding to v and w have a data point incommon
![Page 93: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/93.jpg)
Building Architectures: Important Relations
I If Γ is a graph, then we have a relation RΓ : VΓ → VΓ
defined by (v ,w) ∈ RΓ if and only if (v ,w) is an edge in Γ.
I If (X , d) is a metric space and r > 0 is a threshold, thenRd(r) : X → X is the relation given by (x , x ′) ∈ Rd(r) ifand only if d(x , x ′) ≤ r .
I If f : X → Y is a function, then the graph {(x , f (x)|x ∈ Xis a relation from X to Y
I If Γ1 and Γ2 are two mapper models of the same data set,then there is a naturally defined relation R(Γ1, Γ2) fromVΓ1 to VΓ2 given by (v ,w) ∈ R(Γ1, Γ2) if and only if hecollections corresponding to v and w have a data point incommon
![Page 94: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/94.jpg)
Buillding Architectures
I The idea is to use “geometry” on the feature space toconstruct neural network architectures
I Geometry can arise in numerous ways
I A priori geometry: grid in case of image convolutionalnetworks, linear arrays for time series and text
I Geometries found by research. E.g. primary circle or Kleinbottle in the case of natural images
I Bespoke geometries obtained by taking mapper models forthe column spaces of data matrices
![Page 95: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/95.jpg)
Buillding Architectures
I The idea is to use “geometry” on the feature space toconstruct neural network architectures
I Geometry can arise in numerous ways
I A priori geometry: grid in case of image convolutionalnetworks, linear arrays for time series and text
I Geometries found by research. E.g. primary circle or Kleinbottle in the case of natural images
I Bespoke geometries obtained by taking mapper models forthe column spaces of data matrices
![Page 96: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/96.jpg)
Buillding Architectures
I The idea is to use “geometry” on the feature space toconstruct neural network architectures
I Geometry can arise in numerous ways
I A priori geometry: grid in case of image convolutionalnetworks, linear arrays for time series and text
I Geometries found by research. E.g. primary circle or Kleinbottle in the case of natural images
I Bespoke geometries obtained by taking mapper models forthe column spaces of data matrices
![Page 97: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/97.jpg)
Buillding Architectures
I The idea is to use “geometry” on the feature space toconstruct neural network architectures
I Geometry can arise in numerous ways
I A priori geometry: grid in case of image convolutionalnetworks, linear arrays for time series and text
I Geometries found by research. E.g. primary circle or Kleinbottle in the case of natural images
I Bespoke geometries obtained by taking mapper models forthe column spaces of data matrices
![Page 98: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/98.jpg)
Buillding Architectures
I The idea is to use “geometry” on the feature space toconstruct neural network architectures
I Geometry can arise in numerous ways
I A priori geometry: grid in case of image convolutionalnetworks, linear arrays for time series and text
I Geometries found by research. E.g. primary circle or Kleinbottle in the case of natural images
I Bespoke geometries obtained by taking mapper models forthe column spaces of data matrices
![Page 99: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/99.jpg)
Building Architectures
![Page 100: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/100.jpg)
Building Architectures
![Page 101: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/101.jpg)
Building Architectures
![Page 102: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/102.jpg)
Building Architectures
I Can use symmetries to construct other convolutionalstructures
I Example: rotations by multiplication by roots of unity ondiscretization on circle
I Also exist analogues of pooling constructions for circlesand Klein bottles
I Sphere occurs in 3D voxelized images. Icosahedron is asymmetric discretization
![Page 103: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/103.jpg)
Building Architectures
I Can use symmetries to construct other convolutionalstructures
I Example: rotations by multiplication by roots of unity ondiscretization on circle
I Also exist analogues of pooling constructions for circlesand Klein bottles
I Sphere occurs in 3D voxelized images. Icosahedron is asymmetric discretization
![Page 104: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/104.jpg)
Building Architectures
I Can use symmetries to construct other convolutionalstructures
I Example: rotations by multiplication by roots of unity ondiscretization on circle
I Also exist analogues of pooling constructions for circlesand Klein bottles
I Sphere occurs in 3D voxelized images. Icosahedron is asymmetric discretization
![Page 105: Lecture 8 : The Topology of Neural Nets and Deep Learning May …€¦ · to best approximate a given Boolean function I Ultimately generalize to include continuous functions as well](https://reader036.fdocuments.us/reader036/viewer/2022071002/5fbf4ca0125469753a1a8dc7/html5/thumbnails/105.jpg)
Building Architectures
I Can use symmetries to construct other convolutionalstructures
I Example: rotations by multiplication by roots of unity ondiscretization on circle
I Also exist analogues of pooling constructions for circlesand Klein bottles
I Sphere occurs in 3D voxelized images. Icosahedron is asymmetric discretization