Image recognition
-
Upload
stig-arne-kristoffersen -
Category
Documents
-
view
510 -
download
2
Transcript of Image recognition
HOW DOES IMAGE RECOGNITION WORK?
Before we embark into how it can be used within seismic data, we will take ourselves some
time to explain:
• the eye – foundation of image recognition
• how image recognition is performed on a computer, and what it does.
We will go into depth of explaining Artificial Neural Networks, how algorithms try to mimic the
human eye and its ability to recognize patterns and images in general.
Then we will try to explain components of Convolutional Network (CNN) methodology and
how its implemented today in existing technologies.
References to other presentations made by the author on how it can be used in Seismic
interpretation is made in final slide of this presentation.
FEED-FORWARD ARTIFICIAL NETWORK
A convolutional neural network (CNN, or ConvNet) is a type of feed-forward artificial neural
network where the individual neurons are tiled in such a way that they respond to overlapping
regions in the visual field.
Convolutional networks were inspired by biological processes and are variations of multilayer
perceptrons which are designed to use minimal amounts of preprocessing. They are widely used
models for image and video recognition.
We think this technology and methodology could be used on seismic data in order for us to
perform pattern recognition and perform data training in order to reveal geometries resembling
seismic facies, sequences, geometries, play types, trap types and so forth.
HUMAN EYE – THE NEURONS
Human eye and retina, a 5 layer extension of the brain and portal to the outside world (from article
“Space-time wiring specificity supports direction selectivity in the retina”, Jinseop S. Kim et.al 2013).
NEURONS (BIOLOGICAL VS ARTIFICIAL)
An artificial neuron is a mathematical function conceived as a model of biological neurons. Artificial neurons
are the constitutive units in an artificial neural network. Depending on the specific model used they may be
called a semi-linear unit, Nv neuron, binary neuron, linear threshold function, or McCulloch–Pitts
(MCP) neuron. The artificial neuron receives one or more inputs, representing dendrites and sums them to
produce an output, representing a neuron's axon. Usually the sums of each node are weighted, and the sum
is passed through a non-linear function known as an activation function or transfer function. The transfer
functions usually have a sigmoid shape, but they may also take the form of other non-linear functions,
piecewise linear functions, or step functions. They are also often monotonically increasing, continuous,
differentiable and bounded.
The artificial neuron transfer function should not be confused with a linear system's transfer function.
Biological Neuron Artificial Neuron
ARTIFICIAL NEURONS
Artificial neurons are designed to mimic aspects of their biological counterparts.
Dendrites – In a biological neuron, the dendrites act as the input vector. These dendrites allow the cell to receive signals from a large (>1000)
number of neighboring neurons. As in the above mathematical treatment, each dendrite is able to perform "multiplication" by that dendrite's
"weight value." The multiplication is accomplished by increasing or decreasing the ratio of synaptic neurotransmitters to signal chemicals
introduced into the dendrite in response to the synaptic neurotransmitter. A negative multiplication effect can be achieved by transmitting signal
inhibitors (i.e. oppositely charged ions) along the dendrite in response to the reception of synaptic neurotransmitters.
Soma – In a biological neuron, the soma acts as the summation function, seen in the previous slide mathematical description. As positive and
negative signals (exciting and inhibiting, respectively) arrive in the soma from the dendrites, the positive and negative ions are effectively added in
summation, by simple virtue of being mixed together in the solution inside the cell's body.
Axon – The axon gets its signal from the summation behavior which occurs inside the soma. The opening to the axon essentially samples the
electrical potential of the solution inside the soma. Once the soma reaches a certain potential, the axon will transmit an all-in signal pulse down its
length. In this regard, the axon behaves as the ability for us to connect our artificial neuron to other artificial neurons.
Unlike most artificial neurons, however, biological neurons fire in discrete pulses. Each time the electrical potential inside the soma reaches a
certain threshold, a pulse is transmitted down the axon. This pulsing can be translated into continuous values. The rate (activations per second,
etc.) at which an axon fires converts directly into the rate at which neighboring cells get signal ions introduced into them. The faster a biological
neuron fires, the faster nearby neurons accumulate electrical potential (or lose electrical potential, depending on the "weighting" of the dendrite
that connects to the neuron that fired). It is this conversion that allows computer scientists and mathematicians to simulate biological neural
networks using artificial neurons which can output distinct values (often from −1 to 1).
CNN & IMAGE RECOGNITION
When used for image recognition, convolutional neural networks (CNNs) consist of multiple layers of small neuron collections
which look at small portions of the input image, called receptive fields. The results of these collections are then tiled so that
they overlap to obtain a better representation of the original image; this is repeated for every such layer. Because of this, they
are able to tolerate translation of the input image. Convolutional networks may include local or global pooling layers, which
combine the outputs of neuron clusters. They also consist of various combinations of convolutional layers and fully connected
layers, with pointwise nonlinearity applied at the end of or after each layer. It is inspired by biological processes. To avoid the
situation that there exist billions of parameters if all layers are fully connected, the idea of using a convolution operation on
small regions has been introduced. One major advantage of convolutional networks is the use of shared weight in
convolutional layers, which means that the same filter (weights bank) is used for each pixel in the layer; this both reduces
required memory size and improves performance.
Some time delay neural networks also use a very similar architecture to convolutional neural networks, especially those for
image recognition and/or classification tasks, since the "tiling" of the neuron outputs can easily be carried out in timed stages
in a manner useful for analysis of images.
Compared to other image classification algorithms, convolutional neural networks use relatively little pre-processing. This
means that the network is responsible for learning the filters that in traditional algorithms were hand-engineered. The lack of
a dependence on prior-knowledge and the existence of difficult to design hand-engineered features is a major advantage for
CNNs.
CNN & IMAGE RECOGNITIONRECEPTIVE FIELDS
The receptive field of an individual sensory neuron is the particular region of the sensory space (e.g., the body surface, or
the retina) in which a stimulus will trigger the firing of that neuron. This region can be a hair in the cochlea or a piece of skin,
retina, tongue or other part of an animal's body. Additionally, it can be the space surrounding an animal, such as an area of
auditory space that is fixed in a reference system based on the ears but that moves with the animal as it moves (the space
inside the ears), or in a fixed location in space that is largely independent of the animal's location (place cells). Receptive
fields have been identified for neurons of the auditory system, the somatosensory system, and the visual system.
The term receptive field was first used by Sherrington (1906) to describe the area of skin from which a scratch reflex could be
elicited in a dog. According to Alonso and Chen (2008) it was Hartline (1938) who applied the terms to single neurons, in this
case from the retina of a frog.
The concept of receptive fields can be extended further up to the neural system; if many sensory receptors all form synapses
with a single cell further up, they collectively form the receptive field of that cell. For example, the receptive field of a ganglion
cell in the retina of the eye is composed of input from all of the photoreceptors which synapse with it, and a group of ganglion
cells in turn forms the receptive field for a cell in the brain. This process is called convergence.
Receptive field = center + surround
CNN & IMAGE RECOGNITIONCONVOLUTION
In mathematics and, in particular, functional analysis, convolution is a mathematical operation on two functions f and g,
producing a third function that is typically viewed as a modified version of one of the original functions, giving the area
overlap between the two functions as a function of the amount that one of the original functions is translated. Convolution is
similar to cross-correlation. It has applications that include probability, statistics, computer vision, natural language
processing, image and signal processing, engineering, and differential equations.
The convolution can be defined for functions on groups other than Euclidean space. For example, periodic functions, such as
the discrete-time Fourier transform, can be defined on a circle and convolved by periodic convolution. A discrete convolution
can be defined for functions on the set of integers. Generalizations of convolution have applications in the field of numerical
analysis and numerical linear algebra, and in the design and implementation of finite impulse response filters in signal
processing.
Computing the inverse of the convolution operation is known as deconvolution.
CNN & IMAGE RECOGNITIONPOINTWISE NONLINEARITY
From “Nonlinear Digital Filters: Principles and Applications”, by Ioannis Pitas, Anastasios N. Venetsanopoulo
CNN & IMAGE RECOGNITIONTIME DELAY NEURAL NETWORK
Time delay neural network (TDNN) is an artificial neural network architecture whose primary purpose is to work on
sequential data. The TDNN units recognize features independent of time-shift (i.e. sequence position) and usually form part of
a larger pattern recognition system. Converting continuous audio into a stream of classified phoneme labels for speech
recognition.
An input signal is augmented with delayed copies as other inputs, the neural network is time-shift invariant since it has no
internal state.
The original paper presented a perceptron network whose connection weights were trained with the back-propagation
algorithm, this may be done in batch or online. The Stuttgart Neural Network Simulator implements that version.
CNN & IMAGE RECOGNITIONCLASSIFICATION
Some examples of typical computer vision tasks are presented here.
RecognitionThe classical problem in computer vision, image processing, and machine vision is that of determining
whether or not the image data contains some specific object, feature, or activity. Different varieties of the
recognition problem are described in the literature:
Object classificationOne or several pre-specified or learned objects or object classes can be recognized, usually together with
their 2D positions in the image or 3D poses in the scene. Google Goggles or LikeThat provide a stand-alone
programs that illustrate this function.
IdentificationAn individual instance of an object is recognized. Examples include identification of a specific person's face or
fingerprint, identification of handwritten digits, or identification of a specific vehicle.
DetectionThe image data are scanned for a specific condition. Examples include detection of possible abnormal cells or
tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on
relatively simple and fast computations is sometimes used for finding smaller regions of interesting image
data which can be further analyzed by more computationally demanding techniques to produce a correct
interpretation.
FIRST WE DO PATTERN RECOGNITION
• A pattern is an object, process or event
• A class (or category) is a set of patterns that share common attribute (features) usually from the same information source
• During recognition (or classification) classes are assigned to the objects.
• A classifier is a machine that performs such task
“The assignment of a physical object or event to one of several pre-specified categories” -- Duda & Hart
Armando Vieira & Bernardete Ribeiro (2008)
WHAT IS A PATTERN?
“A pattern is the opposite of a chaos; it is an entity vaguely defined, that could be given a name.”
Examples of Patterns
Armando Vieira & Bernardete Ribeiro (2008)
Patterns of Constellations
Patterns of constellations are represented by 2D planar graphs
Human perception has strong tendency to find patterns from anything. We see patterns from even random noise,
we are more likely to believe a hidden pattern than denying it when the risk (reward) for missing (discovering) a
pattern is often high.
EXAMPLES OF PATTERNS
Armando Vieira & Bernardete Ribeiro (2008)
Biological Patterns ---morphology
Landmarks are identified from biologic forms and these patterns are then
represented by a list of points. But for other forms, like the root of plants,
Points cannot be registered crossing instances.
Applications: Biometrics, computational anatomy, brain mapping, …
EXAMPLES OF PATTERNS
Armando Vieira & Bernardete Ribeiro (2008)
Discovery and Association of Patterns
EXAMPLES OF PATTERNS
Armando Vieira & Bernardete Ribeiro (2008)
EXAMPLES OF PATTERNS
Discovery and Association of Patterns
Armando Vieira & Bernardete Ribeiro (2008)
A broad range of texture patterns are generated by stochastic processes.
EXAMPLES OF PATTERNS
Armando Vieira & Bernardete Ribeiro (2008)
Maps Recognition
Patterns of environment
EXAMPLES OF PATTERNS
Armando Vieira & Bernardete Ribeiro (2008)
APPROACHES TO IMAGE RECOGNITION
• Statistical PR: based on underlying statistical model of patterns and pattern classes.
• Neural networks: classifier is represented as a network of cells modeling neurons of the human brain
(connectionist approach).
• Structural (or syntactic) PR: pattern classes represented by means of formal structures as grammars,
automata, strings, etc.
Armando Vieira & Bernardete Ribeiro (2008)
PROBLEM FORMULATION
Measurements
PreprocessingClassificationFeatures
Input
objectClass
Label
Basic ingredients:
• measurement space (e.g., image intensity, pressure)
• features (e.g., corners, spectral energy)
• classifier - soft and hard
• decision boundary
• training sample
• probability of error
Armando Vieira & Bernardete Ribeiro (2008)
DESIGN CYCLE
1. feature selection and extraction
• What are good discriminative features?
2. modeling and learning
3. dimension reduction, model complexity
4. decisions and risks
4. error analysis and validation
5. performance bounds and capacity
6. algorithms
Armando Vieira & Bernardete Ribeiro (2008)
Armando Vieira & Bernardete Ribeiro (2008)
DATA COLLECTION
How do we know when we have collected an adequately large and
representative set of examples for training and testing the system?
Armando Vieira & Bernardete Ribeiro (2008)
FEATURE CHOICE
Depends on the characteristics of the problem domain.
Simple to extract, invariant to irrelevant transformation,
insensitive to noise.
Armando Vieira & Bernardete Ribeiro (2008)
MODEL CHOICE
Unsatisfied with the performance of our linear object classifier and want to jump to
another class of model
Armando Vieira & Bernardete Ribeiro (2008)
TRAINING
Use data to determine the classifier.
Many different procedures for
training classifiers and choosing
models
Armando Vieira & Bernardete Ribeiro (2008)
EVALUATION
Measure the error rate (or performance) and switch
from one set of features & models to another one.
Armando Vieira & Bernardete Ribeiro (2008)
COMPUTATIONAL COMPLEXITY
What is the trade off between computational ease and performance?
(How an algorithm scales as a function of the number of features, number or training examples,
number patterns or categories?)
CNN IN SEISMIC INTERPRETATION
http://www.slideshare.net/StigArneKristoffersen/future-trends-of-seismic-
analysis?ref=https://www.linkedin.com/profile/preview?locale=en_US&trk
=prof-0-sb-preview-primary-button
http://www.slideshare.net/StigArneKristoffersen/not-
54557734?ref=https://www.linkedin.com/profile/preview?locale=en_US&t
rk=prof-0-sb-preview-primary-button
http://media.wix.com/ugd/c193bc_d9e608ed875d4e208db1a6e8e6b5bb
77.pdf
http://media.wix.com/ugd/c193bc_4088f96c14b848a3a9f3c720f1e3445d.
Below, you will find some links to previous presentations I have made with focus on image recognition as
a tool in Seismic Interpretation, with emphasis on play, trap and seismic stratigraphy