DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES VIA FINITE...

DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES

VIA FINITE-STATE RECURRENT NEURAL

NETWORKS

by

Mehmet Kerem MUEZZINOGLU

August, 2003

IZMIR

DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES

VIA FINITE-STATE RECURRENT NEURAL

NETWORKS

A Thesis Submitted to the

Graduate School of Natural and Applied Sciences of

Dokuz Eylul University

In Partial Fulfillment of the Requirements for

The Degree of Doctor of Philosophy in Electrical and Electronics Engineering,

Electrical and Electronics Program

by


August, 2003

IZMIR

ACKNOWLEDGMENTS

I would like to dedicate this work to my aunt Remziye Dalkıran, who initiated my

education. The steps through this point would have been more troublesome and less

meaningful without her. I would like to thank my parents for their continuous support during

my education and to my wife Tulay Muezzinoglu for her patience and helps in improving

and writing this thesis. I also appreciate Irem Stratmann’s prompted efforts in providing

necessary scientific documents.

I am thankful to the Scientific and Technical Research Council of Turkey, Munir Birsel

Foundation for providing the financial support to improve this work at Computational

Intelligence Laboratory, University of Louisville. Prof. Dr. Jacek M. Zurada was the one

who took care of me and my works there. His kind supervision is gratefully acknowledged.

Beyond all, I am indebted to Prof. Dr. Cuneyt Guzelis, who has never been false in his

advices at every stage of this Ph.D. work. Not only has he patiently supervised my graduate

studies, but also shaped consistently my academic point of view.


ABSTRACT

Information retrieval capability of recurrent neural networks and performances of their

formerly-proposed design procedures are questioned in this thesis work. Five novel design

methods for discrete Hopfield recurrent network model to restore prototype static vectors

from their distorted versions along the operation on a finite state-space are then introduced.

Qualitative properties provided by these methods are verified analytically, while quantitative

ones are estimated by conducting computer experiments. A comparison of each proposed

method with the conventional design procedures is presented in terms of these properties.

The performances of the resulting networks are finally demonstrated on benchmark static

information retrieval applications, namely character recognition and image reconstruction.

Keywords: Associative memory, Hopfield network, information storage, information

retrieval, image reconstruction.

OZET

Bu tez calısmasında dinamik yapay sinir aglarının bilgi geri-catma basarımları ve

bunlara iliskin onceden onerilmis tasarım yontemleri sorgulanmaktadır. Bozulmus statik

bellek vektorlerini onarmak uzere sonlu durum uzayında calısan ayrık Hopfield agı icin

bes yeni tasarım yontemi onerilmektedir. Yontemlerce saglanan nitel ozellikler analitik

olarak dogrulanmakta, nicel ozelliklerse bilgisayar deneyleri ile kestirilmektedir. Onerilen

her yontem, bilinen tasarım yontemleri ile bu ozellikleri acısından karsılastırılmıstır. Ilgili

yontemlerle tasarlanan dinamik agların basarımları, karakter tanıma ve goruntu onarma gibi

statik bilgi geri-catma uygulamalarında gosterilmistir.

Anahtar sozcukler: Cagrısımlı bellek, Hopfield agı, bilgi saklama, bilgi geri-catma,

goruntu onarma.

CONTENTS

Page

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IXList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X

Chapter OneIntroduction 1

1.1 The Memory Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Address Addressable Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Content Addressable Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Association in Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Formulation of Auto-Association . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 The Nearest Codeword Problem . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.4 Auto-Associative Memory Design . . . . . . . . . . . . . . . . . . . . . . . 61.3 Neural Associative Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Feed-Forward Auto-Associators . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1.1 Optimal Linear Associative Memory . . . . . . . . . . . . . . . . . . . . . 81.3.1.2 The Hamming Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Dynamical Auto-Associators . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2.1 Brain-State-in-a-Box Model . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2.2 The Hopfield Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2.3 M-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Chapter TwoDiscrete Hopfield Associative Memory 15

2.1 Discrete Hopfield Network Topology . . . . . . . . . . . . . . . . . . . . . . . 152.1.1 Operation Modes of Network . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.2 Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Recurrent Associative Memory Design Criteria . . . . . . . . . . . . . . . . . . 182.2.1 Criteria for Memory Representation . . . . . . . . . . . . . . . . . . . . . . . 192.2.2 Criteria for Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3 Ideal Recurrent Associative Memory . . . . . . . . . . . . . . . . . . . . . . 212.3 Milestones of Recurrent Associative Memory Design . . . . . . . . . . . . . . . 222.3.1 Outer-Product Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.2 Projection Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3.3 Eigen-Structure Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.4 Linear Inequality Systems to Store Fixed Points . . . . . . . . . . . . . . . . 26

Chapter ThreeTwo Graph Theoretical Design Methods for Recurrent Associative Memory 29

3.1 The Boolean Hebb Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 A Graph Representation of a Binary Memory Set . . . . . . . . . . . . . . . . 303.1.2.1 The Boolean Hebb Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.2.2 Formulation of Maximal Independent Sets . . . . . . . . . . . . . . . . . . 313.1.2.3 Compatibility of a Binary Set . . . . . . . . . . . . . . . . . . . . . . . . . 323.1.3 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.1.3.1 Unipolar Discrete Hopfield Network . . . . . . . . . . . . . . . . . . . . . 353.1.3.2 A DHN Free from Spurious Memories: MIS Network . . . . . . . . . . . . 373.1.3.3 All Fixed Points of MIS-N are Attractive . . . . . . . . . . . . . . . . . . . 383.1.3.4 An Update Rule Provides Attractiveness for Each Memory Vector . . . . . . 393.1.4 Quantitative Properties of Boolean Hebb Rule . . . . . . . . . . . . . . . . . 423.1.4.1 Comparison with Outer-Product Method . . . . . . . . . . . . . . . . . . . 463.1.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1.5.1 A Compatible Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1.5.2 A Compatibilization Procedure and its Character Recognition Application . 513.2 Recurrent Associative Memory Design via Path Embedding into a Graph . . . . 533.2.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Chapter FourConstruction of Energy Landscape for Discrete Hopfield Associative Memory 58

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Discrete Quadratic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Original Design Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2.2 Applicability of the Original Method . . . . . . . . . . . . . . . . . . . . . . 634.2.3 An Extension of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 Computer Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.1 Applicability and Capacity of the Original Design Method . . . . . . . . . . . 704.3.2 A Design Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.3.3 Character Recognition and Reconstruction . . . . . . . . . . . . . . . . . . . 734.3.4 A Classification Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.3.5 An Application of the Extended Method . . . . . . . . . . . . . . . . . . . . 75

Chapter FiveMulti-State Recurrent Associative Memory Design 77

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2.1 Complex-Valued Multistate Hopfield Network . . . . . . . . . . . . . . . . . 805.2.2 Design of Quadratic Energy Function with Desired Local Minima . . . . . . . 825.2.3 Elimination of Trivial Spurious Memories . . . . . . . . . . . . . . . . . . . 84

5.2.4 Algorithmic Summary of the Method . . . . . . . . . . . . . . . . . . . . . . 865.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3.1 Complete Storage Performance . . . . . . . . . . . . . . . . . . . . . . . . . 875.3.2 Application of the Design Procedure . . . . . . . . . . . . . . . . . . . . . . 89

Chapter SixMulti-Layer Recurrent Associative Memory Design 96

6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.2 Multi-Layer Recurrent Network . . . . . . . . . . . . . . . . . . . . . . . . . . 976.3 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Chapter SevenConclusions 104

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

LIST OF TABLES

Page

Table 3.1 The maximum capacity Cmax(n), the probabilitypc (n|m ≤ Cmax(n)), and the best lossless compression ratioRb. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 3.2 Percentages of complete storage in the DHNs designed by the Outer-Product Method (POPM%) and the Boolean Hebb rule (PBHR%) foruniformly distributed random sets. . . . . . . . . . . . . . . . . . . 48

Table 3.3 Complete storage percentages POPM% and PBHR% for different bitprobabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Table 3.4 Average percentages AvPOPM% and AvPBHR% for different bitprobabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 3.5 Simulation results of MIS-N. . . . . . . . . . . . . . . . . . . . . . 51

Table 3.6 Average number of spurious memories for some n, m values. . . . . 57

Table 4.1 Percentages of memory sets that yielded feasible inequality systems. 71

Table 5.1 Percentages of memory sets that yielded feasible inequality systems. 89

Table 6.1 Performance of the proposed method in providing perfect storage andcreating spurious memories and/or limit cycles depending on l. . . . 102

LIST OF FIGURES

Page

Figure 1.1 Block diagram of a typical address addressable memory operation. . 2

Figure 1.2 A content addressable memory. . . . . . . . . . . . . . . . . . . . . 4

Figure 2.1 Conventional discrete Hopfield network model. . . . . . . . . . . . . 16

Figure 2.2 Implementation of the analog Hopfield neural network model. . . . . 17

Figure 3.1 (a)The graphs Gx and resp. Gy having Sx = 1, 2 and resp. Sy =

1, 2, 3 as their unique MIS. x = [1 1 0]T and y = [1 1 1]T . (b) Thegraph G into which both x and y are embedded. . . . . . . . . . . . 33

Figure 3.2 (a) Gx, Gy and resp. Gz having Sx = 2, 3, Sy = 1, 3 andresp. Sz = 1, 2 as their unique MIS’s. (b) The graph G has anextraneous MIS, namely Se = 1, 2, 3. . . . . . . . . . . . . . . . 33

Figure 3.3 (a) The original numerals to be stored as memory vectors. (b)The compatibilized characters. (c) Some distorted numerals. (d)Numerals recalled by MIS-N. . . . . . . . . . . . . . . . . . . . . . 53

Figure 3.4 The graph indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T asits paths between the nodes vd and vs. . . . . . . . . . . . . . . . . . 54

Figure 3.5 Block diagram of the proposed associative memory. . . . . . . . . . 56

Figure 4.1 An algorithmic summary of the overall design method. . . . . . . . . 69

Figure 4.2 Block diagram of the extended network. . . . . . . . . . . . . . . . 70

Figure 4.3 Set of characters which are embedded by the original design methodas memory vectors to DHN. . . . . . . . . . . . . . . . . . . . . . . 73

Figure 4.4 Reconstructions obtained by the resulting DHN. . . . . . . . . . . . 74

Figure 4.5 Three memory patterns used in the classification application. . . . . 74

Figure 4.6 The input map (a) and the classification result (b). . . . . . . . . . . 75

Figure 5.1 An illustration of csign8(u) for u = −1.2 − 0.5i. . . . . . . . . . . . 79

Figure 5.2 Test images used in image reconstruction example. . . . . . . . . . . 91

Figure 5.3 Images corrupted by 20% salt-and-pepper noise (above) and theirreconstructions obtained by the network (below). . . . . . . . . . . . 92



Figure 5.6 Filtered images obtained from noisy images with 40% salt-and-pepper noise by the network (above) and by median filtering (below). 94

Figure 5.7 Lenna images obtained by the networks designed by the generalizedHebb rule and by the proposed method, respectively. . . . . . . . . . 95

Figure 6.1 A two-layer recurrent network made up of discrete perceptrons. . . . 98

CHAPTER ONE

INTRODUCTION

Intelligence necessitates the ability to process information and any information first

needs to be stored in order to be processed. That is why a fundamental property

common to all intelligent systems is that they all employ well-organized, easily accessible

information storage devices. Due to these storage units, computers can perform high-speed

computations that increase our living standards, while animals gain experience to maintain

their lives. This work is devoted to the design of information storage systems, which can

actually do more than just containing the information. We begin by introducing basic

concepts and the terminology that will be used throughout the thesis.

1.1 The Memory Concept

A device that contains information and makes it available upon request is called a memory.

What is definitely expected from such a system is simply to preserve its content until it is

reloaded. A conventional memory contains inherently static information encoded using a

suitable alphabet, which is defined by the constraints imposed by physical implementation

of the system. For example, a RAM device used in digital computers can store only binary

information due to its CMOS structure, hence any information, e.g. a visual or an audio

track, should be encoded as a binary pattern every time before storage. As a consequence,

any information retrieved from this device should be decoded in order to be meaningful for

the user.

A memory performs its typical operation by presenting a single element of its content as

the output, which is implied by the input. Though systems employing a memory unit are

2

......

"Information atl-th location is

required!"

Output

Location #1

Location #l

AddressDecoder

MemoryContent

Figure 1.1: Block diagram of a typical address addressable memory operation.

conceptually dynamical, it should be noted that a conventional memory itself is algebraic,

because the information acquired from it is dependent only on the presented input according

to the definition above. This information is then transmitted to other processing units, where

it gains meaning, maintaining the possibly dynamical behavior of the overall system.

According to the way of inquiring their contents, i.e. the type of the excitatory input

applied, memories are grouped into two categories, namely Address Addressable Memory

(AAM) and Content Addressable Memory (CAM).

1.1.1 Address Addressable Memory

AAM comprises information storage units designed in such a way that each item of the

content occupies a specific location, i.e. a unique address, in a specific storage unit. Any

desired item is requested from AAM by providing its address as the input to the system. A

mechanism to convert the presented address to the location information is involved in this

operation. This peripheral, so called the address decoder, is usually considered as a part

integrated to the memory container. A block diagram of a conventional AAM is shown in

Figure 1.1.

A conventional AAM does not interpret the input nor process its content. Thus, its

operation is not fault-tolerable. In other words, one has to provide the precise and correct

address of the desired item. Most digital computers use localized AAMs, in which each

item is stored into a single memory unit independently of others. This is the actual reason

3

why sometimes inconsistencies occur in the execution of algorithms on digital hardware. In

most of these cases either information stored in the memory is modified out of the control of

the executed process or a wrong address information is presented to the memory causing the

process of a different information than the desired one. If no hardware or software action is

taken against these, a software crash problem is evident.

1.1.2 Content Addressable Memory

Beyond information storage, a memory may be capable of much superior tasks, such as

processing its input. What we roughly mean by processing here is to convert an information

into a desired form. Like encoding and decoding, a typical information processing task

widely performed in engineering is filtering out an effect of an error, i.e. noise, that

might have corrupted the original information. This procedure is obviously crucial for

any system that deals with information, hence possessing a noise-filtering ability would

be a favorable property for a memory. As an information container, CAM is indeed an

information processing system that possesses error correction performance.

Opposed to a localized organization, the content of a CAM may be distributed to different

locations in its information container where each location may include information about

several items of the content. Such a system’s superiority in preserving its content is due

to this distributed structure as undesired modifications on an item of the content can be

compensated by its complementary parts located at other locations.

Besides their offering an alternative distributed organization scheme, CAMs differ from

AAMs also by the way of their excitation. Instead of an address, one applies either an item

of the content or a distorted version of it as the input, then the system presents as output the

single item from its content, which is most similar to this input. This operation performed by

CAM is illustrated in Figure 1.2. It will be called auto-association hereinafter and discussed

in the next section.

Note that it is its auto-associative nature what renders a CAM tolerant to erroneous

inquiries. That is why CAMs are also cited as associative memories, though they constitute

only a subclass of associative memories, namely auto-associative memories.

4

CAM

p|M|p1 p2, , ,...x pi

The memorypattern mostsimilar to x

Distorted orincomplete version

of a memorypattern

Figure 1.2: A content addressable memory.

1.2 Associative Memory

An associative memory is a system, which stores mappings of specific input representations

to specific output representations. That is to say, a system that associates two patterns such

that when one is encountered, subsequently, the other can be reliably recalled. From this

point of view, an associative memory can be interpreted a selective filter, which removes the

noise on the memory patterns only.

Depending on the relevance between the domain and the range of the mapping it

performs, an associative memory belongs to one of the two categories, namely hetero-

associative and auto-associative. Memories of the former type associates between two

disjoint spaces. AAMs are examples of hetero-associative systems: Any input-output pair

(a,m) obtained from the system is an element of A×M , where A is the set of all acceptable

addresses and M is the set including the content, so A ∩ M = ∅. The latter type of systems

will be the focus of interest of this thesis, thus, ignoring their prefixes, hereinafter they will

be cited shortly as Associative Memories (AMs).

1.2.1 Association in Engineering

Kohonen has pointed an analogy between an associative memory and an adaptive filter

function in (Kohonen, 1988). The filter can be viewed as taking an ordered set of input

signals, and transforming them into another set of signals, i.e. the output of the filter. It is the

notion of adaptation, allowing its internal structure to be altered by the transmitted signals,

which introduces the concept of memory to the system. The above mentioned filtering

5

behavior is of course valid only for the patterns stored in associative memory, excluding it

from the traditional filter concept.

The most significant engineering application of such an information processing system

is the pattern association, which has been known to be the prominent feature of human

memory since Aristotle and used in all models of cognition as the basic operation

(Anderson, 1995).

1.2.2 Formulation of Auto-Association

AM is an information processing system that auto-associates, i.e. performs associations

between two nested pattern sets M and M , where M ⊆ M . The function performed by an

AM is expressed as follows:

f : M → M , f [x] = arg minm∈M

d(x,m) (1.1)

where M stands for the set of memory patterns, i.e. memory set, and d(·, ·) is a pre-

determined metric on M .

Evaluation of (1.1) for a given instance (x, M) is called the nearest neighbor

classification problem, whose solution is a crucial step not only for association but also for

many unsupervised learning procedures, e.g. clustering (Dogan & Guzelis, 2003). Nearest

neighbor classification belongs to a class of optimization problems, namely NP-complete

problems, which are known to be hardest among all problems (Vavasis, 1991). This is to say,

obtaining an exact solution of this set-constrained programming turns out to be excessively

time- and memory-consuming when |M | and/or the dimension of M , i.e. the lengths of

memory patterns, increase.

Systems developed to solve the considered problem, i.e. Nearest Neighbor Classifiers

(NNCs) (Duda & Hart, 1973), can actually be considered as perfect but expensive AMs.

Traditional NNCs follow an algebraic way in calculating (1.1): Having presented an

instance x ∈ M , these systems first need to calculate the distances d(x,m) for every

m ∈ M , and then determine the minimum one by mutual comparisons. As implied by the

6

arg operator in (1.1), they finally reconstruct the pattern m∗ which provides this minimum

distance and provide f [x] = m∗.

Though the algorithm given above is easily-implementable on a digital hardware, when

the cardinality of M is large, its performance requires large amount of computation and

memory resources because of the |M | distance calculations at the first step, the storage of

|M | distances at the second step, and the |M | · (|M | − 1)/2 comparisons at the third one.

Note also that the last step necessitates the explicit reconstruction of the memory pattern,

so each element of M must be represented as is, at least at this final step, occupying a

specific location in the system. That is why conventional NNCs should contain localized

information, hence do not benefit from the efficiency of compact representation, e.g.

memory compression.

1.2.3 The Nearest Codeword Problem

Since any finite static information can be encoded as a sequence of binary words of a specific

length, say n, in a distance-preserving fashion (Mano, 1991), an AM operating on the set

M = 0, 1n would provide a significant computer-aided solution to the nearest neighbor

classification problem. That is why works on binary AM is dominant in the related literature,

which is the case for this thesis, too.

This special form of auto-association, where M is given as a binary set in 0, 1n and

d(·, ·) in (1.1) is chosen as the Hamming distance (a well-known metric on the binary space),

has been formally defined as the nearest codeword problem (Garey & Johnson, 1979), which

is also NP-complete.

1.2.4 Auto-Associative Memory Design

Design of a system that evaluates (1.1) as its input-output relation is called the AM design,

which is the ultimate goal of this thesis work. Nevertheless, since its evaluation is expensive

as mentioned above, we are mostly interested in approximating (1.1) here instead by using

compatible models, which are relatively cheaper and faster than NNCs.

7

Having a system model at hand a priori, the design obviously reduces to the

determination of system parameters. This assumption will be valid for all design procedures

mentioned throughout this material.

Artificial neural networks have been considered as appropriate structures for such

models. This is partly due to their biological interpretation, which is out of the scope of

this work, as well as to their mathematically tractable information processing capabilities,

as discussed next.

1.3 Neural Associative Memories

A frequently-quoted formal definition given in (Hecht-Nielsen, 1990) introduces Artificial

Neural Networks (ANNs) as “parallel and distributed information processing structures

consisting of units, which can possess a local memory and carry out localized information

processing operations...”. According to this description, when designed suitably, ANNs are

supposed to be adequate devices to resemble AMs. Motivated by this reasoning, information

storage and retrieval capabilities of ANNs have been constituting one of the major research

areas in literature.

1.3.1 Feed-Forward Auto-Associators

The processing units mentioned in the definition above whose connectionism1 constitutes

an ANN are called neurons. From mathematical point of view, each neuron is assumed to

be performing basically a two-step algebraic operation, that is taking the weighted sum of

its inputs xi ∈ <li=1 and passing it through a nonlinearity φ(·), called activation function,

to produce its output:

y = φ

(

l∑

i=1

wi · xi+t

)

, (1.2)

where wi ∈ <li=1 are (synaptic) weights and t ∈ < is the threshold.

1This term has found a wide usage in ANNs literature instead of connectedness.

8

Every neuron in a network is considered to be a member of a layer, i.e. a processing

level, according to its turn of operation in the information flow within the assembly. In

feed-forward networks, each neuron contributes this flow only in the direction from the

input layer towards the output layer. A network topology which allows connections from

the output of each neuron in a layer towards an input of each neuron in the subsequent layer

is called fully-connected.

Note that feed-forward networks do not employ any memory, hence no information is

ever stored in their structures. This is the reason why they are also called as algebraic

networks. On the other hand, they have been utilized for auto-association due to their

avoiding some critical problems, e.g. instability, that may arise in the design of their

dynamical alternatives. The following feed-forward auto-associators are worth mentioning

shortly here in order to emphasize their advantages and especially disadvantages, which

directed the research on AMs towards dynamical systems.

1.3.1.1 Optimal Linear Associative Memory

The basic idea behind the Optimal Linear Associative Memory (OLAM) is that a single

layer of n neurons possessing linear activation functions performs linear filtering on an n-

dimensional real signal space, that is the simplest form of auto-association indeed.

The goal of OLAM design is to render the network map each element of a given finite

memory set M ⊂ <n onto itself instantaneously. Utilizing the outer-product method

(Haykin, 1994), the parameters of the considered network topology is determined as

W =∑

∀p∈M

p · pT , (1.3)

and all thresholds as zero. Here wij denote the real weight coefficient associated to the

connection from j-th input to i-th neuron.

Note that the basic design criterion given above does not imply error correction nor

avoids vectors other than the memory vectors to be stored, i.e. associated to itself. The

first major drawback of OLAM design can also be explained in this way: It is in general

9

impossible to filter out the noise on an arbitrary signal by using a linear filter. However,

when M consists of orthogonal vectors, it can easily be shown that the network designed

by (1.3) becomes selective only for the memory vectors. As discussed by Kohonen in

(Kohonen, 1977), the very restrictive orthogonality assumption can be relaxed to a milder

one, namely linear independence of memory vectors, if the weight parameters are chosen as

W = M ·(

MT · M)−1 · MT . (1.4)

where M denotes the n × |M | real matrix constructed by augmenting memory vectors as

columns.

OLAM is a simple but definitely primitive AM because of the above mentioned

restrictions on the memory set. This network has been proven in (Zurada, 1992) to exhibit

maximum performance when designed by (1.4), which also constitutes a background for

another well-known design technique described in the subsequent chapter.

1.3.1.2 The Hamming Network

To perform auto-association on a binary pattern space, the Hamming network proposed in

(Watta & Hassoun, 1991) employs a Hamming-distance calculator block incorporated with

a competitive network known as MAXNET (Suter & Kabrisky, 1992) at the first stage of its

operation. This crucial block contains the encodings of binary memory patterns, possibly

as Boolean functions, and points out the nearest one to the input pattern by producing a 1

at one of its |M | outputs and 0 at all others. The memory pattern implied in this way is

explicitly reconstructed then in the second stage. As a result, the network exactly evaluates

(1.1), thus is a perfect AM.

Although some alternative implementation techniques have been proposed to simplify

the costly operation of the first block, e.g. (Aksın, 2002), the Hamming network and its

derivatives (Ikeda et al., 2002) are simply NNCs realized on ANNs, so suffer from the

disadvantages mentioned in the last paragraph of Section 1.2.1.

10

Auto-association indeed is a more complicated task than a simple nonlinear

transformation of a matrix-vector product. This is why a single-layer feed-forward network

topology is too ineffective to cope with this problem. On the other hand, as demonstrated

by the Hamming network, it costs sophisticated processors, like MAXNET, to perform this

task adequately via an algebraic input-output relation. Though there appeared many other

attempts in literature, which are omitted here, to realize (1.1) using feed-forward ANNs,

each of them was subject to the above-mentioned trade-off between the simplicity of the

network topology and the auto-association performance. As a result, when the goal is to

obtain a cheap AM that has a comparable auto-association performance to that of an NNC,

the designer should give up the algebraicity and, of course, its natural benefits.

1.3.2 Dynamical Auto-Associators

What one can achieve by using a feed-forward ANN is a strict subset of the capabilities

of the same network with feedback provided from the output to the input, namely its

dynamical counterpart. Once designed properly, with possibly some more effort than the

design of feed-forward ANNs, dynamical ANNs offer a comparable topological complexity

to that of the simple feed-forward ones while improving substantially the auto-association

performance. On the other hand, their implementation schemes, e.g. (Ghosh et al., 1994),

employing analog circuit elements make them a much cheaper and faster alternative to

NNC, which is indeed suitable to be implemented as a software on a digital hardware.

Dynamical ANN models to be designed as AMs are usually considered as autonomous,

hence their design procedures aim to perform (1.1) not as an input-output relation but as a

mapping from the initial state vector to the steady-state solution. The trajectory of the state

vector produced along the operation is interpreted as an error correction, which does not

require any comparisons between the distorted pattern and the memory vectors. Moreover,

all memory vectors are encoded as system parameters and this allows for a distributed

representation, even data compression.

In the rest of this material, we focus on the design of such dynamical ANNs as AMs.

To locate the original design methods described in the subsequent chapters, the history of

dynamical auto-associators is pictured briefly in the sequel to be continued in Section 2.3.

11

1.3.2.1 Brain-State-in-a-Box Model

Auto-association performance of dynamical systems have been exploited first by a

biologically-inspired work (Ritz et al., 1977) in 1977. The dynamical ANN proposed therein

was called the brain-state-in-a-box model, and was made up of n coupled neurons with

piecewise-linear activation functions, which were collapsing all solutions of the dynamical

system within the unit hypercube [0, 1]n. Though it constitutes the first existence proof

of AMs realized on dynamical ANNs , this network was lacking a design procedure and

probably this was the reason why dynamical ANNs could not attract any significant attention

until 80’s (Golden, 1986).

1.3.2.2 The Hopfield Era

The pioneering work by Hopfield (Hopfield, 1982) was aiming to show the collective

computational capabilities of very simple algebraic processing units when properly

connected allowing for feedback. The problem chosen as an instance of collective operation

was exactly auto-association, hence this approach initiated the usage of dynamical ANNs

as AMs. This single work has been cited more than 4000 times for two decades by research

articles within the Science Citation Index.

The proposed network, so called the Discrete Hopfield Network (DHN), consists of a

single layer of n coupled (bi-state) neurons with hard-limiter type activation functions2.

The dynamics of the network was therefore constrained in the finite state-space −1, 1n,

i.e. the vertices of the unit hypercube. Many design methods for DHN to perform (1.1)

including many modifications of the original model have been proposed then. A formal

analysis of DHN together with a comprehensive discussion on its capabilities constitute the

subject of the following chapter.

1.3.2.3 M-Model

A Hopfield-like dynamical network with a sigmoidal or a piecewise-linear nonlinearity

has been considered as an effective binary AM since the qualitative result of Michel et.

2Such neurons had been introduced as discrete perceptrons in (Rosenblatt, 1962) and they have beenutilized to implement dichotomies, which constitutes another major issue in ANN literature.

12

al. (Michel et al., 1991), and handled as a different model, called M-Model. Though the

analytical design procedure, called eigen-structure method, proposed for this model is really

superior in many aspects to some well-known methods for conventional DHN, M-Model’s

infinite state space is its major drawback in resembling 1.1, whose domain and range are

both finite. The model and its design procedure is explained in Section 2.3.3 and it is

demonstrated by simulation results in subsequent chapters that there exists indeed better

conventional DHN methods than the eigen-structure method.

1.4 Organization of Thesis

The goal of this work is to derive efficient design techniques primarily for the conventional

discrete Hopfield topology to obtain an auto-associative memory while ensuring the perfect

storage. First two chapters of this material is devoted to the introduction of memory concept

and the formulation of the associative memory design problem, respectively. A critical

review of some well-known design procedures are given in Chapter 2.

A binary associative memory design procedure that gives a discrete Hopfield network

with a symmetric binary weight matrix is introduced in the first part of Chapter 3.

The method, which was first proposed in (Muezzinoglu, 2000) and then extended in

(Muezzinoglu & Guzelis, 2001), is based on introducing the memory vectors as maximal

independent sets to an undirected graph, which is constructed by Boolean operations

analogous to the conventional Hebb rule. The parameters of the resulting network is then

determined via the adjacency matrix of this graph in order to find a maximal independent

set whose characteristic vector is close to the given distorted vector. The applicability of

the design method is finally investigated by a quantitative analysis, which was also given in

(Muezzinoglu & Guzelis, 2003a). The theoretical results presented therein are valuable as

they prove that, whenever the given memory vectors are correlated as being compatible, a

discrete Hopfield network with only binary parameters can be designed i) free from spurious

memories, ii) ensuring perfect storage, and iii) with high storage capacity. The graph

theoretical concept of compatibility is introduced and a quantitative analysis is conducted

to enlighten how restrictive this property really is.

13

Another graph theoretical design method for binary recurrent associative memory to

retrieve binary memory vectors from their distorted versions is introduced in the second

part of Chapter 3. The method, which was reported in (Muezzinoglu & Guzelis, 2002),

is based on the observation that an undirected graph of n + 2 nodes represents some n-

dimensional vectors as paths between a specific pair (s, d) of nodes such that any edge of a

path indicates two successive entries of value 1 of the representing vector. We construct a

graph as a union of paths generated for a memory vector set. A memory vector embedded

into the graph in this way is recalled by using a discrete Hopfield network whose trajectory

begins at a distorted vector as the initial condition. Finally the network converges to a

binary vector indicating one of the nearest paths in this graph. As a result, the original

memory vector is reconstructed from the arc-based representation of this path. Quantitative

analysis supported by the simulations shows that the method is superior to many recurrent

associative memory design methods in these aspects: i) An arbitrary memory set can be

embedded as attractive fixed points of the resulting associative memory, ii) the number of

extraneous fixed points is in general greater than the ones caused by the conventional outer-

product method, but these unavoidable extraneous fixed points are all close to the memory

vectors in the resulting network, thus they cause relatively small errors in recall.

An energy function-based auto-associative memory design method to store a given

unipolar binary memory vector set as attractive fixed points to an asynchronous discrete

Hopfield network is presented in Chapter 4. The discrete quadratic energy function, whose

local minima correspond to the attractive fixed points of the network, is constructed by

solving a system of linear inequalities derived from the strict local minimality conditions.

This idea was introduced in (Muezzinoglu et al., 2003a). The parameters (weights and the

thresholds) of the network are then calculated using this energy function. If the inequality

system is infeasible, it can be concluded that no such asynchronous discrete Hopfield

network exists. In this case, we extend the method to design a discrete piecewise quadratic

energy function, which can be minimized by a generalized version of the conventional

discrete Hopfield network, also proposed therein. In spite of its computational complexity,

it is verified by the computer simulations that the original method performs better than

the conventional design methods in the sense that the memory can store, and provide

the attractiveness for almost all memory sets whose cardinality is less than or equal to

14

the dimension of its elements. A convincing character recognition application presented

(Muezzinoglu et al., 2003b) is also included. The complete method, together with its

extension, guarantees the storage of an arbitrary collection of memory vectors, which are

mutually at least two Hamming distances away from each other. The derivation of this

method enlightens the achievable upper bound on the performance of a conventional discrete

Hopfield network in association tasks.

Motivated by the results derived in Chapter 4, a method to store each element of an

integral memory set M ⊂ 1, 2, . . . , Kn as a fixed point into a complex-valued multi-state

Hopfield network is introduced in Chapter 5. This method, which was originally proposed

in (Muezzinoglu et al., 2003c), employs a set of inequalities to render each memory pattern

as a strict local minimum of a quadratic energy landscape, too. Based on the solution of this

system, it gives a recurrent network of n multi-state neurons with complex and Hermitian

synaptic weights, which operates on the finite state space 1, 2, . . . , Kn to minimize this

quadratic function. The maximum number of integral vectors that can be embedded into the

energy landscape of the network by this method is investigated by computer experiments.

Chapter 5 also presents an application of the proposed method by reconstructing noisy gray-

scale images, as was done in (Muezzinoglu et al., 2003d).

Finally, to achieve a perfect storage beyond the capability of the conventional discrete

Hopfield network, a novel design procedure to embed binary memory vectors as attractive

fixed points to a recurrent multi-layer neural network is presented in Chapter 6. It

is first shown that an additional layer to conventional Hopfield model is necessary for

providing perfect storage of some memory sets. Then the link between the number of

hidden layer neurons and the association performance has been exploited. In the proposed

design procedure, which was originally reported in (Muezzinoglu & Guzelis, 2003b)

and (Muezzinoglu & Guzelis, 2003c), we make use of the well-known back-propagation

learning algorithm to provide attractiveness for each memory vector. A pruning technique

is then employed to minimize the number of hidden-layer neurons, so that the network

topology is simplified. The performance of the proposed method is investigated by extensive

computer analysis.

CHAPTER TWO

DISCRETE HOPFIELDASSOCIATIVE MEMORY

This chapter introduces DHN, which is considered as the fundamental model in the

design strategies that are developed in the subsequent chapters. The recurrent AM design

criteria are posed next irrespective of the network model used in the design. Some major

procedures that have been suggested previously to render DHN perform as an AM are

also mentioned herein, since they will then constitute the references for evaluating of the

proposed methods.

2.1 Discrete Hopfield Network Topology

The conventional fully-connected DHN topology (Hopfield, 1982), as illustrated in

Figure 2.1 consists of a single layer of n bi-state discrete perceptrons each of which

takes a weighted sum of the delayed output values and then passes it through the signum

nonlinearity sgn(·) to produce the next output. The connection weights from the state

(output) of i-th neuron to an input of j-th one is a real number denoted by wji and each

neuron possesses a real threshold (or bias) ti.

2.1.1 Operation Modes of Network

There are two operation modes defined for this discrete-time system, namely synchronous

and asynchronous modes. In synchronous mode, the system performs the recursion

x[k + 1] = sgn (W · x[k] + t) , (2.1)

16

Σ u(.)

Σ u(.)

...

w

w

w

w

11

nn

n1

1n

x [k]1

nx [k]

x [k+1]1

x [k+1]n

...

t1

tn

Figure 2.1: Conventional discrete Hopfield network model.

where x[k] stands for the network state (output values of the neurons) at time instant k.

Alternatively, allowing the update of only one element, say i-th one of the state vector at

each iteration according to

xi[k + 1] = sgn

n∑

j=1

wij · xj[k] + ti

(2.2)

prescribes the asynchronous operation mode of the same network.

Due to the the nonlinearity sgn(·), the state-space of the recurrent network in either

operation mode is the bipolar binary space −1, 1n, i.e. the 2n vertices of the unit-

hypercube1. It is also obvious by the two recursions that the fixed points of the network

are invariant under the operation mode. Note however that, as only one neuron is allowed

change its state in asynchronous mode, for all initial conditions the state vector of the

network necessarily follows a path passing through adjacent vertices.

The update order applied in asynchronous mode may affect the trajectory of DHN. In

other words, when two different update orders are applied for two identical asynchronous

DHNs with the same initial conditions, the sequence of binary vectors produced along these

two recursions is different in general. For natural reasons, generally no update order is

specified for asynchronous DHN, deciding on randomly which neuron to update at time

1Due to the change-of-variables xu = (xb + e)/2, where e is a vector with all 1 entries, for every DHNoperating on the bipolar binary space with state vector xb, one can a network with the same properties butoperating on the unipolar space 0, 1n with state-vector xu. The converse is also true as this transformationis bijective, i.e. xb = 2 · xu − e.

17

r+- V x

xi

i-

+- xj

Rij

i

xi(0)

RCi

......

cc

Figure 2.2: Implementation of the analog Hopfield neural network model.

instant k, but complying with the condition: All neurons should be updated within time

intervals n · l < k < n · (l + 1), l = 0, 1, 2, . . .. This scheme is called as the random

update. Due to this randomness involved in their operations, such asynchronous DHNs are

considered as stochastic systems despite their deterministic physical parameters.

2.1.2 Implementation Notes

What makes the usage of DHN preferable in complicated tasks, such as auto-association

or optimization, is definitely its simplified implementation schemes employing basic circuit

elements, instead of complicated digital processors. Though this issue is out of the scope of

this work, which deals with AM design from the point of view given in section 1.2.4 only,

a straightforward model proposed in (Michel & Liu, 2002) is included here in Figure 2.2 to

support the argument that Hopfield networks are much cheaper alternatives to NNCs, and,

consequently, to other ANNs that resemble NNCs.

The circuit made up of the neurons given by Figure 2.2 with infinite gain operational

amplifiers, performs the synchronous recursion (2.1) by making use of only n operational

amplifiers, n linear resistors, and n linear capacitors. Each operational amplifier in the

circuit acts as a weighted-summer of the state variables xi[k] and its infinite open-loop gain

18

realizes a signum nonlinearity sgn(·). Choosing R = 1Ω, the real parameters wij of the

recursion becomes equal to Rij.

To resemble the asynchronous mode operation, it is sufficient to incorporate an external

device to the given model in order to control the delay blocks, i.e. the latches denoted by D

in the figure, such that no two of them change their outputs exactly at the same time, if all

elements of the circuit are considered to be ideal. On the other hand, such unsystematic

delays occur in practice, without any need to take action to ensure them. Hence, the

asynchronous DHN with random update indeed can be considered as a more realistic

model developed to exhibit the effects of non-ideal switching of these delay elements in

synchronous mode. However, it should be noted here that, this is not our (and many other

researchers’) point of view to an asynchronous DHN, since we will focus on this system not

because we can never implement a synchronous DHN in reality, but because of its ensuring

a vital AM design criterion described below.

2.2 Recurrent Associative Memory Design Criteria

Autonomous recurrent networks, especially DHNs, are utilized in AM design to realize

the association function (1.1) as a map from their initial states to their fixed points. To be

precise, whenever a distorted pattern d is presented to a recurrent AM by injecting it as the

initial value of the state vector x[0], the evaluation of the association function should be

produced by the system as the steady-state solution: x[∞] = f(d). Since the network is

expected to perform this for every d ∈ M , it should be designed such that its state-space

includes the pattern space M . Satisfying this fundamental consideration or not is of course

a matter of the choice of activation functions of the neurons in the output layer. For example

DHN automatically satisfies it for binary association as its state-space is the entire binary

space, which is equal to M in nearest code word problem (c.f. Section 1.2.3). However,

in the design of DHNs to operate as AMs on other pattern spaces, the criterion mentioned

above should constitute the first concern of the designer. Further design considerations are

grouped into two categories as follows.

19

2.2.1 Criteria for Memory Representation

Since the steady-state solutions of the recurrent network represents the range M of the

association function, which consists of constant values only in the considered case of auto-

association, the network should not exploit any kind of behavior other than converging

towards a fixed point. Such dynamical systems are defined in (Hirsch & Smale, 1974) as

convergent. Another design condition then follows as the convergence:

Condition 1 Each trajectory of the system should tend to a fixed point, i.e. the system

should be convergent.

To represent all static memory vectors properly, each fixed point of the network should be

corresponded by the designer to an element of M , which is a given collection of n-vectors

in auto-association problem. Hence, recurrent AM design can be viewed as a dynamical

system design with given fixed points.

Condition 2 The set of fixed points of the system should contain the given set of memory

vectors.

Moreover, this correspondence should be provided in one-to-one fashion, because any

fixed point that is excluded by M would otherwise represent an undesired memory, namely

spurious memory, in the state-space of the system.

Condition 3 The set of fixed points should contain no element other than the given memory

vectors, preventing spurious memories.

2.2.2 Criteria for Error Correction

Conditions 1 and 2 are necessary and sufficient to represent the given memory set as fixed

points of the considered recursion, thus a recurrent network satisfying them recalls each

memory successfully when initiated by the memory pattern itself. However, one cannot

expect such a network to perform error correction yet, without any conditions its recurrence.

20

Error correction, or memory retrieval, has been defined as the evolution of the state vector

from a distorted pattern, which is injected as the initial state vector, towards a memory

pattern, which is a specific fixed point x∗ of the recurrence. Consequently, if this fixed point

can be introduced to the network such that other points in the state-space which are located

around x∗ are mapped to x∗ along the recurrence, then the network gains error-correction

capability, at least for the distorted patterns within this neighborhood. For binary recurrent

AM, this condition is expressed as:

Condition 4 Any fixed point x∗ of the system is attractive in the sense that for any point x

within 1-Hamming distance neighborhood of x∗, there exists a trajectory which starts at x

and tends to this fixed point x∗.

Though the above given definition of attractiveness becomes equivalent to the asymptotic

stability in the sense of Lyapunov (Vidyasagar, 1993) for deterministic and uniquely-

solvable finite-state systems, it is necessary here to extend the stability concept to a

stochastic process, such as asynchronous DHN with random update.

Finally, to satisfy the nearest neighbor consideration as imposed by (1.1), the network

should correct errors such that for any point in the state-space should be mapped to the

nearest fixed point. It can easily be shown that this claim is equivalent to the following final

condition.

Condition 5 The radii of attraction basins of attractive fixed points are almost equal.

Hence, these basins share the state space in an equal way.

Herein, attraction basins are defined based on the definition of attractiveness in Condition

4: A point x is in the attraction basin of a fixed point x∗ if there is a trajectory starting at x

and ending at x∗.

21

2.2.3 Ideal Recurrent Associative Memory

A procedure for a specific network model that gives recurrent AM satisfying all design

conditions for an arbitrary collection of memory vectors is called as an ideal AM design

method. Such a method has not appeared in the literature for DHN, nor for any other finite-

state recurrent network model, yet. Actually, an ideal design method can never be proposed

for DHN as proven in Chapter 4.

In representing the memory set, most procedures fail to satisfy Condition 3, since any

attempt to avoid spurious memories usually necessitates the handling of whole state-space

of the network (Athithan & Dasgupta, 1997), which is huge for pattern spaces of high

dimensions.2 Almost all existing methods suffers from the same problem also in satisfying

Condition 5. Therefore, methods satisfying the rest of the conditions are considered to be

the successful ones, which are still rare in the literature.

Definition 1 A design method that gives a recurrent network satisfying Condition 1, 2, and

4 for an arbitrary collection of memory vectors is said to provide the perfect storage.

The expression “the arbitrary collection” naturally includes memory sets of large

cardinalities. As will be illustrated, challenging such sets are relatively harder than storing

small memory sets as fixed points for any design method. In other words, storing a set of

uncorrelated memory vectors by any design method turns out to be a more difficult task as

the cardinality increases. This is why most design methods are investigated quantitatively

to give an upper bound on the number of memory vectors that can be successfully stored.

The regarding method is then considered to be working properly for the memory sets under

this limit.

Definition 2 The maximum number of memory vectors that can be stored as fixed points to

a recurrent AM by a design method is called the method’s memory capacity.

2Even determining the exact locations of spurious memories in a designed network has been reported asan NP-complete problem in (Bruck & Roychowdhury, 1990).

22

2.3 Milestones of Recurrent Associative Memory Design

As proven in (Bruck & Goodman, 1988) and in the next chapter for the unipolar binary case,

symmetry of the weight matrix is one of the two sufficient conditions for the convergence

of an asynchronous DHN. Symmetry is also advantageous in the sense that such a weight

matrix entails only the diagonal and the upper triangular part, i.e. (n2 + n)/2 real values, in

order to be characterized. Noting that some efficient but computationally-costly methods,

such as (Sudharsanan & Sundareshan, 1991) and (Sompolinsky & Kanter, 1986) have been

proposed for non-symmetric DHNs, we further restrict ourselves at this point to symmetric

DHNs.

This section presents four major recurrent AM design methods in chronological order

that have been proposed for symmetric DHNs.

2.3.1 Outer-Product Method

Inspired by the Hebb rule (1.3) used in the design of OLAM, the first recurrent AM design

tool for DHN was proposed in (Hopfield, 1982). The outer-product method is very easy-to-

apply, but gives a rather primitive AM as explained below.

Given a memory set M ⊆ −1, 1n, the positive integer-valued weight matrix of the

network is determined by

W =∑

p∈M

p · pT + |M | · I, (2.3)

where I denotes the n × n unity matrix, and the threshold of each neuron is chosen as zero.

It has been shown in (Bruck & Goodman, 1988) that the outer-product method ensures

the convergence of the resulting network. However, there is no guarantee that each given

memory vector will be mapped to a fixed point. Moreover, nothing can be said about

the attractiveness of fixed point of the network. As a result outer-product method can not

provide the perfect storage. The reason can be explained by an energy-function approach:

23

As will be shown in Chapter 4, each DHN whose weight matrix W is symmetrical

possessing zero diagonal entries, locally minimizes a discrete quadratic

E(x) = −xT ·W · x (2.4)

defined on the state-space −1, 1n. In other words the state vector necessarily tends to a

discrete local minimum of (2.4). However, the considered method in general does not map

M to the local minima of (2.4), since the local minima of the quadratic form

E(x) = xT ·

∑

p∈M

p · pT

· x − |M | · xT · x (2.5)

is in general different than the elements of M . This also explains why the method provides

perfect storage in the trivial case where |M | = 1, because the single local minimum of (2.5)

is indeed the single memory vector in this case.

Another negative outcome of the method is that the spurious memories necessarily occur

in the resulting network, while some of these undesired points can be easily identified as

negatives of the stored memory patterns3.

Fact 1 If p is a fixed point of a DHN recursion with zero thresholds, then −p is also a fixed

point.

Finally, the method is also ineffective from quantitative aspects. A theoretical result

presented in (Dembo, 1989) states that, given uncorrelated n-dimensional memory vectors,

the outer-product method is able to store up to only 0.138 · n points among these as

fixed points of the resulting DHN, without ensuring attractiveness, as will be verified

experimentally in Chapter 3.

Despite its above-mentioned shortages, the outer-product method initiated a major

research field for ANNs, hence is valuable as the pioneering attempt to design AMs on

DHNs.

3Such spurious memories will be introduced in Chapter 5 as trivial ones.

24

2.3.2 Projection Learning Rule

Utilizing pseudo-inverse techniques, Personnaz et.al. developed a method that guarantees

Condition 1 and 2 (Perzonnas et al., 1986). The method, so called the projection learning

rule, has been originally introduced for synchronous DHN, but, of course, works for the

asynchronous DHN, too.

The basic idea behind the procedure is that the storage of memory vectors would be

successfully accomplished if the equality

W · M = M (2.6)

holds, when the thresholds are chosen as zero. Here M is the matrix form of the given

memory set M ⊆ −1, 1n. The minimum-norm solution to (2.6) is given by

W = M · M+, (2.7)

where M+ denotes the Moore-Penrose pseudo-inverse of the binary matrix M. Note that,

when the memory vectors are linearly independent, one obtains the expression

W = M ·(

M · MT)−1 · MT (2.8)

which is of the projection matrix form, so maps any vector onto a memory vector in a single

iteration of the synchronous DHN recursion.

Projection learning rule does not guarantee the attractiveness condition, unless the

memory vectors are mutually orthogonal. However, it at least exploits a correlation, namely

orthogonality, for the given memory set to provide perfect storage.

By the orthogonality assumption, the conditional memory capacity provided by the

method is determined as n, which is relatively high compared to the outer-product method.

25

2.3.3 Eigen-Structure Method

One of the well-known works on recurrent AM design was proposed by Michel et al. in

(Michel et al., 1989). The eigen-structure method in this work was originally proposed

for a modified DHN, called M-Model, where the signum nonlinearities are replaced with

piecewise-linear (or saturated linear) functions. The system thus operates within the unit-

hypercube [−1, 1]n in discrete time. A detailed analysis of the considered network model

and its applications can be found in (Michel & Farrell, 1989). The method has been extended

to continuous-time networks then in (Li et al., 1989).

Given a memory set M = p1, . . . ,p|M | ⊆ −1, 1n the eigen-structure method

consists of the following steps:

1. Choose a memory vector pr and compute the n × (|M | − 1) matrix:

Y = [p1 − pr · · · pr−1 − pr pr+1 − pr · · · p|M | − pr]. (2.9)

2. Calculate singular value decomposition of Y and obtain the matrices U, V and Σ

such that Y = U · Σ ·VT . Let

Y = [y1 · · · y|M |−1],

U = [u1 · · · un],

l = dimension of Spany1, . . . ,y|M |−1. (2.10)

3. Compute

W+ =l∑

i=1

ui(ui)T ,

W− =n∑

i=l+1

pi(pi)T . (2.11)

4. Choose a positive number τ and compute the weight matrix and the threshold vector

of the network as follows:

W = W+ − τ ·W− and t = pr − W · pr. (2.12)

26

Some important properties of this procedure are as follows:

• Matrices W+ and W− depend only on the given memory set and they are independent

of the choice of pr at the first step of the procedure.

• Each memory vector is a fixed point of the resulting network.

• For sufficiently small τ > 0, each equilibrium of the network is stored as an

asymptotically stable equilibrium of the network.

The most positive property of the method is that, without any restrictions on the given

memory set M , the eigen-structure method is capable of storing each memory vector as

an asymptotically stable equilibrium of the resulting network. This is to say, the domain

of attraction for each fixed point is ensured to be nonempty bounded set in <n. However,

this does not imply that each fixed point is attractive on the binary pattern space −1, 1n.

Consequently, the perfect storage is not guaranteed by the eigen-structure method.

The maximum number of memory vectors that can be stored by the method is 2n, which

is the maximum capacity that a design method can achieve.

The results listed above, show that the performance of the system is closely related to

the selection of the parameter τ . For the increasing value of τ , up to a critical value above

which Condition 2 fails, the decrement in the number of spurious states in the network has

been observed by the authors. However, the occurrence of spurious states in the network

are not totally prevented in general, so we say that the method does not satisfy Condition 4.

2.3.4 Linear Inequality Systems to Store Fixed Points

Another effective recurrent AM design method was proposed in (Tan et al., 1991) treating

each neuron of a DHN separately and formulating the design considerations as a system

of linear inequalities. By this approach described below, the design reduces to a linear

feasibility problem to be solved by following various strategies.

27

First ignoring the overall dynamical behavior of the system, each discrete perceptron in

the network can be considered to be implementing a dichotomy as producing the output

y =

1 if∑

i wi · xi + t > 0

−1 otherwise, (2.13)

where x is the input vector applied. Then, a fixed point x∗ of DHN recursion is identified

as a bipolar binary point satisfying the inequality system

x∗1 ·(

wT1 · x∗ + t1

)

> 0

...

x∗n ·(

wTn · x∗ + tn

)

> 0 (2.14)

where wTi denotes the i-th row of the weight matrix.

When this system of n inequalities is imposed for all p ∈ M , a solution (W, t) to the

overall system would actually constitute a set of desired coefficients to ensure Condition 2.

However, the above given conditions are derived only for fixed points, thus the network may

exhibit undesired behaviors during its recursion. In addition to the resulting c · |M | linear

inequalities, the symmetry condition is therefore imposed to ensure Condition 1:

wij = wji ∀i, j ∈ 1, 2, . . . , n. (2.15)

Solving a system of linear inequalities can be formulated as an optimization problem,

called linear feasibility problem (Mangasarian, 1994):

minw∈<d

c

s.t. A · w + b ≤ 0 (2.16)

where c is an arbitrary constant. Two effective tools to solve this problem are the well-known

simplex method (Bertsekas, 1995) and a useful learning algorithm proposed for discrete

perceptrons, so called the discrete perceptron learning algorithm (Rosenblatt, 1962). One

28

should note however that a solution to the formulated problem might not exits, which means

that there exists no DHN that accepts all memory vectors as its fixed points.

The proposed method is superior to any other method in storing fixed points, but does

not take the attractiveness condition into account, thus does not guarantee perfect storage.

On the other hand, it may actually be extended to render each fixed point attractive by

augmenting additional linear inequalities to the constraints. Beyond fixed points, even

desired trajectories can be embedded to the DHN recurrence in this way. In relation to

the discrete-perceptron-based design, a new method following a similar but indirect way in

the recurrent AM design will be described in Chapter 4, further ensuring attractiveness and

also accounting for the memory capacity.

CHAPTER THREE

TWO GRAPH THEORETICALDESIGN METHODS FOR

RECURRENT ASSOCIATIVEMEMORY

This chapter introduces two original design methods for asynchronous DHN operating

on unipolar binary pattern space. Both methods make use of graphs that provide effective

binary information representation.

3.1 The Boolean Hebb Rule

3.1.1 Motivation

Based on the observation of the one-to-one correspondence between the fixed points of a

specific DHN and the Maximal Independent Sets (MISs) of a given graph, a design method

for DHNs in order to solve the vertex cover problem has been suggested in (Shrivastava

et al., 1992). This DHN, so called nonpositive Hopfield network, has a symmetrical,

nonpositive weight matrix and a zero threshold vector. Though its aim is not to obtain an

AM but to solve a quadratic 0-1 problem, the proposed method constitutes a remarkable

contribution to the binary recurrent AM design, because it actually provides an AM

satisfying both Condition 1 and 2 under a condition, namely the compatibility of a given

set of memory vectors: There exists a graph accepting all these vectors as the characteristic

vectors of its MISs. However, this design method still fails to satisfy the attractiveness

consideration. As a consequence of this fact, it has been applied in (Shrivastava et al., 1995)

30

to correct unidirectional errors in binary codes, i.e. the errors caused by transitions either

0 → 1 or 1 → 0, but not both.

The maximum number of binary codes that can be introduced as attractive fixed points

to the nonpositive Hopfield network has also been investigated and found to be 3n/3 for

n = 3l, where l is a positive integer. (see (3.20) for other n values). This number is

fairly high when compared to the achievable capacities of outer-product method and to the

projection learning rule.

This section reports an extended theoretical result, namely a DHN satisfying Conditions

1-4 exists with high memory capacity if the given memory vectors are correlated as

being compatible. A Hebbian-like design procedure for this network is described, and a

comprehensive capacity analysis is finally presented.

3.1.2 A Graph Representation of a Binary Memory Set

A graph G =< V, E > consists of a set V of nodes and a set E ⊂ V × V of edges which

represent connections between some pairs of nodes. Two nodes of a graph are said to be

adjacent if there exists an edge between them, and nonadjacent otherwise. A graph of n

nodes is represented by an n×n binary symmetrical matrix A = [aij], called the adjacency

matrix, such that

aij =

1 if (i, j) ∈ E

0 otherwise(3.1)

A set S ⊂ V of nodes is called an independent set if the elements of S are pairwise

nonadjacent. An independent set SM is maximal if none of its strict supersets, i.e. the sets

including SM together with at least one node not included by SM , is also an independent

set. An independent set can be represented by an n-dimensional binary vector, so called the

characteristic vector, defined by:

xSi =

1 if node i ∈ S

0 otherwise. (3.2)

31

3.1.2.1 The Boolean Hebb Rule

The adjacency matrix A of a graph with known independent sets

xS1 ,xS2, · · · ,xSp

is the

Boolean complement of the following martix A:

A =[

xS1 ·(

xS1

)T]

∨[

xS2 ·(

xS2

)T]

∨ · · · ∨[

xSp ·(

xSp

)T]

(3.3)

Herein, “·” stands for real vector multiplication, and “∨” for bitwise Boolean OR operation.

Note that the above construction of matrix A from a given set of vectors is similar to the

Hebb’s rule (or outer-product rule) used in the design of DHN. Boolean OR operation is

used here instead of real addition in order to accumulate the outer-products.

3.1.2.2 Formulation of Maximal Independent Sets

The following theorem enables to identify MIS’s in an algebraic way.

Theorem 1 ((Pardalos & Rodgers, 1992)) x∗ is the characteristic vector of an MIS in graph

G if and only if x∗ is a discrete local minimum of the quadratic function defined by

EMIS(x) = xTAx − eTx, x ∈ 0, 1n , (3.4)

where e is the n-dimensional column vector whose entries are all unity.

The above mentioned local minimality of an x∗ means that EMIS(x∗) ≤ EMIS(x) for

every x satisfying dH(x∗,x) = 1. Such a minimum point x∗ is called strict if the inequality

is strict, i.e. EMIS(x∗) < EMIS(x) for every x satisfying dH(x∗,x) = 1, where dH(·, ·)denotes the Hamming distance.

Fact 2 Any local minimum of (3.4) is necessarily strict.

Proof: Considering the result presented by Theorem 1, it suffices to prove that the

characteristic vectors of MIS’s in a graph are at least 2-Hamming distance away from

32

each other mutually. To see this fact, we will use contradiction. Let x and y denote two

characteristic vectors associated to MIS’s X and Y in a graph G, and suppose dH(x,y) = 1,

which means that x and y differ in a single entry, say i-th one. Then, x is either equal to

y + ui or to y − ui, where ui is the i-th unit vector: uii = 1 and ui

j = 0 ∀j 6= i. The first

case here implies that X is the set of nodes which can be obtained by augmenting the node

i to Y , i.e. Y ⊂ X . Similarly, the second case implies X ⊂ Y . These inclusions contradict

with the maximality of Y and X , respectively.

3.1.2.3 Compatibility of a Binary Set

Let the rule (3.3) used in the construction of A matrix from its constituting characteristic

vectors be applied using the elements of a given M = ximi=1 ⊆ 0, 1n. In this case, the

correspondence of M to the set of local minima of the energy function EMIS is ensured

to be one-to-one if and only if each vector represents a maximal independent set and no

extraneous maximal independent set occurs in the resulting graph. The following definition

and theorem are given to describe and to test this property, respectively.

Definition 3 A set M of n-dimensional binary vectors is called compatible if there exists a

graph G with n vertices such that the set of characteristic vectors of MIS’s in G is equal to

M .

There are obviously two cases in which the compatibility of M is violated:

Case 1. There exists a pair of distinct vectors x and y both in M such that the independent

set Sx represented by x is a superset of Sy represented by vector y.

Case 2. There exist an extraneous MIS in graph G which does not correspond to any vector

in M .

These two cases are illustrated in Figures 3.1 and 3.2 respectively.

By the following theorem, we introduce the necessary and sufficient conditions for

compatibility which are checked directly on the memory vectors.

33

G yG x

1

2 3

1

32

1

2 3G(b)(a)

Figure 3.1: (a)The graphs Gx and resp. Gy having Sx = 1, 2 and resp. Sy = 1, 2, 3 as

their unique MIS. x = [1 1 0]T and y = [1 1 1]T . (b) The graph G into which both x and y

are embedded.

Gx32

1

Gy

1

32Gz

1

2 3

1

2 3G

(b)(a)

Figure 3.2: (a) Gx, Gy and resp. Gz having Sx = 2, 3, Sy = 1, 3 and resp. Sz = 1, 2as their unique MIS’s. (b) The graph G has an extraneous MIS, namely Se = 1, 2, 3.

Theorem 2 A set M of binary vectors is compatible if and only if the following conditions

are satisfied:

COMP1 For every x,y ∈ M, there exists indices i, j such

that xi = yj = 1 and xj = yi = 0.

COMP2 Whenever xj = xk = yi = yk = zi = zj = 1 and

xi = yj = zk = 0 for some x,y, z ∈ M, there

exists w ∈ M such that wi = wj = wk = 1

and in addition xl = yl = zl = 1 implies wl = 1

for any l.

As explained in the proof of Theorem 2, which can be found in Appendix, COMP1 is the

necessary and sufficient condition for representing each element of a binary vector set as an

34

MIS in the resulting graph obtained by (3.3). Similarly, COMP2 defines the binary vector

sets which does not cause an extraneous MIS in the resulting graph after applying (3.3). In

other words, these are actually the necessary and sufficient conditions on a binary vector set

for avoiding the violations mentioned in Case 1 and Case 2, respectively.

3.1.3 Design Procedure

Given a graph G, the problem of finding an independent set of maximum cardinality is called

the maximum independent set problem. It has been proven in (Pardalos & Rodgers, 1992)

that the global minimizer of EMIS corresponds to the characteristic vector of the maximum

independent set in G, hence the minimization of EMIS constitutes one of the well-known

solution strategies for the maximum independent set problem.

A gradient-like dynamical neural network can be employed to minimize EMIS, as done in

(Jagota, 1995), (Sengor et al., 1999) and (Pekergin et al., 1999). However, if there exist some

MISs of lower cardinality than the largest one in the given graph, then the solutions of these

networks may be trapped by a non-global local minimum depending on the initial state.

Thus the exact solution of the maximum independent set problem is not guaranteed by these

networks. This disadvantage of gradient based methods in solving maximum independent

set problem turns out to be an advantage if it is applied on AM design as this paper suggests.

From this point of view, the suggested design procedure for binary recurrent AM is made

up of the following steps:

Step 1. Construct a graph G such that the set of characteristic vectors of MISs in G is equal

to the set of memory vectors and determine the adjacency matrix A of G.

Step 2. Design a convergent gradient-like dynamical system whose energy function is

equal to EMIS .

Note that, given a memory set M , the first step of the procedure can be easily performed

by applying (3.3), whenever M is compatible. As discussed in the previous section, each

memory vector is distinguishable as an MIS from the resulting graph, if and only if M is

compatible. With this assumption, now it remains to implement the second step, namely the

35

synthesis of a dynamical network to retrieve a memory vector which is close to its initial

state vector. To achieve this, we focus here on DHN.

3.1.3.1 Unipolar Discrete Hopfield Network

The considered discrete Hopfield recursion is an asynchronous update for the entries of the

binary state vector:

xi[k + 1] = φ

ti −n∑

j=1

wijxj[k]

, φ[α] =

1, α > 0

0, α ≤ 0(3.5)

where wij is the weight weight between neurons i and j, ti is the threshold of neuron i, and

xi denotes the state of neuron i.

A quadratic energy function associated to this recursion and defined on the state-space

of the network is given in the following matrix form:

E(x) = xTWx − tTx, (3.6)

where W := [wij] ∈ <n×n is the weight matrix, t := [ti] ∈ <n is the threshold vector,

and x := [xi] ∈ 0, 1n is the state vector. The following theorem, which is a modified

version of the one given in (Bruck & Goodman, 1988), provides a sufficient condition on

the convergence of the recursion (3.5).

Theorem 3 For a symmetric weight matrix W ∈ 0, 1n×n and a threshold vector t with

all unity entries, the recursion (3.5) is convergent, namely it converges to one of its fixed

points.

Proof: First we show that for a symmetric, binary weight matrix W and all unity

thresholds, the energy function (3.6) is non-increasing along the recursion (3.5). Let i-th

entry of the state vector be updated at an arbitrary time step k. The difference in the n-

36

dimensional state vector x can then be represented by the difference vector ∆x defined

by

∆xi = x[k + 1] − x[k] =

1 if xi[k + 1] = 1 and xi[k] = 0

−1 if xi[k + 1] = 0 and xi[k] = 1

0 otherwise

(3.7)

and all other n− 1 entries of ∆x is zero. Then, by using the symmetry of W, the difference

in the energy function E(x) can be written as

∆E = E (x [k + 1]) − E (x [k])

= (x[k + 1])TW x[k + 1] − (x[k])T

W x[k] − tT (x[k + 1] − x[k])

= (x[k + 1] − x[k])TW (x[k + 1] + x[k]) − tT (x[k + 1] − x[k])

= ∆xTW (2x[k] + ∆x) − tT∆x

= ∆xi

2n∑

j=1

wijxj[k] − ti

+ wii (∆xi)2 . (3.8)

The first term in (3.8) is nonpositive by the definition of the recursion together with

ti = 1 for every i, while the second term wii (∆xi)2 can be either 1 or 0. On the other hand,

the term 2∑n

j=1 wijxj[k] − ti is a nonzero integer since W and x are binary and ti = 1

for every i. Considering also (3.7), we conclude that the sum (3.8) is nonpositive. Since

there exist 2n possible states, E(x) takes values from a finite set, E(x) : x ∈ 0, 1n.

Then, non-increasing E(x) eventually takes one of these finite values and remains constant.

For the asynchronous update mode, ∆E = 0 implies one of these three cases: i) ∆xi =

0 ∀i; ii) wii = 1, ∆xi = −1, 2∑n

j=1 wijxj[k] − ti = 1; or iii) wii = 1, ∆xi = 1,

2∑n

j=1 wijxj[k]− ti = −1. The first case implies that x[k] is a fixed point, while the second

one implies a 1 → 0 transition in the i-th entry of the state vector. Case iii) implies that

the sum 2 ·∑nj=1 wijxj[k] is equal to zero. However this contradicts with the implications

wii = 1, ∆xi = 1 which means that the third case is impossible. Since 1 → 0 transitions

can occur for at most n successive steps, then the trajectory settles down to a fixed point

after finite number of time steps.

37

3.1.3.2 A DHN Free from Spurious Memories: MIS Network

In the light of Theorem 1 and Theorem 3, choosing the weight matrix W equal to the

adjacency matrix A and t equal to e, gives a convergent DHN whose energy function (3.6)

has local minima located exactly at the characteristic vectors of MIS’s, i.e. the memory

vectors. We call this specific DHN as the Maximal Independent Set Network (MIS-N). The

following theorem together with Theorem 1 provide that MIS-N satisfies also Condition 3

for a compatible set of memory vectors.

Theorem 4 The set of fixed points of MIS-N has a one-to-one correspondence with the set

of discrete local minima of (3.6).

Proof: Let x∗ denote a local minimum of (3.6). Then it is necessarily strict by Fact 2.

Let x be a binary vector which lies 1-Hamming distance away from x∗. Then, for some

index i one of the followings is true: i) x∗ = x + ui, ii) x∗ = x − ui, where ui stands for

the i-th unit vector.

Since x∗ is a strict local minimum, one can write:

(x∗)TAx∗ − eTx∗ < (x)T

Ax − eT x. (3.9)

In case i) the inequality (3.9) becomes:

(x∗)TAx∗ − eTx∗ <

(

x∗ + ui)T

A(

x∗ + ui)

− eT(

x∗ + ui)

Since A is symmetric, for x∗i = 0 we get

0 < 2(

ui)T

Ax∗ +(

ui)T

Aui − eTui. (3.10)

38

Note that (ui)TAui is either 0 or 1. Rearranging (3.10), one obtains (ui)

TAx∗ =

∑nj=1 aijx

∗j > 0 since aij, x

∗j ∈ 0, 1. This implies

φ

1 −n∑

j=1

aijx∗j

= 0 for all i satisfying x∗i = 0. (3.11)

Similarly, in case ii), we obtain the following inequality from (3.9).

(x∗)TAx∗ − eTx∗ <

(

x∗ − ui)T

A(

x∗ − ui)

− eT(

x∗ − ui)

.

Then, for x∗i = 1 we have

0 < −2(

ui)T

Ax∗ +(

ui)T

Aui + eT ui. (3.12)

Rearranging (3.12) yields (ui)TAx∗ < 1 for all x∗

i = 1 which is equivalent to

φ

1 −n∑

j=1

aijx∗j

= 1 for all i satisfying x∗i = 1. (3.13)

(3.11) and (3.13) imply that x∗ is a fixed point.

To show that the converse is also true, assume x∗ is a fixed point, so satisfies (3.11) and

(3.13). By the definition of φ(·), (3.13) implies (3.12) and (3.11) implies (3.10). These two

implications prove that x∗ is a (strict) local minimum since (3.9) is satisfied.

3.1.3.3 All Fixed Points of MIS-N are Attractive

The fixed point to which a specific initial state vector converges along the asynchronous

MIS-N recursion might be affected by the update order. In the following theorem, we prove

that for any point x which is in the 1-Hamming distance neighborhood of a fixed point x∗,

there exists at least one trajectory starting at x and ending at x∗.

39

Theorem 5 A fixed point x∗ in the MIS-N is attractive in the sense that for each vector

x, which is 1-Hamming distance away from x∗, there exists a trajectory starting at x and

ending at x∗.

Proof: Let x∗ be a fixed point of MIS-N and let x ∈ 0, 1n be a vector which is

1-Hamming distance away from x∗, i.e. there exists an index j such that x∗i = xi, ∀i 6= j

and either i) x∗j = 0 and xj = 1, or ii) x∗

j = 1 and xj = 0 holds. Suppose that x is injected

as the initial state vector to the network, i.e. x[0] = x, and j-th entry of the state vector is

chosen to be updated first. Since x∗ is a fixed point of the network and it also represents an

MIS in the graph represented by A, one can write in case i)

∑

i

ajixi =∑

i6=j

ajix∗i + ajjxj ≥

∑

i

ajix∗i ≥ 1. (3.14)

(3.14) together with x∗j = φ (1 −∑

i ajix∗i ) = 0 imply that xj[1] = φ (1 −∑

i ajixi) = x∗j =

0. In case ii), we have

∑

i

ajixi =∑

i6=j

ajix∗i + ajjxj ≤

∑

i

ajix∗i < 1. (3.15)



i ajixi) =

x∗j = 1. These two facts show that any fixed point x∗ is attractive in all directions if the j-th

entry, which distinguishes x∗ from its neighbor x, is chosen to be updated prior to the other

entries.

3.1.3.4 An Update Rule Provides Attractiveness for Each Memory Vector

The results derived above guarantee that, like nonpositive Hopfield network (Shrivastava

et al., 1995), the first two design considerations are satisfied by the MIS-N network

irrespective to the update order of the state vector entries. The fixed point to which a

specific initial state vector converges, might be affected by the update order. In the following

theorem, we prove that for any point x which is in the 1-Hamming distance neighborhood

of a fixed point x∗, there exists at least one trajectory starting at x and ending at x∗.

40

Theorem 6 A fixed point x∗ in the MIS-N is attractive in the sense that for each vector

x, which is 1-Hamming distance away from x∗, there exists a trajectory starting at x and

ending at x∗.

Proof: Let x∗ be a fixed point of MIS-N and let x ∈ 0, 1n be a vector which is

1-Hamming distance away from x∗, i.e. there exists an index j such that x∗i = xi, ∀i 6= j

and either i) x∗j = 0 and xj = 1, or ii) x∗

j = 1 and xj = 0 holds. Suppose that x is applied

to the recurrence (3.5) as the initial state vector x[0] = x and j-th entry of the state vector

is chosen to be updated first. Since x∗ is a fixed point of the network and it also represents

an MIS in the graph represented by A, one can write in case i)

∑

i

ajixi =∑

i6=j

ajix∗i + ajjxj ≥

∑

i

ajix∗i ≥ 1. (3.16)



i ajixi) = x∗j =

0. In case ii), we have

∑

i

ajixi =∑

i6=j

ajix∗i + ajjxj ≤

∑

i

ajix∗i < 1. (3.17)



i ajixi) = x∗j =

1. These two facts show that any fixed point x∗ is attractive in all directions if the j-th entry

which distinguishes x∗ from its neighbor x = x[0] is chosen to be updated prior to the other

entries.

If the update order of the states in the discrete Hopfield recurrence (3.5) is chosen to

be random (as usually done), then the network becomes nondeterministic. Hence, the

classical Lyapunov stability of a fixed point for such networks does not apply directly.

One can always use alternatively a deterministic update order so as to make the network

deterministic, which is indeed equivalent to that: There exists a unique trajectory starting

at each point in the state space of the network. But this trajectory might not guarantee the

desired error correction if an entry of the state vector other than the one considered in the

proof of Theorem 6, is chosen to be updated first. In order to ensure the Lyapunov stability

41

for the fixed points of MIS-N, we need i) to determine the cases in which some 1-Hamming

distance neighbors of the fixed point has the possibility (depending on the update rule) of

converging to other fixed points, and ii) propose an update order to avoid these cases.

In the following case study, x∗ is treated as a fixed point and x as a point which is 1-

Hamming distance away from x∗.

Case 1: Suppose x∗j = 0 and xj = 1. For any k 6= j, we can write

∑

i akixi =∑

i6=j akix∗i + akjxj =

∑

i6=j akix∗i + akj

=∑

i akix∗i + akj.

(3.18)

If x is applied to the MIS-N as the initial state vector x[0] and k-th state is chosen to be

updated at the first step, then one of the following cases occurs:

1. If x∗k = φ (1 −∑

i akix∗i ) = 0, then

∑

i6=j akix∗i ≥ 1 and hence

∑

i akix∗i + akj =

∑

i akixi ≥ 1. Thus, xk[1] = φ (1 −∑

i akixi) = 0 which means that no state

transition occurs.

2. If x∗k = φ (1 −∑

i akix∗i ) = 1, then

∑

i akix∗i = 0 since aki, x

∗i ∈ 0, 1. So,

(3.18) implies either i) xk[1] = φ (1 −∑

i akixi) = 1 (when akj = 0), or ii)

xk[1] = φ (1 −∑

i akixi) = 0 (when akj = 1). Case i) means no transition:

xk[1] = xk[0] = 1. However, case ii) means a 1 → 0 transition whenever there

exists an edge between the nodes j and k in the corresponding graph.

Case 2: Suppose x∗j = 1 and xj = 0. For any k 6= j, we can write

∑

i

akixi =∑

i6=j

akix∗i + akjxj =

∑

i6=j

akix∗i =

∑

i

akix∗i − akj. (3.19)

If x is applied to the MIS-N as the initial state vector x[0] and k-th state is chosen to be

updated at the first step, then one of the following cases occurs:

42

1. For x∗k = 1, we have akj = 0 since x∗ represents an MIS in the graph represented

by A. Then, x∗k = φ (1 −∑

i akix∗i ) = 1 and akj = 0 together with 3.19 imply that

xk[1] = xk[0] = 1. This means that no transition is possible for the k-th entry.

2. For x∗k = φ (1 −∑

i akix∗i ) = 0, we have

∑

i akix∗i ≥ 1. Then, 3.19 implies either i)

xk[1] = φ (1 −∑

i akixi) = 0 (when akj = 0), or ii)∑

i akjxi ≥ 2 (when akj = 1)

so xk[1] = 1. In case i) no transition occurs, i.e. xk[1] = xk[0] = 0. Case ii) means

a 0 → 1 transition. This case is possible but rare to face with since it occurs only

when the k-th node of the MIS represented by x∗ is connected to j-th node but not

connected to any other node of this MIS.

In the light of the above discussion, we propose the following search procedure as the an

update rule which provides a trajectory starting at the neighbor x and ending at x∗, hence

makes the MIS-N asymptotically stable.

Set j = 1. If no transition is available for the j-th entry, then increment j by 1. If a

0 → 1 (resp. 1 → 0) transition is valid for the entry j satisfying xj = 0 (resp. for j

satisfying xj = 1) of the current state vector x, before accepting it, check all other zero

entries k 6= j : xk = 0 (resp. unity entries k 6= j : xk = 1) whether a transition is

also valid for these entries or not. If invalid for all k, then accept the transition in the j-th

entry and increase j by 1. If valid for some k, then check all of the possible neighbor states

which will be obtained by updating these entries whether at least one of them is a fixed point

or not.1 If a neighbor is fixed point then accept the transition leading to this fixed point. If

not, then accept the valid transition in the j-th entry and increase j by 1. Stop the procedure

when all entries become unchanged.

3.1.4 Quantitative Properties of Boolean Hebb Rule

As already shown in Section 3.1.3.2, an AM designed by the Boolean Hebb rule has no

spurious memories if and only if the given set of memory vectors is compatible. Then,

the maximum number of n-dimensional vectors that can be embedded by this method to

1A point x ∈ 0, 1n is a fixed point of MIS-N iff Φ (e−W · x) = x, where Φ(·) is the diagonaltransformation from <n to 0, 1n defined by Φ(u) = [φ(u1) · · · φ(un)]T .

43

DHN recursion is equal to the maximum number of MIS’s that a graph with n-vertices may

contain.

A specific graph which contains disjoint triangles was investigated by Moon and Moser in

(Moon & Moser, 1965). Then, Erdos (Erdos & Erne, 1973) showed that this specific graph

has the maximum number of MIS’s among all graphs of the same number of vertices. This

specific graph was independently shown by Furedi (Furedi, 1987), and Moon and Moser

(Moon & Moser, 1965) to have exactly the following number of MIS’s.

Cmax(n) =

3n3 if n = 0 (mod3)

4 · 3n−43 if n = 1 (mod 3)

2 · 3n−23 if n = 2 (mod 3)

, for n ≥ 2. (3.20)

Shrivastava et. al. have given this number in (Shrivastava et al., 1995) as the capacity of

their nonpositive Hopfield network. Based on the results found in Section 3.1.3.2, we also

introduce this number as the reachable upper bound on the number of vectors that can be

stored in a DHN designed by Boolean Hebb rule. Although Cmax(n) (see Table 3.1) is

much greater than the capacities achieved by many available methods, it is actually not an

effective capacity of the considered AM, since not every memory set consisting of Cmax(n)

binary vectors is compatible. The compatible sets of cardinality Cmax(n) are indeed rare in

all of the binary sets of cardinality Cmax(n) as a consequence of very strict constraints on

the construction of the above mentioned specific graphs. Thus, we will call Cmax(n) as the

maximum capacity.

A given compatible set uniquely determines a graph whose MIS set is identical to the

given set. By the one-to-one correspondence between the set of MIS’s in a graph and

a compatible set of binary vectors, we can say that the number of all compatible sets

containing n-dimensional binary vectors is equal to the number of different graphs with

n-vertices that can be obtained by the Boolean Hebb rule, which is actually the number of

different adjacency matrices that can be obtained by (3.3).

44

Any graph with n vertices without self-loops can be represented by an n×n (symmetric)

adjacency matrix with all zero diagonal entries. The number of all such graphs is given by

N0(n) = 2n(n−1)

2 . (3.21)

If some nodes of the considered graph have self-loops, as possible in the Boolean Hebb

rule, then the diagonal of adjacency matrix may contain 1’s.

Fact 3 The number of different adjacency matrices that can be obtained by Boolean Hebb

rule is

Nc(n) = 2n(n−1)

2 +n∑

k=1

(nk) 2

(n−k)(n−k−1)2 (3.22)

where the subindex c, which will also be used below, refers to the compatibility.

Proof: The first term in (3.22) is the number of adjacency matrices with all zero

diagonal entries. Let A be an n × n adjacency matrix obtained by (3.3). It can be observed

that i-th diagonal entry aii of A can be unity only if the i-th row (consequently i-th column)

of A has all unity entries. In other words, the existence of any zero entry on the i-th column

of A implies aii = 0. The number of adjacency matrices with exactly k unity diagonal

entries is

(nk) · 2 (n−k)(n−k−1)

2 . (3.23)

The term (nk)in (3.23) is the number of all possible diagonals possessing exactly k unity

entries. For each of (n − k) zero diagonal entries, there are(

n−kk

)

off-diagonal binary

entries which are arbitrary. So, discarding the case k = 0 implying diagonal entries all

zeros, the number of adjacency matrices with at least one unity diagonal entry is given as

N1(n) =n∑

k=1

(nk) 2

(n−k)(n−k−1)2 (3.24)

45

Finally, the number of graphs that can be obtained by Boolean Hebb rule is the sum of the

two terms N0(n) and N1(n).

Dividing Nc(n) by the number of all n-dimensional binary vector sets gives the

probability that, under the uniform distribution assumption, an arbitrarily given set of n-

dimensional binary vectors is compatible when all sets of binary vectors are equiprobable:

pc(n) =2

n(n−1)2 +

∑nk=1 (n

k) 2(n−k)(n−k−1)

2

22n . (3.25)

Note that pc(n) decreases sharply as n goes to infinity, which means that the performance of

Boolean Hebb rule drastically decreases for increasing n if the number of memory vectors

is allowed to be arbitrary with no care of exceeding the maximum capacity. However, since

we know that any set of cardinality more than Cmax(n) is necessarily incompatible, then

another quantity can be introduced by pre-discarding these incompatible sets as a measure

of an effective performance of the Boolean Hebb rule. By restricting the given memory set

to contain not more than Cmax(n) elements, we obtain the probability of a given binary set

M be compatible with |M | ≤ Cmax(n)

pc (n|m ≤ Cmax(n)) =2

n(n−1)2 +

∑nk=1 (n

k) 2(n−k)(n−k−1)

2

∑Cmax(n)i=1 (2n

i )(3.26)

where m = |M |. This quantity is calculated for some n values and listed in Table 3.1 for

comparison.

Although the probability pc (n|m ≤ Cmax(n)) is still very small, the Boolean Hebb rule

provides a very good compression ratio if a large memory set is compatible. A memory set

M consisting of n dimensional binary vectors is embedded with perfect recall into an n×n

(symmetric and binary) adjacency matrix, which is represented by (n2 + n)/2 bits, while

M itself requires |M | · n bits. Considering the fact that the cardinality |M | of a compatible

memory set may reach Cmax, we define the best lossless compression ratio as the proportion

46

Table 3.1: The maximum capacity Cmax(n), the probability pc (n|m ≤ Cmax(n)), and the

best lossless compression ratio Rb.

n Cmax(n) pc (n|m ≤ Cmax(n)) Rb · 100

3 3 2.62 · 10−1 66

4 4 6.36 · 10−2 62

5 6 2.01 · 10−3 50

6 9 2.20 · 10−6 39

7 12 1.52 · 10−10 33

8 18 2.60 · 10−19 25

9 27 2.11 · 10−34 19

10 36 1.91 · 10−53 15

of the number of bits used for representing M in the adjacency matrix A to |M |·n, evaluated

at |M | = Cmax:

Rb =n + 1

2 · Cmax

. (3.27)

This ratio is given also in Table 3.1 for some n values. It should be pointed out that, as n

goes to infinity, the best lossless compression ratio goes to zero, indicating the remarkable

lossless compression performance of the Boolean Hebb rule for high dimensional vectors.

3.1.4.1 Comparison with Outer-Product Method

As explained in Section 3.1.3.2, compatibility of a memory set is a desired property for the

application of the Boolean Hebb Rule (BHR) as it provides the one-to-one correspondence

of the memory vectors to the fixed points of the resulting AM. That is why, in the

quantitative analysis given in the previous subsection, we have assumed that a given set

of binary vectors is compatible. In other words the preceding subsection presents the

performance analysis of the design method for perfect recall of the embedded vectors, which

is indeed a very strict restriction.

47

In order to examine the applicability of the BHR, below we first do not insist on avoiding

spurious memories but do insist on storing all elements of a given M completely as imposed

by Condition 2. This, indeed, corresponds to relaxing the compatibility assumption in the

way stated by the following fact.

Fact 4 All elements of a binary vector set M are stored as fixed points of a DHN, i.e. M

is completely stored BHR if i) M satisfies COMP1, and ii) there exists a compatible n-

dimensional binary vector set M which is a superset of M .

Proof: If M satisfies COMP1, then the independent sets represented by the memory

vectors do not cover each other. (See Theorem 2.) Then, the only case in which a memory

vector x ∈ M is not represented as an MIS after the design procedure (3.3) is that M

does not satisfy COMP2 (which results in some extraneous MIS’s), and in addition, an

extraneous MIS covers the independent set represented by x. If the set M , which is an

augmented version of M including the characteristic vectors of all extraneous MIS’s, is

compatible, then each element x ∈ M , implying x ∈ M , is represented as an MIS.

In order to compare the complete storage performances of BHR and the outer-product

method, we have produced 1000 sets containing exactly m different, randomly-chosen

binary vectors of dimension n drawn according to uniform distribution. Assuming each

of these sets as the given memory set, we have applied both of the methods to obtain DHNs.

Finally we have checked whether the considered set was completely stored in each network.

The percentages of the completely stored sets via the outer-product method (POPM%) and

via the BHR (PBHR%) in all 1000 sample sets are given in Table 3.2 for some m, n values.

Observe from Table 3.2 that the percentages obtained for outer-product are higher than

the corresponding ones obtained for BHR, and for the same m/n ratio the complete storage

performances of both of the methods decrease as n increases. This shows that the outer-

product is superior to BHR in the sense of complete storage for an arbitrary memory set

chosen according to the uniform distribution. However, this is not the case for sparse

memory sets, i.e. the sets containing relatively small number of 1 entries. To see that,

we have repeated the previous procedure for 1000 sparse memory sets drawn such that the

48

Table 3.2: Percentages of complete storage in the DHNs designed by the Outer-Product

Method (POPM%) and the Boolean Hebb rule (PBHR%) for uniformly distributed random

sets.

n m POPM% PBHR%

50 2 100 100

4 99 89

6 83 6

8 36 0

10 4 0

100 4 100 100

8 95 2

12 34 0

16 2 0

20 0 0

probability of choosing 0 as an entry of a memory vector is 66% while the probability of

choosing 1 is 33% (0 : 66%, 1 : 33%). The complete storage percentages obtained for such

memory sets are listed in Table 3.3. The percentages obtained for bit probabilities 1 : 66%,

0 : %33 which reflect the complete storage performances of the two methods for dense2

memory sets are also given in Table 3.3.

Observe from Table 3.2 and Table 3.3 that the complete storage performance of the outer-

product method reaches its maximum for equiprobable bits and decreases symmetrically as

the bit probabilities deviate from 1 : 50%, 0 : 50%. On the other hand, the performance of

the BHR continuously increase as the chosen memory sets get sparser.

We further relax the complete storage assumption by no longer insisting on storing all of

the given memory vectors in the resulting DHN and observe the proportion of the number of

stored vectors to the cardinality of the original (given) memory set. (This ratio will be called

as the storage percentage.) For the above mentioned bit probabilities, the average storage

2We call a set of binary vectors as dense if the number of its 1-entries is greater than that of 0-entries.

49

Table 3.3: Complete storage percentages POPM% and PBHR% for different bit probabilities.

n m 0 : 33%, 1 : 66% 0 : 66%, 1 : 33%

POPM% PBHR% POPM% PBHR%

50 2 100 100 100 100

4 89 9 88 98

6 19 0 18 79

8 1 0 0 38

10 0 0 0 6

100 4 99 84 98 100

8 0 0 0 85

12 0 0 0 9

16 0 0 0 1

20 0 0 0 0

percentages obtained by the outer-product method (AvPOPM%) and BHR (AvPBHR%) in

1000 random memory sets are presented in Table 3.4.

Table 3.3 and Table 3.4 confirm the well-known result (Dembo, 1989) on the storage

capacity of the outer-product method which states that the method stores 0.138n memory

vectors with probability almost 1. As m exceeds 0.138n, the storage percentages start to

decrease. Moreover, this decrement gets sharper as n increases for a fixed bit probability.

We can conclude by these results that the outer-product method has a better performance

than BHR in storing randomly chosen memory vectors as fixed points (either attractive or

not) in the resulting DHN, when the memory sets are dense or when the bit probabilities

are equal. However, BHR is a better alternative to outer-product method in storing sparse

memory sets as fixed points, which is the case in some applications such as character

recognition (see Figure 3.3). We have observed that our method starts to become superior

to outer-product method at the bit probabilities 1 : 35%, 0 : 65%. Moreover, as stated in

Theorem 6, all of these fixed points are attractive and this can not be guaranteed by the

outer-product method. It should also be noted that the BHR provides a better compression

50

Table 3.4: Average percentages AvPOPM% and AvPBHR% for different bit probabilities.

n m 0 : 33%, 1 : 66% 0 : 50%, 1 : 50% 0 : 66%, 1 : 33%

AvPOPM% AvPBHR% AvPOPM% AvPBHR% AvPOPM% AvPBHR%

50 2 100 100 100 100 100 100

4 95 60 100 95 97 99

6 61 7 97 57 54 98

8 30 3 88 23 24 88

10 10 0 68 5 11 75

100 4 99 84 100 100 99 100

8 28 1 99 33 29 99

12 4 0 90 2 3 77

16 1 0 65 1 1 39

20 0 0 38 0 0 10

ratio since the weight matrix of a DHN obtained by the outer-product method is in general

(signed) integer valued, while BHR always results in a binary weight matrix.

3.1.5 Simulation Results

3.1.5.1 A Compatible Example

The design procedure explained above is simulated for the following compatible set of

memory vectors.

x1 =[

1 1 0 1 0

]T

, x2 =[

0 0 1 1 0

]T

,x3 =[

1 0 0 0 1

]T

.

For each initial state vector x0 ∈ 0, 1n the steady state solution of the MIS-N and the

true mapping obtained by the binary association function (1.1) are listed in Table 3.5 where

the initial state vectors x0’s are represented by their corresponding decimal numbers.

51

Table 3.5: Simulation results of MIS-N.

x0 MIS-N f [x0] x0 MIS-N f [x0]

0 x1 x2 or x3 16 x1 x3

1 x3 x3 17 x3 x3

2 x1 x1 18 x1 x1

3 x3 x2 or x3 19 x3 x3

4 x2 x2 20 x2 x2 or x3

5 x3 x2 or x3 21 x3 x3

6 x2 x2 22 x2 x2

7 x3 x2 23 x3 x2 or x3

8 x1 x1 24 x1 x1

9 x3 x3 25 x3 x3

10 x1 x1 26 x1 x1

11 x3 x1 27 x3 x1

12 x2 x2 28 x2 x1

13 x3 x2 or x3 29 x3 x3

14 x2 x2 30 x2 x1

15 x3 x2 31 x3 x1

As seen in Table 3.5, for most of the initial state vectors, the results obtained by the MIS-

N and the binary association function agree. However, for the initial state vectors 0, 7, 11,

15, 16, 27, 28 and 30, MIS-N converges to erroneous memory vectors since the attraction

regions of the fixed points can not be trimmed, hence Condition 5 is not guaranteed by the

design method presented here.

3.1.5.2 A Compatibilization Procedure and its Character Recognition Application

In this example, we introduce a procedure for “compatibilizing” a given incompatible set of

binary vectors, via modifying some elements of the given set such that the resulting set is

compatible. By the modification of a binary vector, we mean complementing an entry of that

vector. To reach a compatible memory set still meaningful for applications, the number of

52

modifications performed in compatibilization should be as small as possible. It is a fact that

if a given set of vectors has cardinality |M | not greater than Cmax(n), then a compatible set

of cardinality |M | exists and can be obtained from the original incompatible one with a finite

number of bit modifications. At this point we introduce the nearest compatible set problem

as the problem of finding a compatible set of the same cardinality which is obtained by

applying the minimum number of modifications on a given incompatible set. For any given

set M of binary vectors, an approximate solution to this problem can be obtained by the

following algorithm. This algorithm basically determines the entries which cause violations

of COMP1 and COMP2 and modifies the associated vectors.

Algorithm 1

Step 1: Determine the sets:

C1 = (x,y) ∈ M × M | x,y violates COMP1C2 = (x,y, z) ∈ M × M × M | x,y, z violates COMP2If both C1 and C2 are empty, then go to Step 4.

Step 2: For each triple (x,y, z) ∈ C2, determine a triple of indices (i, j, k) such that

xi = yj = zk = 0 and xj = xk = yi = yk = zi = zj = 1.3 Set one of the entries

xj, xk, yi, yk, zi, zj to zero.

Step 3: For each couple (x,y) ∈ C1, determine the smallest index i such that xi = yi = 0.

If ‖x‖ > ‖y‖, then set yi = 1. Else, set xi = 1. If an index i such that xi = yi = 0 does

not exist, then determine the smallest index j such that xj = yj = 1. If ‖x‖ > ‖y‖, then set

xj = 0. Else, set yj = 0. Return to Step 1.

Step 4: Stop.

Ten 10×10 black-white decimal numerals given by the first row of Figure 3.3 are desired

to be stored as memory patterns to the recurrent AM. The binary vectors associated to the

digits are constructed by applying lexiographic ordering the columns of the image intensity

3Note that such a triple necessarily exists by Theorem 2.

53

(d)

(c)

(b)

(a)

Figure 3.3: (a) The original numerals to be stored as memory vectors. (b) The

compatibilized characters. (c) Some distorted numerals. (d) Numerals recalled by MIS-N.

matrix which is a binary matrix indicating the black pixels by 1’s and the white ones by

zeros. Then, we have 10 binary vectors each consisting of 100 entries. It can easily be

verified that this set is not compatible. For example, the vectors associated to the digits 3

and 8 violate COMP1.

Applying the above given compatibilization algorithm, we obtain the modified vectors

which represent the modified characters given by the second row of Figure 3.3. Using this

memory set, we have designed an MIS-N via the Boolean Hebb rule and applied some

distorted versions of the original numerals given by the third row of Figure 3.3 as initial

states. The memory vectors recalled by the network for these distorted vectors are given in

the last row Figure 3.3.

3.2 Recurrent Associative Memory Design via Path Embedding into a Graph

Following another graph theoretical approach, a new recurrent AM design method is

described in this section which ensures the storage of all memory vectors as attractive

equilibria of a convergent DHN.

3.2.1 Proposed Method

The tool we use for representing a given memory set M = x1,x2, . . . ,xm ⊆ 0, 1n

here is a directed graph G = 〈V, E〉 which consists of a set V of n + 2 nodes labelled with

54

v

v v

v v

v

1 2

3 4

s d

Figure 3.4: The graph indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T as its paths

between the nodes vd and vs.

vs, v1, v2, . . . , vn, vd, and a set E ⊂ V × V of ordered pairs (vi, vj) satisfying j > i, called

increasing arcs. (It is assumed that 1 > s and d > n.)

A path P is a subset of E which provides a route from vs to vd via its elements, so

produces an increasing sequence SP of nodes starting at vs and ending at vd. Given a path

P , there exists a unique binary vector [1... xT ... 1]T , so called the node-based characteristic

vector of P , whose 1-entries are indicated by SP as vi ∈ SP ⇔ xi = 1, i = 1, 2, . . . n.

Conversely, any n + 2-dimensional binary vector having 1’s as the first and the last entries,

indicates a unique increasing node sequence in the same way. Hence, it is indeed the node-

based characteristic vector of a path in the n + 2-node graph. As an example, the graph

indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T is given in Figure 3.4. Based on this

representation, we introduce our embedding procedure as follows:

Step 1. For i = 1, 2, . . . , m, augment 1’s as the first and the last entries to the memory

vector xi, and then construct the graph Gi = 〈V, Ei〉 containing a single path indicated

by the node-based characteristic vector [1... (xi)T ... 1]T .

Step 2. Combine G1, G2, . . . , Gm in a single graph by

G = 〈V, E〉 =

⟨

V,n⋃

i=1

Ei

⟩

. (3.28)

55

Note that the resulting graph G represents all memory vectors since it contains their

corresponding paths. Besides, this representation might provide binary data compression

when G is expressed in an upper-triangular, zero-diagonal, node-to-node incidence matrix

form T:

tij =

1 if (vi, vj) ∈ E

0 otherwise, (3.29)

which is of dimension (n+2)×(n+2), so independent of the number m of memory vectors.

Given a graph G of n+2 nodes, the problem of retrieving an embedded memory vector is

equivalent to the problem of extracting its indicating path. To achieve this, we first label the

increasing arcs of G with e1, e2, . . . , el where l = |E|. This labelling enables us to define

the arc-based characteristic vector y ∈ 0, 1l associated to a path P in G as ei ∈ P ⇔yi = 1, i = 1, 2, . . . , l. Then, the node-to-arc adjacency matrix form A of G:

aij =

1 if arc ej departs from node vi

−1 if arc ej arrives at node vi

0 otherwise

, (3.30)

which is of dimension (n+2)×l, makes it possible to distinguish the arc-based characteristic

vectors of paths from other l-dimensional binary vectors in an algebraic way, as stated by

the following fact proven in (Bazzaraa & Jarvis, 1977).

Fact 5 An l-dimensional unipolar binary vector y satisfies

Ay = b (3.31)

if and only if y is the arc-based characteristic vector of a path in G. Where b stands for an

n + 2-dimensional vector defined as b1 = −1, bn+2 = 1 and bi = 0 for i = 2, 3, . . . , n + 1.

56

DHNNode

toArc

ConversionNode

toArc

Conversion

yx01

1

y0

x1

1f(x )0

1

1∼∼

Figure 3.5: Block diagram of the proposed associative memory.

It is easy to see that the unipolar binary solutions of (3.31) correspond to the discrete

local minima of the positive semi-definite quadratic:

Φ(y) =1

2‖Ay − b‖2

2 =1

2yTATAy − bTAy +

1

2bT b. (3.32)

Then, initiated by an l-dimensional binary vector y0, the unipolar DHN recursion (3.5) with

W = ATA and t = ATb, converges to an arc-based characteristic vector associated to

a path in G. Together with an algebraic layer performing arc-to-node conversion of the

state vector y at the output, this network resembles [1... (f [x0])T ... 1]T as its steady-state

response, where f [·] is the association function defined by (1.1), if y0 is chosen as the

arc-based characteristic vector, which is indicated by the node-based characteristic vector

[1... (x0)T ... 1]T . This operation is illustrated in Figure 3.5.

3.2.2 Simulation Results

To demonstrate the performance of the proposed method, we have generated 100 random

memory sets containing m binary vectors of dimension n and observed that, for each instant,

the resulting dynamical network had fixed points located exactly at the augmented memory

vectors, and that there occurred some fixed points other than the desired ones. The average

number of these spurious memories over 100 trials are listed on the third column AvS0 of

Table 3.6 for some n and m values. The fourth column AvS2 of Table 3.6 includes the

average number of spurious memories which are located more than 2 Hamming distance

away from the nearest memory vector.

57

Table 3.6: Average number of spurious memories for some n, m values.

n m AvS0 AvS2

5 3 2.1 0

5 4.9 0

10 10.7 0

7 4 7.9 0.2

7 26.7 0.7

15 57.8 0.4

10 5 45.8 14.7

10 226.4 68.4

20 535.7 121

As verified by these results, the proposed method is superior to the conventional methods

in the sense that it can store an arbitrary collection of memory vectors as fixed points in

the resulting network. However, it cannot avoid spurious memories, because the graph G

obtained by the embedding procedure possibly includes some extraneous paths whose node-

based characteristic vectors are excluded by M . Obviously, the arc-based characteristic

vectors of such paths also minimize (3.32), hence constitute spurious memories. It has

also been observed that the average number of spurious memories caused by the method

is greater than that of the outer-product rule for large n values. On the other hand, most

of these fixed points are located in a small neighborhood of the desired ones, hence cause

smaller errors in recall.

This method can be improved by assigning weights to the increasing arcs of G and

adjusting these weights such that the desired paths have a specific length, say 1, in order

to distinguish them from the undesired ones.

CHAPTER FOUR

CONSTRUCTION OFENERGY LANDSCAPE FOR

DISCRETE HOPFIELDASSOCIATIVE MEMORY

An energy function-based auto-associative memory design method to store a given set

of unipolar binary memory vectors as attractive fixed points of an asynchronous discrete

Hopfield network is presented in this chapter.

4.1 Motivation

A comprehensive stability analysis for DHN presented in (Bruck & Goodman, 1988) has

shown that the asynchronous recursion (2.2) necessarily tends to a fixed point if W is

symmetric and has nonpositive diagonal entries. The proof is based on analysis of a discrete

quadratic energy function

E(x) = xTQx + cTx, (4.1)

defined on the state-space of (2.2) with Q ∈ <n×n and c ∈ <n, which is non-increasing

along the asynchronous recursion.

It can be further shown that (2.2) indeed tends to a (discrete) local minimum of (4.1) if

the diagonal entries of the weight matrix are all zero, i.e. a network consisting of a single

layer of neurons without self-feedback. On the other hand, as proven in the next section,

59

one can always find a symmetric DHN without self-feedback which has fixed points located

exactly at the discrete local minima of a given quadratic form defined on the binary space.

As the network model to be designed in order to perform (1.1), we consider here again

an asynchronous DHN operating on 0, 1n according to the recursion:

xi[k + 1] = φ

n∑

j=1

wijxj[k] + ti

, i ∈ 1, . . . , n (4.2)

where W = [wij] ∈ <n×n is the weight matrix and t = [ti] ∈ <n is the threshold vector.

Assuming that the given memory vectors are at least 2-Hamming distance away from

each other, the design of such a finite-state recurrent network, while ensuring perfect

storage, is in fact equivalent to the design of its energy function, i.e. determining coefficients

(Q, c) in (4.1), under the following condition.

Condition 6 x ∈ M implies that x is a strict local minimum of (4.1), i.e. E(x) < E(y) for

all y ∈ 0, 1n such that d(x,y) = 1.

Instead of dealing directly with the considered DHN recursion, we follow this indirect

energy function-based approach in this chapter.

4.2 Discrete Quadratic Design

To simplify the notation, we first note that the n × n matrix Q in (4.1) can be considered

without loss of generality as symmetric, since for an arbitrary matrix P ∈ <n×n, there exists

a symmetric counterpart R = (P + PT )/2 such that xTRx = xTPx for all x ∈ <n.

Due to the unipolarity of the discrete variable x ∈ 0, 1n, the linear term cTx can

be further expressed as a quadratic term: xT diag(c1, c2, . . . , cn)x to reformulate E(·) as a

single quadratic term

E(x) = xT Qx (4.3)

60

on 0, 1n, where Q is a symmetric real matrix equal to Q+diag(c1, c2, . . . , cn). Expanding

(4.3) as the sum∑n

i=1

∑nj=1 qijxixj and then using the symmetry of Q, provides an

alternative notation

E(x) = a(x)Tw (4.4)

which is linear in the coefficient vector w = [q11 · · · q1n... q22 · · · q2n

... · · · ... qnn]T ∈<(n2+n)/2 obtained by a lexiographic ordering of the coefficients qij’s. The column vector

a(x) represents the multiplicative nonlinearity of (4.3) in x:

a(x) := 2

[

x21

2

...x1x2... · · · ...x1xn

......x2

2

2

... x2x3... · · · ...x2xn

...... · · · ......x

2n

2

]T

. (4.5)

Expressing (4.3) as the weighted sum of parameters in this way enables computation of the

coefficient vector w∗ in <n(n+1)/2 under linear inequality constraints to construct a Q such

that (4.3), and consequently (4.1), satisfies Condition 6 for a given set of memory vectors

M ⊆ 0, 1n.

We assume throughout the design that the condition

d(u,v) > 1 ∀(u,v) ∈ M × M, u 6= v (4.6)

holds for a given memory set M . Then, in order to embed each memory vector as a strict

local minimum of the desired quadratic (4.1) as suggested by Condition 6, we obtain for

each memory vector p ∈ M the set of strict linear inequalities

a(p)T w < a(y)Tw, y ∈ B1(p) − p (4.7)

to be solved for the parameter vector w. Here, ”-” stands for the set difference and B1(u)

is defined as the 1-Hamming neighborhood of u as x ∈ 0, 1n : d(u,x) ≤ 1. We

61

denote the polyhedral cone induced by these n linear inequalities by Sp. Since the desired

coefficient vector w∗ lies within the intersection

S =⋂

p∈M

Sp, (4.8)

its search is indeed the feasibility problem of the homogeneous linear inequalities which

induce the polyhedral cone S. By rearranging (4.7) and incorporating all inequalities

associated to all memory vectors p|M |i=1, we obtain the homogeneous inequality system

as

A(M)w < 0, (4.9)

where

A(M) :=

N(p1)...

N(p|M |)

. (4.10)

and N(p) is the n × (n2 + n)/2 matrix whose j-th row is determined as a(p)T − a(yj)T

with yjk = 1 − pk for j = k and yj

k = pk otherwise.

4.2.1 Original Design Method

If the given memory set yields a feasible inequality system, then the coefficient matrix Q

of the desired quadratic (4.3), so the coefficients (Q, c) of (4.1) which satisfy A1 can be

easily determined from a solution of (4.9). Hence, in our original design method, we look

for a solution of (4.9) by directly applying an available method, such as linear programming

(Luenberger, 1973) or Newton’s method (Bertsekas, 1995). Having determined the discrete

quadratic energy function (4.1) which indicates the memory set as its local minima set,

construction of a dynamical system, whose limit points correspond to these local minima,

completes the recurrent AM design. We describe this procedure in the following corollary.

62

Corollary 1 The fixed points of asynchronous recursion (4.2) correspond to the local

minima of (4.1) for the weight matrix W = −2Q and the threshold vector t = −c.

Moreover, recursion designed in this way is convergent, namely for any initial state it

converges to one of its fixed points.

Proof: The state vector x of a finite-state dynamical system converges to a local

minimum of a discrete function E(x) if every state transition provides a decrement in E(x).

The proof is based on imposing this condition on the DHN dynamics whose state-space is

the unipolar binary space 0, 1n. Taking into account that only one entry of the state vector

is allowed to change at a single time step, we analyze the desired behavior of the network

in two separate cases:

1. Suppose xi = 0 and i-th entry is updated at time instant k, then the value of this entry

in the next step should be:

xi[k + 1] =

0 if (x[k] + ei)TQ(x[k] + ei) + cT (x[k] + ei) <

(x[k])TQx[k] + cTx[k]

1 otherwise

(4.11)

where ei stands for the i-th unit vector. Since the diagonal entries of Q are all zero,

we rearrange (4.11) and formulate it as

xi[k + 1] = φ

−2n∑

j=1

qijxj[k] − ci

, (4.12)

where φ(·) is the unit-step nonlinearity.

2. Suppose now that xi = 1 and i-th entry is updated at time instant k. Then we write:

xi[k + 1] =

0 if (x[k] − ei)TQ(x[k] − ei) + cT (x[k] − ei) <

(x[k])TQx[k] + cTx[k]

1 otherwise

(4.13)

which can be expressed exactly as (4.12).

63

Comparing (4.12) with (4.2) we conclude that the desired network can be obtained by

choosing W = −2 · Q and t = −c. The convergence yields from the well-known result of

Bruck and Goodman in (Bruck & Goodman, 1988) as Q here is considered as zero-diagonal.

Observe from the proof of Corollary 1 that the resulting network is an energy-minimizing

network, in the sense that a state transition is accepted if and only if it causes a decrement

in (4.1). Since a point in the 1-Hamming neighborhood of a local minimum x∗ has strictly

greater energy by the construction of (4.1), then we conclude that each fixed point of the

network, which corresponds to a memory vector, is attractive. This implies that for each y

in the 1-Hamming neighborhood of a fixed point x∗, there exists an index i ∈ 1, 2 . . . , nsuch that, the state vector necessarily converges to x∗ in a single step, when the network is

initiated by y and i-th neuron is updated first. By deciding on a random update order, we

obviously can not ensure this convergence, because there is no guarantee that i-th neuron

will be updated first in the case mentioned above. For this purpose, as we did in deriving

the Boolean Hebb rule in the previous chapter, we propose the following update order to be

followed by the resulting network, which ensures the correction of each 1-bit distortion of

the memory vectors.

Attempt to update j-th neuron for the current state vector for j ∈ 1, . . . , n. If any of

the state transitions leads to a fixed point 1, then accept that transition. Otherwise choose

an arbitrary transition.

4.2.2 Applicability of the Original Method

It is evident by the above derivation that the feasibility of (4.9), i.e. non-emptiness of the

polyhedral cone S, is necessary and sufficient to embed all memory vectors as attractive

fixed points of DHN, hence the success of the method is totally dependent on the given

memory set M , which is the only information used in the construction of the inequality

system (4.9). Although we know that S might be an empty set for some M , yet we can only

be sure of this by constructing (4.9) and then checking for its feasibility by attempting to

1A point x ∈ 0, 1n is a fixed point of DHN if and only if Φ (e−Wx) = x, where Φ(·) is the diagonaltransformation from <n to 0, 1n defined by Φ(u) = [φ(u1) · · · φ(un)]

T .

64

solve it. A simple result on the feasibility of the inequality system (4.9) is provided by the

following fact.

Fact 6 The inequality system (4.9) is feasible for any memory set containing a single, yet

arbitrary binary vector p ∈ 0, 1n.

Proof: Let us define u = x−p and observe that the positive definite quadratic Q(u) =

‖u‖22 on 0, 1n possesses a unique strict local minimum at u = 0. Then, p is the (unique)

strict local minimum of the quadratic P (x) = Q(x − p).

Although the original method described in the Section 4.2.1 is not applicable for a

memory set which yields an infeasible inequality system, we extend this method in the

following subsection to carry on the design even if (4.9) is infeasible.

4.2.3 An Extension of the Method

In this subsection, we assume that the system (4.9) of n · |M | homogeneous inequalities has

no solution for a given M which satisfies (4.6). As no discrete quadratic energy function

possessing such a memory set as strict local minima exists in this case, the best one can do is

to construct a discrete piecewise quadratic function instead, to be minimized by a modified

version of DHN. For this purpose we need to partition the inequality system (4.9) into two

feasible systems as A1w < 0 and A2w > 0. Such a partitioning is always possible by the

following constructive proposition, since (4.9) contains no zero row.

Proposition 1 If A ∈ <l×k has no zero row, then the following algorithm provides an

w∗ ∈ <k such that

∑ki=1 auiw

∗i < 0, ∀u ∈ I1

∑ki=1 aviw

∗i > 0, ∀v ∈ I2,

(4.14)

where I1 and I2 are two disjoint integer sets with I1 ∪ I2 = 1, . . . , l.

65

Step 0: Choose an arbitrary w0 ∈ <k. Construct three matrices A1, A2 and A3 each of

which consists of the rows of A whose inner products with w is negative, positive and zero,

respectively. Set k = 0.

Step 1: Let u be the number of rows of A3. If u = 0, then set w∗ = wk and stop. If not,

choose an arbitrary i ∈ 1, . . . , u and a sufficiently small positive ε such that A1(w +

ε · ai) < 0 and A2(w + ε · ai) > 0, where ai is the transpose of the i-th row of A3. Set

wk+1 = wk + ε · ai.

Step 2: Augment the rows of A3 whose inner products with wk+1 is positive (negative) to

A1 (to A2). Delete these rows from A3, increment k by 1 and return to Step 1.

Proof: The algorithm finds a point which does not belong to any of the hyperplanes

defined by the rows of A, and is indeed an escape procedure from the null-space of the rows

that are orthogonal to the initial choice w0. If at any step k of the algorithm, the vector

wk is orthogonal to a row of A, say alT (which is indeed a row of A3 in step k), then the

vector wk+1 = wk + ε · al calculated at Step 2 obviously does not belong to the hyperplane

represented by alTw = 0 for all ε 6= 0, since al 6= 0 by assumption. On the other hand,

choosing ε 6= 0 sufficiently small in magnitude guarantees that the wk+1 obtained in this

way does not belong to the hyperplanes defined by the rows of A1 nor A2, as wk was indeed

in an open half-space bounded by these hyperplanes. So the algorithm eventually produces

a vector w∗ which is not orthogonal to any rows of A. Note that this result also establishes

that A can be partitioned as suggested in (4.14).

When the algorithm is applied to A(M) in (4.9), which is now considered as infeasible,

the matrices A1 and A2 produced by the algorithm induce two disjoint feasible subsystems

of the inequality system (4.9). Although the coefficient vector w∗ produced by the algorithm

satisfies all inequalities induced by A1, as A1w∗ < 0, the inequalities induced by A2 are all

violated: A2w∗ > 0. Consequently, the former inequality system gives rise to a quadratic

energy landscape coefficients (Q, c) constructed by w∗, while the latter yields (−Q,−c)

constructed by −w∗.

66

Recall from the construction of A(M) described just below (4.9) that each inequality in

(4.9), thus each row of A1 and A2, indeed imposes a restriction on the energy of a specific

vector y ∈ B1(x) to satisfy E(x) < E(y), where x is a memory vector. Let D denote the

set of binary vectors restricted in this way by the inequality subsystem A2w < 0, and let

D := 0, 1n − D. Then, we conclude by the above discussion that each memory vector is

a strict local minimum of the piecewise quadratic function

EPQ(x) =

xTQx + cTx if x ∈ D,

−xT Qx − cTx if x ∈ D(4.15)

where the coefficients (Q, c) are calculated by using w∗ as described in the beginning of

this section.

The best performance a conventional asynchronous symmetric DHN can achieve as a

binary associative memory is indeed provided by the original design method described in

the Section 4.2.1. However, in the present case, no regular quadratic form (4.1) satisfying

A1 exists, thus there exists no asynchronous symmetric DHN having attractive fixed points

located at the memory vectors. A modification of (4.2) becomes thus necessary to minimize

the discrete piecewise quadratic energy function (4.15), which we have constructed instead

of (4.1) in this case.

To minimize a given continuous piecewise quadratic, the idea of choosing the weights

and thresholds of a continuous recurrent network dependent on the state vector is not new

(Park et al., 1993). But, to our knowledge, no asynchronous recursion has been proposed

for minimizing a discrete piecewise quadratic form yet. To minimize a discrete piecewise

quadratic function of the form (4.15), we propose a generalized version of (4.2) with state-

dependent weights and thresholds, and investigate its qualitative performance below.

Definition 4 The generalized version of the recursion (4.2) as

xi[k] = φ

(

h [x [k]]

(

n∑

i=1

wijxj[k] + ti

))

(4.16)

xi[k + 1] =1 − h [x[k]]

2xi[k] +

1 + h [x[k]]

2xi[k], (4.17)

67

where h[·] : 0, 1n → −1, 1 is a discrete function that separates a subset D ⊆ 0, 1n

from its complement D as

h[u] =

1 if u ∈ D

−1 if u ∈ D(4.18)

is called the Constrained One-Nested Discrete Hopfield Network (CON-DHN).

We use the terms constrained and resp. one-nested here to point out the additional

constraint imposed by (4.17) on the original recursion (4.2), and resp. the parameter control

mechanism h[·], which can be realized as a discrete multi-layer perceptron (Rosenblatt,

1962), bringing an additional nonlinearity nested in the activation function φ(·).

Corollary 2 The asynchronous recursion (4.16-4.17) has fixed points located at the local

minima of (4.15) for the weight matrix W = −2 · Q and the threshold vector t = −c.

Moreover, these fixed points are all attractive.

Proof: Suppose first that the state vector x[k] of the network, which designed in this

way, is in D at any time instant k. Then, h[x[k]] is equal to 1 and the right-hand side of

(4.16) is the same as that of (4.2), which implies that any state transition which causes a

decrement in the quadratic form (4.1) is accepted and the outcome is assigned to x[k] as the

new state candidate. If this new vector is also in D, then its value is assigned as the new

state vector x[k + 1], so the network operates as in the unconstrained case (4.2), which was

proven to be an energy minimizer in Corollary 1. However, if such a transition leads to a

point in D, i.e. if h[x[k]] = −1, then the second equation (4.17) imposes x[k+1] = x[k]. In

this way CON-DHN restricts the state vector to stay in D. Then, for any initial state vector

in D the state vector of the network evolves in D as guaranteed by (4.16).

On the other hand, if an initial state vector is in D, i.e. h[x[0]] = −1, then the right-

hand side of (4.16) yields a state transition which causes a decrement in the quadratic form

(−Q,−c), i.e. an increment in (4.1). If the new state candidate is in D, then x[1] assigned

by (4.17) is in D and the network operates as in the previous case in further time steps.

68

Otherwise, the candidate is not accepted as the new state, and in the next time step (4.16)

produces another candidate, until a point in D is obtained. Such a point is necessarily

produced by (4.16) at some time step, because, by construction of D, any point z in D has

a 1-Hamming distance neighbor in D which has lower energy than z. This also establishes

the attractiveness of the fixed point, as one of these neighbors is indeed a local minimum of

(4.15).

Since the recursion (4.16-4.17) designed in this way is also an energy minimizing

network, which converges to a local minimum of (4.15), we can conclude that the resulting

CON-DHN corrects all possible 1-bit distortions of the original memory vectors, with the

update order of the neurons chosen as proposed at the end of the previous section. The

summary of the overall design method proposed in this section is illustrated as a flowchart

in Figure 4.1.

The necessity of algebraic computations h[x[k]] and h[x[k]], which are used in the update

of the state vector by (4.16-4.17) can be justified as follows: CON-DHN performs a more

complicated task than that of the conventional DHN, as it can be designed to recall each

element of an arbitrary M from its distorted versions, even when M does not comply with

the restriction (4.9). This relaxation costs an additional Multi-Layer Perceptron (MLP) to

perform h[·] in addition to the computations required by conventional DHN recursion. The

proposed network is illustrated in Figure 4.2.

From the information storage point of view, the weights wij and the threshold tj of this

MLP in addition to the parameters W and t of the conventional recurrence (which have

been determined by Corollary 2) are needed to be known. In order to minimize the amount

of this additional data that characterizes MLP, one should adjust the coefficient vector w

in the energy landscape design such that the number of hyperplanes which separate D

from D is minimum. We note that an approximate solution to this problem is to minimize

the cardinality of D in the design, which can be approximated by finding a least squares

solution to the infeasible inequality system (4.9) by Han’s method (Han, 1980), or by its

variations, e.g. (Pınar & Chen, 1999) or (Bramley & Winnicka, 1996). An exact solution to

69

GivenM 0,1n

satisfy (4.6).which

ConstructA(M)w < 0

Is it feasible ?

Solve itand obtain

E(x) = x Qx+c xT T

yes

Use Corollary 1and find a DHN

no

PQE (x)

ApplyProposition 1and obtain

Design a MLP to implement h(.)

Use Corollary 2 and find a CON-DHN

Figure 4.1: An algorithmic summary of the overall design method.

70

x[k] x[k+1]

z I-1

WeightControl

Mechanism

Σ φ (.)

t

W

Figure 4.2: Block diagram of the extended network.

the following problem would obviously minimize the number of parameters of the MLP, so

would make our design more efficient.

Problem 1 Given D ⊆ 0, 1n and D = 0, 1n − D, find a set of hyperplanes with

minimum cardinality which separates D from D.

We leave this problem open and proceed with the simulation results.

4.3 Computer Experiments

4.3.1 Applicability and Capacity of the Original Design Method

The original design method proposed in Section 4.2.1 is applicable only for a memory set

which satisfies (4.6) and only when this set yields a linear homogeneous strict inequality

system (4.9) derived from the local minimality conditions. Hence, the probability that a

given memory set M yields a feasible inequality system is a measure of the performance

of the design method if M satisfies (4.6), which is the case in many applications. To

quantify the applicability of the original method, we have investigated the mildness of these

restrictions.

We have generated 100 binary memory sets containing |M | unipolar binary vectors

of dimension n randomly, all satisfying (4.6), for some |M |, n values and constructed

the homogeneous inequality system associated to each set as described by (4.9). The

71

Table 4.1: Percentages of memory sets that yielded feasible inequality systems.

n |M | P% n |M | P%

10 5 100 50 25 100

10 90 50 100

15 9 75 78

20 0 100 5

20 10 100 100 50 100

20 100 100 100

30 62 150 86

40 0 200 8

percentages (P%) of memory sets which resulted in a feasible inequality system are given

in Table 4.1.

It can be observed from Table 4.1 that almost all memory sets with ratio |M |/n less

than 1 gives a feasible inequality system, so our original method is applicable for such sets.

Moreover, as n increases, this critical ratio also increases. This means that our method has

better performance for large n values. This bound of this ratio (which is 1 in the worst case)

under which our method almost ensures the desired recurrent AM, is much greater than that

of the conventional outer product rule which ensures the storage of only 0.14n arbitrarily

chosen memory vectors as fixed points (without ensuring their attractiveness). Assuming

that the memory vectors are mutually orthogonal, projection learning rule is capable of

embedding up to n binary vectors as attractive fixed points to a DHN. However, this bound

is not comparable to ours since orthogonality is a rather strict restriction on the memory

vectors. In other words, the projection learning rule has an acceptable performance when

applied to some specific memory sets among all memory sets with |M |/n < 1. The eigen-

structure method, which is probably the most effective design method yet, can store an

arbitrary binary memory set as attractive fixed points. As introduced at the end of Chapter 2,

this method has been proposed for the design of a continuous Hopfield network whose state

space is the n-dimensional hypercube [0, 1]n, including its interior region. The attractiveness

72

of a fixed point is defined on this space but not on 0, 1n. The method does not guarantee

the correction of errors caused by a bit reversal of the memory vectors despite its providing

attractiveness in the continuous sense. The following design example demonstrates the

superiority of the proposed procedure to the former methods.

4.3.2 A Design Example

Consider that a memory set consisting of the following four vectors is to be stored as

attractive fixed points of a recurrent neural network.

x1 = [0 1 0 0 1]T , x2 = [0 1 1 1 1]T , x3 = [1 0 1 0 1]T , x4 = [1 1 0 1 1]T .

Note that these memory vectors satisfy (4.6). By applying the proposed method and making

use of linear programming for the solution of the homogenous linear inequalities, we have

obtained the weight matrix and the threshold vector of the asynchronous DHN recursion

(4.2) as

W =

0 −15.2 −7.5 7.6 5

−15.2 0 −15.2 15.1 9.2

−7.5 −15.2 0 7.6 5

7.6 15.1 7.6 0 −6.1

5 9.2 5 −6.1 0

, t =

−6.3

−3.3

−6.3

12.8

−1.3

.

It can be verified that each memory vector is an attractive fixed point of this AM. The

projection learning rule and the eigen-structure method (for design parameter τ = 0.5)

could also store each vector as a fixed point of the recurrent networks of their concern, while

the outer product rule could not store any of these vectors at all. By injecting each of the

32 binary vectors of dimension 5 to the networks obtained by these methods, we have also

checked their performance in terms of creating spurious states. This simulation has shown

that our method caused no spurious memory while the three extraneous binary vectors

[0 0 1 1 0]T , [1 0 0 1 0]T , [1 1 1 1 1]T were stored as fixed points in the network obtained

by the projection learning rule. The outer product rule also stored two spurious memories

73

Figure 4.3: Set of characters which are embedded by the original design method as memory

vectors to DHN.

[0 1 0 1 1]T and [1 0 1 0 0]T . For the design parameter τ = 0.5, the eigen-structure method

created four spurious memories, namely [0 0 1 0 1]T , [0 1 0 1 1]T , [1 0 0 0 1]T , [1 0 1 1 1]T

but they could be avoided by increasing τ . However, this effect also prevented the desired

memory vectors from being stored. As an example, for τ = 1, no binary fixed point could

be stored as a fixed point to the network by this method.

4.3.3 Character Recognition and Reconstruction

We applied the design procedure for the set of characters given in Figure 4.3. The

lexiographic orderings of these 13 × 10 black-white characters, where 1 and 0 denote a

black and a white pixel, respectively, have been considered as the given memory vectors. It

has been observed that this memory set satisfies (4.6). These 130-dimensional vectors have

resulted in a consistent inequality system, so we have generated a regular quadratic energy

function by solving it. The fixed points of DHN obtained as stated by Corollary 1 were

identical with at the original memory vectors. It can be verified that DHNs designed by the

outer product method and projection learning rule can not store this information without

any modification on the original characters. The network designed by the eigen-structure

method stores all memory vectors but it is incapable of correcting most of the errors caused

by 1-bit reversals on the original characters. Moreover, the convergence of the state vector

of this network to some non-binary fixed points in [0, 1]n was observed for some initial

conditions. This, of course, cannot be considered as a correct behavior.

Although the original method ensures only the correction of 1-bit errors, many 10-bit

distortions, even some 20-bit distortions, of the memory vectors can be corrected by the

resulting DHN, i.e. the basin of attraction of some memory vectors includes even some

74

Figure 4.4: Reconstructions obtained by the resulting DHN.

Figure 4.5: Three memory patterns used in the classification application.

of its 20-Hamming neighborhood. Some of these corrections are illustrated in Figure 4.4.

Interestingly, no spurious memory was detected during the simulations of the DHN designed

for this memory set, despite the fact that the method does not devise any procedure to avoid

spurious memories.

4.3.4 A Classification Application

We have also tested the performance of the recurrent AM as a classifier. The classification

network in this experiment consists of a pre-processing network cascaded to a recurrent AM

designed by our method for the lexiographic orderings of the three 7 × 7 memory patterns

in Figure 4.5. The pre-processing network is used here to scan the input image with a 7× 7

window and then to obtain the lexiographic ordering of each window. A distorted version

of a map, shown in Figure 4.6a, which contains three types of patterns, is presented to the

classification network. The network was able to classify the two recognized patterns in the

map (see Figure 4.6b) and the other patterns were associated to the blank, which had also

been introduced to the network as a memory pattern in the design phase. This example has

shown that the classification task can be performed by the proposed recurrent AM, even in

noisy environment, besides its general usage in pattern recognition applications.

75

Figure 4.6: The input map (a) and the classification result (b).

4.3.5 An Application of the Extended Method

Finally, we present the results of another simple example to demonstrate the extension of

the method described in Section 4.2.3

Consider a memory set consisting of the following vectors:

x1 = [0 0 0 0 0]T , x2 = [0 0 1 1 1]T , x3 = [0 1 0 1 1]T , x4 = [0 1 1 0 0]T ,

x5 = [1 0 0 1 1]T , x6 = [1 1 0 0 0]T , x7 = [1 1 1 0 1]T , x8 = [1 1 1 1 0]T .

The original method cannot be applied for this memory set, because the inequality system

(4.9) is infeasible and, thus, there exists no quadratic form (4.1) that has strict local minima

located at these vectors. By applying Proposition 1, we obtain a coefficient vector w∗ which

partitions the design inequalities as in (4.14). From this coefficient vector, we then construct

the piecewise quadratic form (4.15) with

Q =

0 −4.7 0.9 −0.9 −0.9

−4.7 0 −4.7 2.9 2.9

0.9 −4.7 0 −0.8 −0.8

−0.9 2.9 −0.8 0 −7.5

−0.9 2.9 −0.8 −7.5 0

, c =

7.6

−6.2

7.7

4.6

4.6

,

76

and

D =

[0 0 0 1 1]T , [0 1 0 0 0]T , [0 1 1 1 1]T , [1 1 0 1 1]T , [1 1 1 0 0]T , [1 1 1 1 1]T

.

It can be easily verified that each memory vector is a strict local minimum of this discrete

function. The weight matrix and the threshold vector of CON-DHN are determined as

W = −2 · Q and t = −c, respectively, according to Corollary 2. The separating function

h[·] is finally realized by a discrete multi-layer perceptron, which responds to a vector in D

as 1, and −1, otherwise.

It has been observed that each memory vector has been stored as an attractive fixed point

of the resulting CON-DHN by the extended method, providing perfect storage. By initiating

the network by each of all possible 32 vectors, we have observed that no spurious memory

occurred in the resulting network.

CHAPTER FIVE

MULTI-STATE RECURRENTASSOCIATIVE MEMORY

DESIGN

In this chapter, a design procedure for an AM operating on multi-valued pattern space

is presented. A generalized DHN, namely complex-valued multi-state Hopfield network is

introduced and its design is maintained in a similar way to the one followed in the previous

chapter.

5.1 Motivation

Though many methods have been proposed aiming to obtain a DHN as a binary associative

memory, hardly a few papers have appeared in the literature that generalize the design to

the non-binary case, i.e., for cases where the memory vectors are allowed to take integral

values other than −1 and 1.

To be able to recall n-dimensional integral memory vectors in 1, 2, . . . , K, the

conventional Hopfield model obviously needs to be generalized such that the state space of

the network contains I := 1, 2, . . . , Kn. A straightforward way to achieve this is through

generalizing the conventional bi-state activation function to a K-stage quantizer as proposed

and analyzed in (Zurada et al., 1996). By replacing the activation functions of neurons in

the conventional Hopfield network with this nonlinearity remarkable steps have been made

towards the design of multi-state associative memories (Shankmukh & Venkatesh, 1995),

(Elizade & Gomez, 1992), (Mertens et al., 1991). It has also been shown in (Nadal & Rau,

78

1991) that the maximum number of integral patterns that can be stored in such a network by

any design procedure is proportional to n · (K − 1) · f(K), where f(K) is of order 1.

An alternative dynamical finite-state system operating on I has been introduced in

(Jankowski et al., 1996) as the complex-valued multi-state Hopfield network. This

model employs the complex neuron model (Aizenberg & Aizenberg, 1992) employing the

complex-signum nonlinearity. Each neuron in this autonomous, single-layer, connectionist

network simply takes a complex weighted sum of previous state values and passes it through

the complex-signum activation function. This produces its next state, where the complex-

signum is a K-stage phase quantizer for complex numbers and is defined as:

csignK(u) :=

e0 0 ≤ arg(u) < 2πK

ei 2πK

2πK

≤ arg(u) < 4πK

...

ei 2πK

(K−1) (K − 1)2πK

≤ arg(u) < 2π

(5.1)

Note that, by the virtue of this nonlinearity, each state of the network is allowed to take one

of the equally spaced K points on the unit circle of the complex plane (see Figure 5.1). Each

neuron indicates an integral information modulated as the phase angle of its unit-magnitude

complex-valued state, which constitutes an element of the state vector of the dynamical

network. Hence, not the original integral vectors, but their transformed versions can be

stored and recalled by this network. This injective transformation, which basically maps

each entry of a vector in the integral lattice I as a point on the unit circle of the complex

plane, is expressed as:

pK(·) : 1, 2, . . . , Kn →

ei 2πK

j : j ∈ 0, . . . , K − 1n

pK(u) :=[

ei 2πK

u1 ei 2πK

u2 · · · ei 2πK

un

]T. (5.2)

The range of pK(·), which will be called the transformed vectors in the rest of the paper,

can also be considered as the co-domain of the transformation. In this case, the usage of

complex-valued multi-state Hopfield network, which actually operates on the transformed

79

Figure 5.1: An illustration of csign8(u) for u = −1.2 − 0.5i.

vector space, is meaningful in processing integral vectors. Each state of the network can be

uniquely transformed to an integral vector in I via p−1K (·).

A generalized Hebb rule has been proposed in (Jankowski et al., 1996) as a learning

procedure for complex-valued multi-state Hopfield network to recall some specific phase

combinations from their distorted versions. However, as expected, this generalized rule,

which constitutes the unique learning procedure for the considered network model, suffers

from almost the same limitations as it does in the binary case. This is why an efficient

application of this network could not have been proposed yet. On the other hand, another

significant qualitative result addressed in (Jankowski et al., 1996) is that the state vector

of the network necessarily converges to a local minimum of a specific real-valued quadratic

functional. This is defined in terms of the network parameters, along the collective operation

of the n complex-valued neurons in asynchronous mode, if the complex weight matrix of

the network is Hermitian and its diagonal entries are all non-negative. Such a network will

be called Hermitian hereafter.

Several design procedures that employ inequalities in the design of recurrent neural

networks have been reported, e.g. (Tan et al., 1991), (Schwarz & Mathis, 1992),

(Xiangwu & Hu, 1997). Such attempts mainly focused on embedding fixed points into

the conventional Hopfield network and constructed the design inequalities directly from

the nonlinear recursion performed by the network. Though a solution of these inequalities

gives the desired parameters of the recursion which has fixed points located at the given

80

binary points, networks designed in these ways might not be capable of restoring a memory

vector from its distorted versions, since attractiveness is not a design condition in such

methods. By posing this property as a constraint, an indirect method to construct the energy

landscape of the discrete Hopfield network via solution of homogenous linear inequalities

was proposed in (Muezzinoglu et al., 2003a). Nevertheless, these effective approaches

on designing conventional bi-state network have not yet been extended for multi-state

associative memories.

Based on the energy minimization performed by the complex-valued multi-state Hopfield

network, this paper suggests an indirect design procedure. The procedure gives a Hermitian

weight matrix such that each transformed memory vector is an attractive fixed point of the

resulting finite state system. The proposed method basically employs homogenous linear

inequalities to dig a basin for each transformed memory vector in the quadratic energy

landscape to ensure that they are all strict local minima. If the system of inequalities is

feasible, then its solution provides the desired quadratic form, and finally the complex

weights of the network are determined from the Hermitian coefficient matrix of this

quadratic.

Feasibility of the inequality system constructed in the design is actually not only

sufficient but also necessary for the existence of a Hermitian network that possesses

attractive fixed points located exactly at the transformed memory vectors. In other words,

if the constructed inequality system is infeasible, no Hermitian network can possess a limit

set that contain the transformed memory vectors. This implies that the proposed method

reveals the best performance of such a network as a multi-state associative memory.

5.2 Design Procedure

5.2.1 Complex-Valued Multistate Hopfield Network

Assume a complex-valued multi-state Hopfield network consists of n fully connected

neurons, whose states at time instant k constitute the state vector x[k] of the network.

Let wij denote the complex-valued weight associated to the coupling from the state of

the j-th neuron to an input of the i-th one. The asynchronous operation of the network

81

is characterized as updating the state of a single neuron, say l-th neuron, at time k according

to the recurrence

xl[k + 1] = csignK

ei(π/K)∑

j

wljxj[k]

, (5.3)

while keeping all other states unchanged. Here K is the resolution factor of the network,

and it determines the cardinality of the finite state-space. Although the term ei(π/K) has no

effect on the network dynamics theoretically, it provides a phase margin of π/K for phase

noise of the weighted sum of state vector entries.

The qualitative properties of the proposed network can be investigated by introducing an

energy function defined on the state-space in terms of the weight coefficients:

E(x) := −1

2

∑

i

∑

j

wijxixj, (5.4)

similar to the way followed in the stability analysis of conventional Hopfield network

(Bruck & Goodman, 1988). A sufficient condition on the convergence of the recursion (5.3)

has been reported in (Jankowski et al., 1996) as a Hermitian weight matrix (W = W∗)

with nonnegative diagonal entries (wii ≥ 0). The proof of this statement is simply achieved

by showing that each state transition necessarily causes a decrement in the energy function

under these conditions, which also enable us to rewrite (5.4) in a real-valued quadratic form:

E(x) = −1

2x∗Wx. (5.5)

Since the network operates in a finite state-space by definition of csignK(·), then the domain

of (5.5) is finite. The state transitions therefore ends at a local minimum of (5.5) in finite time

steps for any initial condition. In fact, the domain of the energy function (5.5) and the state

space of the asynchronous recursion (5.3) are the same spaces. Hence the energy function,

which is quadratic in the states but linear in the weight coefficients, not only establishes the

convergence analysis, but also defines attractive fixed points of the network as its strict local

minima.

82

It is assumed throughout the derivation that the update order of the neurons, i.e. the index

l in (5.3), is chosen at random, like usually it is done in the conventional discrete Hopfield

network.

5.2.2 Design of Quadratic Energy Function with Desired Local Minima

We restrict ourselves to the synthesis of the complex-valued multi-state Hopfield network

with Hermitian weight matrix with zero diagonal entries. Note that this assumption not

only reduces the amount of parameters that describe the network, but also simplifies the

design as it already guarantees convergence. Indeed, the design of the network is equivalent

to the design of its energy function in this case, since the parameters (i.e. the Hermitian

weight matrix) of the network can be uniquely determined from the coefficients of its energy

function and vice versa. Thus, rather than the recursion (5.3) directly, our design method

described in the following mainly focuses on the energy function (5.5), which is necessarily

real-valued by the previous assumption.

Given a set of integral memory vectors M ⊂ 1, 2, . . . , Kn, let Mc denote the set of

complex vectors obtained by transforming elements of M into their complex representation

by (5.2). In order to perform a search for a Hermitian coefficient matrix W such that the

real-valued discrete quadratic form (5.5) attains a local minimum at each element of Mc, we

simply apply the definition of a strict local minimum, and impose a set of strict inequalities:

E(x) < E(y), ∀y ∈ BK1 (x) − x (5.6)

to be satisfied for each x ∈ Mc. Here BK1 (u) is the 1-neighborhood of u and defined

formally as:

BK1 (u) :=

n⋃

i=1

v : vi = uiei 2π

K ∨ vi = uie−i 2π

K , vj = uj, j 6= i

∪ u . (5.7)

83

By substituting (5.4) in (5.6), we express this condition as 2n inequalities to be satisfied by

the coefficient matrix W = [wij]:

∑

i

∑

j

wijxixj >∑

i

∑

j

wijyiyj, ∀y ∈ BK1 (x) − x . (5.8)

Incorporating now our initial design considerations wij = wji and wii = 0, condition (5.8)

can be further expressed in terms of only the upper triangle entries of W:

∑

1≤i<j≤n

wij [xixj − yiyj] + wij [xixj − yiyj] > 0, (5.9)

for all y ∈ BK1 (x) − x. We then substitute the identity

wijxixj + wijxixj = 2RewijRexixj − 2ImwijImxixj, (5.10)

in (5.9) and obtain:

∑

1≤i<j≤n

Rewij [Rexixj − Reyiyj] + Imwij [Imyiyj − Imxixj] > 0. (5.11)

for all y ∈ BK1 (x) − x. Recall from the definition of transformation p(·) in (5.2) that

Rexixj = cos(

2πK

(−xi + xj))

and Imxixj = sin(

2πK

(−xi + xj))

where x is the

original integral vector from which the unit-magnitude complex vector x is obtained. Hence,

the design condition (5.11) could be directly expressed in terms of the original memory

vectors, i.e. the elements of M , instead of the transformed ones in Mc.

We finally gather all inequalities associated to all memory vectors, and formally impose

the overall system of inequalities, which have been derived above, as the design condition

as follows.

84

Corollary 3 The quadratic form (5.5) possesses a strict local minimum at each element of

Mc if and only if the homogenous inequality

∑

1≤i<j≤n

Rewij[

cos(

2πK

(xj − xi))

− cos(

2πK

(yj − yi))]

+Imwij[

sin(

2πK

(yj − yi))

− sin(

2πK

(xj − xi))]

> 0, (5.12)

is satisfied by the Hermitian weight matrix W for all x ∈ M and for all y ∈ IK1 (x)− x.

Here IK1 (x) is the ball that contains the inverse-transformed versions of the vectors in

BK1 (x), namely x and all of its 1 neighbors in the integral lattice 1, 2, . . . , Kn:

IK1 (u) :=

n⋃

i=1

v : vi = ui + 1 (mod K) ∨ vi = ui − 1 (mod K), vj = uj, j 6= i ∪ u .

To find real and imaginary parts of desired weight coefficients, a solution to this system

of 2 · |M | · n inequalities is needed to be calculated by an appropriate method. Note that

(5.12) is a linear feasibility problem, because left-hand side of each inequality is linear in

the variables Rewij and Imwij for i, j = 1, 2, . . . , n. Due to this property, if 5.12 is

a feasible inequality system for a given M , any linear programming procedure, e.g. the

primal-dual method (Luenberger, 1973), or the perceptron learning algorithm (Rosenblatt,

1962), would provide a solution, so the complex parameters of the network could be

determined by reconstructing W from this solution. On the other hand, infeasibility of

(5.12) means that the given memory vectors cannot be altogether embedded as strict local

minima into (5.5), and consequently that there exists no Hermitian network which has

attractive fixed points located at each of these vectors.

5.2.3 Elimination of Trivial Spurious Memories

The goal of the design method described above is only to render each memory vector as an

attractive fixed point of the network. Since no additional condition has been imposed on

eliminating undesired fixed points that might occur in the resulting network, the Hermitian

weight matrix W obtained by solving (5.12) by any suitable procedure could also satisfy

a set of inequalities, which implies a vector other than the elements of Mc be a strict local

minimum of (5.5), although these inequalities are not explicitly imposed in the design.

85

Most of the associative memory design methods are known to cause spurious memories.

Unfortunately, neither the existence nor the location of many of these points in the state-

space of the dynamical network is predictable. Moreover, discrimination of these vectors

after the design is very difficult for large n since almost every point in the huge state

space of the system should be checked for this purpose. On the other hand, some of the

spurious memories are correlated with the memory vectors and their locations can be exactly

determined in terms of the memory vectors. For example, the conventional Hebb rule

used in the design of binary associative memory introduces many undesired fixed points

to the network beyond the desired ones, and most of these points cannot be determined

without checking each point in the entire state-space (Zurada, 1992). However, one can

easily conclude that if x is a fixed point of the discrete Hopfield network, then so is −x.

This property of network designed by the Hebb rule enables the designer to address some

spurious memories in advance, which are directly related to the original memory vectors.

A similar relation can be extracted from our design method by observing from (5.12)

that only the differences between the entries of the integral memory vectors, not their actual

values, are used in the construction of the design inequalities. It can be easily verified that

the inequality system constructed for an integral memory vector x ∈ 1, 2, . . . , Kn in

the way proposed in the previous subsection would be exactly the same one constructed

for each vector x + k · e (mod K), where k = 1, 2, . . . , K and e is the n-vector with all 1

entries. Hence, the weight matrix calculated from the solution of (5.12) not only makes each

element of Mc an attractive fixed point, but also introduces at least (K − 1)|M | additional

vectors, namely the transformed versions of the integral vectors obtained by incrementing

each element of M in modulo K by k · e, k = 1, . . . , K − 1, as spurious memories to the

network. Such vectors are called trivial spurious memories and an extension to the design

is proposed in the following to eliminate them.

Let us append an arbitrary integer, say 1, to each memory vector in M as last entry and

apply the proposed procedure to obtain the complex-valued multi-state associative memory

of n + 1 neurons. Since the last entry of any trivial spurious memory is different than 1

by definition, one can simply exclude their transformed versions from the state-space of

the network by restricting the dynamics (5.3) in the subspace that consists of the vectors

86

whose last entries are equal to ei 2πK . This is achieved by setting the state of the (n + 1)-st

neuron fixed to ei 2πK along the recursion. Note that this state is connected to the inputs of

other neurons via the weights wl,n+1nl=1, thus this modification on the network model is

actually equivalent to introducing a complex threshold tl = ei 2πK wl,n+1 to l-th neuron of the

original network (5.3) for l = 1, . . . , n, whose dynamical behavior can now be recast as:

xl[k + 1] = csignK

ei(π/K)

n∑

j=1

wljxj[k] + tl

. (5.13)

Although the method avoids the trivial spurious memories, there might still occur some

non-trivial spurious memories in the network. It is expected that the number of such

attractive fixed points increase with K, since the cardinality of the state-space increases

with K. However, it is ensured by the method that none of these spurious memories is

located in IK1 (x) for all x ∈ M , therefore the resulting network corrects all possible errors

caused by incrementing or decrementing a single entry of the memory vectors by 1. In other

words, correction of the vectors in 1-neighborhood of the memory vectors are guaranteed.

5.2.4 Algorithmic Summary of the Method

A summary of the proposed design method described in Section 5.2.2 together with its

improvement in Section 5.2.3 is given below.

Algorithm 2 Input to the algorithm is M ⊂ 1, 2, . . . , Ln

Step 0: Set a resolution factor K ≥ L for the network. Append 1 to every x ∈ M as the last

entry. Set A as the empty matrix.

Step 1. For each x ∈ M and for each y ∈ IK1 (x) − x, calculate the row vector

[

c12 s12 c13 s13 · · · c1,n+1 s1,n+1... c23 s23 c24 s24 · · · c2,n+1 s2,n+1

... · · · ... cn,n+1 sn,n+1

]

,

where cij = cos(

2πK

(xj − xi))

− cos(

2πK

(yj − yi))

and sij = sin(

2πK

(yj − yi))

−sin

(

2πK

(xj − xi))

, and append it as an additional row to matrix A.

87

Step 2: Find a solution q∗ ∈ <n(n+1) for the inequality system Aq > 0 by using any

appropriate method.

Step 3: Construct the Hermitian matrix

W =

0 q∗1 + iq∗2 q∗3 + iq∗4 · · · q∗2n−1 + iq∗2n

q∗1 − iq∗2 0 q∗2n+1 + iq∗2n+2 · · · q∗4n−3 + iq∗4n−2

......

.... . .

...

q∗2n−1 − iq∗2n q∗4n−3 − iq∗4n−2 q∗6n−5 − iq∗6n−4 · · · 0

.

Step 4: Extract the parameters of recursion (5.13) from W as wij = wij for i, j =

1, 2, . . . , n and tj = ei 2πK wi,n+1 for j = 1, 2, . . . , n.

As the dimension n of the memory vectors increases, manipulating the energy of each

memory vector in the way suggested by the second and third steps of this algorithm becomes

time and memory consuming when compared to the generalized Hebb rule. In practice,

this procedure is easily realizable for memory sets with resolution factor of order 10 and

dimension of order 10, sufficient to perform reconstruction of gray-scale images. On the

other hand, the performance of the resulting network is much better than that of the one

designed by the generalized Hebb rule, as shown at the end of the next section.

5.3 Simulation Results

Results of computer experiments are presented below to illustrate the quantitative

performance of the method, i.e. the maximum cardinality of an arbitrary memory set that

can be successfully embedded into the network by the proposed design method. The recall

capability of the resulting network and its application on reconstructing gray-scale images

are also demonstrated.

5.3.1 Complete Storage Performance

Any fixed point of an n-th order dynamical system can be considered as an n-dimensional

static information encoded as system parameters. As demonstrated in the previous section,

88

dynamical associative memories are designed from this point of view by determining the

parameters of an a priori chosen network model such that a given set of static vectors are the

fixed points of this system. Hence, an associative memory realizes a dichotomy defined on

its state space: some specific points in this space are fixed points (constitute the limit set) of

the system, while the rest are not. However, the design of an ideal associative memory in this

way is generally not possible for every possible memory set, i.e. not every dichotomy can

be implemented, because of limitations of the chosen model, e.g. the number of parameters.

In our case, for example, the network model involves (n2 + n)/2 complex coefficients

(weights and thresholds), however, the number of all possible dichotomies is equal to 2Kn

,

which is the number of subsets of the state space 1, 2, . . . , Kn. If it were possible to

design the complex-valued multi-state Hopfield network as an ideal associative memory

for every possible memory set, then this design would be a very efficient compression tool

that enables the lossless compression of an arbitrary memory set into (n2 + n)/2 complex

numbers. However, such a compression seems impossible from the information theory

point of view, since the number of free variables, i.e. parameters, is quadratic in n, while

the number of dichotomies grows exponentially with n. Therefore, if the design is based on

a network model, which is the case for many neural associative memories, then only some

of the possible memory sets can be introduced as fixed points to the network by any design

method.

We say that a memory set M is stored completely by our design method if each element

of M constitutes a fixed point in the resulting network. We measure the quantitative

performance by the percentage of the number of completely stored memory sets among

a collection of memory sets generated randomly. Recall that the complete storage of a

memory set is equivalent to the feasibility of the inequality system (5.12) constructed for

this set.

For some n, |M | and K values 100 random memory sets have been generated and

checked whether each of these sets yielded a feasible inequality system or not. The number

of sets that yielded a feasible inequality system for each experiment is listed in Table 5.1,

which shows that almost every set with |M | ≤ n can be completely stored independent of

the value of K.

89

Table 5.1: Percentages of memory sets that yielded feasible inequality systems.

n |M | P% (K = 5) P% (K = 10) n |M | P% (K = 5) P% (K = 10)

5 3 100 100 20 10 100 100

5 90 95 20 100 100

7 2 5 30 3 17

10 4 100 100 50 25 100 100

10 97 100 50 100 100

15 0 12 75 9 21

The effect of K on complete storage performance is also shown in Table 5.1. The

probability of complete storage P% increases as the resolution factor K increases for fixed

n and |M |. However, this would cause the state space to grow enormously and, hence,

possibly cause more non-trivial spurious memories as illustrated in the next subsection.

5.3.2 Application of the Design Procedure

We first give an illustrative example of proposed design procedure and investigate the

performance of resulting network.

Example 1 Consider the memory set consisting of the following integral vectors:

x1 =

3

5

5

3

, x2 =

4

3

1

5

, x3 =

4

4

5

4

, x4 =

5

2

4

3

.

90

which belong to the integral lattice 1, 2, . . . , 54. We have first augmented 1 to each vector

as the last entry and transformed them to their phase-modulated versions by (5.2):

x1 =

ei 6π5

eiπ

eiπ

ei 6π5

, x2 =

ei 8π5

ei 6π5

ei 2π5

eiπ

, x3 =

ei 8π5

ei 8π5

eiπ

ei 2π5

, x4 =

eiπ

ei 4π5

ei 8π5

ei 6π5

;

assuming that the resolution factor K is equal to 5. The inequality system has been

constructed as in (5.12) and been solved by linear programming to obtain the weight matrix

and the threshold vector as

W =

0 7.4 − i68.4 −65.6 + i132.8 139.7 − i31.7

7.4 + i68.4 0 108.4 − i17.8 −76.5 − i92.6

−65.6 − i132.8 108.4 + i17.8 0 80.9 + i167.1

139.7 + i31.7 −76.5 + i92.6 80.9 − i167.1 0

,

t =

−73 − i134.1

−82.6 + i46.1

126.6 − i131.3

20.7 + i174.4

.

It can be verified that for these parameters each transformed memory vector xi is a fixed

point of the recursion (5.13). After injecting each 1-neighbor of each memory vector as the

initial state vector it has been observed that the network converged to the nearest memory

vector for each initial condition. Hence, it can be concluded that the design has been

successful. We have also identified the spurious memories by checking the transformed

version of each element of the integral lattice 1, 2, . . . , 54 and observed that the network

has 15 spurious memories, none of which is trivial. Note that the same memory set can be

embedded for a larger resolution factor. When the design is repeated for K = 6, one can

see that the number of spurious memories increases by 2.

91

Figure 5.2: Test images used in image reconstruction example.

Since gray-scale images can be represented by integral vectors, reconstruction of such

images from their distorted versions constitutes a straightforward application of multi-

state associative memory, as investigated in (Zurada et al., 1994). The following example

illustrates the performance of the proposed method in performing this task.

Example 2 Gray-scale versions of three well-known test images, namely Lenna, peppers,

and cups images, have been used in this experiment. Due to computational limitations, the

original high-resolution 256-level images have been re-scaled to 100 × 100 resolution and

their gray-levels have been quantized down to 20 levels. Thus, each image can be considered

as a 100 × 100 matrix consisting of integral numbers where 0 and 20 denote a black and

a white pixel, respectively, and each integer value in between these values indicate a gray

tone. These three prototype images are shown in Figure 5.2.

Each image has been segmented into 500 20-dimensional vectors as xluv ∈

1, 2, . . . , 2020 for u = 1, . . . , 5 and v = 1, . . . , 100, such that j-th column of l-th image

is represented by concatenating 5 of these integral vectors, namely xlij, i = 1, . . . , 5. Here l

denotes the image index: 1 for Lenna, 2 for peppers, and 3 for cups. A 20-neuron complex-

valued multi-state associative memory has then been designed for each triple of memory

vectors x1uv,x

2uv,x

3uv, u = 1, 2, . . . , 5 and v = 1, 2, . . . , 20. Since we have attempted to

embed only 3 vectors into a 20-neuron network by our method, which is far below the

actual capacity investigated in Section 5.3.1, all 500 designs have been successful.

92

Figure 5.3: Images corrupted by 20% salt-and-pepper noise (above) and their

reconstructions obtained by the network (below).

After the design phase the distorted versions of the prototype images have been obtained

by adding 20% salt-and-pepper noise, as shown in Figure 5.3a. Each of these distorted

images was segmented the same way as described above, and then the transformed version

of each vector obtained in this way as the initial condition to the corresponding network

was applied. After all 500 networks reached their steady states, i.e. fixed points, the integral

vectors have been obtained by the inverse transformation p−1K (·) and combined in a 100 ×

100 matrix. The reconstructed images obtained by this procedure for each distorted image

are shown in the corresponding column of Figure 5.3b. It can then be concluded that the

networks are capable of removing 20% salt-and-pepper noise on each image successfully.

In other words almost none of these 500 networks converges to a spurious memory in this

experiment.

As the experiments were repeated for 40% and 60% noise (see Figures 5.4a and

resp. 5.5a, non-trivial spurious memories became effective in the recall, so reconstruction

performance decreased. This can be observed from the recalled images shown in

Figures 5.4b and resp. 5.5b.

93





94

Figure 5.6: Filtered images obtained from noisy images with 40% salt-and-pepper noise by

the network (above) and by median filtering (below).

The tasks performed by a filter and by an associative memory are conceptually different:

A filter is usually expected to remove noise on any signal, while an associative memory is

designed to filter out the noise on prototype vectors only. However, despite the negative

effects of spurious memories, the performance of the network in filtering noisy images is

still comparable to that of median filtering, which is known to be one of the most effective

methods for filtering out salt-and-pepper noise. This can be verified by Figures 5.6a

and 5.6b, showing the reconstructed versions of 40% corrupted images obtained by our

method and by median filtering, respectively.

The recall capability of our method with the generalized Hebb rule proposed in

(Jankowski et al., 1996) was also compared. In this experiment, the three images in

Figure 5.2 were used as the prototype images in generalized Hebb rule. The dominant

effect of spurious memories can be visually identified when Lenna image was about to be

reconstructed from its 20% distorted version when generalized Hebb rule is used in the

design (see Figure 5.7a). Our method on the other hand enables an almost perfect recall as

shown in Figure 5.7b.

95

Figure 5.7: Lenna images obtained by the networks designed by the generalized Hebb rule

and by the proposed method, respectively.

CHAPTER SIX

MULTI-LAYER RECURRENTASSOCIATIVE MEMORY

DESIGN

To achieve perfect storage of binary memory vectors, a generalization of the discrete

Hopfield model to a multi-layer recurrent network of bi-state discrete perceptrons is

suggested in this chapter. The proposed design procedure employs the back-propagation

learning algorithm. In the training phase, the discrete perceptrons are replaced with

sigmoidal neurons having large gains. The number of neurons in the hidden layer, is

assumed to be adjustable and this flexible structure of the network allows the perfect storage

of arbitrary (uncorrelated) binary vectors. The performance of the proposed network is

investigated by intensive computer simulations.

6.1 Motivation

The discussion made in Section 4.2 yields the fact that any discrete Hopfield-based

dynamical associative memory design procedure, which aims a symmetric weight matrix,

is indeed an attempt to map the given M to the set of discrete local minima of a quadratic.

The design method proposed in Chapter 4 achieves perfect storage with this motivation,

whenever this is achievable. However, there is no way to introduce all memory vectors as

fixed points to DHN if M is not quadratic-distinguishable defined below.

Definition 5 A set M ⊆ 0, 1n is quadratic-distinguishable if there exists a functional of

the form (4.1) which has a discrete local minimum at each element of M . It is called strictly

97

quadratic-distinguishable if, in addition, this quadratic has no local minimum other than

the elements of M .

One can verify that quadratic-distinguishable sets constitute a rather small subset in the

set of all possible binary sets. An investigation performed by computer experiments in

Section 4.2.2 has shown that the validity of this property is closely related to the cardinality

|M | of M , and that almost all binary sets containing less than 1.5 ·n elements are quadratic-

distinguishable while such sets turn out to be rare as |M | increases. This result actually

explains the reason why the conventional model does not work adequately as a binary AM

in most cases, i.e. when the memory vectors are chosen arbitrarily, not correlated in this

way.

Another well-known defect of the model is that, even though all given memory vectors

can be perfectly stored by an appropriate design method, the state vector might converge to

a binary point, which does not correspond to any memory vector, i.e. a spurious memory.

Their occurrence is obviously due to the violation of the property in strict sense. Avoiding

spurious memories in the design, as well as addressing them in the resulting network, is not

an easy task for large dimensions.

From these aspects, a generalization of the conventional model is evidently necessary

to achieve the perfect storage of an arbitrary memory set. A successful generalization has

been proposed in Section 4.2.3 by incorporating an algebraic multi-layer perceptron to the

conventional asynchronous dynamics (4.2). In the following section, we suggest another

modification, namely introducing an additional layer that comprises adjustable number of

neurons, to DHN.

6.2 Multi-Layer Recurrent Network

As an alternative model to the conventional discrete Hopfield network, we consider here two

cascaded layers of bipolar discrete perceptrons as illustrated in Figure 6.1. In synchronous

98

...

w

w

w

w

11

nn

n1

1n

x [k]1

nx [k]

x [k+1]1

x [k+1]n

...

t1

tn

Σ sgn(.)

b1

bl

r

1l

11

rln

rl1

r ...

HiddenLayer

OutputLayer

Σ sgn(.)

Σ sgn(.)

Σ sgn(.)

Figure 6.1: A two-layer recurrent network made up of discrete perceptrons.

mode, this network operates on the bipolar binary state-space −1, 1n according to the

recurrence:

x[k + 1] = so (W · sh (R · x[k] + b) + t) (6.1)

Here W, R, t, and b denote l × n output layer weight matrix, n × l hidden layer

weight matrix, n-dimensional output layer bias vector, and l-dimensional hidden layer

bias vector, respectively. The vector-valued functions sh(·) : <n → −1, 1l and so(·) :

<n → −1, 1n are diagonal transformations defined as [sgn(·) · · · sgn(·)]T . Asynchronous

operation mode for the model could also be defined in the same way as was done in (2.2)

for the single layer case. Note that the model enables the designer adjust the number l

of hidden-layer neurons. It is logical to expect that the two-layer network, even with this

flexibility, is supposed to outperform the classical Hopfield model in association task.

6.3 Design Procedure

We pose the dynamical associative memory design problem for the two-layer recurrent

structure in Figure 6.1 as the determination of the parameters W, R, t, and b which

imposes to recursion (6.1) (or to its asynchronous counterpart) an attractive fixed point

located exactly at each element of M that is in general quadratic-non-distinguishable.

99

Definition 6 i. A fixed point p∗ of the an asynchronous recursion pi[k+1] = ρ(p[k]) defined

on a binary space is called attractive if there exists an update rule such that the recursion

converges to p∗ for all initial conditions in B1(p∗), where Bd(q) denotes the set of binary

points which are located at most d-Hamming distance away from q. ii. For synchronous

recursions, the definition of attractiveness is equivalent to that of stability in the sense of

Lyapunov (Vidyasagar, 1993).

As described in Condition 4, attractiveness for a fixed point x ensures the correction

of any 1-bit distortion on x along the recursion (6.1), thus it is a key property for binary

dynamical associative memories. This is why it is considered as the crucial design condition

here. The search for network parameters under this constraint can now be shaped as a formal

supervised learning procedure ignoring the dynamicity of the network.

Problem 2 Determine real coefficients W, R, t, and b such that the equality

so (W · sh (R · q[k] + b) + t) = p (6.2)

holds for all p ∈ M and for all q ∈ B1(p)1.

In order to solve the design equations (6.2), which involve nested discontinuous

nonlinearities, one can make use of a systematic technique, namely back-propagation

algorithm. However, since the network output, i.e. the left-hand side of (6.2), is a

discontinuous functional of the considered parameters, back-propagation training algorithm

would not be applicable for the network. To overcome this problem, each activation function

sgn(·) should first be replaced with a continuous one which has a similar form to that of

sgn(·). A well-known sigmoid function is given by

sgm(u) =1

1 − exp(−λ · u), (6.3)

1It is assumed in the derivation of design equalities (6.2) that the elements of M are located at least2-Hamming distance away from each other. If memory sets violating this assumption are to be taken intoaccount, then the equality should be imposed for all q ∈ B1(p) − M , where “-” denotes the set difference.

100

where the parameter λ trims the gain of this sigmoidal nonlinearity2, is known to be a good

candidate. The network then becomes ready to be trained to produce the desired outputs

(memory vectors), for the sample input vectors (their 1-Hamming neighbors). Mean-

Square-Error (MSE) is used as the performance index in the back-propagation learning

algorithm. It may be anticipated that the cumulative MSE for the training set diminishes

when sufficiently large number of hidden-layer neurons is used in the design. After the

training phase, the activation functions are finally replaced back with sgn(·).

Complete stability issue has not been considered throughout the design, since the

recursive characteristic of the network was ignored. This may cause the actual network

designed in this way oscillate for some initial conditions other than the 1-Hamming

neighbors of memory vectors. However, as will be illustrated below, these oscillations are

not catastrophic since they occur rarely, and the network interestingly converges to a fixed

point for almost all initial state vectors. The one-to-one correspondence between the fixed

points and the memory vectors is not guaranteed, either, since no additional constraint is

imposed in the design to avoid spurious memories.

6.4 Experimental Results

We present in this section the results of some computer experiments which were conducted

to enlighten the qualitative and quantitative performances of the proposed method.

Experiment 1: A straightforward design was first performed for the following randomly

generated memory set:

M =

−1

−1

−1

−1

,

−1

−1

1

1

,

−1

1

−1

1

,

−1

1

1

−1

,

1

−1

−1

1

,

1

1

−1

−1

,

1

1

1

1

.

Note that the M is not quadratic-distinguishable, so there exists no DHN of 4 neurons which

has attractive fixed points located at the elements of M .

2It can easily be seen that sgm(·) approaches to sgn(·) as λ → ∞.

101

The sample input and desired output sets were then generated as described in the previous

section. When the proposed design procedure was applied for l = 4, the parameters of the

two-layer network with sgm(·) activation functions were obtained as:

W =

1.95 −0.59 −1.76 2.21

9.45 −5.93 −5.09 −2.71

6.67 8.81 −1.96 −5.52

−0.77 1.18 −8.44 −0.30

,R =

2.11 2.13 1.78 −0.32

−1.09 −1.22 2.90 0.48

−1.55 −1.68 −0.11 −2.66

−1.50 −1.49 0.07 −2.04

,

t =

−6.45

−0.02

−2.42

−0.91

,b =

−2.84

−0.09

1.62

−1.81

.

It can be verified that these parameter values globally minimizes MSE after replacing the

activation functions with sgn(·), so the resulting network satisfies all desired input-output

relations.

Each of the 24 binary vectors was then injected as the initial state vector to the network

and the recurrent behavior was observed in order to verify perfect storage: Each element of

M constituted an attractive fixed point of the network. 8 of these binary vectors converged

to the same points as the ones obtained by nearest neighbor classifier, while our network

converged to different points for the rest 8 initial conditions. However, the network at

least did not contain any spurious memories, although the procedure had not imposed any

condition to avoid them.

Experiment 2: In this experiment, we randomly generated memory sets for several n

and |M| values and investigated the effect of the number l of hidden-layer neurons on the

performance of the resulting network upon training.

For each randomly generated memory set consisting of |M | n-dimensional bipolar binary

vectors the proposed design was performed to obtain three two-layer recurrent networks

102

Table 6.1: Performance of the proposed method in providing perfect storage and creating

spurious memories and/or limit cycles depending on l.

n |M| l = n/2 l = n l = 3n/2

PS NPS% NPC% PS NPS% NPS% PS NPS% NPC%

66

√20 0

√12 0

√0 0

12 × 32 0√

17 0√

5 0

18 × 48 3√

23 0√

9 0

88 × 22 0

√12 0

√0 0

16 × 20 10√

14 10√

8 0

24 × 33 16 × 25 12√

10 2

1010 × 32 4

√24 0

√2 0

20 × 40 8√

26 4√

6 0

30 × 60 14 × 30 12√

12 8

1212 × 18 6

√12 0

√2 0

24 × 28 10√

20 4√

6 2

36 × 54 34 × 30 22√

14 16

which comprises n/2, n, and 3n/2 number of hidden-layer neurons, respectively. The results

are listed in Table 6.1.

Here PS denotes the perfect storage and a check sign in this column indicates that the

perfect storage was achieved, while a cross indicates that some memory vectors could not

be embedded as an attractive fixed point to the corresponding network. The quantity NPS%

stands for the percentage of the initial conditions, for which the corresponding network

converged to a spurious memory, in all possible 2n initial states. Similarly, NPC% denotes

the percentage of initial state vectors, for which the network entered a limit cycle.

As can be observed from Table 6.1, the perfect storage is more likely to be achieved

for a relatively large number of hidden-layer neurons, because the number of adjustable

parameters, i.e. the dimensions of R and b, increase as l increases. Both the percentages of

spurious memories and limit cycles also decrease in this case, hence the resulting network’s

103

quantitative performance is improved. However, it should be noted that this effect slows

down the back-propagation algorithm and also increases the cost of identifying the network.

Unfortunately, there exists no procedure to find an optimal l. One strategy to approximate it

could be the pruning technique, which is choosing l large enough to ensure perfect storage

and then repeating the design by decrementing it until the perfect storage fails.

CHAPTER SEVEN

CONCLUSIONS

Five novel design methods to improve the performance of DHNs in evaluating the

association function given by (1.1) as a mapping from an initial state vector to a fixed point

have been proposed in this thesis work.

After introducing the memory concept and the associative memory, first the universal

network model, called DHN, has been introduced in Chapter 2. Five recurrent AM design

criteria have then been derived therein. Three major DHN-based AM design methods have

been explained and their performances have been criticized in terms of their fulfillment of

these criteria.

A Boolean Hebb rule for DHN, which admits binary parameters only, have been first

introduced in the first part of Chapter 3. The basic idea in this design method is simply

to embed each binary memory vector as a maximal independent set into a graph. We

have determined the conditions under which the proposed method gives a recurrent AM

satisfying most of these criteria that have not been simultaneously fulfilled by any available

DHN-based AM design method. We have also given a quantitative analysis of the designed

network and compared the storage capacity of the method to the ones provided by some

well-known methods. The simulations have shown that, even if the design conditions on the

memory set are violated, the performance of the method still outperform the outer-product

method for sparse memory sets. In the second part of Chapter 3, another graph theoretical

approach, namely representing the memory vectors as paths between two specific nodes of a

directed graph, has been presented. Generally with the cost of a higher number of neurons,

this second method guarantees perfect storage. Though one cannot avoid the occurrence

of many spurious memories in the network obtained by the method, these undesired fixed

105

points may occur only in a small neighborhood of the original memory vectors, thus cause

small errors.

Another binary recurrent AM design method which employs homogeneous linear

inequalities derived from the local minimality conditions has been presented in Chapter 4.

A solution to this inequality system yields the coefficients of the discrete quadratic energy

function of the DHN, so the weight matrix and the threshold vector of the network can be

directly determined. Simulations have shown that almost all memory sets with cardinality

less than n, where n is the dimension of the memory vectors, can be completely stored

in the dynamical network and so perfectly recalled. The method eventually establishes

an encoding for an arbitrary set of n-dimensional memory vectors as (n2 + n)/2 weight

and threshold coefficients associated to the recursion. The simulation results have shown

that DHN is suitable to recall each letter of English alphabet when designed by the

proposed method. It should be noted that this has not been achieved by any formerly-

proposed design method yet. Probably the most valuable observation of this work, which

triggered the subsequently proposed methods is that a DHN can only possess fixed points,

which are correlated as being quadratic-distinguishable (c.f. Definition 6). This condition

enlightens the upper bound on DHN’s performance in association task. To achieve a higher

performance beyond this limit, a generalization of the conventional DHN model has also

been proposed and demonstrated.

The approach presented in Chapter 4 has then been generalized to multi-state AM design

in Chapter 5. Besides some straightforward generalizations of the conventional DHN model,

complex-valued multi-state Hopfield network has been introduced as an efficient tool to

process static integral information. To support this idea, a design method for a subclass of

this model has been proposed, and uses Hermitian network model to make it operate as a

multi-state associative memory. The new method was shown to outperform the generalized

Hebb rule, which has yet constituted the only known so far learning rule for this model in

associating phase-modulated integral information. The recall performance of the resulting

network has been illustrated on restoring gray-scale images, and the results have been found

satisfactory.

106

A design procedure to ensure perfect storage of uncorrelated memory vectors into a two-

layer recurrent network has been finally proposed in Chapter 6. The adjustability of the

number of neurons in the hidden-layer of the proposed model allows the designer to attain

any desired degree of performance with the cost of a longer training. Though the model

has already been shown to be much superior to the conventional discrete Hopfield network

in association task, it still needs to be analyzed theoretically to reveal the convergence

conditions and the energy function. The proposed method and its analysis should be

extended to be also applicable in the network’s asynchronous operation mode.

REFERENCES

Aizenberg, N., & Aizenberg, I. (1992). CNN based on multivalued neuron as a model of

associative memory for gray-scale images. Proc. 2nd Int. Workshop on Cellular Neural

Networks and their Applications (CNNA-92), Munich, Germany, 36.

Aksın, D. (2002). A high-precision high-resolution wta-max circuit of O(N) complexity.

IEEE Trans. Circuits and Systems Part II, 49, 48–53.

Anderson, J. (1995). Introduction to neural networks. Cambridge, MA: MIT Press.

Athithan, G., & Dasgupta, C. (1997). On the problem of spurious patterns in neural

associative memory models. IEEE Trans. Neural Networks, 8, 1483–1491.

Bazzaraa, M., & Jarvis, J. (1977). Linear programming and network flows. New York: John

Wiley & Sons.

Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.

Bramley, R., & Winnicka, N. (1996). Solving linear inequalities in a least squares sense.

SIAM J. Sci. Comp., 17, 275–286.

Bruck, J., & Goodman, J. (1988). A generalized convergence theorem for neural networks.

IEEE Trans. Information Theory, 34, 1089–1092.

Bruck, J., & Roychowdhury, V. (1990). On the number of spurious memories in the hopfield

model. IEEE Trans. Information Theory, 36, 393–397.

Dembo, A. (1989). On the capacity of associative memories with linear threshold functions.

IEEE Trans. Information Theory, 35, 709–720.

Dogan, H., & Guzelis, C. (2003). A gradient network for vector quantization and its image

compression applications. Lecture Notes in Computer Science, (to appear).

Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.

108

Elizade, E., & Gomez, S. (1992). Multistate perceptrons: Learning rule and perceptron of

maximal stability. J. Phys. A, Math. Gen., 25, 5039–5045.

Erdos, P., & Erne, M. (1973). Clique numbers of graphs. Discrete Mathematics, 59, 235–

242.

Furedi, Z. (1987). The number of maximal independent sets in connected graphs. Journal

of Graph Theory, 4, 463–470.

Garey, M., & Johnson, D. (1979). Computers and intractability: A guide to the theory of

np-completeness. New York: W.H. Freeman.

Ghosh, J., Lacour, P., & Jackson, S. (1994). Ota-based neural network architectures with

on-chip tuning of synapses. IEEE Trans. Circuits and Systems-II, 41, 49–58.

Golden, R. (1986). The ’brain-state-in-a-box’ is a gradient descent algorithm. Journal of

Mathematical Psychology, 30, 73–80.

Han, S.-P. (1980). Least squares solution of linear inequalities.

Haykin, S. (1994). Neural networks: A comprehensive foundation. New York: McMillan

College.

Hecht-Nielsen, R. (1990). Neurocomputing. Reading, MA: Addison-Wesley.

Hirsch, M., & Smale, S. (1974). Differential equations, dynamical systems, and linear

algebra. New York: Academic Press.

Hopfield, J. (1982). Neural networks and physical systems with emergent collective

computational abilities. Proc. Natl. Acad. Sci., USA, 75, 2554–2558.

Ikeda, N., Watta, P., Artıklar, M., & Hassoun, M. (2002). A two-level hamming network for

high performance associative memory. Neural Networks, 14, 1189–1200.

Jagota, A. (1995). Approximating maximum clique with a Hopfield network. IEEE Trans.

Neural Networks, 6, 724–735.

Jankowski, S., Lozowski, A., & Zurada, J. (1996). Complex-valued multistate neural

associative memory. IEEE Trans. Neural Networks, 7, 1491–1496.

Kohonen, T. (1977). Associative memory: A system-theoretical approach. Heidelberg:

Springer-Verlag.

109

Kohonen, T. (1988). Self-organization and associative memory. Berlin: Springer-Verlag.

Li, J., Michel, A., & Porod, W. (1989). Analysis and synthesis of a class of neural networks:

Linear systems operating on a closed hypercube. IEEE Trans. Circuits and Systems-I,

36, 1405–1422.

Luenberger, D. (1973). Introduction to linear and nonlinear programming. Reading, MA:

Addison-Wesley.

Mangasarian, O. (1994). Nonlinear programming. Philedelphia: SIAM.

Mano, M. (1991). Digital design. New York: Prentice Hall.

Mertens, S., Koehler, H., & Bos, S. (1991). Learning grey-toned patterns in neural networks.

J. Phys. A, Math. Gen., 24, 4941–4952.

Michel, A., Farrell, J., & Porod, W. (1989). Qualitative analysis of neural networks. IEEE

Trans. Circuits and Systems-I, 36, 229–243.

Michel, A., & Liu, D. (2002). Qualitative analysis and synthesis of recurrent neural

networks. New York: Mercel Dekker.

Michel, A., Si, J., & Yen, G. (1991). Analysis and synthesis of a class of discrete-time

neural networks described on hypercubes. IEEE Trans. Neural Networks, 2, 32–46.

Michel, A. N., & Farrell, J. (1989). Associative memories via artificial neural networks.

IEEE Control Systems Magazine, 10, 1405–1422.

Moon, J., & Moser, L. (1965). On cliques in graphs. Isr. J. Math., 3, 23–28.

Muezzinoglu, M. (2000). A graph theoretical approach to the binary dynamical associative

memory design. M.Sc. Thesis: Istanbul Technical University.

Muezzinoglu, M., & Guzelis, C. (2001). A Boolean Hebb rule for binary associative

memory design. Proc. 44th IEEE Midwest Symposium on Circuits and Systems

(MWSCAS’01), Dayton, OH,, 713–716.

Muezzinoglu, M., & Guzelis, C. (2002). Associative memory design via path embedding

into a graph. Proc. 11th Turkish Symposium on Artificial Intelligence and Neural

Networks (TAINN’2002), Istanbul, Turkey, 65–71.

Muezzinoglu, M., & Guzelis, C. (2003a). A Boolean Hebb rule for binary associative

memory design. IEEE Trans. Neural Networks, -, (to appear).

110

Muezzinoglu, M., & Guzelis, C. (2003c). Perfect storage of binary patterns in recurrent

multi-layer associative memory. Neural Processing Letters, -, (submitted, in review).

Muezzinoglu, M., & Guzelis, C. (2003b). Perfect storage of binary patterns in recurrent

multilayer associative memory. Proc. 12th Turkish Symposium on Artificial Intelligence

and Neural Networks (TAINN’2003), Canakkale, Turkey, (to appear).

Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003a). Construction of energy landscape

for discrete Hopfield associative memory with guaranteed attractiveness of fixed points.

Proc. 1st International IEEE EMBS Conference on Neural Engineering, Capri, Italy, –.

Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003b). Construction of energy landscape for

discrete Hopfield associative memory. IEEE Trans. Neural Networks, -, (to appear).

Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003c). A new design method for complex-

valued multistate Hopfield associative memory. Proc. International Joint Conference on

Neural Networks (IJCNN’03), Portland, OR, (to appear).

Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003d). A new design method for the complex-

valued multistate Hopfield associative memory. IEEE Trans. Neural Networks, 14, 891–

899.

Nadal, J., & Rau, A. (1991). Storage capacity of potts-perceptron. J. Phys. I France, 1,

1109–1121.

Pardalos, P., & Rodgers, G. (1992). A branch-and-bound algorithm for the maximum clique

problem. Computers Operations Research, 19, 363–375.

Park, J., Kim, Y., Eom, I., & Lee, K. (1993). Economic load dispatch for piecewise quadratic

cost function using Hopfield neural network. IEEE Trans. Power Systems, 8, 1030–1038.

Pekergin, F., Morgul, O., & Guzelis, C. (1999). A saturated linear dynamical network for

approximating maximum clique. IEEE Trans. Circuits and Systems-I, 46, 677–685.

Perzonnas, L., Guyon, I., & Dreyfus, G. (1986). Collective computational properties of

neural networks: New learning mechanism. Phys. Rev. A, 34, 4217–4228.

Pınar, M., & Chen, B. (1999). l(1) solution of linear inequalities. Ima. J. Numer. Anal., 19,

19–37.

111

Ritz, S., Anderson, J., Silverstein, J., & Jones, R. (1977). Distinctive features, categorical

perception, and probability learning: Some applications of a neural model. Psychological

Review, 84, 413–451.

Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain

machines. Washington: Spartan.

Schwarz, S., & Mathis, W. (1992). Cellular neural network design with continuous

signals. Proc. 2nd Int. Workshop on Cellular Neural Networks and their Applications

(CNNA-92), Munich, Germany, 17.

Sengor, N., Cakır, Y., Guzelis, C., Pekergin, F., & Morgul, O. (1999). An analysis of

maximum clique formulations and saturated linear dynamical network. ARI, 268–276.

Shankmukh, K., & Venkatesh, Y. (1995). Generalised scheme for optimal learning in

recurrent neural networks. IEE Proc.-Vis. Image Signal Process., 142, 71–77.

Shrivastava, Y., Dasgupta, S., & Reddy, S. (1992). Guaranteed convergence in a class of

Hopfield netwoks. IEEE Trans. Neural Networks, 3, 951–961.

Shrivastava, Y., Dasgupta, S., & Reddy, S. (1995). Nonpositive Hopfield networks for

unidirectional error correcting coding. IEEE Trans. Circuits and Systems-I, 42, 293–

306.

Sompolinsky, H., & Kanter, I. (1986). Temporal association in asymmetric neural networks.

Physical Review Letters, 57, 2861–2864.

Sudharsanan, S., & Sundareshan, M. (1991). Equilibrium characterization of dynamical

neural networks and a systematic synthesis procedure for associative memories. IEEE

Trans. Neural Networks, 2, 509–521.

Suter, B., & Kabrisky, M. (1992). On a magnitude preserving iterative maxnet algorithm.

Neural Computation, 4, 224–233.

Tan, S., Hao, J., & Vandewalle, J. (1991). Determination of weights for Hopfield associative

memory by error back propagation. Proc. IEEE Int. Symposium on Circuits and Systems,

5, 2491.

Vavasis, S. (1991). Nonlinear optimization: Complexity issues. New York: Oxford

University Press.

112

Vidyasagar, M. (1993). Nonlinear systems analysis, second edition. New Jersey: Prentice-

Hall Publications.

Watta, P., & Hassoun, M. (1991). Exact associative neural memory dynamics utilizing

Boolean matrices. IEEE Trans. Neural Networks, 2(4), 437–448.

Xiangwu, M., & Hu, C. (1997). Using evolutionary programming to construct Hopfield

neural networks. Proc. IEEE Int. Conference on Intelligent Processing Systems, 1, 571.

Zurada, J. (1992). Introduction to artficial neural systems. St. Paul: West Pub. Co.

Zurada, J., Cloete, I., & van der Poel, E. (1994). Neural associative memories with multiple

stable states. Proc. 3rd Int. Conf. Fuzzy Logic, Neural Nets, and Soft Computing, Iizuka,

Japan, 45–51.

Zurada, J., Cloete, I., & van der Poel, E. (1996). Generalized Hopfield networks with

multiple stable states. Neurocomputing, 13, 135–149.

APPENDIX

Proof of Theorem 2

”Only if” Part: i) COMP1 is obviously necessary for the compatibility: If COMP1 is

violated, then there exists no i, j such that xi = yj = 1 and xj = yi = 0 for some

x,y ∈ M , so either xi = 1 ⇒ yi = 1 ∀i, or yi = 1 ⇒ xi = 1 ∀i. This means either

that the independent set Sy covers Sx, i.e. Sx ⊆ Sy, or that Sy ⊆ Sx, both violating Case 1.

ii) To prove the necessity of COMP2, let us consider three distinct characteristic vectors

x,y, z ∈ M such that they mutually satisfy COMP1 but violate COMP2. There are two

cases to be analyzed:

I) xj = xk = yi = yk = zi = zj = 1 and xi = yj = zk = 0 without having an w ∈ M such

that wi = wj = wk = 1.

II) xj = xk = yi = yk = zi = zj = 1, xi = yj = zk = 0 and there exists some w ∈ M

such that wi = wj = wk = 1. But, for each of these w vectors wl = 0 for some l with

xl = yl = zl = 1.

Embedding this set of vectors into a graph G =< V, E > will cause an extraneous

MIS in both cases. To see that, suppose at the beginning we have a fully connected graph,

i.e. E = (i, j) ∈ V × V and embed vectors one by one. Embedding a binary vector

d ∈ M into G is then equivalent to removing some edges (p, q) ∈ E from the graph when

dp = dq = 1. Consequently, an existing edge (p, q) in G cannot be excluded without

embedding a vector whose p-th and q-th entries are both 1. Now, the above considered

vectors x,y and resp. z remove the edges (j, k), (i, k) and resp. (i, j). If there exists

no w ∈ M such that wi = wj = wk = 1 (as stated in I)), then, after the embedding

procedure, the resulting graph will contain an MIS i, j, k which is imposed by none of

114

the embedded vectors, causes a violation mentioned in Case 2. Now assume II) which

means xj = xk = xl = yi = yk = yl = zi = zj = zl = wi = wj = wk = 1 and

xi = yj = zk = wl = 0 for x,y, z,w ∈ M and no w ∈ M with wi = wj = wk = wl = 1.

Such a quadruple results in an extraneous independent set i, j, k, l which is necessarily a

subset of an extraneous MIS causing a violation mentioned in Case 2.

”If” Part: The proof will be done by contradiction.

i) Assume first that the compatibility is violated by a pair of distinct characteristic vectors

x,y ∈ M in the way stated in Case 1. As a direct consequence of the embedding procedure

(3), Sx ⊆ Sy implies xi = 1 ⇒ yi = 1 ∀i. This violates COMP1.

ii) We will show that the existence of an extraneous MIS implies that there exists a triple of

characteristic vectors violating COMP2.

Any MIS of cardinality 2 is a pair i, j of nodes having no edge between them and

satisfying other conditions needed for being an MIS. In the embedding procedure explained

above, an edge can not be removed from the graph without embedding a vector x whose

i-th and j-th entries are both unity. Since this edge should be imposed by a vector from M ,

then this MIS cannot be extraneous. Hence, any extraneous MIS should be of cardinality 3

or more.

Consider now the case that the assumed extraneous MIS has cardinality 3, and is denoted

by Se3 = i, j, k. Since it is extraneous, Se

3 should not be created by a single vector x in

M . By the definition of independent set, a graph including Se3 as its independent set does

not contain the edges (j, k), (i, k) and (i, j). For a pair of distinct vectors x,y removing

these three edges from the graph, there is only one possibility which needs care: One of the

vectors, say x, removes edge (j, k) and the other, y, removes edges (i, k) and (i, j). This

requires xj = xk = 1 and yi = yj = yk = 1. Se3 is indeed imposed by y, so not extraneous.

By the above analysis, it becomes clear that any extraneous MIS should be of cardinality

3 or more, and an extraneous MIS with cardinality 3 cannot be created by a single or a

couple of characteristic vectors. In fact, as described in the ”only if” part, a triple of vectors

violating COMP2 while satisfying COMP1 creates an extraneous MIS of cardinality 3.

115

Moreover, this is the unique way for a triple causing an extraneous MIS of cardinality 3

or just an extraneous independent set of cardinality 3. (The other ways can be eliminated in

similar to the eliminations done for vector pairs.)

What remains to be proven is that an extraneous MIS Se≥3 with cardinality not less than

3 can only be created by some triple of vectors violating COMP2 while mutually satisfying

COMP1. Let X be the set of Se≥3-nonredundant vectors responsible for the existence of

Se≥3, i.e. each vector in X removes at least one edge between two nodes both in Se

≥3 and

none of two vectors in X removes the same set of Se≥3-related edges. For any vector x ∈ X ,

define the index sets I0x = i ∈ Se

≥3|xi = 0 and I1x = i ∈ Se

≥3|xi = 1 and observe that

I0x 6= ∅ and |I1

x| ≥ 2. Also note that x does not remove the set of edges (i, j) ∈ I 0x × I0

xand also it contributes to the existence of Se

≥3 as removing some edges (j, k) ∈ I1x × I1

x.

Let L0x be any strict subset of I0

x . Then, the set Σe≥3 = Se

≥3 − L0x is also extraneous. To

see that, suppose Σe≥3 is not extraneous while Se

≥3 is extraneous. Now, there should exist a

vector u ∈ M which directly imposes Σe≥3 in the resulting graph. By the definition of L0

x,

x contributes only to the extraneousness of Σe≥3 which is assumed to be non-extraneous, so

contradicts with x ∈ X . It means that the extraneousness of Se≥3 implies the extraneousness

of Σe≥3. Now, consider a specific x ∈ X and choose an L0

x as |L0x| = |I0

x| − 1. Let us

extract the indices i’s belonging to L0x from each vector in X to obtain a new set X of Σe

≥3-

nonredundant vectors which creates Σe≥3. Note that X contains at least three vectors. Let ξ

denote the reduced form of x. Observe that I1ξ = I1

x while I0ξ consists of a single element,

say i. The connections of node i to any other node whose index is in Σe≥3 −i = I1

ξ should

be removed by some other vectors in X . For such a vector η ∈ X , we should have ηi = 1.

On the other hand, there should exist a j ∈ Σe≥3 with ηj = 0. Then, there is a third vector

ζ ∈ X which removes the edge (i, j) as ζi = ζj = 1. There necessarily exists an index

k ∈ Σe≥3 such that ζk = 0 with ηk = 1; otherwise ηk = 0 whenever ζk = 0 contradicts with

the Σe≥3-nonredundancy of η and ζ .

This means that there exists a triple x,y, z ∈ M which are the originals of ξ, η, ζ having

the pattern mentioned in COMP2. The independent set i, j, k created by x,y and z is

either extraneous, so there exists no w ∈ M such that wi = wj = wk = 1 violating

COMP2; or not extraneous. Assume now that all such triples it, jt, ktt are not extraneous.

116

This means that for each t there exists an wt ∈ M such that wtit = wt

jt= wt

kt= 1. Then the

implication xl = yl = zl = 1 ⇒ wtl = 1 ∀l, t contradicts with the extraneousness of Se

≥3.

So this implication should be violated for some l which is equivalent to saying that there

exists a triple of vectors violating COMP2.

DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES VIA FINITE...

Documents

Transcript of DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES VIA FINITE...