An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on...

10
X Simpósiu .1- ••• .. ciro de Arquitetura de Computadores An Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract EARTH is a multithrcadcd program cxccution and architecture model that hides communication and latencies through fine-grain multithreading. EARTH provides a simple synchronization mechanism: a thread is spawned when a pre·specified number of synchroni zation signals are received in its synchronization slot - signaling the fact that a li dependences required for its cxccution are satisfied. This simple synchronization mechanism is an essential primitivc in Threaded·C - a multithreaded language dcsigned to program applications on EARTH. The EARTH synchronization mechanism has beeu efficiently irnplemented on a number of computer platforms, and has played an esscntial role in the support of a large number of parallel applications on EARTH. An interesting open question has been: is such a simple mechanism sufficient to satisfy the synchronization needs of the large set of applications that EARTH ca.n implement? Or could the EARTH progra.mming model benefit from the implementation of more elaborate synchronization mechanism? In sue h case, what are the benefits and tradeoff s of adding this mechanisms to EARTH? This papcr describes the implementation of 1-structures under the EARTH execution and architecture modcl. An 1-structure is a data structure that allows f or the implementation of a /enient computation model. A read operation can be issued to au element of an l·structure before it is known that the corresponding write operation has produced thc 'a lue. We also introduce a new parallcl kcmcl based on the Hopfield Network and dcmonstrate how the l·structure support on EA RT H can be utilized. We finish presenting a complete Threaded·C program to solve the Hopfield kerncl is also presented. Kcywords Multithreaded Progra.mming, EARTH, 1-structures, Hopfield Network. I. lNTRODUCTION The EARTH mul tit hreaded architecture was designed to effectively hide communication and syn- chroni zation l atency and thus support scal able parallel appl i cations. One of the advantage of t he EARTH model is that it can be efficiently implemented using off- the-shelf commercial processors and components (16], (17] , [18] . The EARTH architecture and program execution model was first imple- mented on the l'viA NNA machine [6] . Now, it has been successfu ll y ported to parallel machines such as the IBM SP-2 [7], and a network of affordable comput ers running the Linux operating system, Beowu1q21] , [22]). Threaded-C is the language used to program the EARTH archit ecture in the different platforms. Threaded-C implements s upport f or EARTH multithread programming t hrough a set of extensions to the standard C language. Threaded-C offcr expli ci t supp ort to multi th readcd operat ions, such as Computer Architecture and Parall el Systoms Laboratory, University of Oela"•arc, !lõc"'OJ'k, DE, USA. lottv:ffwww. C4psl.ud<l.cdu , email s: omoro10C4psl. udtl. cdu, ggooOCGpsl.udel.edu, and tongOcopsl. udtl.edu 223

Transcript of An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on...

Page 1: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósiu .1- ••• ~ .. ciro de Arquitetura de Computadores

An Implementation of a Hopfield Network Kernel

on EARTH

José N. Amaral, Guang Gao, Xinan Tang

Abstract

EARTH is a multithrcadcd program cxccution and architecture model that hides communication and synchroni~ation

latencies through fine-grain multithreading. EARTH provides a simple synchronization mechanism: a thread is spawned when a pre·specified number of synchronization signals are received in its synchronization slot - signaling the fact that ali dependences required for its cxccution are satisfied. This simple synchronization mechanism is an essential primitivc

in Threaded·C - a multithreaded language dcsigned to program applications on EARTH. The EARTH synchronization mechanism has beeu efficiently irnplemented on a number of computer platforms, and has played an esscntial role in the

support of a large number of parallel applications on EARTH.

An interesting open question has been: is such a simple mechanism sufficient to satisfy t he synchronization needs

of the large set of applications that EARTH ca.n implement? Or could the EARTH progra.mming model benefit from

the implementation of more elaborate synchronization mechanism? In sue h case, what are the benefits and tradeoffs of

adding this mechanisms to EARTH?

This papcr describes the implementation of 1-structures under the EARTH execution and architecture modcl. An

1-structure is a data structure t hat allows for the implementation of a /enient computation model. A read operation

can be issued to au element of an l·structure before it is known that the corresponding write operation has produced

thc 'alue. We also introduce a new parallcl kcmcl based on the Hopfield Network and dcmonstrate how the l·structure

support on EA RTH can be utilized. We finish presenting a complete Threaded·C program to solve the Hopfield kerncl

is also presented .

Kcywords

Mult ithreaded Progra.mming, EARTH, 1-structures, Hopfield Network.

I. lNTRODUCTION

T he EARTH multithreaded a rchitecture was designed to effectively hide communication and syn­

chronization latency and thus support scalable parallel applications. One of the advantage of t he

EARTH model is that it can be efficient ly implemented using off-the-shelf commercia l processors and

components (16], (17] , [18] . The EARTH architecture and program execution model was first imple­

mented on the l'viA NNA machine [6] . Now, it has been successfully ported to parallel machines such

as the IBM SP-2 [7], and a network of affordable computers running the Linux operating system,

Beowu1q21] , [22]).

Threaded-C is the language used to program the EARTH architecture in t he different platforms.

Threaded-C implements support for EARTH multithread programming t hrough a set of extensions

to the standard C language. Threaded-C offcr explicit support to multithreadcd op erations, such as

Computer Architecture and Parallel Systoms Laboratory, University of Oela"•arc, !lõc"'OJ'k, DE, USA. lottv:ffwww.C4psl.ud<l.cdu ,

emails: omoro10C4psl.udtl. cdu, ggooOCGpsl.udel.edu, and tongOcopsl.udtl.edu

223

Page 2: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

t hread creation, synchronization, and communication. Programmers use these primitives to access thc

underlying EARTH multithreaded features. A complete reference to the Threaded-C language can be

found in [23].

1-structure is a non-strict data structure proposed as an extension to the functional language Id b.v

Arvind and his colleagues [3]. This data structure can be used as a synchronization mechanism to

support producer and consumer type of computation. Because an 1-structure is able to queue read

operations when t hey arrive before t he corresponding write operation, the read operation will return

t he expected value even when it is issued before the write has been performed.

Threaded-C does not pro\·ide direct support for 1-structures. In this paper we describe t he imple­

mentation of a library of functions that delivers the functionality of 1-structures in Threaded-C. We

describe a parallel programming kernel based on a Hopfield Network and present ou r implementation

of this kernel in T hreaded-C using 1-structures. T his implementation demonstrates how 1-structures

facilitate the job of the programmer to synchronize readers and writers.

We include a brief description of the architecture in section li, a good descript ion of the Threaded-C

language can be found in [23]. We present a brief description of 1-structures in section III and present

our implementation of 1-structures in T hreaded-C in section IV. Section V describes a parallel kernel

based on Hopfield Networks, and Section VI describes our implementation ofthis kernel in Threaded-C

using T-structures. In Section VIl we discuss related work.

li . THE EARTH ARCHITECTURE

In the EARTH programming model , threads are sequences of instructions belonging to an enclosing

function. Threads always run to completion - they are non-preemptive. Synchronization rnechanisms

are used to determine when threads become executable (or ready). Although it is possible to spawn a

t hread explicitly, in most cases a thread starts executing when a specified synchronization slot counter

reaches zero. A synchronization slot counter is decremented each time a synchronization signal is

received . In a typical program, such a signal is received when some data becomes available. Besides

the counter, a synchronization slot holds the identification number, or thread id, of the t hread t hat is to

be st arted when the counter reaches zero. This mechanism permits the implementation of dataflow-like

firing rules for t hreads (a t hread is enabled as soon as ali data it will use is available).

An EARTH computer consists of a set of EARTH nodes connected by a communications network.1

Each EARTH node has an Execution Unit (EU) anda Synchronization Unit (SU) linked to each other

by queues (see Figure 1). The EU executes active t hreads, and the SU handles the synchronizat,ion

and scheduling of threads and communication with remete processors.

This division allows t h e implementation of multithreading architectures with off-the-shelf micropro­

cessors mass-produced for uniprocessor workstations [18]. The EU is expected to be a conventional

microprocessor executing t hreads sequentially.2 The SU performs specialized tasks and is relatively

' The EARTH model does not spccify t he netv.-ork's topology. 2 More predse.ly, the El: executes t hreads according to their sequenhal .semantics. NaturaUy, such a processor coulc.l take

adva.ntage o( con\'entional techniques for speeding up sequential t hreads, such as out·Of·order execution and branch prediction.

224

Page 3: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio b . ..... ..a ... iro de Arquitetura de Computadores

······································----. . . . . '

Fig. I. EARTH architecture

simple compared to the EU. Thus, the SU can be implemented in a small ASIC chip. The two queues

connecting the EU and SU may be in separate hardware or may be part of the EU and/ or Sü.

The function of Lhe queues shown in Figure 1 is to buffer Lhe communication between the EU and

SU. The ready queue, writlen by the SU and read by the EU, contains a set of l hreads which are ready

Lo be executed. The EU fetches a thread from the ready queue whenever the EU is ready lo begin

executing a new thread. The event queue, written by the EU and read by the SU, contains requests for

synchronization events and remote memory accesses, generated by t he EU. The SU reads and processes

these requests as fast as it is able. Request from the EU for rernote data can go directly to the neLwork

or go through the local SU; implementation constraints will determine the best mechanism, so this is

not defined in t he model.

To assure flexibility, the EARTH model does not specify a particular instruction set. Instead,

ordinary arithmetic and memory operations use whatever instructions are native to the processor(s)

serving as the EU. The EARTH model specifies a set of EARTH operations for synchronization and

communication. These operations are mapped to native EU instructions according lo the needs of the

specific architecture. For instance, on a machine with ASIC SU chips, the EU EARTH instructions

would most likely be converLed to loads and stores from/to memory-mapped addresses which would

be recognized and intercepted by the SU hardware.

To maximize porlability, the EART H model makes mínima! assumptions about memory addressing

and sharing. An EARTH multiprocessar is assumed lo be a distributed memory machine in which

the local memories combine to form a global address space. Any node can specify any address in thi:.

global space. However, a node cannot read or write a non-local address directly. Rernote addresses are

accessed with special EARTH operations for remote access. A remole load is a split-phase transaction with two phases: issuing the operation and using the value retumed. The second phase is performed

in another thread, after the load has completed .

225

Page 4: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

TI I. I-STIWCTURES

I-structures are data structures introduced by Arvind and his collaborators in t he context of thc

functional programming language Id [3]. The most salient advantage of an I-structure is that there is

no need for synch ronization between reads and writes at t he ir issuing t ime. An I-structurc is considered

to be an array of elements3, wherc cach element of t he array can be in one of three states: empty,

initialized, and suspended. Each elemenl of lhe array can only be writlen once, but il can be read

many t imes. Right after allocalion ali the elements of t hc a rray are in t h e empty slate. Concepl ually,

if a read occurs before the write the element goes into t he suspended state and the read operation i ~

kept in a local queue. Subsequent reads are also queucd. When a write occurs. if the element been

written is in the empty state, the value is written in lhe array and the element go~ into the initialized

state. If the element was in the suspended sta te, ali the reads that were queued for that element a re

serviced before t he writing operation is complete, and the element goes into the initialized state. A

read to an initialized element returns immediately with the value previously written. A write to an

elernent t hat is in the initialized state is considered a fatal error and causes the program to term ina te.

In this document we present a set of functions that implement the functionality of I-structures

in Portable Threaded-C (PTC). Functional language environments have a mechanisrn called garbage

collector that is responsible for reclaiming Lhe memory previously allocated for I-struclures that are no

longer needed. In languages such as Threaded-C this rnechanisrn is not available. T he refore we need

to introduce two new operalions thal were not part of t he original I-structure proposilion: delete and

reset. Observe that for the proper functioning of an 1-structure, the read and write operations must be

atornic. This implementation of 1-structures in PTC running on the existi ng EARTH platforrns derives

atornicity frorn two assurnp tions: threads are non-preernptive; and only a single t hread can run on a

node at a time. T he first condition is inherent to the EARTH rnodel, t he second condition rnight not

b e valid in fu t ure implementalions (in SMP clusters for exarnple). However both conditions are t rue

for ali lhe current irnplernentations of EARTH systerns.

IV. IMPL.EMENTING 1-STRUCT URES IN THREADED-C

T his docurnent describes t he support for l-structures implernented in Porlable T hreaded-C through

a set of library functions. In this irnplernentation an 1-structure can be allocated frorn the heap and

when it is no longer in use can be returned to lhe heap. The library supports llve operations in an

I-structure: allocate, read, write, reset, and delete.

The irnplernentation of T-structures described in this docurnent implernents the state transilion

diagram displayed in F igure 2. The stat~ in t his figure represent lhe slale of individua l elements wit hin

an 1-structure. The operations in the state transition can be separated in two groups. Operations lhat

cause a li the elements of lhe a rray to change slate: allocate, reset and delete: and operations tha t

3 14 s truetures "'tr~ defined as arrays of elements in the seminal .. ·ork of Arvind, )Jikhil, and PingaJi (3). Hma:ever. nothing prevents

thc implemcntation of single elemcnt i·structure, or other data structure orgaJlizations.

226

Page 5: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

Fig. 2. State Transilion Diagram for the 1-Structure lmplementatiou

cause a single element of the array to change state: read and write. We consider that prev ious to its

allocation an 1-structure is in the memory heap. When an 1-structure is allocated ali its elements are

placed in the empty state. As read and write operations are performed, individual elements can t hen

be moved to the su.spended or initialized state. I f any element goes in to the FATAL BRROR state, the

program emits an error message and terminates. Only three conditions are cause for a fatal error in

this implementation:

• a write to an array element that has already been initialized;

• a delete or a reset of an I-structure t hat contains at least one element in the suspended state;

• a delete or a reset of an I-structure that is in the heap (was never allocated or has already been

deleted);

Observe that there are other situations that will cause error but that are not checked in this im­

plementation , such as a write or a read to a non-allocated I-structure or a write or a read out of the

bounds of the allocated I-structure. The functions that implement I-structure in Threaded-C are listcd

in Table 1.

V. THE HOPFIELD KERNEL

In this section we introduce a kernel, based on the Hopfield Network, to illustrate the utilization of

t he I-structures presented in this document. The motivation for the introduction of this kernel is to

illustrate the use of 1-structures to provide for simpler user ?efined synchronization in multithreaded

programming.

The Hopfield Network is a recursive neural network that is often used in combinatorial optimization

problems and as an associative memory !Jl]. In both cases the network is formed by a set of neurons 227

Page 6: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

THREADED IJNIT(SPTR slot_adr)

THREADED LALLOCATE(int a rrayJength, void *GLOBAL *GLOBAL place, SPTR slot..adr)

THREADED LREAD..x(void *GLOBAL array, int indcx, int *GLOBAL place, SPTR slot..adr)

THREADED T..READ.BLOCI< (void *GLOBAL g..array, int index, long block...size, void *GLOBAL

place, SPTR slot ..adr)

THREADED LWRTTLx(void *GLOBAL array, int index, T ,·a lue)

THREADED L\VRITE.BLOCI<.SYNC(void *GLOBAL array, int index, long block...size, ,·oid

*GLOBAL origin, SPTR slot..adr)

THREADED LDELETE(void *GLOBAL array)

THREADED LDELETE.BLOCI<(void *GLOBAL array)

THREADED I..RESET(void *GLOBAL array)

THREADED I..RESET .BLOCI< (void *GLOBAL array)

TABLE I

LIBRARY OF FI,;NCTIONS TllAT IMPLEMENT 1-STRUCTURES IN TIIREAOED·C.

t hat are connected by synapses4• Every neuron is connected to every other neuron in the network.

For the kernel that we introduce in this section we consider only the sit uation in which thc network

is placed in a givcn initia l unstable state and is allowed to move to a stable state. In this case the

values of the synapses a re fixed and thc only values that change are the values of the activation levei

of the neurons. In t his recollection mode, t he value or the output or each neuron at t ime k + 1 is given

by the following equation.

(1)

Where w;; is t he weight of the synapse connecting neuron i to neuron j , S,(k) is the output of

neuron i at t ime k, and sgn() is the sign function that evaluates to+ 1 i f its argument is positive and

evaluates to -1 i f its argument is negative. In order to update its activation levei, a neuron computes

the sum of the product of each one of its synapses and the out put levei of the corresponding neuron.

VI. l MPLEMENTINC THE IIOPFIELD K ERNEL IN T HREADED-C

The Hopfield Network is a recursive neural network that is often used in combinatorial optimization

problems and as an associa tive memory [11). In both cases the network is formed by a set of neurons

t hat a re connected by synapses5 . Every neuron is connected to every other neuron in the network.

Figure 3 presents the structure of ou r implementation of the Hopfield network in Threaded-C using

4In t.his paper "'C disc:uss a. Hopficld kernel suitable for the implcmentation of an associati\'C mcmory. \Vitlt fc"· modifications a similar kemel for the resolution of combinatorial optimization problems can be implemented.

' tn this paper we discuss a Hopfield kernel suitable for the implementation of a.n associative memory. \Vith fc9t' modifications a similar kcrncJ for thc rcsolution of combinatorial optimization problems can be implemented.

228

Page 7: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio .L.actsiieiro de Arquitetura de Computadores

MAL."'i f .................. --------................ -------- .... --------- ... -·~----· ................... ---------------.. ;

: ·: . '-------------------------------------------------J ' ' .........................................

Fig. 3. Synchronization Structure of the Hopfield Kemel lmplementation

I-structures6. The program has three functions: HAIN with six threads, activation..update with

two threads, and compute_change with a single thread. In this section we introduce the code and

explain each piece of the program. Observe in the figure that most of the computation is spent in the

loop formed by threads 3 and 4 of the mai n function and by the function activation_update and

compute.change. Threads O, 1, and 2 of the main function are necessary for initialization a nd thread

5 prints the final results. In Figure 3 each thread is annotated with the initial count and the reset

count of the sync slot t hat causes the thread to be spawned.

Because of space limitation , we do not provide neither the 1-structure implementation, nor t he

Hopfield network kernel code in this paper. lnterested readers can fetch the files with the code from

http:j jwww. capsl. udel. edu/ DOCU M ENTS /i-s truct. ta r. gz. In T hreaded-C each node has its own copy of a global scoped variable. Our implementation uses

global scoped variables to store the values of the synapse weights as arrays of floats. Each EARTH

node is in charge of computing t he new activation levei of one neuron. An invocation of lhe function

LINITO must precede the invocation of any other 1-structure in that node. We use two T-structures in

this implementation, one to contain the values of the activation in the previous iteration, and another

to contain t he new activation values. At the end of each iteration, pointer swaping is used to transfer

the new values to the old i-structure, preventing expensive copying. The use of i-structures rclieves

the programmer of some complex synchronization responsibilities.

At the end of each iteration t he amount of change in the state of the nctwork is computed in node

O. When th is change is below an established t hreshold, we considcr that thc network has con\'erged to

• A complete dcscription of the Threaded·C language can be found in (23), anda dcscription of the set of functions that impleaucnt

the 1-structure mechanism is described in (2].

229

Page 8: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

an stable state.

V TI. RELATED WORK

T he Hopfield network was introduced by Joh n Hopfield in a seminall982 paper [12]. Tts application

as an associative mernory as well as to find solu tions for cornbinatory optirnization problems have since

been widely studied [14], [13].

EARTH is a fi ne grain mult ithreaded a rchitecture originally developed to b c irnplernented in a

distributed memory pla tforrn. Current ly research a t the University of Delaware is been p erforrned

to develop an implementa tion of the EART H architecture on a cluster of shared rnernory machines.

Another important fine gra in multithreaded a rchitecture is the MTT Cilk a rchitecture [5] , [lO]. Tn Ci lk

t he distribut ion of t hreads among distinct processors is performed through a mechan isrn called work

stealing: a new t hread is a lways spawned in the local processor. Whenever a processor runs out of

work it tries to steal t hreads from processors t hat have more threads than what they can process. To

t he best of ou r knowledge, non-str ict data struc tures, such as T-structures, have not been implemented

in Cilk.

After the original proposition of 1-structures, Arvind and his collabora tors proposed :-.1-structures[4].

T he main difference between an M-structure and an 1-structure is t hat in an lvl-structure a read

opera tion - which is called take- resets t he location to the empty state. When a write opera tion -

called put- is performed to a location tha t has more than one take waiting, only one of t he takes is

served and the location remains empty. T he advantage of an M-struct ure is t ha t it a llows for mult iple

writes to the same location. lts disad\'antage is t hat it only allows a single read for each value written.

M-struct ures can be a lso implemented as a library of functions in Threaded-C in a similar fashion as

the implementation of 1-structures described in this paper. 1-struc tures are included as inst ructions in

t he pH language [1), a parallel dia lect of Haskell [15].

An interest ing research question is whether and how data structures such 1-structures and M­

structures could benefi t from caching. Dennis and Gao propose four possible approaches to implement

a defer queue where read operations that cannot be immediately serviced can be stored [8), [9) . T hey

choose t o store in memory a list of ident ifiers of t.he processor nodes that have one or more pending

reads for a memory location. Each processors itself holds a list of continua tions for t he requested read

operation.

V II I. CONCLUSION

In this paper we proposed a new parallel kernel based on t he Hopfield network and descri bed t he

implementa tion of 1-st ructures as a library of funct ions on Threaded-C. T he EA RTH architecture h as

been implemented in different platforms, including SP2, MANNA and Beowulf[6], [7), [20). In each one

of these platform the ratio between commun ication costs and processing costs are different. At the time

of the submission of this paper we are working on experiments to investigate how this different ratios

affect the performance of ou r implementation of the Hopfield kernel in EART H using 1-structures. \Ye

230

Page 9: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio uht;:,ileiro de Arquitetura de Computadores

are a lso working in an implementation of t he Hopfield kernel that does not use 1-structures to study

t he effect of using 1-structures for synchronization. We intend to present t he numerical resu lts of ou r

studies at the conference in September.

ACKNOWL.EDCM ENTS

The authors acknowledge the support from DARPA, NSA, NSF (MIPS-97071 25), and NASA. Sec­

t ion li is due to Kevin Theobald. The authors also would like to thank ali t he members of the HT~!T

team in Delaware - Thomas Geiger, Gerd Heber , Cheng Li , Andres Marquez, Kevin Theobald, Pari­

mala T hulasiram, and Ruppa Thulasiram - for lhe productive discussions on the implementation of

1-structures in our weekly meetings. Special thanks to Elif Albuz for carefully readi ng the first draft

and to Gerd Heber for his patience and time helping the authors figure out some of the initial bugs in

the program.

REFERENCES

[I] S. Aditya, Arvind, and J .-W. Maessen. Semantics of ph: A parallel dialect of haskell. TechnkaJ Report CSG-Merno-369, Massachuseus Instituto of Technology, Cambridgc, MA, J unc 1995. Published at FPCA'95 Confcrcncc. (hup:/ /csg­

www.lcs.mit.odu:8001/cgi-bin/search.pl?author=arvind). (2) J. N. Amaral. lmplomontation of i-structurcs as a library of functions in portablo throaded·t. TcchnicaJ Report CAPSL.

TN04, Uni\-er.;ity of Dela .. -aro, Ne...,.,.k, DE, Juno 1998. (3) Arvind, rushiyur S. Nikhil , a nd Keshav K. P ingai i. 1-structurcs: Data structures for parallol computing. A CM TOPLAS,

11(4):598- 632, October 1989.

(4] P. S. Barth, R. S. Nikhil , and Arvind. M-structures: Extonding a parallel, non-strict , functionaJ languago •áth state. Technical Report CSC-327, Massachuset ts Instituto ofTechnology, Cambridgc, ~IA, March 1991. aJso appcar in Procudirogs of A.nctionol Progromming and Computer Architecture, Cambrideg, MA, August, 1991. (http://csg-wM\•.lcs.mit.cdu:8001/cgi­

bin/search.pl?author=arvind). [5] Rober t O. Blumofe, Christopher F. Joerg, Bradley C. Kuszruaul, Charl .. E. L.eiserson, Keith H. RandaJI, and Yuli Zhou.

Cilk: An cfficient multithrcaded runtimc systcm. In Procudings of lhe Fifth ACM S ICPLA N Symposium on Principies 8 Proctice o/ Porallel Programming, pages 207-216, Santa Barbara , California, July 19-21 , 1995. SIGPLAN Nollces, 30(8),

August 1995.

[6] U. Bruening, W. K. Ciloi, and \V. Schroedcr-Preikschat. L.a toncy hiding in message-passing architectures. In Prouedangs of lhe 8th /nlernohonol Porallel Processing Symposium (1 9], pages 704-709.

(7] Haiying Cai. Dynamic load balancing on the EARTH-SP systom. Mastcr's thesis, McCill !;niven~ity, Montréal, Québec, M ay

1997. (8) J. B. Dcnnis nnd C. R. Gao. f,.<Jemory rnodcls and cachc m;magcmcnt for a multithrcaded program cxccution modcl. Technical

Report CSC-363, Massachusetts Instituto ofTechnology, Carnbridge, MA, Octobor 1994.

(9] Jack B. Dennis and Cuang R. Gao. On memory models and cache management for shared-memory rnultiprocessors. ACAPS

TechnicaJ Momo 90, School of Computer Science, McGill Univcrsity, MontréaJ, Québec, Decernber 1994. In ftp://flp­

acaps.cs.mcgill .ca/ pub/ doc/ memos. (10) M. Ftigo, C. E. L.eiserson, and K. H. RandaJI. Thc iruplementation of the cilk-5 mul tithroaded languagc. In 1998 ACM

S!CPLAN Conft:rence on Programming Language Dtsign and lmplementahon, Montreal , Canada, June 1998. [ti] Simon Haykin. Nturol Networks: A Comprehensove Foundalion. Macmillan Collegc Publishing Corupany, Ncw York, NY,

1994. {12) J. J. Hopfield. Neural net"'Orks and physical systcms 1\ith emcrgcnt. c.ollccth·e computational abilities. Procndrngs Nat1onal

Academy of Scoence, 79:2554- 2558. Apr. 1982.

{13) J. J. Hopficld andO. \V. Tank. Neural computrt.t ion ofdecisions in optirnization problems. Biologic.al Cyben1thC.$, ã2:Hl- 152, 1985.

[14] J . J . Hopfield and O. W. Tank. Simple neural opt imization noto.·orks: An a/d converter, signal decision circuit, onda linear

progrannning circui t . IEEE 'Trarosactions ou Circuit.s ond Systems, CAS-33(6):&33-54 1, May 1986.

231

Page 10: An Implementation of a Hopfield Network Kernel on … Implementation of a Hopfield Network Kernel on EARTH José N. Amaral, Guang Gao, Xinan Tang Abstract ... We also introduce a new

X Simpósio Brasileiro de Arquitetura de Computadores

(15) P. Hudak. S. P. Jones, and P. Wadler, editors. Rtport on the Progrommmg Longuoge HOJkell: A Noro-stnct Purely F\mchonol

Longuage, ,·olumc 27 of A CM Sigplan Nohces, ~1a.y 1992. Version 1.2. (16) Herberl H. J . Hum. Olivier ~laquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and l auric J . Hcndrcn. A study of

the EARTH·MA:-INA multithreaded systcm. lntemotionol Joumol of Porollel Progromming, 24(4):319- 347, August 1996.

(17( Herbert H. J . Hum, Olivier ~·laquelin, Kevin B. Theobald, Xinmin Tian, Xinan Tang, Guang R. Gao. Phil Cupryk, xasser Elmasri, t.aurie J. Hcndren, Alberto Jimenez, Shoba. Krishnan, Andres Marquez, Shamir ~terali. Sha.~hank S. i'\tnH\v.'ê\rkar ,

Prakash Panangaden, Xun Xuc, and Yingchun Zhu. A design study of lhe EARTH multiprocessor. In lubomir Bic, 1\'im Bõhm. ParMkevas Evripidou. and Jean-Luc Gaudiot, editors, Proceedings of the IFIP WG 10.3 Working Conferenct on

Porollel Architecturc.s ond Compilotion Techniques, PACT '95, pages 59- 68, Limassol, Cyprus, June 27- 29, 1995. ACM

Press.

(18) Herbert H. J. Hum, Kcvin B. Theobald, and Guang R. Gao. Building multithrcadcd architccturcs "-ith off-th.,.shelf micro· processors. In Proceedings of the 8th lnternotionol Paro/lei Procc.ssing Symposium (19) , pages 288-294.

[19) IEEE Computer Society. Proceedmg.s of tht 8 th lntemational Parollel ProctJ.ung Sympo.sium, Cancún, Mcxico, April 26- 29,

1994.

(20) Olh•ier Maquelin, Guang R. Gao, Herbert H. J . Hum, Kevin B. Theobald, and Xin-Min Tian. Polling Watchdog: Combining polling and interrupts for efficient message handling. In Procoedings o/lhe 23rd Annuollnternotionol Symposium on Comput<r

Architecture, pages 178- 188. Philadelphia, Pennsylvania, May 22-24, 1996. ACM SIGARCH and IEEE Computer Socicly. Computer Architecture NeWJ, 24(2), May 1996.

(21J O. Ridge, O. Becker, P. Merkey, and T. Sterling. Beov.'Uif:harnessing the power ofparallelism in a pil.,.of-pcs. In Proceedmgs

o/ IEEE Aerospoct, 1997. http://cesdis.gsfc.nasa.govf beov.'Uiffpapersfpapcrs.html. (22J T. Sterling, O. J. Becker, O. Savarese, M. Berry, and C. Res. Achieving a balanced lo..--cost architetlure for mass storage

m•nagement through multiple fast ethemet channels on the beowulf parallel workstalion. In Proceedings o f lhe lnternationol

Porollel Proctosong Symposium, 1996. http://cesdis.gsfc.nasa.gov/beowulf/papers/papers.html. (23) Ke,•in 8 . Theobald, José Nelson Amaral, Gerd Heber. Olivier Maquelin, Xinan 'làng, and Cuang R. Cao. Overview of the

portable threaded·c language. CAPSL Technical Memo 19, Unh·crsity of Oelaware, http://wv."W.capsl.udel.edu, April 1998.

232