Peter Alexander Foster

8/3/2019 Peter Alexander Foster

http://slidepdf.com/reader/full/peter-alexander-foster 1/183

Parallel Combinatorial Optimisation for Finding Ground

States of Ising Spin Glasses

Peter Alexander Foster

MSc in High Performance Computing

The University of Edinburgh

Year of Presentation: 2008



To my Parents



Abstract

This dissertation deals with the Ising spin glass ground state problem. An exact approach to

this optimisation problem is described, based on combining the Markov chain framework with

dynamic programming. Resulting algorithms allow ground states of the aperiodic k 2-spin lattice

to be computed in O k 22k time, which is subsequently improved to O k 2 2k , thus resembling

transfer matrix approaches. Based on parallel matrix / vector multiplication, cost optimal parallel

algorithms for the message passing architecture are described, using collective or alternatively

cyclic communications. In addition, a parallel realisation of the Harmony Search heuristic is

described. The implementation of both exact and heuristic approaches using MPI is detailed, as

is an application framework, which allows spin glass problems to be generated and solved.

Dynamic programming codes are evaluated on a small-scale AMD Opteron based SMP

system and a large-scale IBM P575 based cluster, HPCx. On both systems, parallel efficiencies

above 90% are obtained on 16 and 256 processors, respectively, when executing the Ok 22k

algorithm on problem sizes

≥142 spins. For the improved algorithm, while computationally

less expensive, scalability is considerably diminished. Results for the parallel heuristic approach

suggest marginal improvements in solution accuracy over serial Harmony Search, under certain

conditions. However, the examined optimisation problem appears to be a challenge to obtaining

near-optimum solutions, using this heuristic.

i



Acknowledgements

I sincerely thank my project supervisor, Dr. Adam Carter for guidance throughout the project,

and for commenting on this dissertation prior to submitting it.

In addition, I am grateful for funding awarded by the Engineering and Physical Sciences Re-

search Council.

iii



Table of Contents

1 Introduction 1

2 The Spin Glass 3

2.1 Introduction to magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Modelling magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Spin interaction models . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.2 Spin models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 The Ising spin glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Computational Background 13

3.1 Ising spin glass ground states and combinatorial optimisation . . . . . . . . . . 13

3.1.1 Approximate approaches for determining ground states . . . . . . . . . 15

3.1.2 Exact methods for determining ground states . . . . . . . . . . . . . . 19

3.2 A dynamic programming approach to spin glass ground states . . . . . . . . . 21

3.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.2 Ising state behaviour as a Markov chain . . . . . . . . . . . . . . . . . 22

3.2.3 The ground state sequence . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.5 An order-n Markov approach to determining ground states . . . . . . . 27

4 Parallelisation Strategies 314.1 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Harmony search performance . . . . . . . . . . . . . . . . . . . . . . 32

4.1.2 Existing approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.3 Proposed parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 36

4.2 Dynamic programming approaches . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 First-order Markov chain approach . . . . . . . . . . . . . . . . . . . . 39

4.2.2 Higher-order Markov chain approach . . . . . . . . . . . . . . . . . . 43

v



vi TABLE OF CONTENTS

5 The Project 45

5.1 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.1 Available resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 Project preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.1 Initial investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.2 Design and implementation . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.3 Implementation language and tools . . . . . . . . . . . . . . . . . . . 48

5.2.4 Choice of development model . . . . . . . . . . . . . . . . . . . . . . 49

5.2.5 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.6 Risk analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.2.7 Changes to project schedule . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.8 Overview of project tasks . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Software Implementation 53

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Implementation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.3 Source code structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3.1 Library functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3.2 Client functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Performance Evaluation 69

7.1 Serial performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.1.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.2 Parallel performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.2.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.2.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

8 Conclusion 99

8.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.1.1 Algorithmic approaches . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.1.2 Existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.1.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.2 Project summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A Project Schedule 103

B UML Chart 105



TABLE OF CONTENTS vii

C Markov Properties of Spin Lattice Decompositions 107

C.1 First-order property of row-wise decomposition . . . . . . . . . . . . . . . . . 107

C.2 Higher-order property of unit spin decomposition . . . . . . . . . . . . . . . . 108

D The Viterbi Path 111

D.1 Evaluating the Viterbi path in terms of system energy . . . . . . . . . . . . . . 111

E Software usage 113

F Source Code Listings 115



List of Figures

2.1 Types of spin interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Graphs of spin interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Frustrated systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Subsystems and associated interaction energy . . . . . . . . . . . . . . . . . . 10

2.5 Clamping spins to determine interface energy. . . . . . . . . . . . . . . . . . . 10

3.1 Computing total system energy from subsystem interactions . . . . . . . . . . 14

3.2 Example first-order Markov chain with states a, b, c . . . . . . . . . . . . . . . 22

3.3 Illustrating the principle of optimality. Paths within the dashed circle are known

to be optimal. Using this information, optimal paths for a larger subproblem can

be computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Sliding a unit-spin window across a lattice . . . . . . . . . . . . . . . . . . . . 28

4.1 Using parallelism to improve heuristic performance . . . . . . . . . . . . . . . 32

4.2 Conceptual illustration of harmony search behaviour within search space . . . . 33

4.3 Parallelisation strategies for population based heuristics . . . . . . . . . . . . . 34

4.4 Harmony search parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 37

4.5 Graph of subproblem dependencies for an n = 3, m = 2 spin problem . . . . . . 40

4.6 Parallel matrix operations. Numerals indicate order of vector elements. . . . . . 41

5.1 Spin glass structure design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2 Software framework design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.1 Functions provided by spinglass.c . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Schematic of operations performed by get optimal prestates() (basic dynamic

programming, collective operations). In contrast, when using cyclic communi-

cations, processes evaluate diff erent configurations of row i−1, shifting elements

in minPath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.3 Sliding window for improved dynamic programming . . . . . . . . . . . . . . 65

ix



x LIST OF FIGURES

6.4 Schematic of operations performed by get optimal prestates() (improved dyanamic

programming), executed on four processors. The problem instance is a 2×2 spin

lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.1 Execution times for serial dynamic programming (basic algorithm) . . . . . . . 70

7.2 Log execution times for serial dynamic programming (basic algorithm) . . . . 71

7.3 Execution times for serial dynamic programming (improved algorithm) . . . . 72

7.4 Log execution times for serial dynamic programming (improved algorithm) . . 72

7.5 Memory consumption for serial dynamic programming (basic algorithm) . . . . 73

7.6 Log memory consumption for serial dynamic programming (basic algorithm) . 74

7.7 Memory consumption for serial dynamic programming (improved algorithm) . 75

7.8 Log memory consumption for serial dynamic programming (improved algorithm) 75

7.9 Parallel execution time for dynamic programming (basic algorithm, Ness) . . . 78

7.10 Parallel efficiency for dynamic programming (basic algorithm, Ness) . . . . . . 78

7.11 Vampir trace summary for dynamic programming (basic algorithm, Ness) . . . 79

7.12 Parallel execution time for dynamic programming (basic algorithm, cyclic com-

munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.13 Parallel efficiency for dynamic programming (basic algorithm, cyclic commu-

nications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.14 Vampir trace summary for dynamic programming (basic algorithm, cyclic com-

munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.15 Parallel execution time for dynamic programming (improved algorithm, Ness) . 82

7.16 Parallel efficiency for dynamic programming (improved algorithm, Ness) . . . 83

7.17 Vampir trace summary for dynamic programming (improved algorithm, Ness) . 83

7.18 Parallel execution time for dynamic programming (basic algorithm, HPCx) . . 84

7.19 Parallel efficiency for dynamic programming (basic algorithm, HPCx) . . . . . 85

7.20 Parallel execution time for dynamic programming (basic algorithm, cyclic com-

munications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.21 Parallel efficiency for dynamic programming (basic algorithm, cyclic commu-

nications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.22 Parallel execution time for dynamic programming (improved algorithm, HPCx) 87

7.23 Parallel efficiency for dynamic programming (improved algorithm, HPCx) . . . 87

7.24 Summary of parallel efficiencies on HPCx . . . . . . . . . . . . . . . . . . . . 88

7.25 Conceptual representation of properties relevant to parallel performance . . . . 89

7.26 Parallel harmony search convergence durations (ZONEEXBLOCK= 100) . . . 91

7.27 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100) . . . . . 91

7.28 Parallel harmony search convergence durations (ZONEEXBLOCK= 1000) . . . 92

7.29 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000) . . . . 93

7.30 Parallel harmony search convergence durations (ZONEEXBLOCK= 10000) . . 94



LIST OF FIGURES xi

7.31 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000) . . . 94

7.32 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100) . 95

7.33 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000) 96

7.34 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000) 96

A.1 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B.1 UML class diagram of source code module and header relationships . . . . . . 106



List of Tables

5.1 Identified project risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.1 Mean error µe, standard error σe and error rate e of serial harmony search

ground states for increasing solution memory NVECTORS. Results are basedon the ground truth value −30.7214. Error rate is defined as the amount of cor-

rectly obtained ground state configurations over the total amount of algorithm

invokations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.2 Serial execution times for basic dynamic programming on Ness, for various

GCC 4.0 optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.3 Serial execution times for basic dynamic programming on HPCx, for various

xlc optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.4 Results for parallel basic dynamic programming on HPCx using 32 processors,

for combinations of user space (US) or IP communications in conjunction with

the bulkxfer directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

xiii



Chapter 1

Introduction

This dissertation describes aspects concerned with obtaining solutions to an optimisation prob-lem, namely finding ground states of the Ising spin glass. Attention is given to parallel ap-

proaches, their implementation, and their performance.

The first half of this work is devoted to theoretical aspects: The Ising spin glass is a model

relevant to statistical physics and other fields. In Chapter 2, the origins of this model are de-

scribed. The relation is drawn between between the project’s physical background and the

aforementioned optimisation problem. The Ising spin glass is but one possibility of modelling

materials exhibiting glass-like properties; Chapter 2 also exposes its relation to more involved

models. In Chapter 3, the theoretical background of optimisation is examined. Existing ap-

proaches are reviewed. The two approaches bearing significance to undertaken practical work

are detailed, namely dynamic programming and the harmony search heuristic. Parallelisation

strategies are described in Chapter 4, based on dynamic programming and harmony search.

Having examined theoretical aspects, practical aspects are then considered: Chapter 5 de-

scribes work relevant to project organisation. It includes a description of the project’s objectives

and identified risks. This chapter is relevant to practical work undertaken during the project. As

a result of practical work, implemented software is described in Chapter 6. Software function-

ality is detailed, in addition to implemented libraries and the source code’s structure. In Chapter

7, the implemented codes are evaluated. Experimental procedures are described, alongside pa-

rameters used for testing. Results are presented and interpreted. Finally, Chapter 8 concludes

the work. The project’s objectives are reviewed in relation to undertaken practical work. Also,

possibilities for further work are explored.

1



Chapter 2

The Spin Glass

2.1 Introduction to magnetic systems

The phenomenon of magnetism is ubiquitously harnessed in modern technology; it crucially

underpins many applications in areas such as automotive engineering, information processing

and telecommunications. While known since antiquity, the scientific process has enabled an

increasingly accurate understanding of magnetic phenomena. In current research, investigating

the magnetic properties of physical systems remains of great interest in the field of condensed

matter physics. One physical system, the spin glass, is the subject of such investigations. It

forms the background of work undertaken during the course of this project.

Given a physical system, it is possible to characterise its magnetic properties by examining

the relation between interactions occurring between internal subsystems, and the system’s ex-

ternal magnetic moment. The system’s external magnetic moment is a manifestation of these

interactions. More generally, all externally observable magnetic properties are the result of indi-

vidual subsystems’ properties. This concept is applicable both to microscopic and macroscopic

systems, for single or multiple subsystems: As an extreme case, one might consider a single

electron a system, as it possesses an intrinsic magnetic moment. In contrast, the interactions

within a three dimensional crystalline solid, for example are considerably complex and moti-

vate current investigations. This complexity is chiefly due to magnetic interactions at atomic

scale.

At atomic level, the electron eff ects magnetism not only as a result of its intrinsic field,

but also as a consequence of its orbital motion. The former is associated with binary state,

known as spin, which describes the particle’s internal angular momentum. It is spin which

determines the direction of the electron’s intrinsic magnetic moment. In contrast, orbital motion

contributes towards the particle’s external angular momentum, since it describes the particle’s

movement about the nucleus. Atomic magnetic fields depend both on orbital configuration and

spin alignment, where each electron contributes towards the atom’s net magnetic moment.

3



4 Chapter 2. The Spin Glass

In general, an electron’s state is governed by quantum properties, which are subject to the

Pauli exclusion principle [31]. This asserts that for any fermion, such as the electron, particles

may not assume identical quantum state simultaneously. This has important consequences for

the spin configuration of interacting electrons and therefore influences the magnetic propertiesof multiatomic systems.

The first implication of the exclusion principle is that for two electrons possessing iden-

tical orbital movement, spins must antialign to satisfy state uniqueness. Consequentially, the

electrons’ intrinsic magnetic moments antialign, causing net cancellation of these fields for the

particle pair.

The second implication relates to minimising a system’s energy: For interacting electrons

with diff erent orbital motion, the Pauli exclusion principle states that parallel spin alignment will

be favoured, since it guarantees that orbital movement remains disjoint. Because of electrostatic

repulsion, decreasing proximity between electrons lowers the system’s energy. It is this relation

which allows certain materials to retain a magnetic field, the result of a surplus of aligned spins

opposed to disordered spin configuration, in a favourable energetic state.

It turns out that the difficulty in determining a system’s magnetic properties stems from the

complexity of spin interactions: The structure of a specified material may be irregular, resulting

in diff ering ranges between electron orbitals. The type of atomic bonds and electron config-

urations present in the material is also influential, since these influence the orbital energy of

electrons. It was previously mentioned that a system’s energy is sought to be minimised. This

energy depends on the proximity in which interactions occur and hence behaves characteristi-

cally for the examined system.

The energy associated with spin interaction is expressed exactly in the so-called exchange

energy, first formulated by Heisenberg [38] and Dirac [20]. Based on consequences of the Pauli

exclusion principle for the wavefunction of a system consisting of multiple fermions, the system

wavefunction is defined for combinations of aligned and antialigned spins. These wavefunctions

are then used to compute the exchange energy

J = 2

Ψ∗

1(r 1)Ψ∗2(r 2)V I (r 1, r 2)Ψ2(r 1)Ψ1(r 2) dr 1dr 2

where Ψ1, Ψ2 are wavefunctions of interacting particles with locations r 1, r 2 on the real line and

V I is the interaction energy.

Using eigenanlysis∗, it is furthermore possible to express the contribution towards the sys-

tem’s Hamiltonian arising from spin interaction, which depends on J and the spin operands s1,

s2 for a pair of spins:

− J (s1 · s2) (2.1)

∗An explanation is given by Griffiths [31]



2.2. Modelling magnetic systems 5

↑↑

(a) Ferromagnetic

↑ ↓

(b) Antiferromag-

netic

Figure 2.1: Types of spin interaction

This object is of fundamental importance for describing the interaction energy of large sys-

tems, since these may be described in terms of their underlying interacting subsystems. It is

employed simplified in models such as the Ising model [47] used in this project. The interaction

variable J is comprehensively known as the coupling constant . Although it assumes a positive

real value for spin interactions where parallel alignment is favoured, it is important to note that

antiparallel alignment is also favoured in many materials. Bearing this in mind, positive J are

associated with ferromagnetic coupling, whilst negative J are associated with antiferromagnetic

coupling. Figures 2.1(a), 2.1(b) illustrate these interactions.

2.2 Modelling magnetic systems

As currently described, the simplest type of magnetic interaction is expressed by defining two

fundamental operands and an associated coupling constant. Together with the coupling con-

stant, these fundamental operands are evaluated using an interaction operator. The operands

are commonly spins, whose state may be described using either a unit vector or an integer, forexample.

2.2.1 Spin interaction models

Because spin coupling is a symmetric relation, it is possible to describe interactions occurring

amongst multiple spins by considering the set E ⊆{si, s j} | si, s j ∈ S , i j

of pairwise bonds

amongst spins in a spin set S , given the weight function w : {si, s j} → R. This corresponds to

an undirected weighted graph. In the graph, the absence of the edge between two spins sk , sl

is equivalent to the zero coupled edge w (

{sk , sl

}) = 0. An example of such a graph is shown

in Figure 2.2(a). Given this general case of an undirected graph, there are three specialisations

which have been used extensively to investigate the properties of magnetic systems consisting

of many spins.

In terms of spin interactions, a comparatively involved model is the so-called Axial Next

Nearest Neighbour Interaction (ANNNI) model. Here, spins are arranged conceptually as a

lattice in Euclidean n-space, with bond edges defined between neighbouring spins along each

dimension. In addition to these bonds, interactions for each spin are extended in a ‘next spin

but one’ fashion along each dimensions. That is, interactions are defined by conducting a walk




↑

↑

↑

↓

↓

↓

(a) General undirected case

↑↑↑

↑

↑↑↑

↓↓

(b) ANNNI model

↑↑

↑↓

↓↓↓

↓

↓

(c) EA model

Figure 2.2: Graphs of spin interactions

of length l ≤ 2 along the lattice in each dimension, given an initial node. A spin therefore

interacts with n ≤ 4d partner spins, as displayed in Figure 2.2(b). This model has been employed

extensively in research [57, 17, 56].If the ANNNI model is modified by extending the length of the walk to infinity in ar-

bitrary direction, the graph defined by spin interactions E becomes fully connected: E ={si, s j}|si, s j ∈ S , i j

. This realisation of lattice interactions is known as the Sherrington-

Kirkpatrick model [58], whose Hamiltonian is equal to

H = −(i, j)

J i j si · s j.

Here, the notation

(i, j) indicates the sum over all spin interactions, as described. The Sherrington-

Kirkpatrick model is employed by Parisi [54] for the purpose of exploring transition propertiesof magnetisation, using an approach known as mean field theory.

Given that spin interactions occur over short range, an elementary approach to representing a

system considers only nearest neighbour interactions between spins. In a two dimensional lattice

model, the graph of spin interactions becomes is then defined as E ={si, s j}|si, s j ∈ S , d (si, s j) = 1

,

where d (si, s j) is the block distance between spins si, s j. This is illustrated in Figure 2.2(c). The

Hamiltonian of such a system is

H = −i, j

J i j si · s j

where the notation i, jindicates the sum over nearest neighbour spin interactions. Due to

Edwards and Anderson [22], this model is the subject of work undertaken during the course of

this project.

Bonds

The exchange energy between two spins is governed by the magnitude of the coupling constant

J . When dealing with multiple interactions, these bond strengths are often selected from a

probability distribution. This distribution is a continuous uniform or Gaussian distribution for



2.2. Modelling magnetic systems 7

many modelling purposes [58, 60, 52]. When dealing with the Sherrington-Kirkpatrick model,

the exchange energy distribution often includes the property of exponential decay over spin

distance [12]. Another commonly used distribution [36, 21] permits only coupling constants

J ∈ {1, −1}, such that both values are equally probable.

Other distributions have also been employed for defining coupling constants, such as the

twin peaked Gaussian [15]. Ermilov et al. [23] provide an investigation of the implications for

interactions with arbitrarily distributed bonds. In this project, the equally distributed variant of

spin coupling is considered.

2.2.2 Spin models

As with the approaches to modelling spin interaction, the spin object itself may be modelled

to varying levels of of complexity. Most realistically, in a quantum Heisenberg model, eachspin is described by its quantum state in three dimensions, so that the Hamiltonian for a two

dimensional Edwards-Anderson model becomes†

H = −1

2

i j

J xσ xi σ x

j + J yσ y

iσ

y

j+ J zσ z

iσ z

j

where σ xk

, σ y

k , σ z

k are Pauli matrices corresponding to spin sk .

Alternatively, a classical Heisenberg formulation is also possible, as employed by Ding

[19], Kawamura [42]: Here, spins are represented as thee-dimensional real-valued unit vectors,

so that exchange energy between spins si, s j is calculated by means of the inner vector product,

as described in Equation 2.1. A simplification achieved by discretising spin state exists in the

so-called Potts model [63]. Here, a spin may assume a state si ∈ {1, . . . k } where k is the total

number of states. The Hamiltonian of a system of spins with nearest-neighbour interaction is

expressed as

H = −i, j

J i j cosθ (si) − θ (s j)

with θ (si) = 2πsi/k .

The Potts model may be simplified further, achieved when considering the case of the model

when k = 2: Define the Potts correlation function γ (si, s j) = cosθ (si) − θ (s j)

. Given that

θ : {1, 2} → {π, 2π}, the mapping

γ (si, s j) =

1, si = s j

−1, si s j

i s a s ufficient definition for the correlation function in the described case. Alternatively, γ (si, s j) =

†cf. Baxter [8]




si s j, with si, s j ∈ {1, −1}. This leads to the definition of system energy as

H = −

i, j

J i j si s j,

with si, s j ∈ {−1, 1}.

When combined with nearest neighbour interactions and constant J , this archetypal model

of spin interaction is known as the Ising model [11]. As formulated, in the Ising model, a

spin’s state eff ects an exchange energy, whose sign is inverted if the spin’s neighbour’s assumes

opposing state. In this respect, the model spin object is an abstraction of electron state which

discards the consequences of orbital movement , considering only intrinsic angular momentum.

While comparatively restrictive, an adaptation of the Ising model has been the subject of

intense research in its originating field of statistical physics [8]. In addition to certain applica-

tions in investigating the behaviour of neural networks [4] and biological evolution [49], this

model has proven popular in examining the properties of materials in the field of condensed

matter physics [26]. One application involves investigating the properties of materials collec-

tively known as spin glasses. These possess distinctive properties, which are described in the

following.

2.3 The Ising spin glass

Spin glasses are substances which are characterised by structural disorder. This is the case for

chemical glasses or certain types of dilute metal alloys. These materials possess highly irreg-

ular microscopic structure, which has implications for magnetic interactions between ions. In

particular, disorder results in a distribution of ferromagnetic and antiferromagnetic interactions,

which are the origin of the phenomenon known as frustration.

The dynamics of spin glasses are such that there exists a critical phase transition tempera-

ture, above which the system behaves like a conventional paramagnet or ferromagnet. Below

the transition temperature however, a magnetic disorder manifests itself, called the spin glass

phase. This magnetic disorder is responsible for the system’s unique behaviour.

Frustration, the second component to characteristic behaviour, arises when a system’s ener-

getically optimal state is the result of combined interactions which cannot individually assume

optimum state. Instead, the global optimum requires certain interactions to be suboptimal. De-

pending on the constituent interactions, this may imply that there exist multiple state configura-

tions which yield the energetic optimum.

An example of this principle is shown in Figure 2.3(a). Here, three Ising spins s0, s1, s2 ∈{1, −1} interact in a triangular lattice. Because bonds are not consistently ferromagnetic, it is

apparent that some interactions require spins with opposing orientation, to be optimal. This is

the case for the antiferromagnetic bond between spins s1, s2. For either optimal configuration of



2.3. The Ising spin glass 9

↑↑

↓

s0

s1

s2

(a) Three spins

↑↑

↓↓

(b) Four spin ‘plaque-

tte’

Figure 2.3: Frustrated systems

the spin pair it is not possible however, to set s0 so that optimality of the remaining interactions

is satisfied. Similarly, when evaluating the system commencing with pairs s0, s1 or s0, s2, it is

not possible to set the remaining spin so that all interactions are satisfied. It follows that there

exists no configuration of this system in which all interactions are optimal.

In the n-dimensional lattice Ising realisation of a spin glass, the smallest structure capable

of exhibiting frustration is shown in Figure 2.3(b). Considering all 24 combinations of positive

and negative coupling constants, it can be seen that frustrated interactions occur for odd num-

bers of antiferromagnetic or ferromagnetic bonds. For larger systems, it is possible to analyse

frustration by decomposing the lattice into subsystems of this kind. In this context, the square

substructure is termed a plaquette.

Uses of the Ising spin glass

The extent to which the Ising model departs from a realistic representation of magnetic phe-

nomena was previously described. Although the model’s accuracy presents a disadvantage, its

comparative simplicity lends itself to certain analytical advantages: These advantages are based

on the fact that the ‘state space’ of a single spin is small, which has consequences for evaluating

sets of spin systems. Also, since spins interact only over nearest neighbour boundaries, it is

trivial to ‘decompose’ a system into its constituent subsystems, should this be required. Using

such a scheme, total exchange energy is the sum of internal subsystem energy and subsystem

interaction energy (Figure 2.4). This approach is employed in analytical methods described in

following chapters.

For experimental purposes, it is of interest to examine computationally the behaviour of

various realisations of spin glasses. As spin glasses are thermodynamic systems, knowledge

of ground state energy is of particular importance towards this aim. Formally, given an n-spin

system where S = {s0, s1, . . . , sn−1} represents some configuration of these spins,

argminS

H (S )




Subsystem interaction energy

↑

↑↑

↑

↑↑

↑↑

↑

↑

↓↓

Figure 2.4: Subsystems and associated interaction energy

↑ ↑ ↑

↑

↑

↑

↑

↑

↑ ↑

↑ ↑

↑

↑

↑ ↓

↓

↓

↓

↓

↓

↓↓↓

↓

↓ ↓ ↓ ↓

↓↓↓

Free Invert, clamp

Figure 2.5: Clamping spins to determine interface energy.

is the system’s ground state. The Hamiltonian H (S ) describes the energy of system configura-

tion S . In the case of the Ising model with real valued coupling constants, there exists a single

ground state configuration, and an equivalent configuration with all spins inverted. For systems

with discrete valued coupling constants, a number of degenerate ground states may exist. Pro-

vided an algorithm for determining ground states, it may be of interest to examine the eff ect

system size on ground state energy.

Previous work investigates scaling with regard to a related quantity, the so-called interface

energy [15]. For an Ising-like model, interface energy is the absolute diff erence between ground

state energies, obtained when altering the model instance’s spin configuration with respect to

a certain boundary condition (coupling constants are left unaltered). Figure 2.5) shows an ex-

ample, again using the two dimensional lattice Ising model. Here, ground state configurations

are obtained for two experimental instances: In the first instance, the entire set of spin config-

urations is considered. In the second instance, spins in the rightmost column are ‘clamped’:

Their state is equal to that of the previously obtained configuration, only inverted. Enforcing

this condition in the second instance allows the behaviour of adjacent spins to be examined.

A closely related aspect deals with exploring the behaviour of spin glass properties in the

limit N → ∞, where N is the system size. For certain purposes, it is beneficial to approxi-

mate this condition by introducing periodicity into spin interactions. In the Ising model, pairs

of boundary spins along dimensions with periodic boundary conditions interact in the manner



2.3. The Ising spin glass 11

illustrated in Figure 2.2(b). This can easily be expressed mathematically by applying modular

arithmetic to the one dimensional Ising case H =

i J i si si+1 , requiring minor modification for

models with d > 1.

In thermodynamic systems, attention must be given to the relation between macroscopic andmicroscopic properties. To this extent, an important object is the partition function, defined as

Z (T ) =

S

e− H (S )/kT ,

where H (S ) is the system energy, T the absolute temperature and k the Boltzmann constant.

The sum is over all (microscopic) system configurations S . Using the partition function, it is

possible to determine the probability P(S ) of a specific state as

P(S ) =e− H (S )/kT

Z (T )

Fortunately, when examining an ensemble at T = 0K it turns out that P(S ) = 1 iff S is a ground

state configuration, otherwise P(S ) = 0. This fact has implications for computing ground state

energies of Ising spin glasses, the subject of this project.



Chapter 3

Computational Background

In the previous chapter, the Ising model was introduced. System energy was described as a type

of utility function for evaluating system configurations. The problem of obtaining ground state

energy was introduced.

In this chapter, finding ground states of the Ising spin glass is approached as a combinatorial

optimisation problem. In this context, existing solutions are examined, in addition to describing

two approaches implemented in this project, harmony search and dynamic programming. The

latter approach is the consequence of describing spin glass interactions as a Markov chain, which

lends itself to a formulation of the most likely sequence of events in the chain, i.e. the Viterbi

path [61].

3.1 Ising spin glass ground states and combinatorial optimisation

Formally, any instance of the Ising spin glass defines the energy function E (S ) with E : {1, −1}n →R. Here, S = (s1, s2, . . . , sn) is an n-spin configuration, with each spin si ∈ {1, −1}. For con-

venience, a notation for describing a configuration partitioned into p disjoint subsystems is

introduced as S = {S 1, S 2, . . . , S p}. The real valued co-domain of E (S ) corresponds to the totalsystem energy. The total system energy of a partitioned system is

E (S ) =

pk =1

E (S k ) +i, j

J i j si s j |si ∈ S α, s j ∈ S β,

where i, j denotes nearest neighbour Ising interactions, as described in Chapter 2. The subsys-

tems S α, S β are disjoint. By decomposing spin interactions occurring within the entire system,

energy is expressed as the sum of subsystem energy and ‘system boundary’ energy. Defining

13



14 Chapter 3. Computational Background

E (S 2)

E (S 3)

E (S 4)

E (S 1)

→

E (S )

E (S 1, S 2)

E (S 2, S 3)

E (S 3, S 4)

E S , S

Figure 3.1: Computing total system energy from subsystem interactions

E b(S i, S j) as the system boundary energy between disjoint subsystems S i, S j.

E b(S i, S j) =q,r

J q,r sq sr |sq ∈ S i, sr ∈ S j,

the total system energy can be defined as

E (S ) =

p

k =1

E (S k ) +i, j E b(S i, S j)

where i, j denotes nearest neighbour interactions between subsystems, in analogy to nearest

neighbour interactions between individual spins. An example of system decomposition is pre-

sented in Figure 3.1, for a system with cyclic boundary interactions. Decomposition is relevant

to approaches described in this chapter.

Determining ground states

The ground state configuration of an Ising spin glass is defined as S min = argminS E (S ). The

domain of the evaluated function E (S ) implies that an exhaustive search of the system’s state

space requires 2|S | individual evaluations. Such a brute force approach might be implemented

using a depth-first traversal of the state space.

Clearly, using this method is only practicable for the very smallest of problem instances, as

the search space grows exponentially with the number of spins in the system. Therefore, it is

of interest to examine the possibility of restricting the search space, consequently reducing the

complexity of obtaining solutions to problem instances.

The fact that the upper bound of search space size grows exponentially, suggests that the



3.1. Ising spin glass ground states and combinatorial optimisation 15

ground state problem belongs to the class of NP problems. Due to Barahona [6], it is shown

that in fact, certain cases of the problem are NP-complete, such as the two dimensional lattice

where every spin interacts with an external magnetic field, and the three dimensional lattice

model. Istrail generalises the proof of NP-completeness to any model where interactions arerepresented as a non-planar graph [16].

Fortunately, planar instances of the Ising model are not guaranteed to be in NP; a polynomial-

time bound is shown by Barahona for the two dimensional, finite sized model. This fact implies

that obtaining ground states is not intractable for this case of the model, and motivates the devel-

opment of efficient algorithms which obtain exact solutions. The latter are defined as solutions

equivalent to those generated from an exhaustive search.

3.1.1 Approximate approaches for determining ground states

Regardless of NP-completeness, formulation of the ground state problem as a combinatorial

optimisation problem allows a second approach to be considered, involving the class of meta-

heuristic algorithms. Although these algorithms are typically only guaranteed to search exhaus-

tively as time goes towards infinity, many have been shown to produce optimal or near-optimal

solutions to a wide number of problems, provided sufficient execution time. It is therefore of

proximate interest to investigate the performance of these algorithms, in context of the Ising

spin glass.

By common definition, a metaheuristic is a heuristic applicable to solving a broad class

of problems [28]. In practice, this is achieved by defining a set of ‘black-box’ procedures,

i.e. routines specific to the problem. When dealing with combinatorial optimisation problems,

these routines typically include a utility function, whose purpose it is to evaluate candidate

solutions selected from the state space. Utility is then used to compare solutions amongst one

another.

To be of practical use for problems with large state spaces, a heuristic must arrive at a solu-

tion by considering some subset of this space its search space. The metaheuristic approach often

achieves this by random sampling [28], which may cause the algorithm to produce suboptimal

results. To apply a metaheuristic eff ectively, it may therefore be necessary to evaluate perfor-

mance against diff erent combinations of algorithm parameters. Generating sufficient amounts

of samples may motivate parallel algorithmic approaches. Also, although it has been shown

that the performance of optimisation algorithms remains constant over the class of all optimisa-

tion problems [62], there may be significant performance diff erences between algorithms when

applied to a specific problem class. It is hence of interest to examine diverse metaheuristic

approaches in conjunction with the described optimisation problem.




Evolutionary algorithms

One class of metaheuristic is inspired by biological evolution. Here, a population of candidate

solutions is created and subsequently evolved in an iterative process, where individual ‘parent’solutions are selected stochastically in order to generate ‘off spring’ solutions. The process of

selection is designed to favour solutions which exhibit high ‘fitness’, the latter evaluated using

a utility function. In a further biological analogy, off spring are generated by combining solution

parameters from both parents, prior to randomised modification (mutation). These new solutions

are then added to the population, which is typically maintained in order to stay in equilibrium.

The process is then repeated, terminating either on completing a specified number of iterations,

or when a convergence criterion is fulfilled.

Evolutionary algorithmic approaches applicable to combinatorial optimisation are known

as genetic algorithms [50]. The approach here involves representing a solution by the set of parameters supplied to the target function as a string. After evaluating solution fitness as pre-

viously described, crossover is typically realised as a manipulation of substrings: For example,

one might generate off spring as a combination of permuted substrings from parent strings. Cor-

respondingly, mutation might be realised as a permutation of substring elements from a single

solution. It is evident that the multitude of possibilities in which selection, crossover and mu-

tation may be implemented, has the potential to cause deviations in the optimisation process’

performance.

Genetic algorithms have been applied to the spin glass ground state problem by Gropengiesser

[32], who considers two variants of the basic evolution procedure. In the first, the populationis initialised to multiple instances of a single solution, to which mutation is then applied it-

eratively. Using a local search heuristic, mutations conducive to lowering the system energy

are accepted. In the second variant, the former regime is augmented with random parent se-

lection and crossover, such that every child solution replaces one of its parents. Results show

that performance is aff ected strongly by the method of admitting new candidate solutions to the

population, following mutation.

As one might expect, approaches incorporating local minimisation techniques have shown

to improve optimisation performance, as implemented by Hempel et al. [40], using a so-called

hybrid genetic algorithm. This is in comparison to an early investigation by Sutton [59], using

a general evolutionary approach. Houdayer and Martin [41] report good performance for the

Ising model with discrete ± J bond distribution, using a Genetic Renormalisation algorithm.

Here, domain specific knowledge is incorporated into the optimisation process by recursively

partitioning the graph of spin interactions, in resemblance to the description at the beginning of

this chapter. A local optimisation process is then applied to the partitioned system.

Given the nature of the project, of special interest are methods of parallelising genetic al-

gorithms. In the general context of evolutionary computing, Cantu-Paz [14] describes a coarse




grained approach known as the ‘island’ method. In the distributed memory paradigm, processes

are arranged in a toroidal grid, each executing the algorithm in parallel. After each iteration,

a subpopulation of local solutions is selected based on fitness, and exported to neighbouring

processes asynchronously. As an alternative, a fine grained scheme may also be used, wherecrossover is allowed to take place between solutions residing at diff erent processes.

Simulated annealing

Simulated annealing is a technique readily applicable to calculating ground states, as it is based

on the principles in statistical physics which underpin the Ising model. The technique is derived

from the Metropolis-Hastings algorithm [37], in which a probability distribution is sampled in-

directly by means of a first-order Markov chain. That is, the distribution of a generated sample is

sufficiently defined by the value of its predecessor. In simulated annealing, a candidate solution

S in the state space is associated with the probability

P(S ) ∝ e− H (S )/(kT ),

the state probability of a canonical ensemble, which was introduced in Chapter 2.

Optimisation is performed by initialising a random solution configuration and sampling

proximate configurations in the state space by stochastic parameter modification: Specifically

for the Ising model, this would involve perturbing spins by inverting their state. The new con-

figuration is accepted if the perturbation resulted in lower system energy, otherwise the state is

accepted with probability e−∆ H /(kT ) where ∆ H is the change in system energy. Of importance is

the value of temperature T , which is initialised to a certain value and decreased monotonically

towards zero according to a specific annealing schedule, as the algorithm progresses.

In Chapter 2, it was mentioned that as T approaches zero, P(S ) = 1 i ff S is a ground

state. A consequence of this fact for the optimisation process is that if T is initialised to a

finite temperature and decreased sufficiently slowly, the algorithm is guaranteed to arrive at the

system’s globally optimal state [51]. In practice, execution time is restricted to a fraction of that

required for an exhaustive search, so that the annealing process becomes an approximation.

Simulated annealing was first applied to the spin glass problem by Kirkpatrick, Gelatt and

Vecchi [44]. It is important to note that the choice of annealing schedule significantly aff ects

the algorithm’s ability to arrive at an optimal solution. This is because temperature influences

the amount of selectivity involved as state space is explored. Conversely, it follows that the

solution landscape particular to a problem usually aff ects the accuracy of solutions obtained by

the algorithm using a particular schedule.

Ram et al. describe an approach to parallelising the algorithm [55]. Clustering simulated

annealing is based on the observation that a good initial solution typically reduces the amount

of iterations required for the algorithm to converge. After executing the algorithm on multiple




processing elements with diff erent initial states, an exchange of partial results takes place to

determine the most favourable solution. This result is then redistributed to all processing ele-

ments, in order to repeat the process a set number of iterations, after which the final solution is

determined.

Harmony search

A recently developed optimisation algorithm is due to Geem [27]. Known as harmony search,

this algorithm has been applied to a number of optimisation problems such as structural design

[45] and data mining [25]. Harmony search can be considered an evolutionary algorithm, as it

maintains a population of candidate solutions, which compete with one another for permanency

and influence generation of successive candidates.

Inspired by the improvisational process exhibited by musicians playing in an ensemble, har-

mony search iteratively evolves new solutions as a composite of of existing solutions. As with

genetic algorithms, a utility function determines whether a newly generated solution is included

in the candidate set. In addition to devising a probabilistic scheme for combining parameters

from existing solutions, new solutions are modified according to a certain probability. This is

designed to improve exploration of the state space, similar to genetic mutation.

Formally, the algorithm defines an ordered set σ = (σ1, σ2, . . . , σm), of m candidate solu-

tions, where each candidate is an n-tuple σk = (σk 1

, σk 2

, . . . , σk n). Algorithm parameters are the

memory selection rate Pmem, the so-called pitch adjustment rate Pad j and the distance bandwidth

β∈R. Random variables X

∈ {1, 2, . . . , m

}and Y

∈[0, 1) are also defined. Using a termination

criterion such as the number of completed iterations, the algorithm performs the following steps

on the set of initially random candidates:

• Generate: σν = (τ(1), τ(2), . . . , τ(n)) where τ(i) =

σ X

iY ≤ Pmem

Random parameter value Y > Pmem

• Update: For 1 ≤ i ≤ n, σνi← σν

i+ β iff Y ≤ Pad j

• Replace:

– w

←argmaxw

{σw

}– σν ← min{σw, σν}– σw ← σν

In the first step, the algorithm generates a new candidate, whose parameters are selected at

random both from existing solutions in the population and from a probability distribution. In a

further stochastic procedure using random variable Y , solution parameters are modified. This

step is of particular significance for continuous optimisation problems; it may be preferable

to omit it in other cases. Finally, the population is updated by replacing its worst solution, if




the generated candidate is of higher utility. The process is then repeated, using the updated

population.

An application of harmony search to the discrete Ising ground state problem is trivial, by

assigning each solution the ordered set of spins defined at the beginning of this chapter, i.e. σk =

(s1, s2, . . . , sn). Because the set of solution parameter values is discrete and small, the eff ect

of modifying solutions due to distance bandwidth β can be consolidated into the algorithm’s

‘generation’ step. The process thus consists solely of generating and conditionally replacing

existing solutions in memory, governed by parameters m (the candidate population size) and

Pmem (the memory selection rate). Work undertaken for this project examines the performance

of this algorithm for finding Ising spin glass ground states.

3.1.2 Exact methods for determining ground states

Graph theoretic methods

Returning to the spin glass as an exactly solvable model, it is necessary to examine the graph

representation of spin interactions more closely. An undirected graph G = (V , E ) is described

by a set of vertices V = {v1, v2, . . . , vn} and edges E ⊆ {{vi, v j}|vi, v j ∈ V }. Given an Ising spin

glass model S = {s1, s2, . . . , sn} let S = V and E = {{si, s j}| J i j > 0} where J i j is the bond

strength between spins si, s j. The set of vertices is partitioned into subsets S +, S − such that

S + = {si|si = 1}, S − = {si|si = −1}.

Grotschel et al. [29] provide a description of a method which is the basis of algorithms

developed by Barahona et al. [7]. Here, the system’s Hamiltonian is described in terms of S+

and S − as

H (S ) = −

i, j∈ E (S+)

J i j si s j −

i, j∈ E (S −)

J i j si s j −

i, j∈δ(S +)

J i j si s j

where E (T ) = {{si, s j}|si, s j ∈ T } and δ(T ) = {{si, s j}|si ∈ S , s j ∈ S \T }. Considering the eff ect

of opposing spin interactions, the Hamiltonian can be rewritten as

H (S ) = −

i, j∈ E (S+)

J i j −

i, j∈ E (S −)

J i j +

i, j∈δ(S +)

J i j,

from which it follows H (S ) +

i, j∈S

J i j = 2

i, j∈δ(S+)

J i j.

The ground state energy can now be formulated in terms of the function δ as

H min = minS+⊆S

2

i, j∈δ(S +)

J i j −

i, j∈S

J i j

Because the co-domain of δ consists of edges which define a cut of the graph of spin interac-




tions, i.e. a partition of nodes into two disjoint sets, obtaining ground states is now described in

graph theoretical terms as a cut optimisation: As formulated, ground state energy is expressed

as the minimum cut of a weighted graph. Equivalently the problem can be formulated as a

maximisation, if the signs of interaction energies are inverted.Hadlock [34] shows further that finding a maximum cut of a planar graph is equivalent

to determining a maximum weighted matching of a graph, for which there exist polynomial

time algorithms. Bieche et al. [10] and Barahona [6] follow this approach, where a graph is

constructed based on interactions between spin plaquettes. A recent similar approach due to

Pardella and Liers [53] allows very large systems to be solved exactly.

De Simone et al. employ a method known as ‘branch-and-cut’. Here, the cut optimisa-

tion problem is initially expressed as an integer programming problem. In integer program-

ming, the objective is to determine max

uT x|Ax ≤ b

, where the components of vector x ∈ Zn

are determined subject to constraints defined by vectors a, b and matrix A. During execution,branch-and-cut specifically employs the linear relaxation of the programming problem, where

it is permitted that x ∈ R. This relaxation is combined with the branch and bound algorithm,

which is invoked when a non-integral solution of x is determined. Substituting the non-integral

component with integers, the problem is divided using a further algorithm, which recursively

generates a tree of subproblems. By maintaining bounds on solution utility, it is possible to

identify partial solutions which are guaranteed to be suboptimal. Since these are not required

to be subdivided further, the search tree is pruned. Liers et al. [46] describe the branch-and-cut

algorithm in detail, which permits tractable computation of spin glass models consisting of 502

spins without periodic boundaries.

Transfer matrix

A technique applicable to various problems in statistical mechanics is the transfer matrix method

[8]. The requirement is as described at the beginning of this chapter, where a system is described

in terms of adjacently interacting subsystems. Using the definition of system state probability,

a matrix describing interactions is defined as A = pi j

where pi j = P(S i

k +1, |S j

k ), given sub-

systems S k +1, S k assuming states S ik +1

∈ 2S k +1 , Sj

k ∈ 2S k . Conditional independence from other

systems is assumed, i.e. P(S ik +1

|S

j

k ) = P(S i

k +1

|S

j

k , S 1, S 2, . . . S p). Here, the notation 2S denotes

the set of all spin configurations of system S .

By implications of conditional state probability, given an initial subsystem it is possible to

evaluate the state of successive subsystems via a series of matrix multiplications. Problems such

as determining the partition function can be solved using eigenanalysis, an example of which is

given in [15]. The transfer matrix approach due to Onsager allows the partition function of the

two-dimensional Ising model to be formulated [39].

In the next section, the framework of Markov chain theory is used to examine in detail

probabilistic interactions within the Ising spin glass. The Markov transition matrix is equivalent



3.2. A dynamic programming approach to spin glass ground states 21

to the transfer matrix, hence it follows that methods for system properties are closely related.

The chosen approach exposes a dynamic programming formulation of the ground state problem

with implications for further parallelisation.

3.2 A dynamic programming approach to spin glass ground states

A system S is described by a set of states S =S 1, S 2, . . . , S n

, for example spin configurations

S =S i|S i ∈ 2S

. Again, 2S denotes the set of all system configurations. Residing in state S τ,

the system undergoes a series of non-deterministic state transitions, such that each successive

system configuration S τis determined from the assignment S τ

= t (S σ). The map t : 2S → 2S

is defined using a vector of random variables v = (vS 1 , vS 2 , . . . , vS n ), where vS i is a random

successor state the system may assume when in state S i. The probability mass function of theserandom variables is defined as

f vS i

S j= P

vS i = S j|S i

.

Given an initial distribution of states, it may be of interest to determine the most likely sequence

of states. For this purpose, it is useful to examine the system in terms of its Markov properties.

3.2.1 Markov chains

Define a sequence of states C = (S x1 , S x2 , . . . , S xm ). The sequence is said to fulfil the first-

order Markov property, if the value of any single state sufficiently determines the probability

distribution of the state’s successor in the sequence, i.e.

∀i

S xi+1 |S xi

= PS xi+1 |S xi , S xi−1 , . . . , S x1

.

Formulating the probabilities of state transitions in matrix form is convenient for evaluating

the behaviour of the sequence after finite or infinite state emissions: Define the transition matrix

between sequence elements i, i + 1 as

Mi, i+1=

P(S 1|S 1) P(S 1|S 2) . . . P(S 1|S n)

P(S 2|S 1) P(S 2|S 2) . . . P(S 2|S n)...

.... . .

...

P(S n|S 1) P(S n|S 2) . . . P(S n|S n)

,

where P(S τ|S σ) denotes the probability of the emission S τ as the i + 1th element in the chain af-

ter the ith emission, S σ. It follows that the probability distribution of states d =

P(S 1), P(S 2), . . . , P(S n)T




a b c

P(b|a) P(c|b)

P(a|b P(b|c

Figure 3.2: Example first-order Markov chain with states a, b, c

after m sequence emissions can be evaluated as

d =

mk =1

Mk , k +1

d (3.1)

where vector d is the initial state distribution. If for all k , Mk , k +1 = Mk −1, k , the Markov chain

is termed time-homogeneous. Such a chain may be represented by a directed, weighted graph

as shown in Figure 3.2, where nodes represent states and labelled edges represent transition

probabilities. A detailed discussion of further Markov chain properties is provided by Meyn

and Tweedie [48].

By current definition, state emission is governed by an amount of ‘memory’, in that preced-

ing sequence values influence state output at any given point in the sequence. The first-order

Markov chain, where states are conditionally dependent on a single, immediate predecessor, is

the simplest instance of a Markov process.When extending the amount of chain memory, i.e. increasing the number of preceding states

which determine the distribution of output states, the order-n Markov chain must be considered.

A generalisation of the archetypal first-order model, the distribution of an emitted state depends

on n immediate predecessors in the sequence. Following the definition of the first-order model,

the requirement for an order-n chain is

∀i

S xi |S xi−1 , S xi−2 , . . . , S xi−n

= PS xi |S xi−1 , S xi−2 , . . . , S x1

,

i.e. knowledge of preceding n states sufficiently defines the probability of state S xi in the se-

quence. Both model have implications for algorithm design.

3.2.2 Ising state behaviour as a Markov chain

In context of the previously described Markov model, the following approach examines Ising

interactions within the two-dimensional lattice without boundary conditions. Initially, the lattice

lattice is partitioned into rows, as shown in Figure 3.1. Clearly, interactions between individual

rows occur in nearest-neighbour fashion, significantly along a single dimension. That is, for an




n × m spin system, the partition is defined as S = {S 1, S 2, . . . , S n} with S i ∈ {1, −1}m, 1 ≤ i ≤ n.

The energy of the system is

ni=1

H (S i) +

ni=2

H b(S i−1, S i) = H (S 1) +

ni=2

H (S i) + H b(S i−1, S i)

where H (S i) is the Hamiltonian of subsystem S i and H b(S i, S j) is the boundary energy between

subsystems S i, S j, as previously defined.

Since ∪S i= S , the entire lattice’s state is sufficiently described by the states of its constituent

rows. It is reminded that because it is a statistical mechanical model, state is probabilistic, with

P(S ) ∝ e− H (S )/(kT ). Using the described partitioning scheme, it turns out that subsystem state

probability fulfils the property of a first-order Markov chain (cf. Appendix C).

3.2.3 The ground state sequence

Given the Markov property under the chosen representation of Ising interactions, the implica-

tions of ground state for the chain of states (S x1

1, S

x2

2, . . . , S

xnn ) are next examined. Formally, the

probability Pgnd of obtaining ground state energy minS ∈S { H (S )} is

Pgnd

∝exp −

1

kT minS ∈S {

H (S )}

∝ maxS ∈S

exp

− 1

kT H (S )

,

from which it is clear that Pgnd must be maximised, in order to infer the ground state configura-

tion. This configuration is given by the sequence

argmax(S 1,S 2,...,S n)

P(S 1)

ni=2

P(S i|S i−1)

,

which is the most likely sequence of emitted states in a first-order Markov chain.

This result is of significance for obtaining an algorithm for computing ground states, be-

cause there exists a well-known approach due to Viterbi [61]. The basis of the Viterbi algo-

rithm is the observation that optimal state for the first symbol emission in the chain is simply

argminS 1H (S 1). Augmenting the size of considered subproblems, optimum solutions are de-

termined successively, until the size of the set of considered problems equals the originally

specified problem. At this point, the optimisation is complete.

The probability of the most likely sequence of emissions (S µ1

1, S

µ2

2, . . . , S

µnn ), known as the




6

210

1

Figure 3.3: Illustrating the principle of optimality. Paths within the dashed circle are known to

be optimal. Using this information, optimal paths for a larger subproblem can be computed.

Viterbi path, can be obtained from the recurrent formulation

Pviterbi(S i) =

maxS i

{P (S i)

}i = 1

maxS i−1{P (S i|S i−1) Pviterbi (S i−1)} i > 1,

by evaluating maxS n{Pviterbi(S n)}. It follows that the actual sequence can be formulated as

viterbi(i) =

argmaxS i

{Pviterbi(S i)} i = 1

argmaxS i{Pviterbi(S i)} + viterbi(i − 1) i > 1,

determined by evaluating viterbi(n). In this case, the ‘+’ operator denotes symbol concatenation,

so that (S µ1

1, S

µ2

2, . . . , S

µnn ) = S

µ1

1+ S

µ1

2+ . . . + S

µnn .

It is important to note that recursive definition of the Viterbi path diff ers from the (subopti-mal) approach of optimising every conditional probability P(S i|S i−1) individually. Instead, the

path is defined as the optimum of incremented subproblems, where subproblems are defined as

optimal. Schematically depicted in Figure 3.3, this approach is an application of the principle

of optimality due to Bellman [9]. Consequentially, the Viterbi algorithm is an instance of the

dynamic programming problem, recursively defined for all x ∈ X as

V ( x) = max y∈Γ( x)

{F ( x, y) + γ V ( y)} ,

where Γ is a map and 0

≤γ

≤1 is the so-called discount factor . The function V ( x) is known as

the value function, and is optimised using F ( x, y).

The concrete algorithm for computing the Viterbi path probability avoids the overhead and

backtracking suggested by the aforementioned recursive formulation. It involves an iterative

loop to increment the size of the considered system:

opt[] := 1

f o r i : = 1 t o n

for Sj

i∈ S i

pmax := − ∞




for S k i−1

∈ S i−1

p := P

Sj

i | S k i−1

* opt[k]

if p > pmax

optNew[k] := p

opt := optNew

In the listing, Sj

idenotes configuration j of subsystem S i, according to previous convention.

The array opt[] records the optimum path probability for preceding subsystems S 1, S 2, . . . , S i

for every iteration i of the algorithm. Elements of the array are initially set to unity. A second

array optNew[] is used to store updated path probabilities, which are subsequently copied to

opt[] after each iteration of the outer loop. Although the values of optimal state emissions are

discarded in this pseudocode, it is possible to retain them by storing them in an associative data

structure. An implementation of this approach is presented in Chapter 6.

Examining the algorithm’s time complexity, it is apparent that execution time is proportional

to the product of the three loops’ length, since these assume nested structure. That is,

t (n) ∝ n2S 12,

where n is the number of subsystems, and 2S 1 is the set of configurations of subsystem S 1. It

follows that if the spin lattice has dimensions n × m, it is

t (n, m) ∝ n 22m

which is O n 22m

.

By further observation it turns out that the Viterbi path can also be used to evaluate system

energy (cf. Appendix D). This provides a dynamic programming solution to the two dimensional

lattice without boundary conditions, which is

Hmin(S i) =

minS i

{ H (S i)} i = 1

minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1.

(3.2)

3.2.4 Boundary conditions

It is of interest to examine the eff ects of introducing cyclic boundary conditions on state op-

timality, using the described approach. As the latter involves partitioning the spin lattice into

rows, it is possible to diff erentiate between energetic contributions occurring within subsystems

S 1, S 2, . . . , S n, and energetic contributions occurring between these. It is apparent that horizon-

tal conditions have an eff ect on subsystem energy, whereas vertical conditions eff ect subsystem

interactions.

The first eff ect is caused by horizontal boundary interactions, as these involve spins located

at the outermost positions of each spin row. The Hamiltonian H (S i) thus eff ectively includes an




additional term to account for an additional pairwise interaction. The Hamiltonian of the entire

lattice isn

i=1 H (S i)+n

i=2 H b(S i, S i−1), which sufficiently accounts for horizontal boundary in-

teractions within the system. Since the recursive formulation of ground state energy in Equation

3.2 also computes the sum of all subsystem Hamiltonians and their interactions, the existingdynamic program formulations and algorithms can be left unmodified. It follows that the al-

gorithmic complexity of computing ground states does not increase for the case with cyclic

boundaries along a single dimension.

In contrast, the vertical cyclic boundary condition results in pairwise interactions between

subsystems S 1, S n, i.e. the initial and ultimate spin rows. Here, each row constituent spin s j ∈S k (k ∈ {1, n}) potentially has a non-zero bond interaction with its neighbour, s

j∈ S k (k ∈

{1, n} \{k }). Consequentially, The Hamiltonian for the entire lattice is given byn

i=1 H (S i) +

ni=2 H b(S i, S i−1) + H b(S 1, S n), where the latter term is the interaction energy between the two

boundary systems in question. Here, it follows that the proposed existing solution does not

yield the ground state energy, as the recursive formulation does not include the additional term.

Configuration optimality is therefore not guaranteed, for the case with cyclic boundaries along

both lattice dimensions.

As a modification of the original dynamic programming solution, it is conjectured that the

ground state configuration can be determined by evaluating the set of problem instances where

both boundary rows are assigned spin configurations in advance, i.e.

H min = minS 1, S n

{ H min (S n, S n, S 1)

},

with

Hmin(S n, S i, S 1) =

H (S i) + H b(S 1, S n) i = 1

minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S n, S i−1, S 1)} i > 1,

Adapting the previous algorithm, this formulation implies that the execution time t (n) is

t (n) ∝2S 1

t (n)

where n is the number of subsystems, 2S 1 is the set of configurations of S 1 and t (n) is the

execution time of the previously specified algorithm. Therefore,

t (n, m) ∝ 2mn 22m

∝ n 23m

which is O(n 23m),

where the system consists of n × m spins.




Proof of the conjecture is by induction. Since interactions within the system occur in a

regular lattice, the two adjacent boundary subsystems can be chosen arbitrarily, so the recursive

formulation becomes

Hmin(S j, S i, S j+1) =

H (S i) + H b(S i, S i−1) i = j + 1

minS i−1

H (S i) + H b (S i, S i−1) + Hmin

S j, S i−1, S j+1

otherwise,

with subsystems S 0, S 1, . . . , S n−1, boundary subsystems S j, S j+1 and subsystem interactions

mod n. It follows that the ground state energy is defined as

H min = minS j, S j+1

H min

S j, S n, S j+1

.

Choosing boundary subsystems S j+1, S j+2 the formulation further becomes

Hmin(S j+1, S i, S j+2) =

H (S i) + H b(S i, S i−1) i = j + 2

minS i−1

H (S i) + H b (S i, S i−1) + Hmin

S j+1, S i−1, S j+2

otherwise,

which clearly is the optimal sequence of emitted states, given states S j+1, S j+2. As the ground

state configuration can be deduced from minS j+1,S j+2

H min

S j+1, S n, S j+2

, the sequence re-

mains optimal also for this case. Therefore, the sequence is optimal for all j, i.e.

∀0≤i<n∃k ,S i ∀0≤ j<n∃k ,S j

H

S k

j ∪ S j

< H

S k

i ∪ S i

, (3.3)

using the notation S j to denote S \S j.

3.2.5 An order-n Markov approach to determining ground states

Having introduced the Markov model for both the first-order case and its higher-order extension,

it is of interest to examine whether the latter lends itself to a more powerful formulation of Ising

system state probability: Previously, the approach consisted of a row-wise system decomposi-

tion, which resulted in a sequence of subsystems with nearest-neighbour interactions along one

dimension. Reducing subsystem size, it is apparent that interactions between subsystems are no

longer restricted to occurring along one dimension.

Consider the extreme case, where a subsystem consists of a single spin. For the two-

dimensional n × m spin lattice, there exist subsystems S = {S 0, S 1, . . . , S n m−1}. The system’s

total energy is the result of horizontal and vertical interactions between subsystems, which may

be evaluated by sliding a window across the entire lattice, as shown in Figure 3.4. For each spin,

this window considers the interactions originating from a vertical and horizontal predecessor.




Figure 3.4: Sliding a unit-spin window across a lattice

Formally, the Hamiltonian is expressed as

H (S ) =

nm−1i=0

H b(S i, S i−1) + H b(S i, S i−m),

where H b(S i, S i−m) is the interaction energy between S i and its vertical predecessor. Simi-

larly H b(S i, S i−1) is the interaction due to horizontal predecessor S i−1. Also, subsystem indices

are computed mod (nm), in order to evaluate interactions occurring across lattice boundaries.

Here, it indeed turns out that a higher-order formulation of system state is possible (cf. Appendix

C), namely

P(S ) =

nm−1i=0

P (S i|S i−1, S i−2, . . . , S i−m−1) ,

from which ground state probability can be formulated as

Pviterbi(S i, S i−1, . . . , S i−m) =

P (S i, S i−1, . . . , S i−m) i ≤ m

maxS i−m−1{P (S i|S i−1, . . . , S i−m−1) Pviterbi (S i−1, . . . , S i−m−1)} i > m,

for the lattice without cyclic boundary interactions. As previously described, this probability

can be used to determine the actual ground state configuration, and can be reformulated to

determine ground state energy. It follows that the algorithm for obtaining solutions to thisdynamic programming problem is also a modification of the previous approach:

opt[] := 1

for i := m to n*m

for

Sj 0

i , Sj 1

i−1, . . . , S

jm

i−m

∈ ( S i, S i−1, . . . , S i−m )

i f i > m

pmax := − ∞for S k

i−m−1∈ S i−m−1

p := P

Sj0

i | Sj1

i−1, . . . , S k

i−m−1

* opt[

S

j1

i−1, . . . , S k

i−m−1

]




if p > pmax

optNew[k] := p

else

p := P Sj 0

i , Sj 1

i

−1

, . . . , Sjm

i

−m

optNew[

S j 0i , S j 1

i−1, . . . , S jm

i−m

] := p

opt := optNew

The above pseudocode consists of three nested loops, the outermost of which is responsible

for calculating the probability P(S i|S i−1, S i−2, . . . , S i−m−1) for iteratively increasing i. The loop

thus eff ectively specifies a sliding window of size m+1, which is moved across the lattice in the

fashion previously described. For each position of the window all spin configurations are eval-

uated, using the associative data structure opt[] to obtain the probabilities of preceding window

configurations. These are referenced by the tupleS

j 0

i, S

j 1

i−1, . . . , S

jm

i−m

, which represents a win-

dow configuration. The algorithm is for the case without cyclic boundary conditions, therefore

the window is not required to precede position i = m+ 1; at this position, window configuration

probability is unconditional.

Adapting the algorithm for calculating ground state energy, where the statement

p := P

Sj0

i| S

j1

i−1, . . . , S k

i−m−1

* opt[

S

j1

i−1, . . . , S k

i−m−1

]

becomes a summation of subsystem energies, the optimisation proceeds by determining ener-

getically minimal preceding window states for each position of the window on the system lat-

tice. In this form, the algorithm performs identically to the transfer matrix optimisation scheme

described in [15]. It follows that the described scheme must have equivalent computational

complexity.An analysis thereof confirms this assumption: Given that the lattice consists of n × m spins,

the algorithm’s execution time is proportional to

t (n, m) ∝ (nm − m − 1) 22(S 1,S 2,...,S m+2)

,where 2(S 1,S 2,...,S m) is the set of configurations of tuple (S 1, S 2, . . . , S m). Therefore,

t (n, m) ∝ (nm − m − 1) 2m+2 + 2m+1

which is O (nm

−m

−1 ) 2m+2

= O

(nm) 2m .

Although not considered in further detail, the opportunity for further modification of this al-

gorithm presents itself, to account for cyclic boundary interactions within the spin lattice. This

entails invoking the algorithm for specified configurations of the spin tuple (S 1, S 2, . . . , S 1+m),

similar to the algorithm employing a row-wise lattice decomposition. This is conjectured to in-

crease the algorithmic complexity to O(nm2m2m), since there are O(2m) possible configurations

of the specified spin tuple.




In the following chapter, parallelisation strategies are described for the harmony search

heuristic, the first-order Markov chain solution, and as an extension the aforementioned higher-

order modification.



Chapter 4

Parallelisation Strategies

To be of practical use, a computational solution to a given problem must be able to be im-plemented on a machine architecture, such that the algorithm completes within a reasonable

amount of time. While computational complexity provides a means of qualitatively evaluating

problem tractability, the properties of the machine determine the amount of time required for

solving a particular problem instance.

To reduce machine execution time, an approach applicable to physical architectures is to

increase the processing rate of machine instructions. This may be achieved in practice by in-

creasing the machine’s CPU clock rate, improving memory bandwidth, and augmenting the ar-

chitecture by additional features such as registers, caches and pipelining. In general terms, this

requires no conceptual modification to the algorithm , although the algorithm’s performance isusually amenable to optimisation for the respective architecture.

The second approach to increasing machine performance involves parallelisation. Here, per-

formance is improved by distributing computation among a set of processing elements. With

the exception of algorithms with implicit parallelism in operations on data structures in combi-

nation with vector processing architectures, it is necessary to adapt the algorithm and devise a

scheme for achieving this distribution. For message passing architectures, this includes defining

explicit communication operations.

In the following, the potential for implementing parallel versions of harmony search and

dynamic programming methods is considered, with regard to MIMD architectures.

4.1 Harmony search

In the previous chapter, harmony search was described as a probabilistic algorithm employing

an evolutionary strategy for both discrete and continuous optimisation. As such, it performs

a heuristic evaluation of problem state space, i.e. search is non-exhaustive. Since improving

performance motivates parallelisation, it is necessary to examine the heuristic for the purpose of

31



32 Chapter 4. Parallelisation Strategies

P1

(a) No distribution (serial)

P1 P2

(b) ‘Weak scaling’

P1 P2

(c) ‘Strong scaling’

Figure 4.1: Using parallelism to improve heuristic performance

defining performance relevant characteristics.

For any heuristic algorithm, on one hand performance can be quantified by the search pro-

cess’ accuracy. The latter is influenced by the algorithm’s state space traversal policy, signifi-cantly by the size of the search space. It follows that performance can be improved by enlarging

the search space, since in the limit of search space towards state space, solution optimality is

guaranteed.

On the other hand, it may be of interest to restrict the heuristic’s execution time, as previ-

ously described for the general class of halting algorithms. In this case, the task is to increase

the rate at which search is performed.

Using parallelism to improve either of these characteristics, it is apparent that distribution

of computation among processors bears similarity to the concepts of strong scaling and weak

scaling, commonly encountered in parallel performance analysis. Whereas weak scaling im-plies increasing the number of processing elements while keeping the problem size constant

(therefore varying the fraction of computation assigned to a processor), strong scaling increases

the problem size with the number of processors (therefore keeping the fraction of computation

assigned to a processor constant). Similarly, in the case of the heuristic, parallelism can either

be applied for the purpose of distributing a search space of constant size (weak scaling), or for

increasing the size of the search space (strong scaling). Using a tree model, an example of this

relationship is shown in Figure 4.1.

4.1.1 Harmony search performance

The evolutionary strategy used by harmony search for combinatorial optimisation consists of ini-

tial candidate generation, followed by iterative randomised candidate recombination (including

randomised mutation) and solution replacement. The algorithm is probabilistic, hence search

is a random walk, whose average length is influenced by the memory choosing rate (Figure

4.2(a)). Also, the number of solution vectors influences search, such that for NVECTORS=1,

the optimisation becomes greedy: This is because a single solution is retained, which is only

replaced when a solution of higher utility is found . For larger NVECTORS i.e. maintaining a




SlaveSlaveSlave

MasterSolution

(a) Master-slave

Migrate

(b) Coarse-grained

Migrate & select

(c) Fine-grained

Figure 4.3: Parallelisation strategies for population based heuristics

4.1.2 Existing approaches

Parallelisation methods for metaheuristic algorithms were briefly mentioned in Chapter 3. These

are considered in more detail, in order to assess their potential adaptation for harmony search.

Cantu-Paz [14] provides an overview of parallelisation schemes for evolutionary algorithms.

Although these are discussed specifically in context of genetic algorithms, they are also appli-

cable to other evolutionary heuristics, such as those introduced by Koza for generating software

programs [5]. Cantu-Paz discerns between three classes of approach, known as global master-

slave, fine-grained and coarse-grained , respectively. These diff er in the way the evolution-

ary process is distributed amongst processors and to which extent solutions are communicated

amongst them.

Depicted schematically in Figure 4.3(a), the master-slave approach implements a single pop-

ulation; off spring are generated from potentially any parent solutions in the population (termed

panmixia). This is achieved by assigning the population to a single master processor, allowing

slave processors to access and modify individual solutions. Slave processors may be tasked with

evaluating solution fitness, whereas the master is responsible for selection and crossover. It is



4.1. Harmony search 35

possible to consider both a synchronous variant, where solutions are retrieved and modified in

discrete generations, and an asynchronous variant, where a slave may initiate a retrieval in ad-

vance of its peers. Either are suited for implementation on shared-memory or message passing

architectures, however it is noted that the heterogeneous organisation of processes into masterand slaves makes the approach generally less suitable for massively parallel architectures.

In the coarse-grained approach (Figure 4.3(b)), the evolutionary process is no longer pan-

mictic. The set of solutions which forms the population is partitioned among processors, so

that optimisation progresses primarily within semi-isolated ‘demes’ [14]. To allow evolution to

progress globally, demes exchange a proportion of their population with neighbours in a prede-

fined graph topology. This allows solutions of high utility to propagate across the graph, which

promotes convergence towards a common, global solution. On the other hand, the insularity

of subpopulations permits a high degree of diversity, allowing multiple local optima to be ap-

proached independently, thereby preventing early convergence. Previous work includes investi-

gations based on coarse-grained approaches, using both fixed toroidal or hypercubic topologies

and dynamic topologies . The distributed approach makes this technique particularly attractive

for implementation on message passing architectures.

The fine-grained approach, shown in Figure 4.3(c), is also based on distributing the solu-

tion population amongst processors. However in contrast, exchange of solutions occurs more

frequently during the evolutionary process: Instead of periodically initiating migration between

subpopulations, selection itself takes place between processor-assigned demes, which in the

most extreme case consists of a single solution. Depending on the specified network topology,

it may be practicable to select from all subpopulations within a certain vicinity from the initiat-

ing deme, which results in a overlapping selection scheme. Cantu-Paz notes that if this vicinity

is equal to the network diameter for all nodes, evolution regains panmixia. Suited for mas-

sively parallel architectures due to its scalability, this approach appears to be especially eff ective

because of its flexibility.

Aside from evolutionary algorithms, a potentially relevant approach to parallelising a heuris-

tic is presented by Ram et al. [55]. Here, the simulated annealing algorithm is executed indepen-

dently by multiple processors, where each initialises search with a random configuration. This

allows parallel exploration of the seach space, in analogy to the eff ect achieved by executing an

evolutionary process such as genetic algorithms using disjoint subpopulations: Since annealing

proceeds independently, the process executed by each processor potentially converges towards a

diff erent local optimum. To counteract state space exploration, periodically the most promising

solution is determined and exchanged between processors. Akin to migrating solutions between

demes, this promotes global convergence towards a single solution. The number of algorithm

iterations required for convergence is hence reduced. In their implementation, Ram et al. em-

ploy a collective exchange scheme for communicating solutions between individual annealing

processes. However, the neighbourhood exchange scheme described by Cantu-Paz is equally




applicable.

4.1.3 Proposed parallelisation scheme

In the described approaches, parallelism is applied with the intention of enhancing the explo-

rative or exploitative properties of heuristics: Whereas the coarse-grained evolutionary approach

improves exploration alone through parallel selection, the remaining approaches include an el-

ement of parallel search exploitation, by propagating promising solutions in order to accelerate

solution convergence. The method used by Ram et al. can be viewed as a simplification of

the coarse-grained evolutionary approach, where the graph defining solution exchanges is fully

connected.

Having stated the motivation for parallelising harmony search, the opportunity is given to

apply the described approaches to this heuristic. Given that harmony search is an evolutionary

algorithm, distributed state space exploration and exploitation are readily adapted from parallel

genetic algorithms.

Figure 4.4 schematically depicts the proposed parallelisation scheme. Here, optimisation

takes place in distributed fashion, so that the heuristic is executed by multiple processors, each

assigned a set of solution vectors. To allow solutions to be exchanged between processors,

the latter are arranged in a ring. Periodically, processors send solutions to their successors,

while receiving these from predecessors. This reflects the behaviour of the aforementioned

fine-grained approach. In addition however, processors are organised into a twofold hierarchy,

where subordinate processors are not directly involved in cyclic exchange of solutions. Instead,

these exchange solutions using collective operations, based on the scheme described by Ram

et al. Subordinate processors are grouped in such a way that each subgroup includes a ‘ring

exchange’ processor. It follows that collective exchanges consider solutions obtained through

the cyclic exchange process.

Although the proposed scheme is comparatively involved, it allows the behaviour of the

heuristic to be altered by introducing a bias towards search space exploration or conversely

search space exploitation: If the size of subgroups is equal to the total number of processors,

communication is restricted to collective solution exchanges, so that rapid convergence is pro-

moted. In this case, eff ectively only a single subgroup exists. Providing that communication

occurs at short intervals to ensure that similar solution vectors are held in memory, it is specu-

lated that the algorithm will exhibit the described ‘weak scaling’ behaviour while increasing the

number of processors. On the other hand, for unit subgroup size, collective solution exchanges

are absent from the distributed search process. As a consequence, the ring-based approach is

reinstated. Here, the expectation is that the heuristic will emphasise on explorative search, and

therefore exhibit ‘strong scaling’ behaviour when increasing the number of processors.

It is apparent that there are a multitude of parameters which influence parallel optimisation,

in addition to the memory choosing rate and number of solution vectors defined by serial har-



4.1. Harmony search 37

Collective exchangeCyclic exchangeProcessor

Figure 4.4: Harmony search parallelisation scheme

mony search. These include the total number of processors involved in search, and the size

of subgroups. Also, of significance is the rate at which solutions are exchanged, both for the

ring and collective subgroup operations. Finally, the latter two operations must be defined in

detail; these may for example involve selecting solutions at random, or communicating the most

promising solutions.

The following describes a pseudocode prototype of a parallel harmony search algorithm for

obtaining Ising spin glass ground states, using the message passing model:1 S o l u t i o n [ ] s o l u t i o n s := i n i t i a l i s e r a n d o m s o l u t i o n s ( NVECTORS ) ;

2

3 f o r ( i =1 ; h a s c o n v e rg e d ( ) ; i ++) {4 S o l u t i o n s o l u t i o n = new S o l u t i o n ;

5

6 f l o a t h i g h e s t e n e r g y = c o m p u t e h i g h e s t e n e r g y ( s o l u t i o n s ) ;

7 i n t h i g h e s t e n e r g y v e c t o r = c o m p u t e h i g h e s t e n e r g y v e c t o r ( s o l u t i o n s ) ;

8

9 f o r ( j :=1 ; j <= s o l u t i o n . l e n g t h ; j ++) {10 i f ( r a n d ( 0 , 1 ) < MEMORY CHOOSING RATE) {11 s o l u t i o n [ j ] := s o l u t i o n s [ r a n d ( ) ] [ j ] ;

12 } e l s e {13 s o l u t i o n [ j ] := r a n d o m s p i n ( ) ;

14 }15 }16 i f ( s p i ng l as s e n er g y ( s o l u t i o n ) < h i g h e s t e n e r g y ) {17 s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] := s o l u t i o n ;

18 }19 i f ( PROCESSOR ID mod ZONE SIZE = 0 ) {20 m s g s e n d ( s o l u t i o n s [ r a n d ( ) ] , ( PROCESSOR ID+ZONE SIZE) mod




N PROCESSORS ) ;

21 m s g r c v ( r c v s o l u t i o n ) ;

22 c o p y m i n ( r c v s o l u t i o n , s o l u t i o n s [ r a n d ( ) ] ) ;

23

}24 i f ( i mod ZONEEXBLOCK = 0 ) {25 r e d u c e m i n z o n e ( s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] ) ;

26 }27 }

As with serial harmony search, the algorithm consists of an iterative loop, whose purpose it

is to generate successive solutions and evaluate their utility. The proposed algorithm involves

terminating the loop when the most favourable configurations held by processes have identical

energies. Although a more obvious approach might involve a less stringent termination criterion,

it is thought that using this scheme, the number of iterations until termination provides a rea-

sonable means of evaluating solution exploitation. Within the loop, solutions with random spins

are generated, based on the configuration of existing solutions (lines 9–15), and replaced (lines

16–18). The constants NVECTORS and MEMORY CHOOSING RATE control the number of

retained solution vectors and the memory choosing rate, respectively. Following this, each loop

iteration contains communication instructions for processors involved in ring exchange of solu-

tions: Lines 20 and 21 swap random solution vectors between processors, following which the

function copy min() on line 22 copies the value of the energetically more favourable argument

to its complementary argument. In this way, energetically favourable solutions are propagated

within a ring of search processes. There are (N PROCESSORS÷ZONE SIZE) such processors

in the ring.

In addition, solutions are periodically exchanged between subgroups of processes, using the

collective operation reduce min zone. This performs a reduction based on the most favourable

of argument solutions. As defined, the operation involves the highest energy solutions held by

each search process. The operation is executed at a rate determined by the constant ZONE-

EXBLOCK. Subgroup size is influenced by the value of constant ZONE SIZE. When equal

to N PROCESSORS, there exists a single group for which collective operations are defined,

whereas ring communications are without eff ect. Conversely, for unit ZONE SIZE all processes

are involved in ring communications, whereas collective operations are without eff ect.

4.2 Dynamic programming approaches

In the previous chapter, exact solutions to the ground state problem were presented, based on

modelling spin interactions as Markov chains. The latter in turn were used to arrive at dynamic

programming formulations of the respective optimisation problems. Run-time complexities are

lower than the 2nm bound required for finding the ground states of the n × m spin lattice using

brute force, nevertheless they are high enough to merit investigating parallelisation strategies.



4.2. Dynamic programming approaches 39

4.2.1 First-order Markov chain approach

Parallelisation is based on an approach by Grama et al. [30], where a dynamic programming

problem which is serial and monadic is decomposed into a tabular arrangement of solutions

to subproblems of increasing size. The order of operations required to solve the problem is

equivalent to the order of individual scalar multiplications and additions required for a series

of matrix / vector multiplications. The parallelisation approach is therefore given by parallel

matrix / vector multiplication, which is well studied.

A dynamic programming problem is monadic if its optimisation equation contains a single

recursive term. That is, given the function c = g ( f ( x1), f ( x2), . . . , f ( xn)), which assigns a cost

to the solution constructed from subproblems x1, x2, . . . , xn, monadicity exists when g is defined

as f ( j)

⊗a( j, x), where

⊗is an associative operator. In this form, each solution depends on a

single subproblem.

Furthermore, a dynamic programming problem is serial, if there are no cycles in the graph

of dependencies between subproblems. More formally, the graph G = (V , E ) is defined by the

set of nodes V , where each edge represents a subproblem. An edge between nodes exists, if the

optimisation equation contains a recursive term indicating a dependency between subproblems.

Examining the optimisation equation for lattice ground state energy (without cyclic bound-

ary conditions),

Hmin(S i) =

minS i { H (S i)} i = 1

minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1,

it is apparent that the equation is monadic. To establish existence of the serial property, the

graph of subproblem dependencies is visualised (Figure 4.5(a)). As depicted, rows of nodes

represent states of subsystems S i, which characterise the values of subproblems. Since there

are n subsystems, there are n2|S 1 | nodes in the graph. Since a subproblem may assume as many

values as there are values of its preceding dependency, the graph has a trellis-like structure con-

sisting of bipartite graph segments. Because this organisation into individual levels is acyclic,

the dynamic programming problem is serial.

The graph is modified to include information on system energy. Given the pair of nodes

associated with subsystem configurations S k i

, S li−1

, define the weight function wS k

i, S l

i−1

=

wk , li

= H S k

i

+ H b

S k

i, S l

i−1

, for 1 < i ≤ n. Further define an additional node α, such that the

set of graph edges in extended to E = E ∪(α, S k

1)|1 ≤ k ≤ q

for q subsystem configurations.

For i = 1, the weight function is defined as wα, S k

i

= H (S i). Minimising system energy is

then equivalent to obtaining mink pα, S k

n

, where p

α, S k

n

is the minimum path between nodes

α and S k n.




n

2m

(a) First-order

(n − 1)m

2m+1

(b) Higher-order

Figure 4.5: Graph of subproblem dependencies for an n = 3, m = 2 spin problem

A further observation is that that the minimum paths p

α, S k i

, 1 ≤ k ≤ q are expressed as

p(α, S 1i ) = min

w

1,1i

+ p(α, S 1i−1), w

1,2i

+ p(α, S 2i−1), . . . , w

1,q

i+ p(α, S

q

i−1)

,

p(α, S 2i ) = min

w

2,1i

+ p(α, S 1i−1), w

2,2i

+ p(α, S 2i−1), . . . , w

2,q

i+ p(α, S

q

i−1)

,

...

p(α, Sq

i) = min

w

q,1

i+ p(α, S 1

i−1), wq,2

i+ p(α, S 2

i−1), . . . , wq,q

i+ p(α, S

q

i−1)

,

for i > 1. For i = 1, p(α, S k i

) = w(α, S k i

). In an analogy to matrix / vector multiplication,

where addition is substituted by minimisation and multiplication is substituted by addition, the

equations are equivalent to

pi = Mi, i−1 × pi−1

where pi = [ p(α, S 1i

) p(α, S 2i

) . . . p(α, Sq

i)] T . For i > 1, the matrix is defined as

Mi, i−1 =

w1,1i

, w1,2i

, . . . , w1,q

i

w2,1i

, w2,2i

, . . . , w2,q

i...

..

.

. ..

..

.w

q,1

i, w

q,2

i, . . . , w

q,q

i

,

otherwise

Mi, i−1 =

w(α, S 1i

), w(α, S 1i

), . . . , w(α, S 1i

)

w(α, S 2i

), w(α, S 2i

), . . . , w(α, S 2i

)...

.... . .

...

w(α, Sq

i), w(α, S

q

i), . . . , w(α, S m

i)

.




1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

P1

P1

P1

P1

P2

P2

P2

P2

P3

P3

P3

P3

P4

P4

P4

P4

×

×

×

×Step 1

Step 2

Step 3

Step 4

(a) Basic

1

2

3

4

2

3

4

1

3

4

1

2

4

1

2

3

P1

P1

P1

P1

P2

P2

P2

P2

P3

P3

P3

P3

P4

P4

P4

P4

×

×

×

×

(b) Improved

Figure 4.6: Parallel matrix operations. Numerals indicate order of vector elements.

Using a sequence of n matrix / vector operations, it is now possible to compute minimum paths

p(α, S k i

), by initialising p to a q-component zero vector: The first operation M1,0 × p0 yields

minimum paths p(α, S k 1

) for 1 ≤ k ≤ q. Retaining the value of the resulting vector as the

argument for the next matrix / vector operation, minimum paths p(α, S k 2

) for 1 ≤ k ≤ q are

computed. The process is continued, until minimum paths p(α, S k n) have been computed. The

minimum vector component then corresponds to ground state energy.

Matrix operation parallelisation

A simple approach to parallelising the matrix / vector operation is shown in Figure 4.6(a). Here,

the matrix is distributed in such a way that each processor stores the values of q

prows, where p

is the number of processors. Each is responsible for computing the same fraction of components

of the resulting vector. It follows that the latter is assembled from partial results computed by

each processor. In the message passing model, this can be achieved using a gather operation.

For the required purpose, it is necessary for each processor to access all components of the

resulting vector subsequently. Therefore, it is practical to gather collectively. The algorithm is

described in the following pseudocode, where M i, i−1k , l

denotes the component in row 1 ≤ k ≤ q,




column 1 ≤ l ≤ q of matrix M i, i−1:

Float[] p

Float[] p

for k:= (proc id * q p

+1) to ((proc_id+1) * q p

)

Float minval := ∞f o r l : = 1 t o q

if p[l] + M i, i−1k ,l

< minval

minval := p[l] + M i, i−1k ,l

p[k ] := minval

all gather(p, p)

In the pseudocode, the outer loop is responsible for iterating through matrix rows. For each row,

elements are added to vector components stored in p. The minimum sum becomes a component

of the vector p. Matrix rows are assigned to processors based on the processor identifier proc id,whose value is in the range [0, number of processors). The computation concludes with the

collective operation all gather().

Examining the algorithm’s computational complexity, it can be seen that execution time is

t (q) ∝ q

pq. Since determining ground state energy requires n iterations of the algorithm, where

n is the number of rows in the spin lattice, total execution time is t (n, q) ∝ nq2

p. Considering that

the lattice contains m = log2 (q) spin columns, execution time expressed in terms of lattice size

is O

n p

22m

, which is cost optimal in comparison to the serial algorithm presented in Chapter 3.

Memory efficient matrix / vector computation

Alternatively, it is possible to perform the desired matrix / vector computation using a parallel

algorithm with reduced memory requirements for vectors q, q. In resemblance to Cannon’s

algorithm [13], it can be observed that although all processors access vector q in its entirety, in-

dividual components need not be accessed simultaneously, as in the described approach above.

Instead, the vector can be distributed between processors, so that each holdsq

pcomponents.

Computation commences with each processor performing additions of matrix elements asso-

ciated with its allocated vector components. After the latter have been processed, all proces-

sors perform a cyclic shift of vector components, which allows the minimisation operation to

progress further. This procedure is repeated until processors have completed the minimisation

operation on their assigned rows. The approach is illustrated in Figure 4.6(b), for which the

modified pseudocode is:

Float[] p

Float[] p

for k:= (proc id *q

p+1) to ((proc_id+1) *

q

p)





if (l modq

p) = 1

cyclic shift(p)

if p[(l-1) modq

p+ 1 ] + M

i, i−1k ,l

< minval

minval := p[(l-1) modq

p+ 1 ] + M

i, i−1k ,l

p[(k-1) mod q p

+ 1 ] : =minval

Here, the previously defined loop has been adapted to index the components of the distributed

vectors. Since the result vector p becomes an operand in successive iterations of the algorithm,

performing a collective operation on p is not necessary; this vector is thus distributed identically

to p.

In Chapter 3, a serial algorithm was presented for the ground state energy of the lattice

with cyclic boundary conditions. This involved evaluating the boundaryless ground state en-

ergy H min for all configurations of boundary subsystems S 1, S n. To adapt the parallel matrix

algorithm for this problem, define the weight function between nodes α, S k

1as w α, S k

1 =

H S k

1

+ H b

S k

1, S l

n

, for boundary subsystem configuration S l

n. The ground state energy can

then be obtained by performing the described series of matrix operations for all configurations

of subsystem S n. For each configuration S k n, the final result vector contains the minimum path

lengths pn = [ p(α, S 1n) . . . p(α, S k

n) . . . p(α, Sqn)]T , of which the relevant component is retained.

The ground state energy is the minimum of these retained components. The complexity of the

entire computation is O

n p

23m

executed on p processors, for an n-row, m-column lattice. In

comparison to the serial algorithm, this is cost optimal.

4.2.2 Higher-order Markov chain approach

It remains to develop a parallel solution to the approach based on the higher-order Markov chain.

For this model, it was formulated that ground state probability is


P (S i, S i−1, . . . , S i−m) i ≤ m

maxS i−m−1{P (S i|S i−1, . . . , S i−m−1) Pviterbi (S i−1, . . . , S i−m−1)} i > m,

where m is the number of lattice columns. By the relation between state probability and energy,

in analogy to the approach based on row-wise lattice decomposition shown in Chapter 3, it was

shown that

Hmin(S i, S i−1, . . . , S i−m) =

H (S i, S i−1, . . . , S i−m) i ≤ m

minS i−m−1{ H b (S i, (S i−1, . . . , S i−m−1)) + Hmin (S i−1, . . . , S i−m−1)} i > m,

where H (S i, S i−1, . . . , S i−m) is the energy of the ordered set of subsystems (S i, S i−1, . . . , S i−m)

and H b(S i, (S i−1, . . . , S i−m−1)) is the interaction energy between system S i and the ordered set

(S i−1, . . . , S i−m−1). Examining this optimisation equation, it can be seen that it is monadic,

since it contains a single recursive term. As each level of recursion eff ects a unit decrease




of indices of the tuple (S i, S i−1, . . . , S i−m), there are no cyclic dependencies between subprob-

lems. The dynamic programming formulation is therefore also serial. Considering this sim-

ilarity, the opportunity is given to adapt the parallel matrix based computation to solve this

dynamic programming problem. To achieve this, the weighted graph of subproblems is re-established, with an edge connecting two nodes if the recursive formulation indicates depen-

dency. For an n × m spin lattice, there are (n − 1) m 2m nodes in the graph, because each tuple

(S i, S i−1, . . . , S i−m) has 2m configurations and a solution is constructed from (n − 1) m subprob-

lems. A given subproblem corresponds to a certain position of the sliding window on the lat-

tice, as described in Chapter 3. The function w ((S i, S i−1, . . . , S i−m), (S i−1, S i−2, . . . , S i−m−1)) =

H b (S i, (S i−1, . . . , S i−m−1)), defined for i > m, describes the weight of an edge. As before,

the graph is extended with an additional node α, so that the set of edges is defined as E =

E ∪ {(α, (S 1, S 2, . . . , S m+1)) | for all configurations of (S 1, . . . , S m+1)}. For i ≤ m, define the

weight function w(α, (S i, S i−1, . . . , S i−m)) = H (S i, S i−1, . . . , S i−m). This results in a trellis-likegraph, shown in Figure 4.5(b). Minimising system energy is equivalent to obtaining

min(S nm,S nm−1,...,S nm−m)

{ p (α, (S nm, S nm−1, . . . , S nm−m))} ,

where the function p is the minimum path between two nodes in the graph.

Previously, matrices of edge weights between trellis segments were used to compute min-

imum paths, for which the parallel matrix operation was presented. From the optimisation

equation and Figure 4.5(b), it is observed that each node at a given level is connected to

only two nodes at the preceding level. This is because there are two configurations of tuple

(S i−1, S i−2, . . . , S i−m−1) for any specified tuple (S i, S i−1, . . . , S i−m). Assigning infinite weights

to unconnected nodes between trellis levels, it follows that the matrices are sparse, with regard

to infinite valued elements.

Providing matrix sparseness can be exploited, an adaptation of the existing parallel algo-

rithm will execute in t (n, m) ∝ (n − 1)m 1 p

2m time on p processors, since each matrix contains

2m rows distributed between processors. With a total of (n − 1) m matrix operations, the ground

state energy of the lattice without cyclic boundary conditions can be obtained in O( nm p

2m) time.

This is cost optimal in comparison to the serial algorithm described in Chapter 3. Using bit

string representations of spin tuples in combination with shift operations, an approach which

considers matrix sparseness is described in Chapter 6.



Chapter 5

The Project

In previous chapters, the theoretical background to the ground state optimisation problem wasdescribed. Having described the two approaches identified for solving this problem, this chapter

deals with undertaken practical work towards their implementation and evaluation.

5.1 Project description

The purpose of the project is to conduct practical investigation into parallel algorithms for de-

termining ground states of the Ising spin glass. Specifically, the project deals with the two-

dimensional Edwards-Anderson model, i.e. the Ising model with lattice aligned spins, in which

spins are able to assume two discrete states.Investigations deal with a method for obtaining spin glass ground states exactly. The method

is based on the transfer matrix method, in which the statistical-mechanical properties of the lat-

tice system are used to obtain solutions. It follows that one project objective is to develop a

parallel algorithm based on the Transfer Matrix method. As an additional objective, the project

includes investigating an alternative parallel algorithm, with which solutions to the ground state

problem are obtained heuristically. The performance of both parallel algorithms is to be evalu-

ated; in the case of the heuristic this entails evaluating solution accuracy.

Investigation requires that algorithms are developed in software. The software should be

self-contained: From the user’s perspective, the software should off er sufficient functionality tobe useful as a research tool, allowing various types of problem instance to be solved using the

implemented algorithms. The software should be able to be executed on a wide range MIMD

multiprocessing architectures.

5.1.1 Available resources

There are two computing resources available for the project. The first of these, Ness, is a shared

memory multiprocessor system [2]. It has a total of 32 back-end processors, which are parti-

45



46 Chapter 5. The Project

tioned into two interconnected groups. This configuration allows a single job to request 16 pro-

cessors at maximum. The system is constructed from AMD 64-bit Opteron processors, which

have a clock frequency of 2.6GHz. Jobs are submitted to the back-end from a dual processor

front-end, which executes the Sun Grid Engine scheduling system. The back-end has 32 × 2GBof RAM. The system is based on the Linux operating system, providing Fortran, C and Java

programming environments. Both shared memory and message passing model programming

are supported, using the MPI and OpenMP programming interfaces. Ness does not implement

a budget system for CPU time, however access to queues is restricted according to the amount

of requested computation time.

Also available is the supercomputing resource HPCx [3]. This consists of a cluster of IBM

P575 shared memory nodes, each containing 16 processors and 32GB of RAM. For executing

jobs, the system consists of 160 compute nodes. Nodes are constructed from Power5 proces-

sors, which have a clock frequency of 1.5GHz. The processor architecture allows for 6.0Gflop / stheoretical peak performance. Inter-node communication is supported using IBM High Perfor-

mance Switch interconnects. These provide a maximum unidirectional inter-node bandwidth

of 2GB / s, at MPI latencies of 4–6 µs [24]. Based on the AIX operating system, the serial and

parallel programming environments are similar to those provided on Ness. The job scheduler,

LoadLeveler, provides queues for serial and parallel jobs, using a budget system for CPU time.

5.2 Project preparation

Before commencing the project, an initial phase was designated to project preparation. Thisconsisted of investigating the problem background and defining the project’s aims. Potential

approaches to solving the spin glass problem were identfied and implemented as prototype soft-

ware. Project process activities were carried out, consisting of a risk analysis and scheduling. A

software development model was decided upon.

5.2.1 Initial investigations

Access to an existing serial transfer matrix code was provided before commencing the project

preparation phase. The potential was given for a code level analysis of parallelism; this approach

was considered an alternative to basing an implementation on the mathematical formulation of

the optimisation problem, which was subsequently undertaken. With a view to implementing

the parallel approach described by Grama et al. [30], initial work consisted of investigating the

exact optimisation technique described in Chapter 3.

The harmony search algorithm was identified as a potential secondary approach to com-

pare to the envisaged exact ground state solver. After initialising a CVS repository for project

source code and experiment data, a serial implementation of the heuristic was evaluated, in

order to assess the algorithm’s suitability for further parallelisation. The evaluation consisted



5.2. Project preparation 47

spinglass.h

+int xSize

+int ySize

+double[] weights

+Spin[] initialSpins

+Spin[] spins

+boolean[] clamps

Figure 5.1: Spin glass structure design

of determining solution accuracy, based on ground states obtained for a collection of random

spin glasses, using an implementation of a brute force algorithm. Discussed in Chapter 7, re-

sults suggest that solution accuracy might be increased, using a parallel implementation of the

algorithm.

5.2.2 Design and implementation

A basic software framework was developed, to facilitate the collation of performance data. This

framework consisted of a set of utilities, implementing rudimentary functionality for creating

spin glass problem instances and evaluating their energy. Based on this, a design for a more

extensive framework was created, based on the following list of client operations on a spin glass

API:

• Initialisation of spin lattices with specific boundary conditions

• Destruction of spin lattices

• Calculation of system energy

• Bond randomisation

Also, a spin glass data structure was designed. Shown in Figure 5.1, this consists of instance

variables for storing the height and width of the spin lattice. The values of spins themselves are

stored in an associative array-like data structure, as are the values of coupling constants. The

former are stored two-dimensionally in row major fashion, while the latter require an additional

dimension. In the design, two 2-dimensional arrays store vertical and horizontal bonds, again

using a row major storage scheme. To record whether a spin is clamped to a specific state, the

data structure includes a further array. Finally, the initial values of spins are stored. This stores

the actual state to which a spin is clamped, allowing the primary spin array to be reserved for

computation.

A schema of the framework is shown in Figure 5.2. This includes an interface for per-

forming input / output operations: It allows representations of coupling constants to be read from




SpinGlass

+spinGlass_new()

+spinGlass_remove()

+spinGlass_energy()

IO+readBonds()

+writeBonds()

+readClamps()

+writeClamps()

writeBonds writeClamps

transferMaxtrixSolverSolver

Figure 5.2: Software framework design

files, similarly a function allows the clamping state of spins to be read. These operations are

complemented by functionality for writing representations to file.

The IO operations are required by the two utilities writebonds and writeclamps, which fa-

cilitate creating spin glass problem instances. These are responsible for writing data to files,

which are subsequently read by solver utilities. The format of clamping state files is specified as

a UNIX UTF-8 encoded text file, containing the symbols ‘1’ and ‘0’. These provide a represen-

tation of whether a spin is clamped, such that a string encodes the state of a lattice row. Strings

consist of the aforementioned symbols, separated by whitespace. Spin clamps are stored in the

file as consecutive strings, separated by line feed characters. The file format for spin coupling

constants is similar: Here, symbols are floating point numbers in decimal notation, again sep-

arated by whitespace and line feed characters. The format reflects the design of the spin glass

data structure, in that two consecutive blocks retain values of vertical and horizontal bonds. The

format specifies that these blocks are separated by a single blank line.

Figure 5.2 also shows the design of the spin glass API. This exports functionality to client

solvers, which themselves implement a simple interface for solving spin glass instances. A

solver uses the IO interface to construct a spin glass instance from bond and clamp state files.

Thereafter, it invokes its implementation of a ground state algorithm. The latter utilises further

API operations, to evaluate spin glass energy. Finally, the spin glass instance is destroyed, after

an output of the determined solution has been generated.

5.2.3 Implementation language and tools

During the course of software design, the choice of implementation language and tools was

considered. The C language was selected due to its general widespread use as a development




language on high performance systems, and availability of compilers both on the two computa-

tion resources and development machines. To ensure portability, ANSI C 89 was selected as the

implementation standard.

To expedite software development, it was decided to implement the software using the GLib

library [1]. This is a cross-platform collection of utility functions which implement general

purpose data structures, parsers etc. Macros and type definitions are provided, which potentially

reduce the amount of required pointer casts in a code. This in turn has an impact on cast errors

and debugging time.

A build management system was also selected. Widely used in conjunction with the C and

C++ programming language on UNIX based systems, this allows makefiles to be generated

semi-automatically and configured for diff erent target systems. This was considered useful for

providing an application package for a variety of systems.

Given the available computing resources of which the HPCx is a clustered system, the MPI

message passing library was chosen for parallel development. For this reason, the algorithms

described in Chapter 4 are given for the message passing model. Although the possibility of

using a hybrid shared memory / message passing approach using MPI and e.g. OpenMP is given,

this was considered beyond the scope of the project.

5.2.4 Choice of development model

For the choice of software development model, multiple factors were taken into account. These

included the amount of time available, the required functionality and overall software complex-

ity.

Intuitively, implementation can be realised in two phases, each relating to one of the two

algorithms. From previous experience and design requirements, it was assumed that each of

the implementation tasks would involve a relatively small amount of written code. Instead, im-

plementation eff ort was assumed to focus on distribution of data, communication patterns and

algorithm correctness. Therefore, it was thought that the approach of applying staged delivery to

each phase would be advantageous to the project. Following the design of the framework’s over-

all architecture with multiple ground state solvers, this approach involves discrete design / imple-

mentation / testing activities associated with one release for each ground state solver. Developing

each ground state solver is associated with iteratively augmenting software functionality.

5.2.5 Project schedule

The devised project schedule is shown in Appendix A. Based on an available time frame of 16

weeks, the schedule accounts for all project deliverables, implementation goals and exploratory

aims. Therefore both a practical component, consisting of software development and evaluation,

and the project report and presentation are included.




Risk Type Impact Likelihood Action

Data loss Schedule High Low Avoid

Lack of time Schedule, Scope High Moderate Reduce

Unavailable testing resources Schedule, Quality, Scope High Low Avoid

Algorithmic complexity Scope, Schedule Moderate Moderate Avoid

Table 5.1: Identified project risks

The practical component is split into two distinct phases. Each of these corresponds to the

development and evaluation of the dynamic programming and harmony search based ground

state solvers. A development / evaluation iteration is comprised of tasks for designing, imple-

menting, debugging and testing software, before gathering performance data. Following devel-

opment and evaluation, tasks are specified for producing the report and presentation. A single

week is left unallocated for making amendments to the produced work.

The implementation, debugging and testing tasks required for software development are

scheduled in parallel, as it was thought that this best reflects the nature of the chosen develop-

ment model, where functionality is integrated iteratively. Evaluation tasks are interleaved with

software development, so as to minimise the eff ects of unavailable resources, should these have

occurred.

5.2.6 Risk analysis

To assess the chance of the project’s successful completion, potentially detrimental factors were

considered. Such factors include those aff ecting the project plan and scheduling, software qual-

ity and software scope. Table 5.1 lists risks identified during project preparation by type, esti-

mated impact, likelihood of occurring and proposed action.

Judging from the product of impact and likelihood of occurrence, the most significant risk is

lack of time. As the time frame for completing the project and required deliverables was short,

this was conceivable. To counteract this, care was taken to define project goals rigorously to

avoid feature creep, furthermore all tasks were scheduled within a 15 week time frame, allowing

for a further week as float time.

The remaining risks were avoided by ensuring sufficient computing time on parallel ma-

chines (pertaining to unavailable resources), backups and software version control (pertaining

to data loss) and sufficient background research (pertaining to sophisication of algorithms). As

a fallback action in the event of not being able to implement the researched transfer matrix

scheme, the possibility of performing a code level analysis of an existing serial transfer ma-

trix solver code was given. As a caveat, this approach would have off ered less insight into the

underpinnings of parallelism in the transfer matrix method.




5.2.7 Changes to project schedule

A number of changes were made to the project schedule. These concerned both the order of

scheduled tasks and their estimated duration.

Most significantly, developing the parallel harmony search solver proved to require less

time than envisaged in the project schedule; it claimed only two schedule weeks in comparison

to the four weeks assigned during preparation. As a result, it was possible to implement a more

advanced exact parallel solver, as previously described.

Also, the original decision to designate performance evaluation to a single task for each of

the two solver types proved impractical. Instead, data were gathered separately for each comput-

ing resource, with subtasks for each variant of the exact solver. Separating evaluation between

machines was initiated by the fact that implementing experiments on HPCx was delayed due to

compilation issues with the required version of the GLib library.

Furthermore, after devising the original project schedule, the communicated date for the

presentation proved to be after the date for the remaining deliverables. The time gained was

allocated to completing the project report.

5.2.8 Overview of project tasks

The following provides a description of tasks undertaken during the project, as an account of

the extent to which the project schedule was adhered to.

In weeks 1 and 2, the ideas presented in Chapter 3 were developed as a basic serial exact

ground state solver code. The parallelisation method using collective operations, discussed in

Chapter 4 was also implemented. In both cases, the algorithms were based on the spin lattice

without boundary conditions.

In week 2, timing data were collected for the previously implemented serial solver. In addi-

tion, scaling data for the parallel solver were collected on the Ness computing resource. Work

commenced on implementing the improved parallel ground state solver using cyclic commu-

nication patterns, also described in Chapter 4. The improved parallel ground state solver was

completed in week 3. In week 4, further scaling performance data were collected on Ness for

this code. Remaining time in week 4 was used to conduct a code review, based on the entirety

of implemented software.

In week 5, work commenced on developing the harmony search ground state solver. Both

serial and parallel code was completed in week 6, during which the dynamic programming

code was modified to support solving systems with cyclic boundary conditions. In week 6,

performance data for the dynamic programming code were collected on the HPCx machine.

In week 7, further performance data were gathered on HPCx. This was to evaluate the

dynamic programming code with cyclic communication patterns. Also, routines were developed

for evaluating harmony search performance, which was subsequently evaluated in week 8.




In weeks 9 and 10 a further modification to the exactly solving dynamic programming ap-

proach was implemented, based on the higher-order Markov chain theory described in Chapter

3. This was for the spin glass model without cyclic boundary conditions. In week 10, perfor-

mance data were gathered for this algorithm.The remaining time was used to complete the project report and perform a final revision of

all deliverables.



Chapter 6

Software Implementation

6.1 Introduction

The implemented software is a framework for experimenting with two-dimensional lattice spin

glass ground state problems. It consists of utilities which assist with generating spin glass in-

stances, which may be subsequently solved using either exact or heuristic based solver utilities.

The latter provide information on both the energy and spin configuration of ground states. While

aimed primarily at generating solutions using parallel algorithms, it is also possible to reconfig-

ure the software to use serial computation only.

The software is implemented in the C programming language. The GNU C compiler was

used on the development system. To increase C90 standard conformity, the compiler flags -ansi-pedantic were used. Development took place predominantly on a 32 bit single processor Linux

system, on which gcc 4.1.2 and gdb 6.6 were installed. The MPI implementation was MPICH2,

version 1.0.6. To assist with debugging, the Valgrind suite was used to check for memory leaks.

The version control system CVS was used extensively during implementation. Based on a

central repository stored on the Ness machine, version control was used as a means of retrieving

the entire code base and synchronising code modifications between machines.

The build management system used for the software is the GNU autotools suite. This is used

to automatically configure the software prior compiling it on the target architecture. Instructions

on how this can be achieved are given in Appendix E.

In the following, an overview of the software framework is given.

6.2 Implementation overview

From the user’s perspective, the framework consists of a set of binary executables. These are:

• genbonds

53



54 Chapter 6. Software Implementation

• genclamps

• sbforce

• dpsolver

• dpsolverfast

• hmsolver

The two utilities genbonds and genclamps are used to generate random coupling constants

and specify the clamping state of spins in the lattice, respectively. As implemented, the utilities

produce character based representations as described in the design in Chapter 5. The utilities

write to the standard output. Using UNIX shell redirection, this output can be stored in files, in

preparation to invoking a ground state solver on the data. Using these utilities therefore facili-tates creating instance data. Both genbonds and genclamps use standard command line options

for specifying spin lattice dimensions and related parameters. For example, lattice dimensions

are specified using –xSize= x –ySize= y, for a system with x rows and y columns.

The remaining executables correspond to implementations of algorithms described in Chap-

ters 3 and 4: For testing purposes, the sbforce utility implements a simple exhaustive search,

hmsolver the harmony search algorithm in its parallel realisation. Similarly, dpsolver and dp-

solverfast provide exact solvers based on dynamic programming approaches. As before, all of

these executables use command line parameters for specifying options. In this case, the most

significant parameters are those for specifying bond and clamp configuration files. These utili-ties write solutions to standard output.

From the perspective of implementation, the software is constructed using a modular ap-

proach. Also based on the design described in the previous chapter, there exist various library

modules, which provide functionality such as IO and spin glass manipulation. These are utilised

by client modules, which include implementations of of ground state solvers. By means of C

headers, client modules are able to reference APIs. API implementations are used to generate

separate binary executables through the linking process.

Appendix B includes a UML class schema of the relationships between source code modules

and headers. As shown, source code modules reference various headers, which include arrays.h,gstatefinder.h, io.h, random.h and spinglass.h are defined. Their purpose is as follows:

• arrays.h Specifies multidimensional array operations

• gstatefinder.h Specifies the interface to be implemented by ground state solvers

• io.h Defines IO operations

• random.h Defines randomisation functions



6.3. Source code structure 55

• spinglass.h Defines the spin glass data structure and operations

As shown in Figure B.1, multidimensional arrays are used by the dynamic programming based

solvers, as befits the algorithms’ requirements for associative data structures. The IO header

is used by module main.c, which implements an entry point for all executables. Further-

more, gstatefinder.h is included by main.c, bforce gstate finder.c, dp gstate finder.c and har-

mony gstate finder.c, the latter three implementing exhaustive search, dynamic programming

and harmony search, respectively. Whereas dp gstate finder.c implements the basic exact op-

timisation algorithm described in Chapter 3, a further module dp gstate finder fast.c provides

an implementation of the improved dynamic programming algorithm, described in the same

chapter.

6.3 Source code structure

From the description of source module and header purpose, the following provides a more de-

tailed description of the implementation. This is given at function level for a selection of the

code base, to illustrate core functionality.

6.3.1 Library functionality

arrays.h

As previously mentioned, the implementation of the exactly solving algorithm requires access

to multidimensional arrays. Given the restriction in C to defining single-dimensional dynamic

arrays, next to using static arrays, it is necessary to use pointer arithmetic and casts to implement

multidimensional arrays. Confining implementation to source module arrays.c, functions are

provided for constructing and destroying arrays in two and three dimensions of arbitrary size.

Returning pointer types, the constructor functions allow data elements to be accessed using

conventional array syntax, while preserving memory contiguity. These functions are invoked

repeatedly by dp gstate finder.c and dp gtate finder fast.c. While a less involved approach

might have off ered increased performance, implementing the dynamic programming algorithm

otherwise was considered too cumbersome, given the allocated time for software development.

As an alternative, the header defines macros which emulate a multidimensional array, based

on performing arithmetic on a single pointer. Although syntactically less convenient, this ap-

proach requires fewer dereferencing operations to access a pointer element. For performance

reasons, the approach is utilised by the spin glass library functions in spinglass.c.




io.h

Header io.h defines six functions, responsible for reading and writing files containing repre-

sentations of spin state, clamping state and coupling constants. Three functions responsible

for reading from file are of the form *read(char *fileName, int *xSize, int *ySize). Of all pa-

rameters, which are all called by reference, the value of fileName is read upon invoking the

function, whereas xSize and ySize retain the spin lattice diminsions after the function call has

been completed. The function returns a pointer to state data read from file.

Complementary functions for writing to file are of the form write(struct SpinGlass *spin-

Glass, char *fileName). Here, the parameters consist of a pointer to an instance of the spin

glass abstract data type (described in the previous chapter), and the name of the file to write to.

The function return type is void.

The file-reading functions in io.c are implemented using a single static method, GQueue*parse file(). As the name suggests, this provides simple parsing capabilities, using a loop to

iterate through string tokens obtained from the standard library function strtok(). Recording and

verifying counts of symbols on each line, tokens are added to a queue. This queue is returned

by the function. Dequeuing elements stored in the queue, the aforementioned reading functions

then construct data structures representing spin glass parameters.

spinglass.h

The spin glass data structure is defined in header spinglass.h. Using a C struct type, the fol-

lowing fields are defined:

1 s t r u c t S p in Gl as s {2 g i n t x S i z e ;

3 g i n t y S i z e ;

4 S p i n ∗ s p i n s ;

5 g d o u b l e ∗ w e i g h t s ;

6 g b o o l e a n ∗ c l a m p s ;

7 S p i n ∗ i n i t i a l S p i n s ;

8 } ;

As given by the design description in Chapter 5, the structure specifies variables ∗ for storing

lattice dimensions. An enumeration type defines the Spin type; the pointer field is used to

reference a memory block storing the state of spins. The enumeration defines integer states

UP=1 and DOWN=-1. Spins’ states are stored using a row-major scheme. This matches the

access method using a single pointer, defined in arrays.h. Coupling constants, clamping states

and the field initialSpins store states similarly. The latter field provides an account of spin state

∗GLib specifies wrappers for standard C types; motivation for their use is discussed in the GLib documentation

[1]




(a) row energy() (b) interrow energy() (c) ensemble delta()

Figure 6.1: Functions provided by spinglass.c

distinct to field spins, the latter storing the state of spins while performing optimisation. Using

two separate fields allows lattice configurations to be compared before and after optimisation.

Header functions in spinglass.h are grouped into four categories, associated with allocating

memory for the data type, computing lattice energy, writing lattice properties to file, and mis-

cellaneous activities. All functions operate on the spin glass data structure, which is passed by

reference from a caller function.

The purpose of the memory related functions is as described in the design: These ensure that

the spin glass structure is initialised and terminated correctly. The constructor function is of the

form *spinglass alloc(gint xSize, gint ySize, Spin *initialSpins, gdouble *weights, gboolean

*clamps); it requires as parameters the lattice dimensions, initial spin configuration, coupling

constants, and clamping states. The function returns a pointer to a newly allocated data structure(fields are assigned according to supplied parameters). To assist in freeing memory after use,

the function spinglass free() is implemented.

Lattice energy is computed using a collection of five functions. The simplest of these is de-

fined as spinglass energy(struct SpinGlass *spinGlass), which returns as a floating point num-

ber the energy arising from all interactions in the lattice. For convenience, spinglass(struct Sp-

inGlass *spinGlass, Spin *conf) returns the energy due to coupling constants specified in *spin-

Glass, however the configuration is given as a separate array *conf. A comparison between the

remaining three energy calculating functions is given in Figure 6.1: The spinglass row energy()

function determines the energy of a spin row (considering horizontal bonds), whereas inter-row energy() uses vertical bonds to calculate the interaction energy between adjacent rows.

With ensemble delta(), the energetic contribution between a single spin and its predecessors in

horizontal and vertical dimensions is calculated.

The file output functions in spinglass.c are used to implement the output functions in io.c.

The functions are of the form write(struct SpinGlass *spinglass, FILE *file), i.e. arguments

include a pointer to a spin glass structure and a file pointer. If required, this allows spin glass

properties to be easily echoed to screen, using the file pointer stdout.




Finally, miscellaneous functions include get random spins() (used to generate random spin

configurations, while considering spin clamping state), has vertical boundary() (used to deter-

mine whether cyclic boundary interactions are present along the lattice’s vertical dimension),

and correlate(). The latter is used to compare spin configurations between spin glass structuresin terms of diff ering spin state.

6.3.2 Client functionality

Having described library functionality provided by the software, attention is now given to the

code modules utilising this functionality. These include the entry point module main.c, and

more importantly, the modules implementing optimisation algorithms. Note that the code base

includes additional modules for the utilities genbonds and genclamps. These do not make use

of library functions; as their implementation is trivial, these are not considered in further detail.

The source code for all algorithms is provided in Appendix F.

main.c

Module main.c uses the standard argument processing library provided by GLib to implement

execution parameter parsing for solver utilities. This requires a number of auxiliary data types

and structures, which are defined as static global and local variables in the module’s main()

function. The latter is responsible for reading file name arguments associated with specific flags,

describing the locations of coupling constant and clamping state files. Also, a file describing a

spin configuration to compare the solution to may be specified.After parsing program arguments, presence of required and optional parameters is verified.

A local function init() then initialises a spin glass data structure, using previously described

function spinglass alloc(). Optimisation is then initiated by invoking the header defined func-

tion find ground states(). After the solution has been obtained, spinglass correlate() performs

a comparison, should the related flag have been specified. After deallocating the data structure,

init() and main() terminate. By each optimisation algorithm implementing find ground states()

in its own module and linking with main.c, the main() function is provided by the same mod-

ule for all utilities. This promotes code reuse and facilitates extending the code base with new

algorithms.

bforce gstate finder.c

To generate ground truth data for testing purposes, module bforce gstate finder.c implements

a brute force ground state solver. The solver is based on an infix traversal of state space. This

is achieved using a function find ground states(), which is called recursively. A conditional

statement restricts recursion depth, based on a variable whose value represents the position of a

window on the spin lattice. For each invocation of the function, the state of the spin under the




window is flipped. Before and after flipping spin state, recursive calls are performed, in each

case advancing the window by one spin. The base case eff ects evaluation of system energy. If

system energy is found to be lower than the recorded minimum, energy and configuration are

output before updating the minimum. Since search is exhaustive, the ground state configurationis eventually output.

harmony gstate finder.c

Serial and parallel harmony search algorithms were described in Chapters 3 and 4. The se-

rial algorithm consists of initial random solution generation (characterised by the parameter

NVECTORS) followed by an iterative process, in which low-utility solutions are replaced. Re-

placement is based on combining the components of stored solutions, using randomisation. The

latter is controlled by the memory choosing rate parameter. The parallelisation strategy involvesa collection of harmony search processes which exchange solutions between each other, using a

hierarchical system of nearest-neighbour and collective communication patterns.

Excepting the number of processss, module harmony gstate finder.c defines all parameters

controlling the behaviour of harmony search using preprocessor directives. These parameters

include the number of solutions held by a process (NVECTORS), the memory choosing rate,

the number of iterations before performing a collective communication operation, and the size

of subgroups involved in collective communications.

In addition to the module’s entry function find ground states(), the implementation consists

of seven static functions, responsible for initialising and finalising message passing communica-tions, collectively evaluating solution energy, and verifying the algorithm’s state of convergence.

When the entry function is invoked, the implementation begins by allocating memory for a

single solution vector *neighbourSpins, which is used to store data from nearest-neighbour ring

communications. After initialising communications, solution vectors are generated randomly

and assigned to elements of an array Spin *spins[NVECTORS]. The latter is the collection

of solution vectors used during the heuristic process. The actual heuristic consists of a loop

executed directly after the aforementioned solution generation, which is of the form:

1 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++) {2 / ∗ C r e at e new v e c t o r ∗ / 3

4 / ∗ Co mpu te h i g h e s t e n e rg y v e c t o r ∗ / 5

6 / ∗ S e t v e c t o r c o mp on e nt s ∗ / 7

8 / ∗ R ep la ce v e c t or i n m emory , i f new v e c t or i s o f h i gh e r f i t n e s s ∗ / 9

10 / ∗ P e rf o rm c o mm u ni c at i on o p e r a t i o n s ∗ / 11 }




As shown, the loop’s execution is controlled by get stabilised status(), responsible for eval-

uating the state of convergence. Within the loop body, memory for a new solution vector is

allocated; like all other solution vectors, the memory block consists of xSize × yS ize elements

of type Spin, where xS ize × ySize are the dimensions of the spin lattice. After determiningthe solution vector with highest energy, the values of the new solution vector’s components are

set from existing vectors, according to the algorithm described in Chapter 3. Following this,

the new solution’s energy is determined. The highest energy solution is replaced, if compari-

son yields that the new solution’s energy is lower. Communication routines are executed, after

which the process begins anew.

The hierarchical communication scheme is implemented using two separate conditional

statements, responsible for performing nearest-neighbour ring communications and collective

operations:

1 i f ( S o l v e r P r o c I D % ZONE SIZE == 0 ) {2 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ;

3 M PI S en dr ec v ( s p i n s [ r an do m ] , 1 , T yp e Ar ra y , ( S o l v e r P r o c I D+ZONE SIZE)%

S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , T y p e A r r a y , MPI ANY SOURCE ,

MPI ANY TAG , COMM, MPI STATUS IGNORE) ;

4 r e d u c t i o n f u n c t i o n ( n e i g h bo u r S pi n s , s p i n s [ r an do m ] , NULL , NULL ) ;

5 }6

7 i f ( i % ZONEEXBLOCK == 0 ) {8 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ m ax Ve ct or ] , S o lv e r Zo n e ) ;

9 }

The exchange begins by processes selecting solutions at random (line 2) and sending them to

their neighbours. Ring communication is performed using the send / receive operation in line 3,

where each process with ID Solver ProcID sends to process ID ((Solver ProcID + ZONE SIZE)

mod Solver NProcs). Here, Solver NProcs is the total number of processes and ZONE SIZE

is the number of processes in a subgroup. In this way, ZONE SIZE controls the number of

processes involved in ring communications. Every random solution is received into the memory

block referenced by *neighbourSpins. Whether this is committed to a process’ solution set

spins[], depends on the result of applying reduction function(). The latter performs identically

to the copy min() function in Chapter 4, copying the energetically minimal argument to its

complement. Consequentially, line 4 is responsible for accepting or rejecting solutions received

in the ring exchange operation. Line 7 performs the aforementioned collective operation; this

involves each subgroup performing a reduction on their least favourable solutions, using the

communicator Solver Zone. The communicator refers to all processes in a subgroup based on

the instruction

M PI C omm s plit (COMM, So lv er Pr oc I D / ZON E SIZE , 0 , & S o l v e r Z o n e ) ;

which partitions the set of all processes, such that processes with equal Solver ProcID /




ZONE SIZE share the same subgroup. The function reduce minimal spin vectors is itself

based on the MPI Allreduce() operation, using reduction function() as a custom reduction op-

erator. The frequency of reduction is controlled by the value of constant ZONE SIZE.

After the optimisation loop has terminated, the function find ground states() performs anumber of operations to finalise optimisation, such as determining the most favourable solution

held hitherto in the solution set among processes. The obtained configuration data are copied to

the spins field of the spin glass data structure, and the solution is output by invoking the function

spinglass write spins(). Memory for storing solution vectors is deallocated, following which

MPI communications are terminated.

To complete the description of the harmony search module, it remains to detail the function

which controls the heuristic’s termination, get stabilised status(). Like the collective operation

used for exchanging solutions between processes, this is based on reduction operations used to

determine whether the most favourable solutions held by processes have equal energy. This isachieved with the instructions

c o m p u t e l o w e s t e n e r g y ( & m in E ne r gy , &m i n V e c t o r ) ;

M PI All r ed uce ( &minE ner gy , &global M inE ner g y , 1 , MPI DOUBLE , MPI MI N , COMM) ;

i f ( minE ner gy == g l o b a l M i n E n e r g y ) l o c a l Ha s O p t im u m = TRUE;

M P I A l l r e d u c e ( & l o c a l H as O p t i m u m , &a l l H a v e O p i t i m u m , 1 , M PI I NT , MPI LAND , COMM

) ;

the first of which determines the lowest energy locally, the second the lowest energy glob-

ally, followed by a further reduction to determine whether all processes possess solutions with

energies corresponding to that of the globally most favourable solution. This implements the

termination condition described in Chaper 4.

dp gstate finder.c

In Chapter 3, it was established that the ground state energy of the Ising spin glass can be

obtained using an algorithm consisting of nested loops. Based on formulating ground state

energy as a dynamic programming problem, approaches to parallelisation inspired by those used

for matrix / vector multiplication were presented in Chapter 4. The basic O(nm22m) time serial

algorithm for computing ground state energy of the lattice without cyclic boundary conditions

leads to two parallel variants, using a collective communication operation between processes,or alternatively a cyclic shift operation. The latter was shown to be more memory efficient. To

account for cyclic boundary conditions in more than one dimension, the algorithm is required

to be executed for all configurations of an arbitrary spin row (cf. Theorem 3.3). In the collective

variant, the basic algorithm for systems without cyclic boundary conditions is given by the

pseudocode

Float[] p

Float[] p




for k:= (proc id *q

p+1) to ((proc_id+1) *

q

p)


if p[l] + M i, i−1k ,l

< minval

minval := p[l] + M i, i−1k ,l

p[k ] := minval

all gather(p, p)

which is executed n times for an n × m spin column, using vector p as argument p in

successive iterations of the algorithm and matrices Mi, i−1 to store interaction energies between

configurations of spin rows i, i − 1. The latter are evaluated in the ith iteration of the algorithm.

The all gather() operation combines the vector distributed among p processors into a single vec-

tor. Upon termination, vector p contains ground state energies for all configurations of the nth

spin row, from which ground state energy can be obtained for the entire lattice by determining

the minimum vector component.

As described, the algorithm is capable only of computing ground state energy; implicit

information on actual ground state configuration is discarded. To enable this information to

be computed, it is necessary to retain at each iteration of the algorithm the value of l yielding

the assignment p[k] := minval, for all values of k . This corresponds to retaining the optimal

configuration of row i − 1 for each of the q configurations of row i, with 1 < i ≤ n. This requires

a two-dimensional array.

Module dp gstate finder.c implements the basic dynamic programming algorithm, suited

for both serial and parallel execution. Both parallel variants based on collective and cyclic shift

operations are implemented. To promote code reuse, this is achieved by using preprocessor

directives for conditional compilation.

Similar to the implementation of harmony search, in addition to the entry function find ground states(),

the module consists of six static functions. These are responsible for initialising and finalising

message passing, computing ground state energy, manipulating spin rows and applying the ob-

tained ground state configuration to the spin glass data structure.

Given the parallel algorithm in either of its variants, a problem the implementation must

address is how to distribute the set of configurations a spin row may assume, among processes.

This amounts to distributing the rows of matrices Mi,i−1 among processes, where each row ac-

counts for a unique configuration of spin row i. As spins assume binary state, a simple approach

is to represent spin subsystems as bit strings, e.g. assigning spin values +1 → 1, −1 → 0.

Exploiting the fact that processes are addressed using integer numbers in MPI, the bit string

representation can be split into a prefix and suffix, where the prefix is given by the process

number. For an m spin subsystem and p processors, prefixes consist of log2 p bits, suffixes

m − log2 p bits. Providing the number of processes is a power of 2, it is possible to enumerate

all possible spin configurations by each process considering its process number prefix, and all

suffixes 0 ≤ k < 2m−log2 p. This is the approach implemented in dp gstate finder.c.




When find ground states() is invoked, the implementation begins by initialising message

passing, following which the function get minimum path() is invoked. This is responsible for

initiating a series of further function calls, based on a loop which iterates through each row

in the lattice. After allocating memory for an array *minPath , get minimum path() allocates**minPathConf, the two-dimensional array used to record optimal subssystem configurations.

The aforementioned loop then commences; for each spin row i, the function

g e t o p t i m a l p r e s t a t e s ( s p i nG l a ss , m i n P a t h P ar t i a l , m in Pa th Co nf [ i ] , i , t r e l l i s C o l s , 0 ) ;

is invoked, which performs the parallel matrix / vector operation previously described in

pseudocode. The arguments are the spin glass data structure to optimise, a memory block cor-

responding to vector p, the matrix row to hold the optimal states of row i-1, the current spin

row, and the total number of elements in p. The final argument is used to enforce a particular

configuration of the final spin row. In absence of cyclic boundary conditions its value is not

significant. Should the spin glass indeed possess cyclic boundary conditions, the loop over spin

rows is repeated for all configurations of this row, and the lowest obtained energy is accepted as

the ground state energy.

Using conditional compilation based on the constant CYCLIC EXCHANGE, two imple-

mentations of get optimal prestates() are provided, to account for both variants of the parallel

algorithm. If CYCLIC EXCHANGE is left undefined, a further constant USE MPI allows con-

trol over whether message passing communications are used. If the latter is left undefined, the

optimisation proceeds serially.

Both implementations of get optimal prestates() are based on the pseudocode designs pre-

viously discussed, using control flow instructions for dealing with spin rows when i=1, for which

cyclic boundary interactions must be considered. In contrast to the presented pseudocode, the

elements of matrices M i, i−1 are not stored explicitly in a data structure. Instead, loop variables

are used to determine matrix elements as demanded, which are computed by invoking the func-

tions defined in spinglass.h on the spin glass instance. To this end, of importance is the function

adjust spin row(), which modifies a spin glass instance according to the bit string representation

of a spin row.

The collective implementation of get optimal prestates() begins by allocating the array

*minPathNew, which is equivalent to vector p in the pseudocode, with elements distributed

among processes. Elements of *minPathNew are assigned values, based on elements in *min-

Path and interaction energies arising from the examined spin rows. Having completed this

evaluation, distributed vector elements are combined and reassigned to *minPath, using the in-

struction

MP I Al lg at he r ( minPathNew , t r e l l i s C o l s / Sol ver NPr ocs ,M PI DOUBLE , minPath ,

t r e l l i s C o l s / So lv er NP r oc s , MPI DOUBLE ,COMM) ;

where trellisCols/Solver NProcs is the amount of vector components stored by each pro-

cess, COMM is the global communicator and MPI DOUBLE is the data type of vector elements.




minPath

minPathNew

Configurations of row i − 1

Configurations of row i

Determine optimum states of

row i − 1, for row i

Gather results held in minPathNew

Figure 6.2: Schematic of operations performed by get optimal prestates() (basic dynamic pro-

gramming, collective operations). In contrast, when using cyclic communications, processes

evaluate diff erent configurations of row i − 1, shifting elements in minPath.

A schematic depiction of the optimisation process for a single invocation of get optimal prestates()

is shown in Figure 6.2.

Similar in its operation, the realisation of get optimal prestates() using cyclic shift oper-

ations between processes distributes vector pamong processes using the array *minPathNew.

However, instead of assigning all components of vector p to each process, these are also dis-tributed among *minPath. This requires multiple communication operations as optimisation

progresses for a single spin row. Here, elements in *minPath are examined in parallel by each

process, however since each only retains a fraction of components in p, it is necessary to perform

a cyclic shift of data. It turns out that as iteration through elements in *minPath progresses, it is

possible to communicate elements residing at neighbouring processes is advance. This suggests

a nonblocking communication scheme, which is implemented in the software module. The non-

blocking communication scheme utilises MPI Issend(), Wait() and Recv() instructions inserted

into the optimisation loops (cf. Appendix F).

After get optimal prestates() has been invoked for all spin rows, it remains to obtain the

ground state energy from *minPath and the corresponding ground state configuration from

**minPathConf. Since the latter stores optimal configurations of preceding spin rows, for each

spin row, the ground state configuration can be recovered. This is achieved by determining the

optimum configuration of the final spin row, and traversing through matrix rows, referencing

preceding subsystem configurations. Function set optimal config() performs this activity. It is

invoked by get minimum path(), following which the ground state configuration is output using

spinglass write spins().




Figure 6.3: Sliding window for improved dynamic programming

dp gstate finder fast.c

In Chapter 3, an improved serial algorithm for computing ground states was presented. In

contrast to the previous algorithm, instead of considering interacting spin rows in the lattice,

subsystems can be considered positions of a ‘sliding window’. This window covers spin rows

horizontally, such that the total number of spins is equal to the number of columns in the lattice

plus one. As with the row-wise approach, optimisation is achieved by comparing adjacent

subsystems. Here, adjacent subsystems are those obtained by advancing the sliding window by

one spin (Figure 6.3).

In Chapter 4, it was suggested that the matrix / vector approach can be used to arrive at an im-

proved parallel algorithm. As previously, matrices retain interaction energies between adjacent

subsystems. However, as a caveat of the sliding window approach, interacting subsystems must

share spin configurations in the overlapping region between window positions. This means that

for every subsystem configuration, it is only necessary to evaluate interactions with two config-

urations of the preceding subsystem.

The module dp gstate finder fast.c implements the improved algorithm for obtaining ground

states, for the lattice without cyclic boundary conditions. Similar in structure to dp gstate finder.c,

the module consists of a function get minimum path(), which is responsible for performing the

main optimisation. Given a spin glass instance, it proceeds to invoke get optimal prestates() in

a loop which iterates through all subsystems in the lattice.

Two main diff erences arise from the ‘sliding window’ approach to subsystems. Firstly,

adjusting spin configurations based on bit strings requires a ‘leading spin’ to be referenced

in the spin lattice, instead of a spin row. For this reason, the module implements the func-

tion adjust spin ensemble(), whose arguments include the problem instance and referential

spin. Secondly, interaction between subsystems involves the energy introduced by a single

spin interacting with vertical and horizontal neighbours (Figure 6.1(c)). Therefore, function

get optimal prestates() utilises the library function spinglass ensemble delta().

Invoking get optimal prestates() serves the same purpose as previously, namely to record

optimal energy for increasing size, recording configuration data in a two dimensional array.

Again, this is achieved using an argument *minPath which corresponds to vector p in the pseu-

docode algorithm. After the function has returned, this array stores data equivalent to vector




p. The computation performed by get optimal prestates() is shown in Figure 6.4. Here, ele-

ments corresponding to vector p are computed in parallel, such that interactions between each

corresponding subsystem configuration and preceding subsystems in both of its two states are

compared. Given the irregular pattern in which elements in *minPath are accessed, the ap-proach using a collective operation to combine elements of the resulting array *minPathNew is

favourable.

The method of determining configurations of preceding subsystems to evaluate involves ma-

nipulating the subsystem’s bit string representation. Given a bit string where the most significant

bit describes the leading spin’s state, conducting a left arithmetic shift reveals permissable con-

figurations of the preceding subsystem (the least significant bit may assume 1 or 0). Figure 6.4

illustrates bit strings corresponding to subsystem configurations, for a 2 × 2 spin lattice.

Once optimisation has completed, as with dp gstate finder.c it remains to restore the ground

state configuration from data stored in **minPathConf. Again, this is achieved using a functionset optimal config(). In this case, each row of **minPath yields information on the optimum

state of one spin. The final row is used to infer the state of an entire subsystem. The entire

ground state configuration can then be output.




0 0 0

0 0 1

0 1 0

1 0 0

1 1 0

1 1 1

0 1 1

1 0 1

0 0 0

0 0 1

0 1 0

1 0 0

1 1 0

1 1 1

0 1 1

1 0 1

m i n P a t h

m i n P a t h N e w

W i n d o w p o s i t i o n i −

1

W i n d o w p o s i t i o n i

D e t e r m i n e o p t i m u m s t a t e s o f

r o w i −

1 ,

f o r r o w i

↑

↑

↑

↑

↑

↑ ↑

↑

↑ ↑

↑

↑

↑

↑ ↑

↑

↑

↑ ↑

↑

↑ ↑

↑

↑

↓

↓

↓

↓

↓

↓

↓

↓ ↓

↓

↓

↓

↓

↓

↓

↓

↓

↓

↓ ↓

↓

↓

↓

↓

G a t h e r r e s u l t s h e l d i n m i n P a t

h N e w

P 1

P 2

P 3

P 4

Figure 6.4: Schematic of operations performed by get optimal prestates() (improved dyanamic

programming), executed on four processors. The problem instance is a 2 × 2 spin lattice.



Chapter 7

Performance Evaluation

So far, approaches to solving spin glass ground states have been presented. These include

exactly solving methods based on dynamic programming, and the harmony search heuristic.

Both approaches are implemented in software, suited for serial and parallel execution using

MPI. The dynamic programming implementation incorporates two variants, which are referred

to as the basic and improved algorithms. Previous complexity analysis showed that the improved

algorithm requires less run time than its counterpart.

In examining techniques for parallelising these exact and heuristic algorithms, further al-

ternatives were described in Chapter 4. In the case of the dynamic programming algorithms,

approaches based on collective and cyclic communication patterns were given. The latter are

implemented using nonblocking synchronous send operations in MPI. Both collective and cyclic

variants are applicable to the basic dynamic programming algorithm, whereas the improved dy-

namic programming algorithm relies solely on collective communications.

In this chapter, the aforementioned solver implementations are examined in terms of their

performance. Data are presented against varying parameters and interpreted. For the parallel

exact solvers, a comparison is given between attainable performance on the Ness and HPCx

machines.

7.1 Serial performance

In the development process, serial versions of ground state solvers were implemented prior to

their parallel analogues. For the exact algorithms, besides facilitating an incremental develop-

ment strategy, this allowed an initial evaluation of performance, in order to gauge the possible

behaviour of parallel dynamic programming. Similarly, performance data for serial harmony

search were examined, in particular to assess the accuracy of solutions generated by the algo-

rithm.

69



70 Chapter 7. Performance Evaluation

0

500

1000

1500

2000

2500

3000

0 20 40 60 80 100 120 140 160 180 200

T i m e ( s )

Spins

Serial dynamic programming code performance

Figure 7.1: Execution times for serial dynamic programming (basic algorithm)

7.1.1 Dynamic programming

Execution time data for serial dynamic programming were gathered on Ness. The experimental

procedure involved invoking both variants of the algorithm on the machine’s back-end, against

varying problem sizes. Timing data were recorded using the shell’s time command. While

off ering limited accuracy and resolution, this method was deemed sufficient, considering the

magnitude of execution times. The source code was compiled using the gcc compiler, supplying

the -O2 optimisation flag. Random problem instances were generated as square lattice k -spin

systems without cyclic boundary conditions.

Basic algorithm

Results for basic dynamic programming are shown in Figure 7.1. As shown, problem instances

are generated for systems of up to 142 spins. As one would expect, execution time rises mono-

tonically, such that the recorded time for 142 spins is approximately 42min. Considering the

ascertainments made in Chapter 3 about the algorithm’s asymptotic behaviour, the graph ap-

pears to confirm an exponential relationship between system size and execution time.

To examine run time behaviour more closely, the data are visualised as a logarithmic plot

(Figure 7.2). Here, it is apparent that execution time cannot be accurately approximated with

the function f (k ) = αe β k , since it is ln ( f (k )) = ln (α) + βk , which corresponds to a line. Also,

the plot shows near-constant values for the first three data points. This is likely to result from

limited timing resolution.

In Chapter 3, the algorithm’s asymptotic complexity was shown to be O√

k 22√

k , for a



7.1. Serial performance 71

-12

-10

-8

-6

-4

-2

0

2

4

6

8

0 20 40 60 80 100 120 140 160 180 200

l g ( T i m e ) ( s )

Spins

Serial dynamic programming code performanceCurve fit

Figure 7.2: Log execution times for serial dynamic programming (basic algorithm)

square lattice k -spin system without cyclic boundary interactions. From this fact, it is clear

that a more accurate model of execution time must consider an exponential relationship with

respect to the root of system size. The function f (k ) = αe β√

k is thought to be an adequate

approximation.

Figure 7.2 includes a fit of the function ln ( f (k )) = ln (α) + β√

k to log plotted data points.

The first three data points are excluded from the fit. This was obtained using the Marquardt-Levenberg algorithm implemented in Gnuplot. With asymptotic standard errors of 0.9365%

and 0.8656% respectively, values of α = 1.77111 10−6 and β = 1.50197 were computed. The

valueβ

ln 2= 2.1667 bears similarity to the theoretical value of 2 in the exponential term of the

algorithm’s asymptotic complexity. The greater value may be attributed to approximation using

constant α.

Improved algorithm

Results for improved dynamic programming are shown in Figure 7.3. Here, problem instances

were generated in the range of k = [4, 361] spins. Comparison with Figure 7.1 reveals that

as expected, execution times are lower. As a practical advantage, this allowed the algorithm’s

performance to be evaluated against larger problem instances during experimentation.

A log plot of these data is shown in Figure 7.4. As before, this representation reveals near-

constant execution time for the first data points in the series. A unique feature is the data point

at k = 49, which is an outlier in what appears to be another exponential curve against√

k . It is

speculated that the outlier is due to caching eff ects: The Opteron 1218 processor on Ness has

a 64KiB L1 data cache, which is likely to be sufficient for containing optimisation data held in




0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400

T i m e ( s )

Spins

Serial improved dynamic programming code performance

Figure 7.3: Execution times for serial dynamic programming (improved algorithm)

-10

-8

-6

-4

-2

0

2

4

6

0 50 100 150 200 250 300 350 400

l g ( T i m e ) ( s )

Spins

Serial improved dynamic programming code performanceCurve fit

Figure 7.4: Log execution times for serial dynamic programming (improved algorithm)




0

1e+06

2e+06

3e+06

4e+06

5e+06

6e+06

7e+06

8e+06

100 200 300 400 500 600 700

R e s i d e n t m e m o r y c o n s u m p t i o n ( K i B )

Spins

Serial dynamic programming code performance

Figure 7.5: Memory consumption for serial dynamic programming (basic algorithm)

**minPathConf and *minPath (cf. Chapter 6): The former requires 6 × 7 × 28 × 4bytes = 42KiB,

the latter 28 × 4bytes = 1KiB. The spin glass data structure is estimated to require less than

1KiB, yielding a total of less than 64KiB (considering the size of additional memory blocks).

Fitting the log plot to the function used for analysing basic dynamic programming, ln ( f (k )) =

ln (α) + β√

k allows further comparison of the two algorithms. Using the same procedure for

producing the fit, obtained values are α = 1.0845 10−5

, β = 1.2275, with asymptotic standarderrors of 0.8924% and 0.9401%, respectively. The value of β is close to the theoretical value

of 1 in the exponential term of the algorithm’s complexity function; compared to basic dynamic

programming, execution time is observed to grow at a slower rate, as expected.

Memory consumption

Brief experiments were conducted to assess memory consumed by the dynamic programming

implementations. Considering resident memory values, as reported by the top process utility,

data were recorded by initiating computation using increasingly large problem sizes. For both

algorithms, as allocated memory remains constant for the majority of computation, it was not

necessary to execute until termination.

Plots of memory consumption are shown in Figures 7.5,7.6. For basic dynamic program-

ming, the data reveal that to avoid swapping on a machine with 4GiB (e.g. Ness), the maximum

problem size is a 24×24 spin lattice. With improved dynamic programming, the maximum prob-

lem size decreases to 19 × 19 spins. This behaviour is expected, since **minPathConf contains

O√

k 2√

k

vs. Ok 2

√ k

elements, for a k -spin square lattice. Again using a log plot approach

(Figures 7.7,7.8), performance data are fit to the function f (k ) = β k α 2√

k , whose logarithm is




10

11

12

13

14

15

16

100 200 300 400 500 600 700

l g ( R e s i d e n t m e m o r y c o n s u m p t i o n ) ( K i B )

Spins

Serial dynamic programming code performanceCurve fit

Figure 7.6: Log memory consumption for serial dynamic programming (basic algorithm)

ln ( β) + α ln (k ) +√

k ln(2)

. For basic dynamic programming, obtained values are α = −9.46851,

β = 40.42 (asymptotic standard errors 1.401% and 1.924%, respectively). The values for im-

proved dynamic programming are α = −6.76659, β = 27.1801 (asymptotic standard errors

2.092% and 2.844%). Comparing the two values of β, it is apparent that between the two vari-

ants of dynamic programming, there exists a trade-off between execution time and memory

efficiency: In terms of execution time, improved dynamic programming is preferable, whereasfor memory consumption, the basic algorithm is preferable.

7.1.2 Harmony search

Serial harmony search was evaluated by comparing solutions generated by the heuristic to

ground truth, based on a 6 × 6 spin problem instance with equally distributed bonds in the

range [−1, 1). Ground truth was obtained by conducting an exhaustive search on the problem

instance. While varying the number of solution vectors used, search was executed multiple

times. Results were used to compute minimum error , mean error and standard error values.

Totalling 80 executions for each value of NVECTORS, results are presented in Table 7.1.

As shown, standard and mean error values improve monotonically when increasing algo-

rithm memory capacity. No improvement in error rate is given, when increasing memory to

NVECTORS=50; the algorithm’s ability to find the exact ground state decreases under the spec-

ified parameter value. Despite this, µe and σe suggest that large NVECTORS benefits solution

quality in general. This is in agreement with the behaviour of ‘solution exploration’ described

in Chapter 3. Exploring the algorithm’s behaviour against large NVECTORS is indeed the mo-

tivation behind developing parallel harmony search.




0

1e+06

2e+06

3e+06

4e+06

5e+06

6e+06

7e+06

100 150 200 250 300 350 400

R e s i d e n t m e m o r y c o n s u m p t i o n ( K i B )

Spins

Serial improved dynamic programming code performance

Figure 7.7: Memory consumption for serial dynamic programming (improved algorithm)

10

11

12

13

14

15

16

100 150 200 250 300 350 400

l g ( R e s i d e n t m e m

o r y c o n s u m p t i o n ) ( K i B )

Spins

Serial improved dynamic programming code performanceCurve fit

Figure 7.8: Log memory consumption for serial dynamic programming (improved algorithm)

NVECTORS = 1 NVECTORS = 2 NVECTORS = 10 NVECTORS = 50 µe 1.84 1.55 0.97 0.83

σe 0.83 0.77 0.77 0.61

e 0.06 0.10 0.14 0.10

Table 7.1: Mean error µe, standard error σe and error rate e of serial harmony search ground

states for increasing solution memory NVECTORS. Results are based on the ground truth value

−30.7214. Error rate is defined as the amount of correctly obtained ground state configurations

over the total amount of algorithm invokations.




Optimisation flags Execution time

-O0 10.682s

-O1 10.542s

-O2 6.354s

-O3 6.340s

-O3 -funroll-loops 4.043s

-O3 -funroll-loops -ftree-loop-im 4.043s

-O3 -funroll-loops -ftree-loop-im -funswitch-loops 4.043s

Table 7.2: Serial execution times for basic dynamic programming on Ness, for various GCC 4.0

optimisation flags

7.2 Parallel performance

The architecture of the Ness and HPCx machines was described in Chapter 5. In the following,

the method and results of performance assessment are presented for the implemented parallel

algorithms. As with the serial algorithms, results are interpreted.

7.2.1 Dynamic programming

Since the dynamic programming algorithms are deterministic, the opportunity is given to assess

parallel performance in terms of execution time. That is, given parallel execution time T p on

p processors, and serial execution time T s, it is possible to describe performance in terms of

parallel efficiency T s/(T p p).

In preparation for experiments on Ness, serial execution time was measured against vari-

ous combinations of gcc compiler flags, based on the basic dynamic programming algorithm

and a 11 × 11 test spin problem. Using the -O3 optimisation level with flag -funroll-loops,

for automated loop unrolling off ered the greatest gain in performance over unoptimised code.

Timing data are shown in Table 7.2. This behaviour is not surprising, since the code is heavily

reliant on loops for processing spin glass data structures. In contrast, rudimentary analysis of

the source code reveals few cases where performance would likely benefit from loop-invariant

motion (pertaining to other optimisation flags used).

On HPCx, the same test spin problem was used to assess execution time on the machine’s

serial job node. Here, the eff ect of target architecture optimisation was considered, using the

xlc r re-entrant compiler, version 8.0. For all tests, 64-bit compilation was enabled using the

-q64 flag. Timing data are listed in Table 7.3. The set of compiler flags used for parallel

performance evaluation was -qhot -qarch=pwr5 -O5 -Q -qstrict.

The parallel environment on HPCx allows control over a number of settings [3], poten-

tially influencing distributed application performance. Specifically, settings eff ect the protocol

used for communicating between shared memory nodes, including use of remote direct memory




0.1

1

10

100

1000

10000

0 2 4 6 8 10 12 14 16

T i m e ( s )

Processors

10x10 spins11x11 spins

12x12 spins13x13 spins14x14 spins15x15 spins

Figure 7.9: Parallel execution time for dynamic programming (basic algorithm, Ness)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16

P a r a l l e l e f f i c i e n c y

Processors

10x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins

Figure 7.10: Parallel efficiency for dynamic programming (basic algorithm, Ness)



7.2. Parallel performance 79

0.75

0.8

0.85

0.9

0.95

1

0 2 4 6 8 10 12 14 16

A p p l i c a t i o n t i m e ( s ) / T o t a l e x e c u t i o n t i m e ( s )

Processors


Figure 7.11: Vampir trace summary for dynamic programming (basic algorithm, Ness)

10 × 10 instance, at a rate decreasing against p.

To interpret these results, it is reminded that the basic dynamic programming algorithm re-

quires a sequence of √

k blocking, collective gather operations to complete computation. For

each of these operations, each processor contributes 2√

k elements. After ground state energy

has been obtained from array *minPath, the ground state configuration is recovered from **min-

PathConf through a similar sequence of √ k gather operations.

Clearly, scalability is aff ected by the size of problem instances, since this influences the

amount and size of messages sent between processors. If the cost of a single collective gather

is approximated as t gather = p

T 0 + m 1 B

where p is the number of processors, T 0 the message

initialisation cost, m the message size and B the bandwidth, it follows that for constant mes-

sage size, overall cost relates linearly to p. This serves as a possible explanation for the linear

reduction in parallel efficiency observed for the majority of problem instances in Figure 7.10.

The increase in efficiency for larger problem instances can then be attributed to the fact that

computing ground state energy requires ∝ 1 p

2 m operations per processor (cf. Chapter 4). Conse-

quentially, for constant p, the fraction m/ m2 p diminishes as m is increased; communication costs

thus become less significant as the problem size increases. It is speculated that the 10 × 10 spin

lattice causes severe imbalance between communication and computation, so that the amount of

computation is closely approximated by a constant, regardless of p.

Figure 7.11 shows the fraction T c/T m of parallel computation time over communication

time. These data were gathered by re-linking compiled source code with the Vampir library and

recording summary data as reported by the applications trace utility. Time spent on tracer API

calls is omitted. As a general trend, it is observed from the plot that increasing the number of




1

10

100

1000

10000

0 2 4 6 8 10 12 14 16

T i m e ( s )

Processors


Figure 7.12: Parallel execution time for dynamic programming (basic algorithm, cyclic com-

munications, Ness)

processors does indeed increase the proportion of time spent on communication. For the 14×14,

15×15 lattices, T c/T m does not decrease monotonically with p. This may be due to the accuracy

of trace data, which indicate a non-monotonic relation between lattice sizes and scalability.

Having examined performance of basic dynamic programming using collective operations,

a similar procedure is given for the approach based on cyclic communication. In Figures 7.12,

7.10, 7.11, plots of execution time, parallel efficiency and the fraction T c/T m are shown. From

Figure 7.12, it is again observed that increasing the processor count causes execution time to

diminish, with the exception of the 10×10 lattice. For the latter, performance appears to degrade

more profoundly as with the collective variant of the algorithm. This is to the extent that exe-

cution time on 16 processors exceeds that obtained for a single processor. For larger processor

counts and the remaining problem instances, performance appears to degrade uniformly; this

eff ect is shown more clearly in Figure 7.13. Here, parallel efficiency fluctuates in the range of

[1, 4] processors, before decreasing monotonically for each examined problem instance. Signifi-

cantly, scalability does not improve monotonically as lattice size is increased. Nevertheless, it is

possible to group problem instances into two categories, such that the smaller 10×10 and 11×11

lattices result in parallel efficiency in the range [.4, .5] on four processors, with the remainder

attaining [.99, .8] efficiency. Increasing the processor count to 16, parallel efficiency drops to

[.4, .5], [.01, .2] for the respective groups. From Figure 7.14, it is observed that communication

costs become significant for all problem sizes, as the processor count increases: For p = 16, the

fraction T c/T m lies in the range [.4, .5] for all examined lattices, except the 10 × 10 lattice, for

which the fraction is further diminished due to communication costs.




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16


Processors

10x10 spins11x11 spins12x12 spins


Figure 7.13: Parallel efficiency for dynamic programming (basic algorithm, cyclic communica-

tions, Ness)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16


Processors


Figure 7.14: Vampir trace summary for dynamic programming (basic algorithm, cyclic commu-

nications, Ness)




0.1

1

10

100

1000

0 2 4 6 8 10 12 14 16

T i m e ( s )

Processors

10x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins

Figure 7.15: Parallel execution time for dynamic programming (improved algorithm, Ness)

Comparing the two variants’ performance, it is observed that using collective communi-

cations reduces execution time on few processors. This suggests that in this case, collective

communication costs are less expensive than cyclic operations. Also, it is reminded that the

cyclic variant of the algorithm requires additional conditional statements, which increases the

number of branch instructions in the code. Scalability is significantly reduced, indicating that

problem instances significantly larger than 15 × 15 spins are required to obtain favourable ef-ficiency at p > 16 processors. It is possible that sufficiently large problem instances might

expose the cyclic approach as advantageous, these are however not explored due to restricted

experimental time scales. For the examined problem sizes, reduced scalability is thought to be

influenced by synchronisation overhead, such that the amount of computation within the nested

loops∗ is not sufficient to merit overlapping communications.

Results for improved dynamic programming executed on Ness are shown in Figures 7.15,

7.16, 7.17. For all examined problem instances, parallel execution times behave similarly as ob-

served for the 10 × 10 lattice using basic dynamic programming: Here, increasing the processor

count causes performance to degrade severely for smaller lattices, such that parallel efficiencydrops to around 20% at p = 4 processors. Larger lattices result in slightly enhanced paral-

lel efficiency, however increasing to p = 16 causes near-uniform degradation to around 10%.

Figure 7.17 shows performance degradation from the perspective of computation and commu-

nication time. The fraction T c/T m behaves as expected in relation to Figure 7.16, indicating

that performance degradation is due to communication costs. In comparison to basic dynamic

programming using cyclic communications, the eff ect of increasing processors is further pro-

∗cf. Chapter 4




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16


Processors


14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins

Figure 7.16: Parallel efficiency for dynamic programming (improved algorithm, Ness)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16


Processors

10x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins

Figure 7.17: Vampir trace summary for dynamic programming (improved algorithm, Ness)




1

10

100

1000

10000

100000

0 50 100 150 200 250 300

T i m e ( s )

Processors


Figure 7.18: Parallel execution time for dynamic programming (basic algorithm, HPCx)

nounced, such that T c/T m is reduced to under 20% at 16 processors.

Comparing basic and improved variants of the algorithm, it appears there exists a trade-

off between scalability and algorithmic complexity. Whereas basic dynamic programming has

higher algorithmic complexity, results show favourable scalability up to 16 processors. In con-

trast, improved dynamic programming is a more efficient algorithm in terms of complexity,

however scalability is considerably diminished on Ness for examined problem sizes. A possibleexplanation for this behaviour is provided by the number of communication operations, which

is O (k ) for the improved variant, versus O√

k

required for the basic variant, for a k -spin lat-

tice. Given that communication takes place every O

22√

k

instructions, versus every O

2√

k

instructions for basic (collective) and improved algorithms, respectively, it is clear that the ratio

of computation against communication is lower for the improved algorithm. Since communi-

cations are non-blocking in both cases, it follows that for improved dynamic programming, a

greater proportion of execution time is due to communication operations. As a consequence,

this reduces scalability.

Performance on HPCx

Plots of performance data on HPCx for basic dynamic programming using collective communi-

cations are shown in Figures 7.18, 7.19. Because of how the machine’s resources are grouped

into logical partitions and their implication for time budgeting, the processor count was scaled

like 16 2n, albeit to greater magnitude than on Ness. For small problem sizes, behaviour is as

observed on Ness, where increasing the processor count eff ects little improvement in execution

time. Scalability improves as problem size is increased, to the extent that parallel efficiency is




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0 50 100 150 200 250 300


Processors


Figure 7.19: Parallel efficiency for dynamic programming (basic algorithm, HPCx)

greater than 95%, for lattices with 15 × 15, 16 × 16 spins solved on 256 processors. A distinct

feature is observed for the 15 × 15 lattice, where super linear speedup appears to occur in the

range of [16, 128] processors.

In Figures 7.20, 7.21, results for the algorithm variant using cyclic communications are

shown. In comparison to the collective approach, again performance improves as problem size

is increased. However, the obtained parallel efficiency is around 60% at 256 processors, for a

16 × 16 spin lattice. This decline in performance is similar to that observed on Ness. In contrast,

on HPCx , increasing parallel efficiency reflects the ordering of problem sizes more accurately.

Fluctuations observed on Ness are not present; for all examined problem instances execution

time decreases monotonically against the number of processors. As with the collective variant,

parallel efficiency obtained for the 15 × 15 lattice exceeds that for the 16 × 16 lattice, on 16

and 32 processors. In contrast, scaling performance is not sufficient for super linear speedup, as

previously noted.

Results for improved dynamic programming on HPCx are shown in Figures 7.22, 7.23.

Here, performance drops rapidly for all explored problem sizes, such that executing on 16 pro-

cessors reduces parallel efficiency to below 50%. Increasing the number of processors, effi-

ciency tails off further; at 256 processors, it is less than 10%. Significantly, in resemblance to

the aforementioned results, the largest examined problem instance does not result in the most

scalable computation: The 22 × 22 lattice falls behind 18 × 18 and 20 × 20 instances in terms of

parallel efficiency. This phenomenon is observed for all evaluated processor counts.

Concluding from performance data on HPCx, the three algorithm variants exhibit varying

degrees of scalability. From most to least scalable, the algorithms are ordered as:




1

10

100

1000

10000

100000

0 50 100 150 200 250 300

T i m e ( s )

Processors



Figure 7.20: Parallel execution time for dynamic programming (basic algorithm, cyclic com-

munications, HPCx)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300


Processors


Figure 7.21: Parallel efficiency for dynamic programming (basic algorithm, cyclic communica-

tions, HPCx)




1

10

100

1000

0 50 100 150 200 250 300

T i m e ( s )

Processors


16x16 spins18x18 spins20x20 spins22x22 spins

Figure 7.22: Parallel execution time for dynamic programming (improved algorithm, HPCx)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300


Processors


Figure 7.23: Parallel efficiency for dynamic programming (improved algorithm, HPCx)




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300


Processors

16x16 spins, Improved DP16x16 spins, Basic collective DP

16x16 spins, Basic cyclic DP

Figure 7.24: Summary of parallel efficiencies on HPCx

• Basic algorithm using collective communications

• Basic algorithm using cyclic communications

• Improved algorithm using collective communications

This ordering is as observed on Ness, however problem scalability is higher on HPCx, for eachof the variants. This is attributed to lower communication costs on HPCx, resulting from higher

message passing bandwidth available on the machine. A summary of the algorithms’ parallel

efficiency on HPCx is shown in Figure 7.24, based on a 16 × 16 lattice.

7.2.2 Harmony search

The parallel harmony search algorithm introduced in Chapter 4 is based on a combination of

two types of communication operation. Considering additional algorithm parameters, the al-

gorithm exhibits a high degree of flexibility; this leads to a potentially large set of algorithm

variants. The latter must be considered when examining performance. To restrict the space

of algorithm variants, it was decided to confine the behaviour of communication operations:

Hence, cyclic operations are based on exchanging random solution vectors between processes,

such that favourable solutions are retained. Collective operations take place between process

groups of specified size. Cyclic operations are executed every iteration of the harmony search

algorithm, while collective operations are executed periodically.

The question arises how to assess the heuristic’s parallel performance. For a deterministic

algorithm, such as the exact dynamic programming based solver, performance is characterised




Processors

T i m e

(a) Non-heuristic

A c c u r a c y

Processors

T

i m e

(b) Heuristic

Figure 7.25: Conceptual representation of properties relevant to parallel performance

by scalability. Scalability is quantified in terms of the algorithm’s execution time against the

number of processors on which it is executed. From the latter, measures such as speedup and

parallel efficiency can be computed. This leads to a two-dimensional space (Figure 7.25(a)),

which may be explored experimentally; for a given problem size, it may for example be of in-

terest to approximate the function which maps the number of processors to execution time. In

the case of heuristic algorithms however, an additional dimension is significant for characteris-

ing performance, namely the accuracy of generated solutions. As a result, the space in which

performance is evaluated is three-dimensional (Figure 7.25(b)). Experimental exploration may

involve assessing the relation between accuracy and execution time, for a given number of pro-cessors. Another possibility might involve approximating the boundary surface in the space,

providing such a surface exists.

From the discussion in Chapter 3, it is evident that quantifying solution accuracy is non-

trivial: It is necessary to define a measure to compare solutions with one another. An obvious

approach is to use the utility function, if defined by the heuristic. However, it might prove

advantageous to employ a measure more reflective of the problem’s solution landscape, for

example considering the distribution of solution utility values.

In the following description of an attempt at performance evaluation, parallel harmony

search was executed on a number of test instances, while varying the number of processes and

a selection of algorithm parameters. As previously explained, the algorithm possesses a signif-

icant number of parameters. Given the specified communication strategies, these include the

number of solution vectors NVECTORS, the memory choosing rate, and the rate of performing

collective operations ZONEEXBLOCK.

Experiment series are based on three lattice sizes of 12 × 12, 14 × 14 and 15 × 15 spins.

For each size, five instances were generated, using random uniform bond distributions in the

range [−1, 1). The procedure for every configuration of parameters and process count involved




executing the algorithm on each lattice instance five times. Result data were then collected and

mean values computed. A single data point used in visualisation corresponds to the mean result

obtained for a given lattice size instance.

Evidently, using several problem instances multiplies the number of times the parallel al-gorithm must be invoked. As a compromise to reduce the number of invokations, the two

parameters NVECTORS and the memory choosing rate were held constant. More importantly,

the three-dimensional space to explore is adapted, such that execution time is replaced by the

number of loop iterations executed by harmony search. This is thought to better reflect the

performance property of state space exploitation, described in Chapter 3. An advantage of the

parallel algorithm’s design is that it terminates when all processes hold identical solution vec-

tors (cf. Chapter 4). Consequentially, the aforementioned performance property can be seen

as a ‘dependent variable’ reflecting solution exploitation, which need not be considered when

permuting algorithm parameters. Eff ectively, this allows performance assessment to be dividedbetween exploring the relations number of processes against accuracy and number of processes

against algorithm iterations.

Experiments were carried out on Ness, using up to 16 processors. The size of processor

subgroups ZONESIZE was varied in the range [1, 16], so that the number of processors lies in

the range [ZONESIZE, 16] for each experiment. The parameter ZONEEXBLOCK was variably

assigned values 102, 103, 104. For each lattice instance, solution accuracy was characterised

in terms of energetic aberrance from ground truth data obtained using dynamic programming.

Also, solution configurations were compared using the Hamming distance [35]†. Finally, the

number of algorithm iterations was recorded.

Performance results

In Figure 7.27 performance data for ZONEEXBLOCK = 10 are shown, against varying proces-

sor numbers, lattice sizes and ZONESIZE. Quantitatively, the plot corresponds to the series of

experiments where collective operations are performed frequently among processes. As the al-

gorithm is defined, solutions are exchanged at a constant rate between process groups. The latter

however vary in size with parameter ZONESIZE, as previously mentioned. Given a subgroup

size, the smallest collection of processes consists of a single subgroup; in general the processor

count must be a multiple of ZONESIZE. For this reason, curves in the plot vary in length. As an

example of reading the plot, consider the curves s16 which range from 4 to 16 processes. These

correspond to invoking the algorithm with a subgroup size of 4. As a special case, for each plot

there exist two curves per lattice size in the range [1, 16]. These correspond to subgroup sizes

of 1 and 2.

Figure 7.27 describes ∆ E , the diff erence between ground truth and mean solution energies,

†The implemented algorithm takes the complement of spin configurations into account, where all spin states are

inverted.




1

10

100

1000

10000

100000

2 4 6 8 10 12 14 16

I t e r a t i o n s

Processors

s12

s14

s16

Figure 7.26: Parallel harmony search convergence durations (ZONEEXBLOCK= 100)

-180

-160

-140

-120

-100

-80

-60

-40

-20

0

2 4 6 8 10 12 14 16

D e l t a E

Processors

s12

s14

s16

Figure 7.27: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100)




-180

-160

-140

-120

-100

-80

-60

-40

-20

0

2 4 6 8 10 12 14 16

D e l t a

E

Processors

s12

s14

s16


against processors p. On initial consideration, it is observed that increasing the processor count

reduces aberrance in some cases: Accuracy for one of the 16 × 16 spin lattice series improves

from around −160 to −60 at 16 processors. It turns out that this series corresponds to the

parameter value ZONESIZE = 1. Similar improvements occur for 12 × 12 and 14 × 14 lattices,

from −120 and −85 to −35 and −17, respectively. However, increasing ZONESIZE to 2 eff ects

an increase in solution accuracy in all cases, such that little improvement in accuracy is observedwhen increasing p.

Comparing Figures 7.27, 7.28, 7.29 allows insight to be gained into the eff ect of increas-

ing the frequency of collective exchanges within processor subgroups. For increasing ZONE-

EXBLOCK, the eff ect of p becomes less significant: With the exception of experiment series

conducted for ZONESIZE = 1, all processor counts yield energetic aberrances in the approxi-

mate range [−10, −20]. For ZONESIZE = 1, behaviour is consistent for all values of ZONE-

EXBLOCK, to the extent that increasing p eff ects a significant increase in solution accuracy as

observed for ZONEEXBLOCK = 102.

From the previous observations, two conclusions can be drawn with regard to solution ex-

ploration. Firstly, it appears that increasing the value of ZONEEXBLOCK causes solution ex-

ploration to improve, given that accuracy as characterised by ∆ E improves. This is in agreement

with the assumption made in Chapter 4, where solution exploration and exploitation were de-

scribed as opposing qualities in the search process. Assuming that collectively exchanging so-

lutions benefits solution exploitation, an obvious consequence of reducing the frequency of this

operation is increased solution accuracy. Secondly, from the increase in solution accuracy be-

tween subgroups sized 1 and 2, it is concluded that contrary to prior expectation, the ring-based




-180

-160

-140

-120

-100

-80

-60

-40

-20

0

2 4 6 8 10 12 14 16

D e l t a

E

Processors

s12

s14

s16


scheme of exchanging solutions contains an element of solution exploitation. In increasing the

size of subgroups, more opportunity is evidently given for diverse solution ‘islands’, since there

exist processes only participating in infrequent collective operations. A possible explanation for

the increase in accuracy against p is the circumference of the ring in which processes exchange

solutions. For large circumferences, it becomes increasingly propagating a solution across the

ring becomes increasingly involved. This also improves solution diversity.

Figures 7.26, 7.30, 7.31 show performance results in terms of algorithm iterations until con-

vergence. The scheme is identical to that used to visualise solution aberrance. In Figure 7.26,

results for ZONEEXBLOCK = 100, (where collective operations occur frequently) show that

increasing p above ZONESIZE causes a reduction in execution time for all lattice and process

subgroup sizes. As previously observed, an exception are the series executed for unit ZONE-

SIZE, where the number of iterations increases against the processor count. Also, maximum

execution times occur for ZONESIZE = 16.

These results are interpreted as follows: Firstly, the reduction in execution times against p is

attributed to the solution exploitation property of ring-based communications: As p is increased,

so does the number of processor subgroups. Since the latter exchange solutions frequently, con-

vergence is promoted between those processes involved in ring communications. Convergence

between remaining processors is aff ected by the rate of subgroup communications. Secondly,

when no cyclic communications take place, it follows that convergence is only promoted by

collective communications, which in all experiments occur infrequently in comparison to cyclic

communications. This serves as an explanation for peak execution times when ZONESIZE = p.

Thirdly, for unit ZONESIZE execution times are comparatively short, which is attributed to




1

10

100

1000

10000

100000

1e+06

2 4 6 8 10 12 14 16

I t e r a t i o n s

Processors

s12

s14

s16


1

10

100

1000

10000

100000

1e+06

2 4 6 8 10 12 14 16

I t e r a t i o n s

Processors

s12

s14

s16





50

60

70

80

90

100

110

120

130

2 4 6 8 10 12 14 16

H a m m i n g d i s t a n c e

Processors

s12

s14

s16

Figure 7.32: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100)

absence of processes exempt from cyclic communications. Since the latter occur frequently,

convergence is promoted especially rapidly.

Figures 7.32, 7.33, 7.34 plot the Hamming distances of generated solutions against proces-

sors, for all conducted experiment series. This metric is designed to expose accuracy in terms

of the number of dissimilar spin states, in solutions generated by the heuristic. Increasing the

number of processors to 16 appears to decrease Hamming distance slightly, for all lattice in-

stances. It is observed that distances are approximately equal to k 2

, where k is the number of

spins. This suggests that the distribution of spin configurations against system energy might be

uniform. Considering this, the metric does not appear expressive of solution accuracy.

Overall, results indicate that parallel Harmony search does improve solution accuracy. How-

ever, it must be considered that the improvements shown in Figures 7.27, 7.28, 7.28 are marginal.

Also, it is noted that comparatively good performance is achieved on few processors, providing

algorithm parameters are selected carefully. Cyclic communications were observed to contain

a significant element of solution exploitation. Unsurprisingly considering the latter, lowest en-

ergetic aberrance is achieved when communications are minimised. The attempt to quantify

accuracy in terms of Hamming distance highlights the difficulty of obtaining solutions heuris-

tically: The spin glass problem appears to have a rough solution landscape, which poses a

difficulty for finding ground states using harmony search. In all conducted experiment series,

only suboptimal solutions were found.

Because of their fundamental diff erences, comparison between examined exact approaches

and harmony search is difficult to achieve. Whereas dynamic programming places exact de-

mands on computation due to its deterministic nature, the heuristic is flexible in terms of re-




50

60

70

80

90

100

110

120

130

2 4 6 8 10 12 14 16


Processors

s12

s14

s16


40

50

60

70

80

90

100

110

120

130

2 4 6 8 10 12 14 16


Processors

s12

s14

s16





sources, albeit at the expense of accuracy. All dynamic programming approaches were shown

to benefit from high bandwidth communications as found on HPCx. The codes are thus suited

for execution on non-vector supercomputer machines with many processors. In contrast, de-

pending on algorithm parameters, execution performance on a commodity cluster system withlow latency Gigabit Ethernet may prove adequate. This is estimated from 153s execution time

on Ness, corresponding to around 20000 iterations of harmony search on 16 processors, for a

256 spin lattice. Guest [33] provides an overview of message passing performance on commod-

ity systems, which suggests reasonable bandwidth would be obtained.



Chapter 8

Conclusion

In the previous chapters, implemented parallel optimisation software was described and exper-imental results presented. Given the project’s scope, there exist numerous possibilities for con-

ducting further work. Based on theoretical and practical aspects described in this dissertation,

the following discusses such possibilities briefly, before concluding.

8.1 Further work

In Chapter 2, the spin glass problem was introduced. Here, it was established that the Ising

spin glass is a simplification of spin interaction. The two objects defining the exchange energy

between spins are the spins themselves, and coupling constants. In general, the graph of spininteractions can be arbitrary. Spins assume state, whose representation can vary in complexity

from the classical or quantum Heisenberg formulation of state, to the binary Ising formulation.

Coupling constants may be chosen from arbitrary distributions, such as a discrete or continuous

Gaussian etc.

8.1.1 Algorithmic approaches

Considering that the project is concerned with the Ising spin glass, the opportunity presents

itself to explore the behaviour of more involved models. As an intermediate model between

Heisenberg and Ising formulations, one might implement the Potts model, where spins assume

discrete state. Provided that the model of spin interactions is left unaltered, this model appears

comparatively simple to implement: Applying the framework of subsystems and subsystem

interactions to the Potts model, it is apparent that the total energy of a system is still the sum

of subsystem energies and interaction energies between them. However, for a p state model,

the number of states a k -spin subsystem can assume is p k , instead of 2k . The consequence of

greater diversity is that the computational complexity of basic dynamic programming increases

to On p 2m

for an n×m lattice. Similarly, improved dynamic programming has a complexity of

99



100 Chapter 8. Conclusion

O (nm p m). A further ramification of spin state concerns the algorithm’s implementation, which

is based on bit string representations of subsystems. Clearly, allowing more than binary state

requires the code to be redesigned. A possible approach might involve representing subsystems

as linked lists of integers. A likely consequence of this for all algorithms would be reducedperformance from additional memory operations.

One might also consider extending the algorithms to higher dimensions. While this is trivial

in the case of the heuristic, the dynamic programming approaches require the notion of a subsys-

tem to be extended into higher dimensions: Whereas basic dynamic programming is based on

a sequence of interacting spin rows for the square lattice, it is necessary to consider a sequence

of interacting lattices for the cubic lattice. The relation is analogous between hypercubes of

d and d + 1 dimensions. As a caveat, the algorithms become computationally expensive: The

basic algorithm requires O

n 2

2 n d −1time for an n

d

-spin Ising hypercubic lattice, since thereare n (d − 1)-dimensional subsystems in the lattice. For the improved algorithm, the sliding

window approach is based on a sequence of d − 2-dimensional subsystems, yielding a time

complexity of Ond 2nd

. It is assumed that both algorithms’ parallel performance will degrade,

since higher-dimensional data are required to be communicated between processes. This places

greater requirements on message passing bandwidth.

Another possibility for further work involves applying the framework described in Chapter

3 to more general models of spin interaction: For an arbitrary graph of interacting spins, the

concept of probabilistic spin configuration (s1, s2, . . . , sn) can be expressed as

P(s1, s2, . . . , sn) =

ni=1

P (si|Πi),

where Πi is the set of precursor spins associated with spin si. The task is then to arrive at

a formulation of optimum spin configuration, as shown in Chapter 3. It is believed that the

resulting dynamic programming problem must be both non-serial and polyadic, since the graph

may contain cycles, and since a spin is permitted to have multiple ancestors. This is likely to

have consequences for the complexity of the corresponding optimisation algorithm.

Of particular interest is the algorithm described by Pardella and Liers [53]. This provides

a polynomial time solution to the planar spin glass problem, allowing ground states to be de-

termined exactly, for problem instances far larger than those examined in this project. The

approach is based on combining the cut optimisation problem with the notion of ‘Kasteleyn

cities’, i.e. complete graphs which are subgraphs in the dual lattice representing plaquette frus-

trations in the spin lattice. Pardella and Liers apply the algorithm to a 3000 ×3000 lattice, which

represents an improvement over previous graph theoretical approaches [46]. Parallelisation of

cut optimisation might be achieved using the approach described by Diaz, Gibbons et al. [18].



8.1. Further work 101

8.1.2 Existing code

Next to implementing additional algorithms for spin glass optimisation, further work might be

conducted on the existing code base. Possible additional features include augmenting function-ality to allow algorithm parameters to be controlled at runtime, or implementing further bond

distributions. Unlike basic dynamic programming, the improved dynamic programming algo-

rithm does not support lattices with periodic boundary conditions. This can be implemented by

adapting the approach described in Chapter 3, where the algorithm is invoked repeatedly, for

diff erent configurations of boundary spins.

More pertinent is the optimisation of the existing code’s performance. Considering the

project’s scope, it was decided to adopt a design promoting code maintainability, described

in Chapters 5 and 6. Given additional time, it would be of interest to examine the cost of

pointer operations, replacing them where possible by static arrays. Also, although state-of-the-

art compilers were used during development and evaluation, the potential is given for optimising

kernel code segments: In the function get optimal prestates(), one might for example consider

manual function inlining or loop unrolling. Similar treatment for the harmony search module is

conceivable.

As implemented, the codes use MPI for achieving message passing parallelism. Although

the algorithms are indeed based on the message passing architecture, one might consider a

shared memory approach: Given the method of state space decomposition, where configura-

tions of spin subsystems are distributed equally among processes, the parallel for directive as

e.g. implemented in OpenMP appears an obvious instrument in implementing shared memory

versions of the algorithms.

8.1.3 Performance evaluation

In Chapter 7, performance data were gathered for dynamic programming and harmony search

algorithms. Scalability of the exact algorithms was examined on two machines. Further exper-

imental work might be concerned with evaluating scalability on other machines, such as com-

modity clusters or the Blue Gene architecture, if available. A more detailed examination of per-

formance on existing architectures might consider the implications of message passing latency

and bandwidth, especially with regard to the dynamic programming code using asynchronous

communications. Also applicable to harmony search, it is of interest to examine scalability.

Due to time constraints, undertaken work considered the algorithm’s accuracy. Additionally,

one might consider the eff ect of processor count and communication frequency on algorithm

iterations (ideally the latter should remain constant). Finally, there exists the potential to exper-

iment with alternative communication strategies as proposed in this work.



102 Chapter 8. Conclusion

8.2 Project summary

During the course of the project, software was developed to compute ground states of the Ising

spin glass. The software includes implementations of serial and parallel optimisation algo-rithms. The latter include parallel dynamic programming algorithms, available in two variants.

The first of these allows lattice instances with arbitrary boundary conditions to be solved, while

the second is computationally more efficient. Performance was examined, indicating good scal-

ability for the first variant. In contrast, scalability is limited for the second variant. Also, a

further algorithm was examined. This implements a parallel ground state optimiser, based on

the harmony search heuristic. Performance was examined in terms of solution accuracy and

algorithm convergence.

In Chapter 5, the project’s goals were described. These consisted of developing an exact

ground state solver based on the transfer matrix method. As an additional objective, inves-

tigation was to include an alternative, heuristic parallel algorithm. The performance of both

algorithms was to be examined. It was intended that the software should be self-contained,

off ering sufficient functionality to be useful as a research tool.

In the light of undertaken work, the project’s goals are considered fulfilled to considerable

extent: Implemented software includes variants of exact optimisation algorithms. In theoretical

work, the dynamic programming approach was shown to off er identical performance to transfer

matrix based methods, therefore both approaches are considered computationally equivalent.

The described harmony search heuristic was also implemented. Both dynamic programming

and harmony search are implemented as message passing codes. Performance was investigated

as proposed, examining scalability of dynamic programming codes, and accuracy of parallel

harmony search. Although it remains of interest to examine scalability of the alternative code,

overall the project is considered a success.

8.3 Conclusion

In this dissertation, the Ising spin glass was introduced as a combinatorial optimisation problem.

The theoretical background was discussed, identifying and developing solutions to the problem.

A description of undertaken project work was provided. Implemented software was described

and experimental results were presented. Finally, possibilities for further work were identified.



Appendix A

Project Schedule

103



104 Chapter A. Project Schedule

W k 1

W k 2 W k 3 W k 4 W k 5 W

k 6 W k 7 W k 8 W k 9 W k 1 0

W k 1 1 W k 1 2 W k 1 3 W k 1 4 W k

1 5 W k 1 6

D e t a i l e d d e s i g n

I m p l e m e n t a t i o n

D e b u g g i n g

T e s t i n g

P e r f o r m a n c e E v a l u a t i o n

D e t a i l e d d e s i g n

I m p l e m e n t a t i o n

D e b u g g i n g

T e s t i n g

P e r f o r m a n c e E v a l u a t i o n

R e p o r t

P r e s e n t a t i o n

S u b m i s s i o n ,

C o r r e c t i o n s

Figure A.1: Project schedule



Appendix B

UML Chart

105



106 Chapter B. UML Chart

i o . c

i o . h

s p i n g l a s s . c

s p i n g l a s s . h

m a i n . c

r a n d o m . c

r a n d o m . h

a r r a y s . c

a r r a y s . h

g s t a t e f i n d e r . h

b f o r c e_

g s t a t e_

f i n d e r . c

d p_

g s t a t e_

f i n d e r . c

d p_

g s t a t e_

f i n d e r_ f a s t . c

h a r m o n y_

g s t a t e_

f i n d e r . c

Figure B.1: UML class diagram of source code module and header relationships



Appendix C

Markov Properties of Spin Lattice

Decompositions

C.1 First-order property of row-wise decomposition

Using a row-wise decomposition strategy of spin rows, system state probability is expressed as

P(S ) =1

Z (T )exp

− 1

kT

H (S 1) +

ni=2

H (S i) + H b(S i−1, S i)

= 1 Z (T )

exp− 1

kT H (S 1)

ni=2

exp− 1

kT ( H (S i) + H b (S i−1, S i))

.

The partition function is expanded in a similar manner to account for subsystems, as

Z (T ) =S ∈S

exp

− 1

kT H (S )

=

S 1

exp

− 1

kT H (S 1)

ni=1

S i

exp

− 1

kT ( H (S i) + H b(S i−1, S i)

=

ni=2

Z i(T ), with Z i(T ) =

S i

exp −1

kT H (S

i) i = 1

S iexp− 1

kT ( H (S i) + H b(S i−1, S i))

1 < i ≤ n

Substituting Z (T ) in Equation C.1, state is defined as

P(S ) =1

Z 1(T )exp

− 1

kT H (S 1)

ni=2

1

Z i(T )exp

− 1

kT ( H (S i) + H b (S i−1, S i))

= P(S 1)

ni=2

P (S i|S i−1).

107



108 Chapter C. Markov Properties of Spin Lattice Decompositions

which shows that the chosen approach fulfils the property of a first-order Markov chain; the

conditional probability P(S i|S i−1) is due to dependence of row S i on its predecessor’s configu-

ration.

C.2 Higher-order property of unit spin decomposition

Applying an analogous approach to determining system state probability, P(S ) is expressed as

P(S ) =1

Z (T )exp

− 1

kT

nm−1

i=0

H b(S i, S i−1) + H b(S i, S i−m)

=1

Z (T )

nm−1

i=0

exp

− 1

kT ( H b(S i, S i−1) + H b(S i, S i−m))

.

with Z (T ) =nm−1

i=0 Z i(T ) and Z i(T ) =

S iexp− 1

kT ( H b(S i, S i−1) + H b(S i, S i−m))

it follows

that

P(S ) =

nm−1i=0

P (S i|S i−1, S i−m).

It is reminded that ground state information can be obtained by optimising P(S ). For this par-

ticular model, the ground state configuration is obtained by maximising P(S ), i.e.

argmaxS 0,S 1,...,S nm−1

nm−1

i=0

P (S i|S i−1, S i−m)

.

Next, it is necessary to adapt the Viterbi path formulation, in order to arrive at a recursive expres-

sion of ground state energy for the higher-order Markov model. Disregarding cyclic boundary

interactions in the model, and noting that P(S i|S i−1, S i−mi) = P(S i) for i = 0, a prototypical

approach is

Pviterbi(S i) =

maxS i{P(S i)} i = 1

maxS i−1,S i−m {P (S i|S i−1, S i−m) Qviterbi(S i−1) Qviterbi(S i−m)} i > 1.

Unfortunately, there exists a caveat against recursively stating

Pviterbi(S i) = maxS i−1,S i−m

{P (S i|S i−1, S i−m) Pviterbi(S i−1) Pviterbi(S i−m)} ,

because by definition, probability of subsystem S i assuming a given state is conditionally de-

pendent on subsystems S i−1, S i−m, which in turn are both conditionally dependent on subsystem

S i−m−1. This ordering requires that when evaluating terms P viterbi(S i−1) and Pviterbi(S i−n) identi-



C.2. Higher-order property of unit spin decomposition 109

cal sets of subsystem configurations are considered. The The mapping Qviterbi must reflect this

behaviour in terms of Pviterbi.

A solution to the dependency problem of vertical and horizontal predecessor spins can be

obtained by increasing the order of the Markov model to m + 1. As a result, system stateprobability is given by the product

P(S ) =

nm−1i=0

P (S i|S i−1, S i−2, . . . , S i−m−1) ,

from which ground state probability can be formulated as


P (S i, S i−1, . . . , S i−m) i ≤ m

maxS i

−m

−1

{P (S i

|S i

−1, . . . , S i

−m

−1) Pviterbi (S i

−1, . . . , S i

−m

−1)

}i > m.



Appendix D

The Viterbi Path

D.1 Evaluating the Viterbi path in terms of system energy

It is of interest to examine the behaviour of system state probability, which is present in the

recursive formulation of the Viterbi path, and evaluated in the described pseudocode algorithm.

Taking the natural logarithm of the state probability , it is observed that

ln (P (S )) = ln

1

Z (T )exp

− 1

kT H (S )

= ln

1

Z (T )

− H (S )

kT ∝ − H (S ).

Using this result, the natural logarithm of the conditionally dependent state probability P(S i|S i−1)

is

ln (P(S i|S i−1)) = ln

P(S i, S i−1)

P(S i−1)

= ln (P(S i, S i−1)) − ln (P(S i−1))

∝ − ( H (S i) + H (S i−1) + H b (S i, S i−1)) + H (S i−1)

∝ − ( H (S i) + H b (S i, S i−1)) ,

which allows system probability to be evaluated quantitatively in terms of its Hamiltonian. This

in turn permits reformulation of the dynamic programming optimisation problem;

ln (Pviterbi(S i)) =

maxS i

{ln (P (S i))} i = 1

maxS i−1{ln (P (S i|S i−1)) + ln (Pviterbi (S i−1)}) i > 1

ln (Pviterbi(S i)) =

c minS i

{ H (S i)} i = 1

minS i−1{ H (S i) + H b (S i, S i−1) + c ln (Pviterbi (S i−1))} i > 1,

with c ∈ R. It is trivial to apply the same approach to the recursive function viterbi(i), which

evaluates to the actual sequence of emitted states in the Viterbi path, and the described pseu-

111



112 Chapter D. The Viterbi Path

docode algorithm.

Setting c = 1, the evaluated optimal sequence remains the Viterbi path. Further substitution

yields

Hmin(S i) =

minS i

{ H (S i)} i = 1

minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1,

(D.1)

which is the Hamiltonian of the system (S 1, S 2, . . . , S i), whose states are equal to those emitted

by the Viterbi algorithm. Since the Viterbi path corresponds to the most probable system state,

Hmin is the system’s ground state. This provides a solution to the ground state problem for the

two dimensional lattice without vertical or horizontal boundary interactions.



Appendix E

Software usage

The following provides instructions on how to install and use the software described in thisdissertation.

Requirements

The software requires the library glib-2.0 to be installed. By default, this library is expected

to reside in the directory /usr/lib, with headers located at /usr/include/glib-2.0 and /usr/lib/glib-

2.0/include. These settings may be changed by modifying the file Makefile.am. An implemen-

tation of MPI, such as MPICH2, is also required.

Configure and compile

The software is delivered as a compressed tarball with the .tar.gz file name extension. It is un-

packed by issuing

tar xvzf ising.tar.gz

at the command prompt. Following this, it is necessary to initiate configuration by issuing

./configure

from within the package’s root directory. Environment variables are used to specify configu-

ration options, including the compiler used (which defaults to mpicc). For example, to disable

optimisation, the necessary commands are:

export CFLAGS=-O0; ./configure

113



114 Chapter E. Software usage

Providing configuration was successful, compilation is initiated using

make

Usage

Upon completion, the source directory contains the binaries genbonds, genclamps, sbforce,

dpsolver, dpsolverfast, hmsolver, whose purpose is described in chapter 6. Most significantly,

the solver utilities dpsolver, dpsolverfast, hmsolver operate on spin bond configuration files.

which are generated using genbonds. To generate a sample 12 × 12 spin configuration file

BONDS, the required command is

./genbonds -x 12 -y 12 > BONDS

which is solved e.g. using improved dynamic programming on a single process by invoking

./dpsolverfast -b BONDS

Multiprocessing is enabled either by invoking mpiexec directly, or by using one of the SUN

GridEngine scripts located inside the source root directory. All utilities support the -? flag for

displaying a list of command line options.



Appendix F

Source Code Listings

1 / ∗2 ∗ F i l e : main . c

3 ∗4 ∗ I m p le m e nt s common e n t r y p o i n t f o r g ro u nd s t a t e s o l v e r u t i l i t i e s .

5 ∗ R e s p o n s i b l e f o r p r o c e s s i n g command l i n e o p t i o n s a nd i n i t i a t i n g c o m p u ta t i o n

6 ∗7 ∗ / 8

9 # i n c l u d e < s t d i o . h>

10 # i n c l u d e < s t d l i b . h>

11 # i n c l u d e < g l i b . h>

12 # i n c l u d e < g l i b / g p r i n t f . h>

13

14 # i n c l u d e ” s p i n g l a s s . h ”

15 # i n c l u d e ” io . h”

16 # i n c l u d e ” g s t a t e f i n d e r . h”

17

18 / ∗ T h es e s t o r e v a l u e s o f c ommand l i n e a r gu m en t s ∗ / 19 s t a t i c g c h a r ∗ s p i n C o n f i g = NULL ;

20 s t a t i c g c h a r ∗ b o n d C o n f i g = NULL ;

21 s t a t i c g c h a r ∗ c l a m p C o n f i g = NULL;

22 s t a t i c g c h a r ∗ c o m p S p i n C o n f i g = NULL ;

23

24 / ∗ Da ta s t r u c t u r e f o r command l i n e p r o c e ss i n g .

25 ∗ S p e c i f i e s p r o p e r t i e s o f command l i n e o p t i o n s ∗ / 26 s t a t i c G O pt i on E nt r y e n t r i e s [ ] =

{27 { ” s p i n − i n i t i a l −c o n f i g ” , ’ s ’ , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME, &

s p i n C o n f i g , ” I n i t i a l s p i n c o n f i g u r a t i o n f i l e ” , ” s p i n C o n f i g ” } ,

28 { ” b o n d−co nf ig ” , ’b ’ , 0 , G OPTION ARG FILENAME, &bondConf ig , ” I n i t i a l bond

c o n f i g u r a t i o n f i l e ” , ” b o n dC o nf i g ” } ,

29 { ” c l a m p−co n fi g ” , ’ c ’ , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME , &

c l am p C o nf i g , ” I n i t i a l s p i n c l am p c o n f i g u r a t i o n f i l e ” , ” c l a m p C o n f ig ” } ,

30 { ” s p i n −c o m p a r i s o n − c o n f i g ” , ’x ’ , G OPTION FLAG OPTIONAL ARG , G OPTION ARG FILENAME

, & co mp Sp in Co nf ig , ” S p i n c o n f i g u r a t i o n t o c om pa re r e s u l t w i t h ” , ”

c o m p S p i n C o n f i g ” } ,

31 { NULL }32 } ;

115



116 Chapter F. Source Code Listings

33

34 s t a t i c v oi d i n i t i a l i s e c o m p u t a t i o n ( ) ;

35

36 i n t main ( i n t a r g c , char ∗ a r g v [ ] ) {

3738 / ∗ I n i t i a l i s e d a t a s t r u c t u r e f o r a r gu m en t p r o c e s s i n g ∗ / 39 G E r r o r ∗ e r r o r = NULL ;

40 G O p ti o n C on t e x t ∗ c o n t e x t ;

41

42 c o n t e x t = g o p t i o n c o n t e x t n e w ( ”− C a l c u l a t e s p i n g l a s s g ro un d s t a t e s ” ) ;

43 g o p t i o n c o n t ex t a d d m a i n e n tr i e s ( c on t ex t , e n t r i e s , NULL) ;

44 / ∗ P a r se a r g u m en t s ∗ / 45 g o p t i o n c o n t e x t p a r s e ( c o nt ex t , &a rg c , &a rg v , & e r r o r ) ;

46

47 / ∗ H a nd l in g o f r e q u i r e d a r gu m en t s ∗ / 48 i f ( b o n d C o n f i g == NULL) {49 g f p r i n t f ( s t d e r r , ” P le as e s p e c i f y a n i n p ut b ond c o nf i gu r a t i o n f i l e . \ n” ) ;

50 e x i t ( EXIT FAILURE ) ;51 }52 i f ( c l a m pC o n f i g != NULL && sp in Co nf ig == NULL) {53 g f p r i n t f ( s t d e r r , ” S pe ci fy in g a clam p c o n f ig u r a t io n f i l e r e q u i r e s t h e u se o f

a n i n i t i a l s p i n c o n f i g u r a t i o n f i l e .\ n” ) ;

54 e x i t ( EXIT FAILURE ) ;

55 }56

57 i n i t i a l i s e c o m p u t a t i o n ( ) ;

58

59 g o p t i o n c o n t e x t f r e e ( c o n t e x t ) ;

60 r e t u r n ( EXIT SUCCESS ) ;

61 }6263 v o i d i n i t i a l i s e c o m p u t a t i o n ( ) {64 g i n t x Si ze , y Si z e , x Si ze 1 , y Si ze 1 ;

65

66 / ∗ Used t o c o n st r u c t s pi n g l as s s t r u c t u r e ∗ / 67 g d o u b l e ∗ w e i g h t s = NULL ;

68 g b o o l e a n ∗ c l a m p s = NULL ;

69 S p i n ∗ s p i n s = NULL ;

70 S p i n ∗ c o m p S p i n s = NULL ;

71

72 s t r u c t S p i n G l a s s ∗ s p i n G l a s s ;

73

74 / ∗ R ea d w e i g h t s f ro m p r e v i o u s l y o b t a i n e d f i l e name ∗ / 75 w e i g h t s = r e a d w e i g h t s ( b o n d C o n fi g , &x S i z e , & y S i z e ) ;76

77 i f ( c l a m pC o n f i g != NULL) {78 / ∗ Read s p i n c la mp s f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ / 79 c l a m p s = r e a d c l a m p s ( c l a m p C o n f i g , &x S i z e 1 , & y S i z e 1 ) ;

80

81 / ∗ C hec k t h a t s i z e s o f s p i n a nd c l am p m a t r i c e s m a tc h ∗ / 82 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {83 g f p r i n t f ( s t d e r r , ” E rr or : Bond and cl am p m a t r ix s i z e s do no t ma tc h .

A b o r t i n g \n” ) ;




117

85 }86 }87

88 i f ( s p i n C o n f i g != NULL) {

89 / ∗ Read i n i t i a l s p i n c o n f i g u r a t i o n f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ / 90 s p i n s = r e a d s p i n s ( s p i n C o n f i g , &x S i ze 1 , &y S i z e 1 ) ;

91

92 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {93 g f p r i n t f ( s t d e r r , ” E rr or : Bond and sp in c o n f i g u r a t i o n m at r ix s i z e s do n o t

m a tc h . A b o r t i n g \ n” ) ;


95 }96 }97

98 i f ( c o m p S p i n Co n f i g != NULL) {99 / ∗ Read c om pa ri so n s p i n c o n f i g u r a t i o n f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ /

100 c o mp S p in s = r e a d s p i n s ( c o m p S pi n C o n fi g , &x S i z e 1 , & y S i z e 1 ) ;

101102 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {103 g f p r i n t f ( s t d e r r , ” E rr or : R e f e r e n c e s p in c o n f i g u r a t i o n an d bond m at ri x

s i z e s do n o t m at ch . A b or t in g \ n” ) ;


105 }106 }107

108 / ∗ I n i t i a l i s e s p i n g l a s s ∗ / 109 s p i n G l a s s = s p i n g l a s s a l l o c ( x S iz e , y S iz e , s p i n s , w e ig h t s , c l am p s ) ;

110

111 / ∗ C om pu te g r ou n d s t a t e ∗ / 112 f i n d g r o u n d s t a t e s ( s p i n G l a ss ) ;

113114 i f ( c o m p S p i n s != NULL) {115 / ∗ Compare r e s u l t i n g c o n f i g ur a t i o n t o s p e c i f i e d r e f er e n c e c o n f i g ur a t i o n ∗ / 116 g i n t d i s t a n c e ;

117 s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 = s p i n g l a s s a l l o c ( x S i ze , y S i z e , c om p Sp in s , NULL ,

NULL) ;

118 d i s t a n c e = s p i n g l a s s c o r r e l a t e ( s p in G la s s , s p i n Gl a s s2 ) ;

119

120 g p r i n t f ( ” C o r r e l a t i o n d i s t a n c e : %d\n ” , d i s t a n c e ) ;

121 s p i n g l a s s f r e e ( s p i n G l a s s 2 ) ;

122 }123

124 s p i n g l a ss f r e e ( s p i n G l a ss ) ;

125 }




1 / ∗2 ∗ F il e : d p g s t a t e f i n d e r . c

3 ∗4 ∗ I m pl e me n ts s e r i a l an d p a r a l l e l b a s i c d yn am ic p ro gr am mi ng a l g o r i th m s

5 ∗6 ∗ / 7

8 # i n c l u d e < s t d l i b . h>

9 # i n c l u d e <math . h>

10 # i n c l u d e < s t r i n g . h>

11 # i n c l u d e < g l i b . h>


13


15 # i n c l u d e ” a r r a y s . h ”


17

18 / ∗ CYCLIC EXCHANGE d e f i n e s c y c l i c c o m m u ni c a t i on p a t t e r n s ∗ / 19 # d e f i n e YCLIC EXCHANGE

20

21 / ∗ USE MPI d e f i n e s p a r a l l e l c od e ∗ / 22 # d e f i n e USE MPI

23 # i f d e f USE MPI

24 # i n c l u d e <mpi . h>

25 # e n d i f

26

27 / ∗ D e f in e s d a ta t y p e f o r m es sa ge p a s si n g ∗ / 28 # d e f i n e T INT MPI LONG LONG INT

29

30 / ∗ C o n st a nt a l i a s ∗ / 31 # d e f i n e IGNORE BITMASK TRUE32

33 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 34 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;

35 s t a t i c g i n t S o lv e r P ro c I D = 0 ;

36 # d e f i n e COMM MPI COMM WORLD

37 s t a t i c g u i n t 6 4 S o l v e r P r o c es s o r M a sk = 0 ;

38 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 39

40 / ∗ A d ju s t row o f s p i ns a cc or di ng t o b i t s t r i n g r e p r e s e n t a t i o n

41 ∗ s p i n G l a s s ( w ri t e ) t h e s p i n g la ss s t r u c t u r e t o m a n i p u l a t e

42 ∗ row s p e c i f i e s t h e s p i n row i n t h e r a n g e [ 0 ,NROWS)

43 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row

44 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID d o e s n o t i n fl u en ce t h e b i t s t r i n g ∗ /

45 s t a t i c v oi d a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g i n t row , t i n t c o nf ,

g b o o l e a n i g n o r e B i t m a s k ) ;

46

47 / ∗ D et er mi ne g ro un d s t a t e and c o n f i g u ra t i o n o f a s p in g l a s s i n s t an c e

48 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ / 49 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

50

51 / ∗ D e te r mi n e o p ti mu m c o n f i g u r a t i o n s o f s p i n ro w row −1 , f o r a l l c o n f i g u r a t i o n s o f r ow

ro w



119

52 ∗ s pi nG la ss ( r e a d / w r i te ) s pi n g l a s s i n s t a n c e

53 ∗ m i n P a t h ( r e a d / w r i t e ) s t o r e s minimum p a th ( i . e . g ro un d s t a t e e n er g y ) o f

s u bs y st e m b e f o r e a nd a f t e r i n c r em e n t in g r ow row

54 ∗ m i n Pa t h Co n f ( r e a d / w r it e ) s t o r e s o ptimum c o n f i g u r a ti o n s o f r ows

55 ∗ row row o f t h e s p i n l a t t i c e t o p r o c e s s56 ∗ t r e l l i s C o l s number of s p i n row c o n fi g u r a t i o n s

57 ∗ f i n a l R o w C o n f u s e d t o s pe c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c li c

b ou nd ar y c o n d i t i o n s a re p r e s e n t ∗ / 58 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t

∗ m in Pa th Co nf , g i n t row , t i n t t r e l l i s C o l s , t i n t f i na l Co n fR o w ) ;

59

60 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n r ow s , b as ed on o p ti mu m c o n f i g u r a t i o n s

61 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e

62 ∗ m i n P a t h C o n f ( r e a d ) s t o r es op timum s pi n row c o n f ig u r a t io n s

63 ∗ c o n f o ptimum c o n f i g u r a t i o n o f u l t i m a t e s p i n row ∗ / 64 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t

c o n f ) ;

6566 / ∗ I n i t i a l i s e m e s s ag e p a s s i n g c o m m u n i c a t i o n s

67 ∗ / 68 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

69

70 / ∗ T e r m i n a t e m e s sa g e p a s s i n g c o m m u n i c a ti o n s

71 ∗ / 72 s t a t i c v oi d term comms ( v o i d ) ;

73

74

75 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {76

77 g d o u b l e e n e r g y ;

7879 i f ( s p i n G l a s s −> y S i z e > 6 3 ) {80 g f p r i n t f ( s t d e r r , ” E rr o r : The s p e c i f i e d s pi n l a t t i c e e xc e ed s a c o un t o f 63

c o l u m n s \n” ) ;

81 }82

83 i ni t c om ms ( s p i n G l a ss ) ;

84

85 g et m in im um p at h ( s p in G la s s ) ;

86

87 term comms ( ) ;

88

89 / ∗ M as te r p r o c e ss o u t p u t s s p i n g l a s s g r ou nd s t a t e ∗ / 90 i f ( S o l v e r P r o c I D == 0 ) {91 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;

92 g p r i n t f ( ” En e r g y : %E\n ” , e n e r g y ) ;

93 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t d o u t ) ;

94 }95

96 r e t u r n e n e r g y ;

97 }98

99 s t a t i c v oi d a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g i n t row , t i n t c o nf ,

g b o o l e a n i g n o r e B i t m a s k ) {



121

152

153 / ∗ S e t c u r re n t s p in row c o n f i g ur a t i o n ∗ / 154 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;

155

156 f o r ( k =0 ; k < t r e l l i s C o l s ; k ++ ) {157 g d o u b l e i n t e r R o w E n e r g y ; / ∗ E n e rg e ti c c o n t r i b u t io n o f c u r re n t a nd

p r e v i o u s r ow ∗ / 158 g d o u b l e r o wE n er g y ; / ∗ E n e rg e ti c c o n t r i b u t io n o f c u r re n t r ow ∗ / 159

160 / ∗ S e t p r e ce d i ng s p i n row c o n f i g u r a t i o n ∗ / 161 a d j u s t s p in r o w ( s pi nG la ss , pr e v i o u sR o w , k , IGNORE BITMASK ) ;

162

163 / ∗ C a lc u l at e e n e r g et i c c o n t r i b u t i o n s ∗ / 164 i n t e r R o w E n e r g y = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou s Ro w ) ;

165 r o wE n er g y = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) ;

166

167 i f ( m i n P a t h [ k ]+ i n t e r R o w E n e r g y+r o w E n e r g y < p a t h ) {

168 p a t h = m i n P a t h [ k ] + i n t e r R o w E n e r g y + rowEnergy ;169 c o n f = k ;

170 }171 }172

173 / ∗ R ec or d o p ti mu m p a t h s t o e xa m in e d s t a t e ∗ / 174 m i n P a t h C o n f [ j ] = c o n f ;

175 minPathNew [ j ] = p a t h ;

176 }177 }178


180 / ∗ E x c ha n g e m in im um p a t h s ∗ / 181 M P I A l l g a t h e r ( m i nP at hN ew , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,

t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;

182 # e l s e

183 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;

184 # e n d i f

185

186 g f r e e ( min PathNew ) ;

187 }188 # e n d i f

189

190 / ∗ C y cl i c v a r ia n t ∗ / 191 # i f d e f CYCLIC EXCHANGE

192 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t

∗ m in Pa th Co nf , g i n t row , t i n t t r e l l i s C o l s , t i n t f i na l Ro w Co n f ) {193

194 t i n t j , k ;

195

196 / ∗ C om pu te n e i g h b o u r p r o c e s s I D ∗ / 197 g i n t l e f tN e i gh bo u r = ( S o l v e r P r o c I D −1+S o l v e r N P r o c s ) % S o l v e r N P r o c s ;

198

199 / ∗ S t o r e s u p d at e d mi ni mu m p a t h d a t a ∗ / 200 g d o u b l e ∗ minPathNew = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;

201 g d o u b l e ∗ b u f f e r = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;

202




203 g i nt p r e v i o u s R o w ;

204

205 i f ( r o w == 0 ) {206 p r e v i o u s R o w = ( s p i n G l a s s −> x S i z e ) − 1 ;

207208 / ∗ S e t p r e ce d i ng row c o n f i g u r a t i o n ∗ / 209 a d j u s t s p i n r o w ( s p i n G l as s , p re vi ou sR o w , f in a lR o wC o nf , IGNORE BITMASK ) ;

210

211 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {212 m i n P a t h C o n f [ j ] = f i n a l R o w C o n f ; / ∗ T h e o r e t i c a l l y r e du n da n t ∗ / 213

214 / ∗ S e t c u r re n t s p in row c o n f i g ur a t i o n ∗ / 215 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;

216

217 / ∗ C a l cu l a te e n e r g et i c c o n t r i b u t io n ∗ / 218 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) +

s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou sR o w ) ;

219 }220 } e l s e {221 M P I R e q u e s t r e q u e s t ;

222 p r e v i o u s R o w = ro w − 1 ;

223

224 / ∗ I t e r a t e t hr ou gh s u b se t o f c u r re n t row ’ s s t a t e s ∗ / 225 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {226 g d o u b l e p a t h = G MAXDOUBLE;

227 t i n t c o n f ;

228

229 / ∗ S e t s p in row c o n f i g ur a t i o n ∗ / 230 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;

231

232 / ∗ I t e r a t e t hr ou gh ∗ a l l ∗ s t a t e s o f p r ec e di ng s p in row ∗ / 233 f o r ( k =0 ; k < t r e l l i s C o l s ; k ++ ) {234 g d o u b l e i n t e r R o w E n e r g y ;

235 g d o u b l e r o wE n er g y ;

236

237 / ∗ S e t p r e v io u s r ow c o n f i g u r a t i o n ID ∗ / 238 t i n t cID = ( S o l v e r P r o c I D ∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) + k ) %

t r e l l i s C o l s ;

239

240 / ∗ I n i t i a t e n e i g hb o u r r o t a t i o n o f m i np a th ∗ / 241 i f ( k == 0) MPI Isse nd ( minPath , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,

l e f t N e i g h b o u r , 0 , COMM, & r e q u e s t ) ;

242

243 / ∗ S e t p r e ce d i ng s p i n r ow c o n f i g u r a t i o n ∗ / 244 a d j u s t s p in r o w ( s pi nG la ss , pr e v i o u sR o w , cID , IGNORE BITMASK ) ;

245

246 / ∗ C a lc u l at e e n e r g et i c c o n t r i b u t i o n s ∗ / 247 i n t e r R o w E n e r g y = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou s Ro w ) ;

248 r o wE n er g y = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) ;

249

250 i f ( k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) == 0 && k != 0 ) {251 / ∗ R e c e i ve d a t a ∗ / 252 MPI Recv ( b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , (

S o l v e r P r o c I D +1) % Sol ver NPr oc s , MPI ANY TAG, COMM,



123

MPI STATUS IGNORE) ;

253 MPI Wait (& re q u e s t , MPI STATUS IGNORE ) ;

254 memcpy ( m i n Pa th , b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( g d o u b l e

) ) ;

255 / ∗ . . . r e c e i v e d a t a ∗ / 256 / ∗ S e nd d a t a ∗ / 257 M P I I s s e n d ( m i n Pa th , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,

l e f t N e i g h b o u r , 0 , COMM, & r e q u e s t ) ;

258 / ∗ S e nd d a t a ∗ / 259 }260

261 i f ( minP ath [ k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + i n t e r R o w E n e r g y +

r o w E n e r g y < p a t h ) {262 p a t h = min Pat h [k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + i n t e r R o w E n e r g y +

rowEnergy ;

263 c o n f = cID ;

264 }

265 }266

267 m i n P a t h C o n f [ j ] = c o n f ;


269

270 / ∗ R e c e i ve d a t a ∗ / 271 MPI Recv ( b uf fe r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , ( S o l v e r P r o c I D +1)

% So lv er NP ro cs , MPI ANY TAG, COMM, MPI STATUS IGNORE) ;

272 MPI Wait (& r e qu e st , MPI STATUS IGNORE ) ;

273 memcpy ( mi n Pa t h , b uf f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( g d o u b l e ) ) ;

274 }275 }276

277 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;278

279 / ∗ F r e e m em or y ∗ / 280 g f r e e ( min PathNew ) ;

281 g f r e e ( b u f f e r ) ;

282 }283 # e n d i f

284

285 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {286 t i n t j ;

287 g u i n t i ;

288

289 g ui n t t re l l is Ro ws = s p i n G l a s s −> x S i z e ;

290 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e ) ;291

292 g d o u b l e p a t h = G MAXDOUBLE;

293 t i n t c on f ;

294

295 / ∗ S t o r e s minimum p a th t o c u r r e n t l y e xa mi ne d s u bs y st e m f o r e ac h o f i t s s t a t e s ∗ / 296 # i f d e f CYCLIC EXCHANGE

297 g d o u b l e ∗ m i n P a t h P a r t i a l = g new0 ( gdo uble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;

298 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ; / ∗ S t o r e s mi nimu m p a t h d a t a o f a

s ub s y s t em i n a s u b s e t o f i t s s t a t e s ∗ / 299 # e l s e




300 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ;

301 g d o u b l e ∗ m i n P a t h P a r t i a l = m i n P a t h ;

302 # e n d i f

303

304 t i n t ∗∗ m i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t im a l c o n f i g u r a t i o n s o f p r e ce d i ng s ub sy s te m , g i v en s u bs y st e m i i n

s t a t e j ∗ / 305

306 i f ( ! s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s p i n G la s s ) ) {307 f o r ( i =0 ; i < t r e l l i sR o w s ; i ++ ) {308 g e t o p t i m a l p r e s t a t e s ( s p in Gl as s , m in P a th Pa rt i a l , m in P a t h C o n f [ i ] , i ,

t r e l l i s C o l s , 0 ) ; / ∗ L a st a rg um en t i s z er o , s i n c e we do n ’ t c a re a b ou t

v e r t i c a l b ou nd ar y ∗ / 309 }310

311 # i f d e f CYCLIC EXCHANGE

312 M PI A l lg at he r ( m i nP a th P a rt i al , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,

t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;313 # e n d i f

314

315 / ∗ G e t m in im um p a t h ∗ / 316 f o r ( j =0 ; j < t r e l l i s C o l s ; j ++ ) {317 i f ( minPath [ j ] < p a t h ) {318 p a t h = m i n P a t h [ j ] ;

319 c o n f = j ;

320 }321 }322 s e t o p t i m a l c o n f i g ( s pi nG la ss , m in Pa th Co n f , c on f ) ;

323

324 } e l s e {325 t i n t ∗∗ r e t a i n e d M i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s /

S o l v e r N P r o c s ) ;

326

327 f o r ( j =0 ; j < t r e l l i s C o l s ; j ++ ) {328 f o r ( i =0 ; i < t r e l l i sR o w s ; i ++ ) {329 / ∗ L a st a rg um en t c o r r es p o nd s t o f i x e d s p i n f o r b ou nd ar y i n t e r a c t i o n ∗ / 330 g e t o p t i m a l p r e s t a t e s ( s p in G l a s s , m in Pa th Pa rt ia l , mi n P a t h C o n f [ i ] , i ,

t r e l l i s C o l s , j ) ;

331 }332


334 M PI Al lg at he r ( m i n P a t h P ar t i a l , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,

minP ath , t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;

335 # e n d i f 336

337 / ∗ T r a ck e n e r g y ∗ / 338 i f ( minPath [ j ] < p a t h ) {339 p a t h = m i n P a t h [ j ] ;

340 c o n f = j ;

341 / ∗ R e ta i n s t a t e s s t o re d i n mi nC onf ∗ / 342 memcpy (& ( r et ai ne dM in Pa th Co nf [ 0 ] [ 0 ] ) , &( mi n P a t h C o n f [ 0 ] [ 0 ] ) , t r e l li s R o ws

∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) ∗ s i z e o f ( t i n t ) ) ;

343 }344 }



125

345

346 s e t o p t i m a l c o n f i g ( s pi nG la ss , r et ai ne dM in Pa th Co nf , c on f ) ;

347 a r r a y f r e e 2 D ( r et ai ne dM in Pa th Co nf ) ;

348 }

349350 g f r e e ( m in Pa t h ) ;

351 a r r a y f r e e 2D ( m in Pa th Co nf ) ;


353 g f r e e ( m i nP a th P ar t i a l ) ;

354 # e n d i f

355 }356

357 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t

c o n f ) {358 g i n t i ;

359 g ui n t t re l l is Ro ws = s p i n G l a s s −> x S i z e ;

360 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e ) ;


363 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e e xc ha ng ed (

c o mp l et e ) row c o n f i g u r a t i o n d a ta ∗ / 364 # e n d i f

365

366 / ∗ I t e r a t e t hr ou gh s p in r o ws i n r e v er s e ∗ / 367 f o r ( i= t r e l l i s R o w s −1 ; i >=0; i −− ) {368 / ∗ S e t r ow c o n f i g u r a t i o n ∗ / 369 a d j u s t s p i n r o w ( s p i nG l as s , i , c on f , IGNORE BITMASK ) ;

370

371 / ∗ R e f er e nc e o pt imu m c o n f i g u r a t i o n o f p r e ce d i ng s p i n r ow ∗ / 372 # i f d e f USE MPI

373 M P I A l l g a t h e r ( m in P at h Co n f [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T I NT , m i nP at h Co n fR o w, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;

374 c o n f = minPathConfRow [ conf ] ;

375 # e l s e

376 c o n f = m i n P a t h C o n f [ i ] [ c o n f ] ;

377 # e n d i f

378 }379


381 g f r e e ( mi nPa th Co nf Ro w ) ;

382 # e n d i f

383 }384

385 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {386


388 g d ou bl e b i n a r y P l a c e s ;

389

390 M P I I n i t ( NULL , NULL ) ;

391 M PI C o mm s iz e (COMM, & S o l v e r N P r o c s ) ;

392 M PI C om m r an k (COMM, & S o l v e r P r o c I D ) ;

393

394 / ∗ C hec k p r o c e s so r c o un t i s a p ow er o f t wo o r u n i t y ∗ / 395 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2 !=0 ) {




396 g f p r i n t f ( s t de r r , ” The p r o c e s s o r c o un t mu st be a power o f two . A bo rt in g . \ n” ) ;


398 }399

400 / ∗ C r e at e p r o c e s s o r m as k ∗ / 401 S ol ve r P ro ce ss or Ma sk = S o l v e r P r o c I D ;

402 b in ar yP l a ce s = ( l o g ( ( g d o u b l e ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;

403 S ol ve r P ro ce ss or Ma sk <<= ( s p i n G l a s s −> y S i z e ) − ( g i n t ) b i n a r y P l a c e s ; / ∗ S h i f t l o g 2 (

N p r oc s ) b i t s l e f t ∗ / 404 # e n d i f

405 }406

407 s t a t i c v oi d t e r m c o m m s ( ) {408


410 M P I F i na li z e ( ) ;

411 # e n d i f

412 }



127

1 / ∗2 ∗ F il e : d p g s t a t e f i n d e r f a s t . c

3 ∗4 ∗ I m p le m e nt s s e r i a l a nd p a r a l l e l i m pr o ve d d y n am ic p ro gr am mi ng a l g o r i t h m s

5 ∗6 ∗ / 7

8 # i n c l u d e < s t d l i b . h>

9 # i n c l u d e <math . h>

10 # i n c l u d e < s t r i n g . h>

11 # i n c l u d e < g l i b . h>


13


15 # i n c l u d e ” a r r a y s . h ”


17

18 / ∗ USE MPI d e f i n e s p a r a l l e l c od e ∗ / 19 # d e f i n e USE MPI


21 # i n c l u d e <mpi . h>

22 # e n d i f

23

24 / ∗ D e f in e s d a ta t y p e f o r m es sa ge p a s si n g ∗ / 25 # d e f i n e T INT MPI LONG LONG INT

26

27 / ∗ C o n st a nt a l i a s ∗ / 28 # d e f i n e IGNORE BITMASK TRUE

29

30 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 31 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;32 s t a t i c g i n t S o lv e r P ro c I D = 0 ;

33 # d e f i n e COMM MPI COMM WORLD

34 s t a t i c t i n t S o l v e r P r o c e s s o r M a sk = 0 ;


37 / ∗ A d ju s t g ro up o f s p i ns a cc o rd i ng t o b i t s t r i n g r e p r e s e n t a t i o n

38 ∗ s p i n G l a s s ( w ri t e ) t h e s p i n g la ss s t r u c t u r e t o m a n i p u l a t e

39 ∗ l e a d i n g S p i n s p e c i f i e s s l i d i n g window p o s i t i o n i n t h e r a n g e [ y S i z e

, x S i z e ∗ y S i z e )

40 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row

41 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID d o e s n o t i n fl u en ce t h e b i t

s t r i n g ∗ / 42 s t a t i c v oi d a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a di n g Sp i n , t i n t

c o n f , g b o o l e a n i g n o r e B i t m a s k ) ;

43

44 / ∗ D et er mi ne g ro un d s t a t e and c o n f i g u ra t i o n o f a s p in g l a s s i n s t an c e

45 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ / 46 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

47

48 / ∗ D et er mi ne o pt im um c o n f i g u r a t i o n s o f s p i n g ro up l e a di n g Sp i n −1 , f o r a l l

c o n f i g u r a t i o n s o f g r ou p l e a d i n g S p i n

49 ∗ s pi nG la ss ( r e a d / w r i te ) s pi n g l a s s i n s t a n c e




50 ∗ m i n P a t h ( r e a d / w r i t e ) s t o r e s minimum p a th ( i . e . g ro un d s t a t e e n er g y ) o f

s ub s y st em b e fo r e a nd a f t e r i n cr e me n ti n g b y s p in l e ad i n gS p i n

51 ∗ m i n Pa t h Co n f ( r e a d / w r i te ) s t o r e s op timum c o n f i g u r at i o n s o f s p in g ro up s

52 ∗ l e a d i n g S p i n p o s i t io n o f s l i d i n g window i n t h e r a n g e [ y S i z e , x S i z e ∗

y S i z e )53 ∗ t r e l l i s C o l s number of s p i n gr o u p c o nf i g ur a ti o ns

54 ∗ f i n a l R o w C o n f u s e d t o s pe c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c li c

b ou nd ar y c o n d i t i o n s a re p r e s e n t ∗ / 55 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t

∗ m i n P at h C o n f , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) ;

56

57 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n g ro up s , b as ed on o p ti mu m c o n f i g u r a t i o n s

58 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e

59 ∗ m i n P a t h C o n f ( r e a d ) s t o r es op timum s pi n g r o u p c o nf i g u r a t io n s

60 ∗ c o n f o ptimum c o n f i g u r a t i o n o f s p i n g r o u p a t u l t i m a t e

s l i d i n g wind ow p o s i t i o n ∗ / 61 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t

c o n f ) ;62


64 ∗ / 65 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

66



70

71

72 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {73

74 g d o u b l e e n e r g y ;75

76 i f ( s p i n G l a s s −> y S i z e > 6 3 ) {77 g f p r i n t f ( s t d e r r , ” E rr o r : The s p e c i f i e d s pi n l a t t i c e e xc e ed s a c o un t o f 63

c o l u m n s \n” ) ;

78 }79

80 i ni t c om ms ( s p i n G l a ss ) ;

81

82 g et m in im um p at h ( s p in G la s s ) ;

83

84 term comms ( ) ;

85

86 / ∗ M as te r p r o c e ss o u t p u t s s p i n g l a s s g r ou nd s t a t e ∗ / 87 i f ( S o l v e r P r o c I D == 0 ) {88 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;

89 g p r i n t f ( ” En e r g y : %E\n ” , e n e r g y ) ;

90 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t d o u t ) ;

91 }92


94 }95



129

96 s t a t i c v oi d a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a di n g Sp i n , t i n t

c on f , g b o o le a n i g n o r e B it m a s k ) {97 g i n t i ;

98 S p i n s p i n ;


101 / ∗ Row c o n f i g ur a t i o n i s d ep en de nt on p r oc e ss o r I D , w hi ch i s a b i t p r e f i x ∗ / 102 i f ( ! i g n o r e B i t m a s k ) c o n f = c o n f | S o l v e r P r o c e s s o r M a s k ;

103 # e n d i f

104

105 f o r ( i =0; i <=s p i n G l a s s −> y S iz e ; i ++ ) {106 i f ( c o n f % 2 != 0 ) s p in = UP ;

107 e l s e s p i n = DOWN;

108

109 / ∗ S et s pi n a t p o s i t i o n i w i th i n s l i d i n g window ∗ / 110 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −( s p i n G l a s s −> y S i z e )+ i ] = s p i n ;

111

112 c o n f = c o n f >> 1 ;113 }114 }115

116 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t

∗ m i n P at h C o n f , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) {117 t i n t j ;

118 t i n t k ;

119

120 / ∗ S t o r e s u p d at e d mi ni mu m p a t h d a t a ∗ / 121 g d o u b l e ∗ minPathNew = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;

122

123 i f ( l e a d i n g S p i n == s p i n G l a s s −> y S i z e ) {124 / ∗ s p i n G l a s s −> y S i z e c o r re s po n d s t o t h e f i r s t s p i n i n t h e s ec on d row o f t h e

l a t t i c e ∗ / 125

126 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {127 / ∗ S e t c u r re n t s p in g ro up c o n f i g u ra t i o n ∗ / 128 a d j u st s p i n e n s e mb l e ( s p in Gl as s , l e a di n gS p in , j , ! IGNORE BITMASK ) ;

129

130 / ∗ C a l cu l a te e n e r g et i c c o n t r i b u t io n ∗ / 131 minPathNew [ j ] = s p i n g l a s s e n s e m b l e d e l t a ( s p in G l as s , l e a d i n gS p i n ) +

s p i n g l a s s r o w e n e r g y ( s p i n G l as s , 0 ) ;

132 }133

134 } e l s e {135 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {136 g d o u b l e p a t h = G MAXDOUBLE;

137 g d o u b l e e n s e m b l e E n e r g y ;

138 t i n t c o n f I n d e x , c o n f ;

139

140 / ∗ S e t c u r re n t s p in e ns em bl e c o n f i g ur a t i o n ∗ / 141 a d j u st s p i n e n s e mb l e ( s p in Gl as s , l e a di n gS p in , j , ! IGNORE BITMASK ) ;

142 e n s e m b l e E n e r g y = s p i n g l a s s e n s e m b l e d e l t a ( s p i n G l as s , l e a d i n g S p i n ) ;

143

144 / ∗ C a l c u l at e i n d ex f o r a c c e ss i n g p r e ce d i ng e n se mb l e c o n f i g u r a t i o n ∗ /




145 c o n f I n d e x = ( ( ( j | S o l v e r P r o c e s s o r M a s k ) << 1 ) | t r e l l i s C o l s ) ˆ

t r e l l i s C o l s ;

146

147 f o r ( k =0 ; k < 2; k ++) {

148 / ∗ M i n im i se o n sum o f e n s em b le e n e r g i e s ∗ / 149 i f ( m i n P at h [ c o n f I n d e x+k ]+ e n s e m b l e E n e r g y < p a t h ) {150 p a t h = m i n P a t h [ c o n f I n d e x+k ] + e n s e m b l e E n e r g y ;

151 c o n f = c o n f I n d e x + k ;

152 }153 }154

155 / ∗ R ec or d o p ti mu m p a t h s t o e xa m in e d s t a t e ∗ / 156 m i n P a t h C o n f [ j ] = c o n f ;


158 }159 }160

161 # i f d e f USE MPI162 / ∗ E x c ha n g e m in im um p a t h s ∗ / 163 M P I A l l g a t h e r ( m i nP at hN ew , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,

t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;

164 # e l s e

165 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;

166 # e n d i f

167

168 g f r e e ( min PathNew ) ;

169 }170

171 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {172 t i n t j ;

173 g u i n t i ;174

175 g ui n t t re l l is Ro ws = ( s p i n G l a s s −>x S i z e −1) ∗ s p i n G l a s s −> y S i z e ;

176 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e+1 ) ;

177

178 g d o u b l e p a t h = G MAXDOUBLE;

179 t i n t c on f ;

180

181 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ;

182

183 t i n t ∗∗ m i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t im a l c o n f i g u r a t i o n s o f p r e ce d i ng s ub sy s te m , g i v en s u bs y st e m i i n

s t a t e j ∗ / 184185 f o r ( i =0; i < t r e l l i sR o w s ; i ++ ) {186 g e t o p t i m a l p r e s t a t e s ( s pi nG la ss , mi nPa th , m in Pa th Co nf [ i ] , s pi n Gl as s −> y S i z e+ i ,

t r e l l i s C o l s ) ;

187 }188 / ∗ F in d o pt im um c o n f i g u r a t i o n o f s p i n g r ou p a t u l t i m a t e s l i d i n g w ind ow p o s i t i o n ∗ / 189 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) {190 i f ( minPath [ j ] < p a t h ) {191 p a t h = m i n P a t h [ j ] ;

192 c o n f = j ;

193 }



131

194 }195 s e t o p t i m a l c o n f i g ( s pi n Gl as s , m in Pa th Co nf , c on f ) ;

196

197 g f r e e ( m in Pa t h ) ;

198 a r r a y f r e e 2D ( m in Pa th Co nf ) ;199 }200

201 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t

c o n f ) {202 g i n t i ;

203 g ui n t t re l l is Ro ws = ( s p i n G l a s s −>x S i z e −1) ∗ s p i n G l a s s −> y S i z e ;

204 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e+1 ) ;

205


207 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e e xc ha ng ed (

c o mp l et e ) row c o n f i g u r a t i o n d a ta ∗ / 208 # e n d i f

209210 f o r ( i= t r e l l i s R o w s −1 ; i >0 ; i −− ) {211

212 / ∗ S e t s p in G la s s s p in a c co r d in g t o l e ad i n g s p in c o n f i g u ra t i o n ∗ / 213 g i n t s p i n V a l = c o n f >> ( s p i n G l a s s −> y S i z e ) ;

214 g i n t l e a d i n g S p i n = ( s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e − 1 ) − ( t r e l l i s R o w s −1

− i ) ;

215 i f ( s p i n V al != 0 ) ( s p i n G la s s −> s p i n s ) [ l e a d i n g S p i n ] = UP ;

216 e l s e ( s p i n G l a s s −> s p i n s ) [ l e a d i n g S p i n ] = DOWN;

217


219 M P I A l l g a t h e r ( m in P at h Co n f [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T I NT , m i nP at h Co n fR o w

, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;

220 c o n f = minPathConfRow [ conf ] ;221 # e l s e

222 c o n f = m i n P a t h C o n f [ i ] [ c o n f ] ;

223 # e n d i f

224 }225

226 / ∗ S e t e n se mb l e c o n f i g u r a t i o n d ue t o f i r s t l e a d i n g s p i n ∗ / 227 a d j u st s p i n e n s e mb l e ( s p in Gl as s , s pi n Gl as s −>ySi ze , con f , IGNORE BITMASK) ;

228

229


231 g f r e e ( mi nPa th Co nf Ro w ) ;

232 # e n d i f

233 }234

235 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {236


238 g d ou bl e b i n a r y P l a c e s ;

239

240 M P I I n i t ( NULL , NULL ) ;



243




244 / ∗ C hec k p r o c e s so r c o un t i s a p ow er o f t wo o r u n i t y ∗ / 245 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2 !=0 ) {246 g f p r i n t f ( s t de r r , ” The p r o c e s s o r c o un t mu st be a power o f two . A bo rt in g . \ n” ) ;


248 }249

250 / ∗ C r e at e p r o c e s s o r m as k ∗ / 251 S ol ve r P ro ce ss or Ma sk = S o l v e r P r o c I D ;

252 b in ar yP l a ce s = ( l o g ( ( g d o u b l e ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;

253 S ol ve r P ro ce ss or Ma sk <<= ( s p i n G l a s s −> y S i z e ) + 1 − ( g i n t ) b i n a r y P l a c e s ; / ∗ S h i f t

l o g 2 ( N p ro c s ) b i t s l e f t ∗ / 254 # e n d i f

255 }256

257 s t a t i c v oi d t e r m c o m m s ( ) {258


260 M P I F i na li z e ( ) ;261 # e n d i f

262 }



133

1 / ∗2 ∗ F i l e : h a r m o n y g s t a t e f i n d e r . c

3 ∗4 ∗ I m pl e me n ts p a r a l l e l h armo ny g ro un d s t a t e s o l v e r

5 ∗6 ∗ / 7

8 # i n c l u d e < g l i b . h>


10 # i n c l u d e <mpi . h>

11 # i n c l u d e < s t r i n g . h>

12



15 # i n c l u d e ”random . h”

16

17

18 / ∗ S e r i a l a l g o ri t h m p a ra m et e rs ∗ / 19 # d e f i n e NVECTORS 10

20 # d e f i n e MEMORY CHOOSING RATE 0 . 9 5

21

22 / ∗ P a r a l l e l a l g o ri t h m p a ra m et e rs ∗ / 23 # d e f i n e ITERBLOCK 10 0

24 # d e f i n e ZONEEXBLOCK 10 0

25

26 / ∗ Common s p i n g l a s s d a t a ∗ / 27 s t a t i c s t r uc t S p i n G l a s s ∗ s p i n G l a s s ;

28 s t a t i c S p i n ∗ s p i n s [NVECTORS] ;

29 s t a t i c g i n t x S iz e ;

30 s t a t i c g i n t y S iz e ;

31 / ∗ Common s p i n g l a s s d a t a ∗ / 32

33 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 34 # d e f i n e COMM MPI COMM WORLD

35 # d e f i n e ZONE SIZE 16

36 s t a t i c M P I D a t a t yp e T y p e A r r a y ;

37 s t a t i c M PI Op R e d u c t i o n O p ;

38 s t a t i c MPI Comm S o l v e r Z o n e ;

39 s t a t i c g i n t S o lv e r P ro c I D = 0 ;

40 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;


43 / ∗ D et er mi ne h i g h e st e ne rg y s p in g l a ss h el d by t h i s p r oc e ss

44 ∗ h ig he st En er gy ( w r i te ) t he e n e rg y o f th e o b ta in ed s o l u t i o n v e c t o r 45 ∗ v e c t o r N u m ( w ri te ) t h e in d e x of th e s o l u t i o n v e c t o r a s st o re d i n t h e

a r r a y s p i n s [ ] ∗ / 46 s t a t i c v oi d c o m p u t e h i g h e s t e n e r g y ( g d o ub l e ∗ h i g h e st E n e r g y , g i n t ∗ vectorNum ) ;

47

48 / ∗ D et er mi ne l o w es t e ne rg y s p in g l a s s h e ld b y t h i s p r oc es s

49 ∗ h ig he st En er gy ( w r i te ) t he e n e rg y o f th e o b ta in ed s o l u t i o n v e c t o r

50 ∗ v e c t o r N u m ( w ri te ) t h e in d e x of th e s o l u t i o n v e c t o r a s st o re d i n t h e

a r r a y s p i n s [ ] ∗ / 51 s t a t i c v oi d c o m p u t e l o w e s t e n e r g y ( g d o u b l e ∗ l o we s tE n e rg y , g i n t ∗ vectorNum ) ;

52




53 / ∗ D et er mi ne t h e a l g o ri t h m ’ s c o nv e rg e nc e s t a t u s , b as ed on s o l u t i o n v e c t o r s h e ld b y

e ac h p r o c e s s

54 ∗ r e t u r n s TRUE , i f t h e a l g o r i t h m h a s c o n v e r g e d ∗ / 55 s t a t i c gbo ole an g e t s t a b i l i s e d s t a t u s ( v o i d ) ;

5657 / ∗ C o l l e c t i v e l y o b ta i n e n e r g e t i c a l l y m in im al s o l u t i o n v e c to r h el d b y p r o c es s es

58 ∗ s p i n V e c t o r ( r e a d / w r i te ) s p e c i f i e s s o l u t i o n v e c to r t o p er fo rm r e d uc t i on on ,

b a se d o n e n e rg y

59 ∗ comm ( r e a d ) MPI c o m m u n i c a t o r t o s p e c i f y p ro ce s s e s i n vo lv e d i n

r e d u c t i o n ∗ / 60 s t a t i c v oi d r e d u c e m i n i m a l s p i n v e c t o r ( S p in ∗ sp in Ve cto r , MPI Comm comm) ;

61

62 / ∗ D e f in e s o p e ra t i on , o n w h ic h r e d u c t i o n i s b as ed

63 ∗ v e c t o r1 , v e c t o r 2 ( r e ad / w r i t e ) o p e r a t i o n a r g um e nt s

64 ∗ l e n g t h ( r e a d ) l e n g t h o f v e c t o r s

65 ∗ d a t a t y p e ( r e a d ) d a t a t y p e u s e d f o r c o m m u n i c a t i o n s ∗ / 66 s t a t i c v oi d r e d u c t i o n f u n c t i o n ( S pi n ∗ v e c t o r1 , S p in ∗ v e ct o r 2 , g i n t ∗ l e n g t h ,

M P I D a t a t y p e ∗ d a t a T y p e ) ;67


69 ∗ / 70 s t a t i c v oi d i n i t c o m m s ( v o i d ) ;

71



75

76 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ p a r a m S p i n G l a s s ) {77 g i n t i , j ;

78

79 / ∗ U se d t o s t o r e e n e rg y a nd i d e n t i f i e r o f h i g h e s t e n e rg y v e c t o r i n memory ∗ / 80 g d ou bl e h ig he st En er gy ;

81 g i n t m a x Vec t o r ;

82

83 / ∗ U se d t o s t o r e e n e rg y a nd i d e n t i f i e r o f l o w e s t e n e rg y v e c t o r i n memory ∗ / 84 g do ub le m in E n er g y ;

85 g i n t m i n V e c t o r ;

86

87 / ∗ U se d f o r c o mm u ni c at i ng s p i n v e c t o r s ∗ / 88 S p i n ∗ n e i g h b o u r S p i n s = g n e w ( S p i n , p a r a m S p i n G l a s s −> x S i z e ∗ p a r a m S p i n G l a s s −> y S i z e ) ;

89

90 / ∗ S t or e s p in g l a s s g l o b a l l y ∗ / 91 s p i n G l a s s = p a r a m S p i n G l a s s ;

92 x S i z e = p a r a m S p i n G l a s s −> x S i z e ;93 y S i z e = p a r a m S p i n G l a s s −> y S i z e ;

94

95 i n i t c o m m s ( ) ;

96

97 / ∗ I n i t i a l i s e b y g e n e r a t i n g r an do m v e c t o r s ∗ / 98 f o r ( i =0 ; i <NVECTORS; i ++ ) s p i n s [ i ] = s p i n g l a s s g e t r a n d o m s p i n s ( s p i n G l a s s ) ;

99

100 / ∗ B e g i n i t e r a t i v e p r o c e s s ∗ / 101 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++ ) {102 / ∗ C r e a te new v e c t o r ∗ /



135

103 S p i n ∗ n e w S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;

104

105 / ∗ C om pu te i n i t i a l h i g h e s t e n e r g y v e c t o r ∗ / 106 c o mp u te h i gh e st e n er g y (& h ig h es tE n er gy , &m ax Ve ct or ) ;

107108 / ∗ S e t v e c t o r c o mp o ne n ts ∗ / 109 f o r ( j =0 ; j < x S i z e ∗ y S i z e ; j ++ ) {110 i f ( s p i n G l a s s −>c l a mp s != NULL && ( sp in Gl as s −>clam ps ) [ j ]) {111 / ∗ C la mp in g c o n d i t i o n ∗ / 112 n ew S p i n s [ j ] = s p i n G l a s s −> s p i n s [ j ] ;

113 } e l s e i f ( r a n d c o n t i n u o u s ( 0 , 1 ) < MEMORY CHOOSING RATE) {114 / ∗ Memory s e l e c t i o n c o n d i t i o n ∗ / 115 n ew S p i n s [ j ] = s p i n s [ g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ] [ j ] ;

116 } e l s e i f ( r a n d c o i n t o s s ( ) ) {117 n ew S p i n s [ j ] = UP;

118 } e l s e {119 n ew S p i n s [ j ] = DOWN;

120 }121 }122

123 / ∗ R e p la c e v e c t o r i n memo ry , i f t h e new v e c t o r i s f i t t e r ∗ / 124 i f ( s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , n ew Sp in s ) < h i g h e s t E n e r g y ) {125

126 g f r e e ( s p i n s [ m ax V ec to r ] ) ; / ∗ F re e p r e v i ou s v e c t o r ∗ / 127 s p i n s [ m a x Ve ct o r ] = n e w S p i n s ;

128 } e l s e {129 g f r e e ( n e w Sp i n s ) ;

130 }131

132 i f ( S o l v e r P r o c I D % Z ONE SI ZE == 0 ) {133 / ∗ P e r i o di c e xc ha ng e o f s p i n v e c t o r s b et we en n e i gh b o ur i n g z o ne s ∗ / 134 / ∗ H i gh e st e n er g y v e c t o r i s r e p l ac e d b y ra nd om v e c t o r ∗ / 135 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ;

136 M P I S e n dr e c v ( s p i n s [ random ] , 1 , T yp e A r r ay , ( S ol ve r P ro cI D+ZONE SIZE)%

S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , T y p e A r r a y , MPI ANY SOURCE ,

MPI ANY TAG , COMM, MPI STATUS IGNORE ) ;

137 r e d u c t i o n f u n c t i o n ( n ei gh bo ur Sp in s , s p i ns [ random ] , NULL, NULL) ;

138 }139

140 / ∗ Zo ne i n t e r n a l v e c t o r e xc ha ng e ∗ / 141 i f ( i % ZONEEXBLOCK == 0 ) {142 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ ma x Ve ct o r ] , S ol ve r Z on e ) ;

143 }144 }145

146 / ∗ D et er mi ne minimum v e c to r , c op y c o n f i g u r a t i o n b ac k t o o r i g i n a l s t r u c t u r e ∗ / 147 c o m p u t e l o w e s t e n e r g y (& mi nE ne rg y , &m i nV e ct o r ) ;

148 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ m i nV e ct o r ] , COMM) ;

149 memcpy ( s pi n Gl as s −> s p i n s , s p i n s [ m i n V e c t o r ] , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;

150

151 / ∗ M as te r p r o c e ss o u t p u t s s o l u t i o n ∗ / 152 i f ( S o l v e r P r o c I D == 0 ) {153 p r i n t f ( ” S t a b i l i s e d a f t e r %d i t e r a t i o n s . \ n ” , i ) ;

154 g p r i n t f ( ” En e r g y : %E\n ” , m i n E n e r g y ) ;




155 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t do u t ) ;

156 }157

158 term comms ( ) ;

159160 f o r ( i =0; i <NVECTORS; i ++ ) g f r e e ( s p i n s [ i ] ) ;

161 g f r e e ( n e i gh bo u r Sp in s ) ;

162

163 r e t u r n minEnergy ;

164 }165

166 s t a t i c gbo ole an g e t s t a b i l i s e d s t a t u s ( v o i d ) {167 g do ub le m in E n er g y ;

168 g do ub le g lo ba lM in En er gy ;

169 g bo ol ea n l oc al Ha sO pt im um = FALSE ;

170 g bo ol ea n a ll Ha ve Op it im um ;

171

172 g i n t m i n V e c t o r ;173

174 / ∗ P er fo rm r e d u c t i o n o n l o w e s t e n er gy s o l u t i o n s ∗ / 175 c o m p u t e l o w e s t e n e r g y (& mi nE ne rg y , &m i nV e ct o r ) ;

1 76 M P I A l l r e d u c e ( & m i n E n e r g y , & g l o b a l M i n E n e r g y , 1 , MPI DOUBLE , MP I MI N , COMM) ;

177

178 / ∗ D et er mi ne w he th er a l l p r oc e s s e s r e t a i n i d e n t i c a l l o w es t e ne rg y s o l u t i o n s ∗ / 179 i f ( minEnerg y == g l o b a l M i n E n e r g y ) l o c a l H as O p t i m um = TRUE ;

180 M P I A l l r e d u c e ( & l o c a l Ha s O p ti m u m , &a l l H a v eO p i t i m u m , 1 , M PI I NT , MPI LAND , COMM) ;

181

182 r e t u r n ( a l l H a v e O p i t i m u m ) ;

183 }184

185 s t a t i c v oi d c o m p u t e h i g h e s t e n e r g y ( g d o ub l e ∗ h i g h e st E n e r g y , g i n t ∗ vectorNum ) {186 g i n t i ;

187

188 ∗ h i g h e s t E n e r g y = −G MAXDOUBLE;

189

190 f o r ( i =0; i <NVECTORS; i ++ ) {191 / ∗ I t e r a t e t hr ou gh a l l s o l u t i o n v e ct o rs , d e te r mi ne h i g h e st e ne rg y ∗ / 192 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y c o n f ( s p i n G la s s , s p i n s [ i ] ) ;

193 i f ( e n e r g y > ∗ h i g h e s t E n e r g y ) {194 ∗ h i g h e s t E n e r g y = e n e r g y ;

195 ∗ vectorNum = i ;

196 }197 }198 }199

200 s t a t i c v oi d c o m p u t e l o w e s t e n e r g y ( g d o u b l e ∗ l o we s tE n e rg y , g i n t ∗ vectorNum ) {201 g i n t i ;

202

203 ∗ l o w e s t E n e r g y = G MAXDOUBLE;

204

205 f o r ( i =0; i <NVECTORS; i ++ ) {206 / ∗ I t e r a t e t hr ou gh a l l s o l u t i o n v e ct o rs , d e te r mi ne l o we s t e ne rg y ∗ / 207 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y c o n f ( s p i n G la s s , s p i n s [ i ] ) ;

208 i f ( e n e r g y < ∗ l o w e s t E n e r g y ) {



137

209 ∗ l o w e s t E n e r g y = e n e r g y ;

210 ∗ vectorNum = i ;

211 }212 }

213 }214

215 s t a t i c v oi d r e d u c e m i n i m a l s p i n v e c t o r ( S p in ∗ sp in Ve cto r , MPI Comm comm) {216 S p i n ∗ n e w S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;

217

218 M P I A ll r e du c e ( s p i n Ve c t o r , n ew Sp in s , 1 , T yp e Ar r ay , R e du c ti o n Op , comm ) ;

219 memcpy ( s p i n V ec t o r , n e wS pi ns , x S i z e∗ y S i z e ∗ s i z e o f ( Spin ) ) ;

220 g f r e e ( n e w S p i n s ) ;

221 }222

223 s t a t i c v oi d r e d u c t i o n f u n c t i o n ( S pi n ∗ v e c t o r1 , S p in ∗ v e ct o r 2 , g i n t ∗ l e n g t h ,

M P I D a t a t y p e ∗ d a t a T y p e ) {224 g do ub le en er gy 1 , e ne rg y2 ;

225226 e n e r g y 1 = s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , v e c t o r 1 ) ;

227 e n e r g y 2 = s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , v e c t o r 2 ) ;

228

229 / ∗ O p e ra t i o n c o n d i t i o n ∗ / 230 i f ( e n e r g y 1 < e n e r g y 2 ) {231 memcpy ( v ec to r2 , v ec to r1 , x Si ze ∗ y S i z e ∗ s i z e o f ( Spin ) ) ;

232 }233 }234

235 s t a t i c v oi d i n i t c o m m s ( v o i d ) {236 M P I D a ta t yp e s p in Ty pe ;

237

238 M P I I n i t ( NULL , NULL ) ;239



242 i f ( S o l v e r P r o c I D == 0 ) p r i n t f ( ” N P ro c s : %d , z o n e s i z e : %d\n ” , S o l ve r N P r oc s ,

ZONE SIZE) ;

243

244 / ∗ S p l i t c o mm u ni c at o r ∗ / 245 M PI C om m s pl it (COMM, S o l v e r P r o c I D / Z ON E SI ZE , 0 , & S o l v e r Z o n e ) ;

246

247 / ∗ I n i t i a l i s e r e d u c t i o n o p e r a t i o n ∗ / 248 M PI O p c re at e ( ( M PI U se r f u nc ti on ∗ ) r e d u c t i o n f u n c t i o n , 1 , &R e d uc t io n O p ) ;

249 M PI T yp e v ec to r ( 1 , s i z e o f ( Spin ) , s i z e o f ( Spin ) , MPI BYTE , &spinT ype ) ;

250 M P I Ty p e v ec t o r ( x S iz e , y S iz e , y S iz e , s p in T yp e , & T yp e Ar r ay ) ;251 M PI T yp e c om mi t(& T y pe A r ra y ) ;

252 }253

254 s t a t i c v oi d term comms ( v o i d ) {255 MPI Co mm fr ee (& S o lv e r Z o ne ) ;

256 M PI T yp e f re e(& Ty pe A rr ay ) ;

257 M P I F i na li z e ( ) ;

258 }




1 / ∗2 ∗ F i l e : s p i n g l a s s . h

3 ∗4 ∗ S p e c i f i e s s p in g l a s s o p e ra t io n i n t e r f a c e and s p i n g l a s s d a t a s t r u c t u r e

5 ∗6 ∗ / 7

8 # i n c l u d e < g l i b . h>

9 # i n c l u d e < s t d i o . h>

10

11 # i f n d e f SPINGLASS H

12 # d e f i n e SPINGLASS H

13

14 / ∗ C o ns ta n ts f o r s p in g l a s s IO ∗ / 15 # d e f i n e STR SPIN UP ”+”

16 # d e f i n e STR SPIN DOWN ”−”

17 # d e f i n e STR CLAMPED ” 1 ”

18 # d e f i n e STR UNCLAMPED ” 0 ”19 # d e f i n e WEIGHT FMT ”%l f ”

20

21 / ∗ S p in d a ta t y p e ∗ / 22 t y p e d e f enum S p i n {23 UP = 1 ,

24 DOWN = −1

25 } S p i n ;

26

27 / ∗ S pi n g l a ss s t r u c t u r e ∗ / 28 s t r u c t S p i n G l a s s {29 / ∗ L a t t i c e d i me n s io n s ∗ / 30 g i n t x S i z e ;

31 g i n t y S i z e ;32

33 / ∗ V ec to r o f s p in s t a t e s ∗ / 34 S p i n ∗ s p i n s ;

35

36 / ∗ S t o r e s c o u p li n g c o n s t a n t s . Da ta a re s t o r e d a s t w o r ow m aj or m ap pi ng s o f s p i n s

t o v e c to r s ,

37 ∗ s uc h t h a t v e r t i c a l b on ds p r ec e de h o r i z o n t a l b on ds . ∗ / 38 g d o u b l e ∗ w e i g h t s ;

39 / ∗ S t o r e s c la mp in g s t a t e s s i m i l a r l y ∗ / 40 g b o o l e a n ∗ c l a m p s ;

41 / ∗ S t o r e s i n i t i a l s p i n c o n f i g u r a t i o n ∗ / 42 S p i n ∗ i n i t i a l S p i n s ;

43 } ;44

45 / ∗ C o ns t ru c t a new s p in g l a s s s t r u c t u r e

46 ∗ x S i z e l a t t i c e r o w s

47 ∗ y S i z e l a t t i c e c o l u m n s

48 ∗ i n i t i a l S p i n s ( r e ad ) v e c t o r o f i n i t i a l s p i n s t a t e s . I f NULL , a v e c t o r o f UP s p i n s

i s a l l o c a te d

49 ∗ w e i gh t s ( r ea d ) v e c t o r o f b on ds . I f NULL , z e ro w e i gh t s a re i n i t i a l i s e d

50 ∗ c l a m p s ( r e a d ) v e c t o r of cl am p i n g s t a t e s .

51 ∗ r e t u r n s s p i n g l a s s d a t a s t r u c t u r e ∗ /



139

52 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t x Si ze , g i n t y Si ze , S pi n ∗ i n i t i a l S p i n s , g do ub le

∗ w e i g h t s , g b o o l e a n ∗ clamp s ) ;

53

54 / ∗ D e s t ru c t a s p i n g l a s s s t r u c t u r e . P er fo rm s d ee p d e a l l o c a t i o n .

55 ∗ s p i n G l a s s s p i n g l a s s d a t a s t r u c t u r e ∗ / 56 v o i d s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

57

58 / ∗ D et er mi ne t o t a l e ne rg y o f s p in g l a ss

59 ∗ s p in G la s s ( r ea d ) s p in g l a ss d at a s t r u c t u r e , wh ose s p i n s t a t e s and b on ds a r e

r e f e r e n c e d

60 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 61 g d ou b le s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

62

63 / ∗ D et er mi ne t o t a l e ne rg y o f s p in g l a ss u s i n g a l t e r n a t i v e s p in v e c to r

64 ∗ s p in G la s s ( r ea d ) s p in g l a ss d at a s t r u ct u r e , wh ose b on ds a r e r e f e r en c e d

65 ∗ c o nf ( r e a d ) v ec to r o f s pi ns whos e s t a t e s ar e re fe r e nc ed

66 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ /

67 g do ub le s p i n g l as s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f ) ;68

69 / ∗ D e te r mi n e e n e r g y o f s p i n r o w

70 ∗ row s p i n row i n r a n g e [ 0 ,NROWS)

71 ∗ s pi n G la s s ( r e ad ) s pi n g l a s s d at a s t r u ct u r e

72 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 73 g d ou b le s p i n g l a s s r o w e n e rg y ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g i n t ro w ) ;

74

75 / ∗ D et er mi ne e n er g y r e s u l t i n g f ro m v e r t i c a l i n t e r a c t i o n s b et we en t wo r ow s row , row+1


77 ∗ row row i n s p i n l a t t i c e , i n t h e r a n g e [ 0 ,NROWS)

78 ∗ r e tu rn s row e n e r g y , a c c o u n t i n g f o r c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 79 g do ub le s p i n g l a ss i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g i n t r ow ) ;

8081 / ∗ D e te r mi n e e n e rg y b e tw e en s p i n a nd i t s n e i g h b o u rs i m m e d i a t e ly a b ov e a nd t o t h e l e f t

o f i t


83 ∗ l ea di ng Sp in s p in p o s i t i o n i n t h e r a n g e [ 0 , XSIZE ∗ Y SI ZE ) , w i t h r ow m a jo r

e n u m e r a t i o n ∗ / 84 g do ub le s p i n g l a ss e n s e m b l e d e l t a ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a d i n gS p i n ) ;

85

86 / ∗ W ri te s p in s t a t e s t o f i l e

87 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o

88 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e ∗ / 89 v o i d s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;

90

91 / ∗ W ri te s p in s t a t e s t o f i l e92 ∗ c o n f ( r e a d ) s p i n c o n fi gu r a ti o n ve c to r t o o u t p u t

93 ∗ s p in G la s s ( r ea d ) u se d t o s p e c i f y l a t t i c e d im en s i o ns

94 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 95 v o i d s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f , F IL E ∗ f i l e ) ;

96

97 / ∗ W r it e c o u p li n g c o n s t a n t s t o f i l e

98 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e

99 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 100 v o i d s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;

101




102 / ∗ W r it e c la mp in g s t a t e s t o f i l e


104 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 105 v o i d s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;

106107 / ∗ G en e ra t e r an do m s p i n s b as ed on u n if o rm d i s t r i b u t i o n , a c c ou n t in g f o r c la mp ed s p i n s

108 ∗ s p in G la s s ( r ea d ) u se d t o s p e c i f y l a t t i c e d im en s i o ns and c la mp in g s t a t e s

109 ∗ r e tu rn s v ec t or of sp i ns s t or in g l a t t i c e c o n fi gu r a t i on ∗ / 110 S pi n ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

111

112 / ∗ D et er mi ne w he th er s p in g l a s s h as c y c ic v e r t i c a l b ou nd ar y i n t e r a c t i o n s


114 ∗ r e t u r n s TRUE i f c o n d i t i o n p r e s e n t ∗ / 115 g bo ol ea n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

116

117 / ∗ Compare s p i n s t a t e s o f t w o s p i n g l a s s e s

118 ∗ s p in g la s s1 ( r e ad ) s pi n g l as s d a ta s t r u ct u r e

119 ∗ s p in g la s s2 ( r e ad ) s pi n g l as s d a ta s t r u ct u r e120 ∗ r et u r ns minimum number o f d i f f e r i n g s p i n s , c on s i de ri ng s p in Gl as s 1 ’ s

i n v e r s i o n ∗ / 121 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) ;

122

123 # e n d i f / ∗ SPINGLASS H ∗ /



141

1 / ∗2 ∗ F i l e i : s p i n g l as s . c

3 ∗4 ∗ I mp le me nt s s p in g l a s s o p e ra t i on i n t e r f a c e

5 ∗6 ∗ / 7

8 # i n c l u d e < s t d i o . h>

9 # i n c l u d e < s t r i n g . h>

10 # i n c l u d e < g l i b . h>


12


14 # i n c l u d e ” a r r a y s . h ”


16

17 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t x Si ze , g i n t y Si ze , S pi n ∗ i n i t i a l S p i n s , g do ub le

∗ w e i g h t s , g b o o l e a n ∗ c l a m p s ) {18 g i n t i ;

19

20 s t r u c t S p i n G l a s s ∗ s p i n G l a s s = g new ( s t r u c t S p i n G l as s , 1 ) ;

21

22 s p i n G l a s s −> x S i z e = x S i z e ;

23 s p i n G l a s s −> y S i z e = y S i z e ;

24 i f ( x S i z e < 2 | | y S i z e < 2 ) {25 g f p r i n t f ( s t d e r r , ” War nin g : T r ie d t o c o n s t r u c t s pi n g l a s s w it h d i m e ns io ns %d

by %d\ n ” , x S i ze , y S i z e ) ;

26 }27

28 / ∗ A l l o ca t e s p in m a tr ix ∗ / 29 i f ( i n i t i a l S p i n s == NULL) {30 s p i n G l a s s −> s p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;

31 / ∗ A ss ig n d e f a u l t v a lu e s ∗ / 32 f o r ( i =0 ; i < x S i z e ∗ y S iz e ; i ++ ) ( s p i n G l as s −> s p i n s ) [ i ] = UP ;

33 s p i n G l a s s −> i n i t i a l S p i n s = NULL ;

34 } e l s e {35 s p i n G l a s s −> s p i n s = i n i t i a l S p i n s ;

36 / ∗ S e t i n i t i a l s p i n s ∗ / 37 s p i n G l a s s −> i n i t i a l S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;

38 memcpy ( s p i n G l a s s −> i n i t i a l S p i n s , s pi n Gl as s −> s p i n s , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;

39 }40

41 / ∗ A l l o c a t e b on d w e i g h t m a t ri x − s t o r e s v e r t i c a l bo nd s , t h en h o r i z o n ta l b on ds ∗ / 42 i f ( w e i g h t s == NULL ) s p i n G l a s s −> w e i g h t s = g n e w0 ( g d o u b l e , x S i z e ∗ y S i z e ∗2 ) ;43 e l s e s p i n G l a s s −> w e i g h t s = w e i g h t s ;

44

45 s p i n G l a s s −>c l a m p s = c l a m p s ;

46

47 r e t u r n s p i n G l a s s ;

48 }49

50 v o i d s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {51 / ∗ F r e e a l l f i e l d s ∗ / 52 i f ( s p i n G l a s s −> s p i ns != NULL ) g f r e e ( s p i n G l a s s −> s p i n s ) ;




53 i f ( s p i n G l a s s −> i n i t i a l S p i n s != NULL ) g f r e e ( s p i n G l a s s −> i n i t i a l S p i n s ) ;

54 i f ( s p i n G l a s s −> w e ig h ts != NULL ) g f r e e ( s p i n G l a s s −> w e i g h t s ) ;

55 i f ( s p i n G l a s s −>c l am p s != NULL ) g f r e e ( s p i n G l a s s −>clam ps ) ;

56

57 g fr e e ( s pi nG la ss ) ;58 }59

60 g d ou b le s p i n g l a s s r o w e n e rg y ( s t r u c t S p i n G l a s s ∗ s p in G la s s , g i n t row ) {61 g i n t i ;

62 g d o u b l e e n e r g y = 0 ;

63

64 g d o u b l e w e i g h t ; / ∗ B on d w e i g h t ∗ / 65 S p i n s p i n 0 , sp in 1 ; / ∗ N e ig h bo u r s p i n s ∗ / 66

67 g i n t x S i z e = s p i n G l a s s −> x S i z e ;

68 g i n t y S i z e = s p i n G l a s s −> y S i z e ;

69 S p i n ∗ s p i n s = s p i n G l a s s −> s p i n s ;

70 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;71

72 / ∗ I t e r a t e t h ro u gh r ow s p i n s ∗ / 73 f o r ( i =0; i < y S i ze ; i ++) {74 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i ) ;

75

76 / ∗ C a l c u l at e h o r i z o n t a l b on d e n e rg y ∗ / 77 w e i g h t = A r r ay A cc e ss 3 D ( w e i g h t s , y S i z e , x S i ze , r ow , i , 1 ) ;

78 i f ( i <y S i z e −1 ) s p in 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i +1) ;

79 e l s e s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , 0 ) ;

80 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;

81

82 / ∗ S e t e n er g y t o MAXFLOAT , i f s p i n0 s t a t e i s u n p e r m i ss i b l e d ue t o c la mp s t a t e

∗ / 83 i f ( s p i n G l a s s −>c l a mp s != NULL) {84 g b o o l e a n cl am p = A r r a y A c c e s s 2 D ( s p i n G l a s s −>c l am p s , y S i z e , r ow , i ) ;

85 i f ( clamp == TRUE && sp in 0 != A r r a y A c c e s s 2 D ( s p i n G l a s s −> i n i t i a l S p i n s , y Si ze ,

r o w , i ) ) {86 e n e r g y = −G MAXDOUBLE;

87 }88 }89 }90

91 r e t u r n −1 ∗ e n e r g y ;

92 }93

94 g do ub le s p i n g l a ss i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p in G la s s , g i n t row ) {95 g i n t i ;

96 g d o u b l e e n e r g y = 0 ;

97

98 g d o u b l e w e i g h t ;

99 S p i n s p i n 0 , sp in 1 ;

100




104 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;



143

105

106 / ∗ I t e r a t e t h ro u gh row s p in s , a c c um u la t in g e n er g y ∗ / 107 f o r ( i =0; i < y S i ze ; i ++) {108 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i ) ;

109110 / ∗ C a l c u l at e v e r t i c a l b on d e n er g y ∗ / 111 w e i g h t = A r r ay A cc e ss 3 D ( w e i g h t s , y S i z e , x S i ze , r ow , i , 0 ) ;

112 i f ( r o w<x S i z e −1 ) s p i n1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow+1 , i ) ;

113 e l s e s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , 0 , i ) ;


115 }116

117 r e t u r n −1 ∗ e n e r g y ;

118 }119

120 g d ou b le s p i n g l a s s e n s e m b l e d e l t a ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a d i n gS p i n ) {121 g d o u b l e e n e r g y = 0 ;

122123 S p i n sp i n 0 , s p in 1 ;


125 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;

126

127 g i n t row = l e a d i n g S p i n / s p i n G l a s s −> y S i z e ;

128 g in t column = l e a d i n g S p i n % s p i n G l a s s −> y S i z e ;

129



132 g d ou bl e w e i gh t ;

133

134 i f ( r o w > 0 ) {135 / ∗ C a l c u l at e v e r t i c a l c om po ne nt ∗ / 136 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n ) ;

137 s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow −1 , c o l um n ) ;

138 w e i g h t = A r r a y Ac c e s s3 D ( w e i g h t s , y S i z e , x S i z e , r ow −1 , c ol um n , 0 ) ;


140 }141

142 i f ( column > 0 ) {143 / ∗ C a l c u l at e h o r i z o n t a l c om po ne nt ∗ / 144 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n −1 ) ;

145 s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n ) ;

146 w e i g h t = A r r a y Ac c e s s3 D ( w e i g h t s , y S i z e , x S i z e , r ow , c o lu m n −1 , 1 ) ;


148 }149

150 r e t u r n −1 ∗ e n e r g y ;

151 }152

153 g d ou b le s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {154

155 g d o u b l e e n e r g y = 0 ;

156

157 g i n t i ;

158 / ∗ T o t a l e n er g y i s sum o f r ow s ’ e n e r g i e s a n d row i n t e r a c t i o n s ∗ /




159 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {160 e n e r g y += s p i n g l a s s i n t e r r o w e n e r g y ( s p in G l as s , i ) ;

161 e n e r g y += s p i n g l a s s r o w e n e r g y ( s p i n G la s s , i ) ;

162 }

163164 r e t u r n e n e r g y ;

165 }166

167 g d ou b le s p i n g l a s s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f ) {168 g d ou bl e e n er gy ;

169

170 S p i n ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;

171 s p i n G l a s s −> s p i n s = c o n f ;

172 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;

173 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;

174


176 }177

178 v o i d s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {179 g i n t i , j ;

180 S p i n s p i n ;

181

182 / ∗ I t e r a t e t hr ou gh s p i ns and f or ma t o u tp u t ∗ / 183 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {184 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {185 s p i n = A r r a y A c c e s s 2 D ( s p i n G l a s s −> s p i n s , s p i n G l as s −>y S iz e , i , j ) ;

186 i f ( s p i n == UP) {187 g f p r i n t f ( f i l e , ”%s ” , STR SPIN UP ) ;

188 } e l s e {189 g f p r i n t f ( f i l e , ”%s ” , STR SPIN DOWN ) ;190 }191

192 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;

193 }194

195 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;

196 }197 }198

199 v o i d s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f , F IL E ∗ f i l e ) {200 S p i n ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;

201 s p i n G l a s s −> s p i n s = c o n f ;

202 s p i n g l a s s w r i t e s p i n s ( s pi nG la ss , f i l e ) ;203 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;

204 }205

206 v o i d s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {207 g i n t i , j , k ;

208 g d ou bl e w e i gh t ;

209

210 / ∗ I t e r a t e t h ro u gh w e i g ht s a nd f o rm a t o u t p u t ∗ / 211 f o r ( k =0; k < 2 ; k ++) {212 f o r ( i =0 ; i < s p i n G l a s s −> x S i z e ; i ++ ) {



145

213 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {214 w e i g h t = A r r a y A c c e s s 3 D ( s p i n G l a s s −>w e ig h t s , s p i n G l as s −>y S iz e , s p i n G l as s

−>x Si ze , i , j , k ) ;

215 g f p r i n t f ( f i l e , WEIGHT FMT ” ” , w e i g h t ) ;

216 }217

218 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;

219 }220

221 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;

222 }223 }224

225 v o i d s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {226 g i n t i , j ;

227 g bo ol e an cl am p ;

228

229 / ∗ I t e r a t e t h ro u gh c la mp s a nd f o rm a t o u t p u t ∗ / 230 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {231 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {232 cl amp = A r r a y A c c e s s 2 D ( s p i n G l a s s −>c l am p s , s p i n G l a s s −>y S iz e , i , j ) ;

233 i f ( clamp ) {234 g f p r i n t f ( f i l e , ”%s ” , STR CLAMPED ) ;

235 } e l s e {236 g f p r i n t f ( f i l e , ”%s ” , STR UNCLAMPED) ;

237 }238

239 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;

240 }241

242 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;243 }244 }245

246 S pi n ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {247 g i n t t o t a l = s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e ;

248 g i n t i ;

249

250 / ∗ A l l o ca t e s p i ns ∗ / 251 S p i n ∗ s p i n s = g new ( Spin , t o t a l ) ;

252

253 / ∗ A s si g n s p i n v a l u e s ∗ / 254 f o r ( i =0; i < t o t a l ; i ++) {255 i f ( s p i n G l a s s −>c l a mp s != NULL && ( sp in Gl as s −>c l a m p s ) [ i ] ) {256 / ∗ C la mp ed s t a t u s ∗ / 257 s p i n s [ i ] = ( s p i n G l a s s −> s p i n s ) [ i ] ;

258 } e l s e {259 / ∗ A s s ig n r an do m s p i n v a l u e s ∗ / 260 g b o o l e a n ran d om Val = r a n d c o i n t o s s ( ) ;

261 i f ( randomVal == TRUE) {262 s p i n s [ i ] = UP ;

263 } e l s e {264 s p i n s [ i ] = DOWN;

265 }




266 }267 }268

269 r e t u r n s p i n s ;

270 }271

272 g bo ol ea n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {273 g bo ol ea n h a s Ve r t ic al B o un d a r y = FALSE ;

274



277

278 g i n t i ;

279 / ∗ I t e r a t e t hr ou gh s p i ns i n u l t i m a te row , c h ec k in g f o r non− z e ro s p i n v a l u e s ∗ / 280 f o r ( i =0; i < y S i z e && ! h a s V e r t i c a l B o u n d a r y ; i ++ ) {281 g d o u b l e w e i g h t = A r r a y A c c e s s 3 D ( s p i n G l a s s −>w e ig h t s , y S iz e , x S iz e , x S iz e −1 , i ,

0 ) ;

282 i f ( w e i g h t != 0 ) h a s V e r t i c a l B o u n d ar y = TRUE ;283 }284

285 r e t u r n h a s V e r t i c a l B o u n d a r y ;

286 }287

288 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) {289 g i n t i , j , k ;

290 g i n t f i n a l Di s t a n c e = G MAXINT ;

291 g in t d i s t a n c e ;

292

293 f o r ( k =0; k < 2 ; k ++) {294 / ∗ R ep ea t , c om pa ri ng b o th o r i g i n a l a nd i n v e r s e o f s p i nG l a ss 1 ∗ / 295 d i s t a n c e = 0 ;296

297 f o r ( i =0 ; i < s p i n G l a s s 1 −> x S iz e ; i ++ ) {298 f o r ( j =0 ; j < s p i n G l a s s 1 −> y S i z e ; j ++ ) {299 S p i n s p i n 1 = A r r a y A c c e s s 2 D ( s p i n G l a s s 1 −> s p i n s , s p i n G la s s 1 −>y Si ze , i , j )

;

300 S p i n s p i n 2 = A r r a y A c c e s s 2 D ( s p i n G l a s s 2 −> s p i n s , s p i n G la s s 2 −>y Si ze , i , j )

;

301 i f ( k == 0 ) {302 i f ( s p i n 1 != s p i n2 ) d i s t a n c e ++ ;

303 } e l s e {304 i f ( s p i n 1 == s p i n2 ) d i s t a n c e ++ ;

305 }306 }307 }308 i f ( d i s t a n c e < f i n a l D i s t a n ce ) f i n a l D i s t an c e = d i s t a n c e ;

309 }310

311 r e t u r n f i n a l D i s t a n c e ;

312 }



147

1 / ∗2 ∗ F il e : i o . h

3 ∗4 ∗ S p e c i f i e s IO o p e r a ti o n i n t e r f a c e

5 ∗6 ∗ / 7


9

10 # i f n d e f IO H

11 # d e f i n e IO H

12

13 / ∗ For f i l e i n p ut r o u t i n es u s i n g f g e t s ( ) ∗ / 14 # d e f i n e MAX LINE LEN 100 000

15

16 / ∗ Rea d s p i n c o n f i g u r a t i o n f ro m f i l e

17 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g

18 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n19 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n

20 ∗ r e t u rn s v e c to r o f s pi ns , s t or e d i n r ow m ajo r o r de r ∗ / 21 S pi n ∗ r e a d s p i n s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;

22

23 / ∗ R ea d s p i n c l am p in g s t a t e f ro m f i l e

24 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g

25 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n

26 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n

27 ∗ r e t u rn s v e c to r o f s p in c la mp s t a t es , s t o re d i n r ow m aj or o r d er ∗ / 28 g b oo l ea n ∗ r e a d c l a m p s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;

29

30 / ∗ R ea d s p i n b on d c o n f i g u r a t i o n f ro m f i l e

31 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g32 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n

33 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n

34 ∗ r e t u rn s v e c to r o f s p in bo nd s , s t o re d i n r ow ma jo r o r de r

35 ∗ d at a f o r v e r t i c a l b on ds p r e ce de t h os e f o r h o r iz o n al b on ds ∗ / 36 g d ou b le ∗ r e a d w e i g h t s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;

37

38

39 / ∗ W r it e s p i n c o n f i g u r a t i o n t o f i l e

40 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a

41 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 42 v o i d w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;

43

44 / ∗ W r it e s p i n c la mp i ng s t a t e t o f i l e45 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a

46 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 47 v o i d w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;

48

49 / ∗ W r it e s p i n b on d c o n f i g u r a t i o n t o f i l e

50 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a

51 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 52 v o i d w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;

53

54 # e n d i f / ∗ IO H ∗ /




1 / ∗2 ∗ F i l e : i o . c

3 ∗4 ∗ I m pl e me n ts IO o p e r a t i o n s s p e c i f i e d i n i o . h

5 ∗6 ∗ / 7

8 # i n c l u d e < s t d l i b . h>

9 # i n c l u d e < s t d i o . h>

10 # i n c l u d e < s t r i n g . h>

11 # i n c l u d e < g l i b . h>


13


15 # i n c l u d e ” io . h”

16

17 / ∗ P ar se s a f i l e , a dd in g t o ke n s t o a q ue ue

18 ∗ f i l e Na m e ( r e ad ) f i l e name t o r e ad f ro m19 ∗ x S i ze ( w r i te ) number o f t ok e n r o ws c o nt a in e d i n t h e f i l e

20 ∗ y S i ze ( w r i te ) number o f t ok e n c ol um ns c o nt a in e d on t h e f i l e

21 ∗ r e t u rn s q ue ue c o n ta i n in g p ar se d t o ke n s ∗ / 22 s t a t i c GQueue ∗ p a r s e f i l e ( g c h a r ∗ f il e Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {23 g i n t nRows = 0 ;

24 g i n t n C o l s = 0 ;

25 g i n t n Co lC he ck = 0 ;

26

27 GQueue ∗ t o k e n Q u e u e = g q u e u e n e w ( ) ;

28

29 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” r ” ) ;

30 g c ha r l i n e [ MAX LINE LEN+1 ] ;

31 i f ( f i l e != NULL) {32

33 / ∗ Read l i n e s u n t i l en d o f f i l e , p r oc e ss i f non z er o l e n gt h ∗ / 34 w h i l e (NULL != f g e t s ( l i n e , MAX LINE LEN , f i l e ) ) {35 i f ( s t r l e n ( l i n e ) > 0 && l i n e [ 0 ] != ’ \n ’ ) {36 g c h a r ∗ t o k e n ;

37 nRows++ ;

38

39 n C ol Ch ec k = 0 ;

40 / ∗ T o k en i se l i n e s ∗ / 41 t o k e n = s t r t o k ( l i n e , ” \ t \n” ) ;

42 w h i l e ( t o k e n != NULL) {43 g c h a r ∗ tokenMem = g m a l l o c ( s t r l e n ( t o k e n ) + 1 ) ;

44 s t r c p y ( tokenMem , t o k e n ) ;45

46 n C ol Ch eck ++ ;

47

48 / ∗ Add t o k e n t o q ue u e∗ / 49 g q u e u e p u s h t a i l ( t o k e n Qu e u e , tokenMem ) ;

50 t o k e n = s t r t o k ( NULL , ” \ t \n” ) ;

51 }52

53 / ∗ C he ck f o r m a t ch i ng ro w l e n g t h s ∗ / 54 i f ( n C o l s == 0 ) n C ol s = nColCheck ;



149

55 i f ( n C o l C h e c k != n C o l s ) {56 g f p r i n t f ( s t d e r r , ” E r r o r : The i n p u t d a t a m a t r i x d o e s n o t c o n t a i n

r ow s o f e q u al l e n g t h s . \ n” ) ;

57 e x i t ( −1 ) ;

58 }59 }60 }61 } e l s e {62 g f p r i n t f ( s t d e r r , ” An e r r o r o cc ur re d w hi le o p en i ng t he f i l e %s . \ n ” , f i l e N a m e ) ;

63 e x i t ( −1 ) ;

64 }65

66 f c lo s e ( f i l e ) ;

67

68 ∗ x S i z e = nRows ;

69 ∗ y S i z e = n C o l s ;

70

71 r e t u r n t o k e n Q u e u e ;72 }73

74 S pi n ∗ r e a d s p i n s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {75 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 76 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l eN a m e , x S i ze , y S i z e ) ;

77 S p i n ∗ s p i n s = g n e w ( S p i n , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;

78

79 i n t i =0 ;

80

81 / ∗ P r o ce s s t o k e n s ∗ / 82 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {83 / ∗ G et t o k e n ∗ / 84 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;85

86 / ∗ C he ck w h et h er s t r i n g s a ss um e e x p e ct e d v a l u e s ∗ / 87 i f ( s t r c m p ( t o k e n , S TR S P IN U P ) ==0) {88 s p i n s [ i ] = UP ;

89 } e l s e i f ( st rc mp ( tok en , STR SPIN DOWN) ==0) {90 s p i n s [ i ] = DOWN;

91 } e l s e {92 g f p r i n t f ( s td er r , ” E r r o r : Un r e c o g n i s e d sp i n da t a . \ n” ) ;

93 e x i t ( −1 ) ;

94 }95

96 g f r e e ( t o k e n ) ;

97 i ++ ;98 }99

100 g q u eu e f re e ( t ok en Qu eu e ) ;

101 r e t u r n s p i n s ;

102 }103

104

105 v o i d w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {106 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 107




108 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;

109

110 i f ( f i l e != NULL) {111 s p i n g l a s s w r i t e s p i n s ( s p in Gl as s , f i l e ) ;

112 } e l s e {113 g f p r i n t f ( s t de r r , ”An e r r o r o c c ur r e d w hi le o pe ni ng t he f i l e %s . ” , f il eN am e ) ;

114 }115

116 f c l os e ( f i l e ) ;

117 }118

119 g b o o le a n ∗ r e a d c l a m p s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {120 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 121 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l eN a m e , x S i ze , y S i z e ) ;

122 g b o o l e a n ∗ c l a m p s = g n e w ( g b o o l e a n , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;

123

124 i n t i =0;

125126 / ∗ P r o ce s s t o k e n s ∗ / 127 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {128 / ∗ G et t o k e n ∗ / 129 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;

130

131 / ∗ C he ck w h et h er s t r i n g s a ss um e e x p e ct e d v a l u e s ∗ / 132 i f ( str cm p ( to ken , STR CLAMPED) ==0) {133 c l a m p s [ i ] = TRUE ;

134 } e l s e i f ( st rc mp ( to ke n , STR UNCLAMPED) ==0) {135 c l a m p s [ i ] = FALSE ;

136 } e l s e {137 g f p r i n t f ( s td er r , ” E r r o r : Un r e c o g n i s e d sp i n d a t a . \ n” ) ;

138 e x i t ( −1 ) ;139 }140

141 g f r e e ( t o k e n ) ;

142 i ++ ;

143 }144


146 r e t u r n c l a m p s ;

147 }148

149 v o i d w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {150 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 151 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;152

153 i f ( f i l e != NULL) {154 s p i n g l as s w r i t e c la m p s ( s pi nG la ss , f i l e ) ;


157 }158

159 f c l os e ( f i l e ) ;

160 }161



151

162 g d o ub l e ∗ r e a d w e i g h t s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {163 g i n t nRows , n Co ls ;

164 g i n t i =0 ;

165

166 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 167 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l e N a m e , &n Ro ws , & n C o l s ) ;

168 g d o u b l e ∗ w e i g h t s = g n e w ( g d o u b l e , ( n R ow s∗ nCo ls ) ) ;

169

170 / ∗ A cc ou nt f o r v e r t i c a l and h o r i z o n ta l w e ig h ts s t o re d i n f i l e ∗ / 171 ∗ x S i z e = nRows / 2 ;

172 ∗ y S i z e = n C o l s ;

173

174 / ∗ S i mp l e c h ec k f o r m at ch i ng v e r t i c a l / h o r i z o n t a l b on d n u mb er s ∗ / 175 i f ( n R o w s % 2 == 1 ) {176 g f p r i n t f ( s t de r r , ” Odd number o f d at a r ows d e te c te d when r e a di n g bond f i l e .

S h o ul d b e e v en . \ n” ) ;

177 e x i t ( −1 ) ;

178 }179

180 / ∗ P r o ce s s t o k e n s ∗ / 181 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {182 / ∗ G et t o k e n ∗ / 183 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;

184 g d o u b l e w e i g h t V a l = 0 ;

185

186 / ∗ C o n ve r t t o d o u b le ∗ / 187 i f ( s s c a n f ( t o k e n , WEIGHT FMT , & w e i g h t V a l ) != 1 ) {188 g f p r i n t f ( s td er r , ” E r r o r : U n r e c o g n i s e d bond da t a . \ n” ) ;

189 e x i t ( −1 ) ;

190 }191192 w e i g h t s [ i ++ ] = w e i g h t V a l ;

193

194 g f r e e ( t o k e n ) ;

195 }196


198 r e t u r n w e i g h t s ;

199 }200

201 v o i d w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {202 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 203 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;

204205 i f ( f i l e != NULL) {206 s p in g l a s s w r i t e w e ig h t s ( s pi nG la ss , f i l e ) ;


209 }210

211 f c l os e ( f i l e ) ;

212 }




1 / ∗2 ∗ F i l e : a r ra y s . h

3 ∗4 ∗ S p e c i f i e s a r r ay o p er a ti o n i n t e r f a c e

5 ∗ an d d e f i n e s m ac ro s f o r a r ra y o p e r a t i o n s6 ∗7 ∗ / 8

9 # i n c l u d e < g l i b . h>

10

11 # i f n d e f ARRAYS H

12 # d e f i n e ARRAYS H

13

14 / ∗ E m u l a t es t wo −d i m e n s i o n a l a r r a y a c c e s s

15 ∗ a r r a y p o i n t er t o d a t a

16 ∗ i , j a r r a y i n d i c e s ∗ / 17 # d e f i n e A r ra y Ac c es s 2D ( a r r a y , r o w l e n g t h , i , j ) ( ( a r r a y ) [ ( i ) ∗ ( r o w l e n g t h ) + ( j ) ] )

1819 / ∗ E m u la t es t h r e e −d i m en s i on a l a r ra y a c c es s

20 ∗ a r r a y p o i n t er t o d a t a

21 ∗ i , j , k a rr ay i n d i ce s ∗ / 22 # d e f i n e A r ra y Ac c es s 3D ( a r r a y , r o w l e n g t h , c o l u m n l en g t h , i , j , k ) ( ( a r r a y ) [ ( c o l u m n l e n g t h )

∗ ( r o w l e n g t h ) ∗ ( k ) + ( i ) ∗ r o w l e n g t h + ( j ) ] )

23

24

25 / ∗ A rr ay d a ta t y p e s ∗ / 26 t y p e d e f g u i n t 6 4 t i n t ;

27 t y p e d e f g d ou b le t d o u b l e ;

28

29 / ∗ C o n s t r u c t t wo −d i m en s i on a l a r ra y . Da ta c o n t i g u i t y i s e n su r ed

30 ∗ nRows number o f r o ws31 ∗ n Co ls number o f c ol um ns

32 ∗ r e tu r ns p o i n t er t o a l l o c a te d d a t a ∗ / 33 t i n t ∗∗ a r r a y n e w 2 D ( t i n t n Ro ws , t i n t n C ol u mn s ) ;

34

35 / ∗ D e s t r u c t t wo −d i m en s i o na l a r ra y p r e v i o u s l y a l l o c a t e d w i t h a rr ay ne w 2D ( )

36 ∗ a r r a y t h e a r r a y to d e s t r u c t ∗ / 37 v o i d a r r a y f r e e 2 D ( t i n t ∗∗ a r r a y ) ;

38

39 / ∗ C o n s t r u c t t wo −d i m en s i on a l a r ra y . Da ta c o n t i g u i t y i s e n su r ed

40 ∗ nRows number o f r o ws

41 ∗ n Co ls number o f c ol um ns

42 ∗ nZ s i z e o f t h ir d d i m e n s i o n

43 ∗ r e tu r ns p o i n t er t o a l l o c a te d d a t a ∗ / 44 t d o u b l e ∗∗∗ a r r a y n e w 3 D ( t i n t nZ , t i n t n Ro ws , t i n t n C ol u mn s ) ;

45

46

47 / ∗ D e s t ru c t t h re e −d i m en s i on a l a r ra y p r e v i o u s l y a l l o c a t e d w i th a rr ay n ew 3 D ( )

48 ∗ a r r a y t h e a r r a y to d e s t r u c t ∗ / 49 v o i d a r r a y f r e e 3 D ( t d o u b l e ∗∗ ∗ a r r a y ) ;

50

51 i n t a r r a y u t e s t ( v o i d ) ;

52

53 # e n d i f / ∗ ARRAYS H ∗ /



153

1 / ∗2 ∗ F i l e : a r ra y s . c

3 ∗4 ∗ I m pl e me n ts a r ra y o p e r a ti o n i n t e r f a c e s p e c i f i e d i n a r r ay s . h

5 ∗6 ∗ / 7

8 # i n c l u d e < g l i b . h>

9 # i n c l u d e < s t d i o . h>

10

11 # i n c l u d e ” a r r a y s . h ”

12

13 t i n t ∗∗ a r r a y n e w 2 D ( t i n t n Ro ws , t i n t n C ol u mn s ) {14 g i n t i ;

15

16 / ∗ A l l o ca t e p o i n t er b l oc k ∗ / 17 t i n t ∗∗ a r r a y = g m a l l o c ( n R ow s ∗ s i z e o f ( t i n t ∗ ) ) ;

18 / ∗ A l l o c a t e d a ta b l o ck ∗ / 19 a r r a y [ 0 ] = g m a l l o c ( n R ow s ∗ nColumns ∗ s i z e o f ( t i n t ) ) ;

20 / ∗ A s s i g n d a t a o f f s e t s ∗ / 21 f o r ( i =1 ; i <n Ro ws ; i ++) a r r a y [ i ] = a r r a y [ 0 ] + i ∗ nColumns ;

22

23 r e t u r n a r r a y ;

24 }25

26 v o i d a r r a y f r e e 2 D ( t i n t ∗∗ a r r a y ) {27 g f r e e ( a rr ay [ 0 ] ) ;

28 g fr e e ( a rr ay ) ;

29 }30

31 t d o u b l e ∗∗∗ a r r a y n e w 3 D ( t i n t nZ , t i n t n Ro ws , t i n t n C ol u mn s ) {32 g i n t i ;

33

34 / ∗ A l l o ca t e p o i n t er b l oc k ∗ / 35 t d o u b l e ∗∗∗ a r r a y = g m a l l o c ( n Z ∗ s i z e o f ( t d o u b l e ∗∗ ) ) ;

36 / ∗ A l l o c a t e s e co n ds p o i n t e r b l o ck ∗ / 37 a r r a y [ 0 ] = g m a l l o c ( n Z ∗ nRows ∗ s i z e o f ( t d o u b l e ∗ ) ) ;

38 / ∗ A l l o c a t e d a ta b l o ck ∗ / 39 a r r a y [ 0 ] [ 0 ] = g m a l l o c ( n Z ∗ nRows ∗ nColumns ∗ s i z e o f ( t d o u b l e ) ) ;

40

41 / ∗ A s si g n d a ta b l o ck ∗ / 42 f o r ( i =0; i <nZ ; i ++ ) a r r a y [ i ] = a r r a y [ 0 ] + nRows∗ i ;

43 / ∗ A s si g n d a ta b l o ck ∗ / 44 f o r ( i =0 ; i <nZ ∗ n Ro ws ; i ++) ( ∗ a r r a y ) [ i ] = ( ∗ a r r a y ) [ 0 ] + i ∗ nColumns ;45

46 r e t u r n a r r a y ;

47 }48

49 v o i d a r r a y f r e e 3 D ( t d o u b l e ∗∗ ∗ a r r a y ) {50

51 g f r e e ( a rr ay [ 0 ] [ 0 ] ) ;

52 g f r e e ( a rr ay [ 0 ] ) ;

53 g fr e e ( a rr ay ) ;

54 }




55

56 i n t a r r a y u t e s t ( v o i d ) {57

58 g in t i , j , k ;

59 t i n t ∗∗ a r r a y = a r r a y n e w 2 D ( 1 0 , 1 0 ) ;60 t d o u b l e ∗∗∗ a r r a y 2 = a r r a y n e w 3 D ( 5 , 3 2 , 3 2 ) ;

61

62 f o r ( i =0 ; i < 1 0; i ++ ) {63 f o r ( j =0 ; j < 1 0; j ++ ) a r r a y [ i ] [ j ] = i ∗ 10 + j ;

64 }65 f o r ( i =0 ; i < 1 0; i ++ ) {66 f o r ( j =0 ; j < 1 0; j ++ ) g a s s e r t ( a r r a y [ i ] [ j ] == i ∗10+ j ) ;

67 }68

69 a r r a y f r e e 2 D ( a r r a y ) ;

70

71 f o r ( i =0; i < 5 ; i ++) {

72 f o r ( j =0 ; j < 3 2; j ++ ) {73 f o r ( k =0 ; k < 3 2; k ++) {74 a r r a y 2 [ i ] [ k ] [ j ] = i ∗ 1024 + k ∗ 32 + j ;

75 g a s s e r t ( a r r a y 2 [ i ] [ k ] [ j ] == i ∗ 1024 + k ∗ 32 + j ) ;

76 }77 }78 }79

80 f o r ( i =0 ; i < 5 ; i ++) {81 f o r ( j =0 ; j < 3 2; j ++ ) {82 f o r ( k =0 ; k < 3 2; k ++) {83 g a s s e r t ( a r r a y 2 [ i ] [ k ] [ j ] == i ∗ 1024 + k ∗ 32 + j ) ;

84 }85 }86 }87

88 a r r a y fr e e 3D ( a rr ay 2 ) ;

89

90 r e t u r n 0 ;

91 }



155

1 / ∗2 ∗ F i l e : r an do m . h

3 ∗4 ∗ D e f in e s i n t e r f a c e f o r ra ndo m n um be r g e n e r at i o n

5 ∗6 ∗ / 7

8 # i n c l u d e < g l i b . h>

9

10 / ∗ G e n er a te c o n t i n u o u s l y d i s t r i b u t e d r an do m d o u b le i n t h e r a ng e [ l ow e r , u p pe r )

11 ∗ l o w e r l o w e r l i m i t

12 ∗ u p p e r u p p e r l i m i t ∗ / 13 g d o ub l e r a n d c o n t i n u o u s ( g d o u bl e l ow er , g d o ub l e u p p er ) ;

14

15 / ∗ G en e ra t e e q u a l l y d i s t r i b u t e d ran dom b o ol e an

16 ∗ / 17 g b oo l ea n r a n d c o i n t o s s ( ) ;




1 / ∗2 ∗ F i l e : r an do m . c

3 ∗4 ∗ I m p le m e nt s i n t e r f a c e f o r r an do m nu mb er g e n e r a t i o n

5 ∗6 ∗ / 7

8 # i n c l u d e < s t d i o . h>

9 # i n c l u d e < g l i b . h>


11

12 g d o ub l e r a n d c o n t i n u o u s ( g d o u bl e l ow er , g d o ub l e u p p er ) {13 r e t u r n g r a n d o m d o u b l e r a n g e ( l o w er , u p p e r ) ;

14 }15

16 g bo ol ea n r a n d c o i n t o s s ( ) {17 g b o o l e a n v a l u e = g r a n d o m b o o l e a n ( ) ;

18 r e t u r n v a l u e ;19 }



157

1 / ∗2 ∗ F il e : b f o r c e g s t a t e f i n d e r . c

3 ∗4 ∗ I m pl e me n ts b r u t e f o r c e g ro un d s t a t e f i n d e r

5 ∗6 ∗ / 7

8 # i n c l u d e < g l i b . h>


10 # i n c l u d e < s t d i o . h>



13

14 s t a t i c v oi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l ea d in g Sp i n , g do ub le ∗ minEnergy ,

s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

15

16 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {

17 g i n t n S p i n s = s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e ;18 g d ou bl e m in E n er g y = G MAXDOUBLE;

19

20 / ∗ I n i t i a t e b r u t e f o r c e e v a l u a t i o n ∗ / 21 f i n d g r o u n d s t a t e s b r u t e f o r c e ( n Sp in s , &minE nerg y , s p in G la s s ) ;

22

23 r e t u r n minEnergy ;

24 }25

26 / ∗ R e cu r si v e b r ut e f o r ce g ro un d s t a t e e v a l ua t i o n

27 ∗ l e a di n g Sp i n s p in ‘ window ’ p o s it i o n , u se d t o s p e c i f y s t a t e t o be f l i p p e d . Used t o

e v a l u a t e b as e c a se

28 ∗ m in En er gy ( r ea d / w r i t e ) R ec or ds c u r r e n t minimum e n e rg y . Fo r ea ch i n v o c a t i o n o f

t h e f u n ct i on , s t a t e s a re o u tp u t i f t h e i r e ne rg y i s l ow er t ha n t h e v a l u ec u r r e n t ly h el d b y t h i s v a r i a bl e

29 ∗ s pi nG la ss ( r e a d / w r i te ) s p in g l a s s d a ta s t r u c t u r e who se s p i n s a re m a ni pu la te d

d u r i n g s e a r ch ∗ / 30 s t a t i c v oi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l ea d in g Sp i n , g do ub le ∗ minEnergy ,

s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {31 / ∗ B a se c a s e ∗ / 32 i f ( l e a d i n g S p i n == 0 ) {33 / ∗ C o mp u te e n e r g y ∗ / 34 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;

35

36 i f ( e n e r g y < ∗ minEnerg y ) {37 ∗ m i n E n e r g y = e n e r g y ;

38 }39

40 i f ( e n e r g y == ∗ minEnergy ) {41 g p r i n t f ( ” \ n L e a f n o de w i t h e n e r g y %E\ n ” , e n e r g y ) ;

42 g p r i n t f ( ” I s c u r r e n t g r o u n d s t a t e \n” ) ;

43 s p i n g l a ss w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;

44 }45

46 } e l s e {47 / ∗ C o mp u te e n e r g y ∗ / 48 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , m in E ne r gy , s p i n G l a s s ) ;




49 / ∗ F l i p s p i n down ∗ / 50 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −1] ∗= DOWN;

51 / ∗ C o mp u te e n e r g y ∗ / 52 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , m in E ne r gy , s p i n G l a s s ) ;

53 }54 }



159

1 / ∗2 ∗ F i l e : g s t a t e f i n d e r . h

3 ∗4 ∗ S p e c i f i e s i n t e r f a c e f o r g r o u n d s t a t e s o l v e r s

5 ∗ / 6


8

9 # i f n d e f GSTATEFINDER H

10 # d e f i n e GSTATEFINDER H

11

12 / ∗ D et er mi ne g ro un d s t a t e s o f s p i n g l a s s

13 ∗ s pi n G la s s ( r e ad ) t he s p i n g l a s s t o e v a l u a te ∗ / 14 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;

15

16 # e n d i f / ∗ GSTATEFINDER H ∗ /



Bibliography

[1] The GLib library. http: // library.gnome.org / devel / glib / , 2008. Accessed 2 July, 2008.

[2] The Ness user guide. http: // www2.epcc.ed.ac.uk / ness / documentation / index.html, 2008.

Accessed 2 July, 2008.

[3] User’s guide to the HPCx service.

http: // www.hpcx.ac.uk / support / documentation / UserGuide / HPCxuser / HPCxuser.html,

2008. Accessed 2 July, 2008.

[4] D.J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.

Physical Review A, 32(2):1007–1018, 1985.

[5] D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation using

the transputer network architecture. Advances in genetic programming: volume 2 table of

contents, pages 317–337, 1996.

[6] F. Barahona. On the computational complexity of Ising spin glass models. J. Phys. A:

Math. Gen, 15(10):3241–3253, 1982.

[7] F. Barahona, M. Grotschel, M. Junger, and G. Reinelt. An application of combinato-

rial optimization to statistical physics and circuit layout design. Operations Research,

36(3):493–513, 1988.

[8] R.J. Baxter. Exactly solved models in statistical mechanics. Academic Press, London;

Tokyo, 1982.

[9] R. Bellman. Dynamic Programming. Science, 153(3731):34–37, 1966.

[10] I. Bieche, R. Maynard, R. Rammal, and JP Uhry. On the ground states of the frustration

model of a spin glass by a matching method of graph theory. J. Phys. A: Math. Gen,

13:2553–2576, 1980.

[11] S.G. Brush. History of the Lenz-Ising Model. Rev. Mod. Phys., 39(4):883–893, Oct 1967.

161



162 BIBLIOGRAPHY

[12] M. Campanino, E. Olivieri, and A.C.D. van Enter. One dimensional spin glasses with po-

tential decay 1/r 1+g. Absence of phase transitions and cluster properties. Communications

in Mathematical Physics, 108(2):241–255, 1987.

[13] Lynn Elliot Cannon. A cellular computer to implement the kalman filter algorithm. PhD

thesis, Montana State University, Bozeman, MT, USA, 1969.

[14] E. Cantu-Paz. A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux

et Systems Repartis, 10(2):141–171, 1998.

[15] A. Carter. Finite-size scaling studies of Ising spin glasses. PhD thesis, Department of

Physics and Astronomy, University of Manchester, 2003.

[16] B.A. Cipra. The Ising Model Is NP-Complete. SIAM News, 33(6), 2000.

[17] D. de Fontaine and J. Kulik. Application of the ANNNI model to long-period superstruc-

tures. ACTA METALLURG., 33(2):145–165, 1985.

[18] J. Dıaz, A. Gibbons, G.E. Pantziou, M.J. Serna, P.G. Spirakis, and J. Toran. Parallel

algorithms for the minimum cut and the minimum length tree layout problems. Theoretical

Computer Science, 181(2):267–287, 1997.

[19] H.Q. Ding. Monte Carlo simulations of Quantum systems on massively parallel computers.

Proceedings of the 1993 ACM / IEEE conference on Supercomputing, pages 34–43, 1993.

[20] P.A.M. Dirac. On the Theory of Quantum Mechanics. Proceedings of the Royal Society

of London. Series A, Containing Papers of a Mathematical and Physical Character (1905-

1934), 112(762):661–677, 1926.

[21] B. Drossel and MA Moore. The ± J spin glass in Migdal-Kadanoff approximation. The

European Physical Journal B Condensed Matter , 2001.

[22] S.F. Edwards and P.W. Anderson. Theory of spin glasses. Journal of Physics F: Metal

Physics, 5(5):965–974, 1975.

[23] A.N. Ermilov, A.N. Kireev, and A.M. Kurbatov. Investigation of models of spin glass with

arbitrary distributions of the coupling constants. Theoretical and Mathematical Physics,

49(3):1071–1076, December 1981.

[24] Chochia et al. IBM High Performance Switch on System p5 575 Server - Performance.

http: // www-03.ibm.com / systems / p / hardware / whitepapers / 575 hpc perf.html, 2008. Ac-

cessed 2 July, 2008.



BIBLIOGRAPHY 163

[25] R. Forsati, M. Mahdavi, M. Kangavari, and B. Safarkhani. Web page clustering using Har-

mony Search optimization. Electrical and Computer Engineering, 2008. CCECE 2008.

Canadian Conference on, pages 001601–001604, 2008.

[26] M. Gabay and G. Toulouse. Coexistence of Spin-Glass and Ferromagnetic Orderings.

Physical Review Letters, 47(3):201–204, 1981.

[27] Z.W. Geem, J.H. Kim, et al. A New Heuristic Optimization Algorithm: Harmony Search.

SIMULATION , 76(2):60, 2001.

[28] F. Glover and G.A. Kochenberger. Handbook of Metaheuristics. Springer, 2003.

[29] C.D. Godsil, M. Grotschel, and D.J.A. Welsh. Combinatorics in statistical physics. Hand-

book of combinatorics (vol. 2) table of contents, pages 1925–1954, 1996.

[30] A. Grama, V. Kumar, A. Gupta, and G. Karypis. Introduction to Parallel Computing:

Design and Analysis of Algorithms. Addison-Wesley, 2003.

[31] D.J. Griffiths. Introduction to Quantum Mechanics. Prentice Hall, 1995.

[32] U. Gropengiesser. The ground state energy of the ± J spin glass. A comparison of vari-

ous biologically motivated algorithms. Journal of Statistical Physics, 79(5-6):1005–1012,

1995.

[33] M.F. Guest. Communications Benchmarks on High-End and Commodity-Class Com-

puters. http: // www.cse.scitech.ac.uk / disco / Benchmarks / pmb.2004 / index.htm, 2008. Ac-

cessed 2 July, 2008.

[34] F. Hadlock. Finding a maximum cut of a planar graph in polynomial time. SIAM Journal

on Computing, 4(3):221–225, 1975.

[35] R.W. Hamming. Error Detecting and Error Correcting Codes. Computer Arithmetic, II ,

29(2):147–160, 1990.

[36] A.K. Hartmann. Scaling of stiff ness energy for three-dimensional ± J Ising spin glasses.

Physical Review E , 59(1):84–87, 1999.

[37] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applica-

tions. Biometrika, 57(1):97–109, 1970.

[38] W. Heisenberg. Mehrk orperproblem und Resonanz in der Quantenmechanik. Zeitschrift

f¨ ur Physik , 38(6):411–426, 1926.

[39] P.C. Hemmer, H. Holden, and S.K. Ratkje. The collected works of Lars Onsager: with

commentary. World Scientific, Singapore; River Edge, NJ, 1996.



164 BIBLIOGRAPHY

[40] G. Hempel, G. Blaschke, and KF Pal. The ground state energy of the Edwards-Anderson

Ising spin glass with a hybrid genetic algorithm. Physica A, 223(3):283–292, 1996.

[41] J. Houdayer and O.C. Martin. Hierarchical approach for computing spin glass ground

states. Physical Review E , 64(5):56704, 2001.

[42] H. Kawamura. Chiral ordering in Heisenberg spin glasses in two and three dimensions.

Physical Review Letters, 68(25):3785–3788, 1992.

[43] J.H. Kim, Z.W. Geem, and E.S. Kim. Parameter estimation of the nonlinear Muskingum

model using harmony search. Journal of the American Water Resources Association,

37(5):1131–1138, 2001.

[44] S. Kirkpatrick, CD Gelati Jr, and MP Vecchi. Optimization by Simulated Annealing.

Biology and Computation: A Physicist’s Choice, 1994.

[45] K.S. Lee and Z.W. Geem. A new structural optimization method based on the harmony

search algorithm. Computers and Structures, 82(9-10):781–798, 2004.

[46] F. Liers, M. Junger, G. Reinelt, and G. Rinaldi. Computing Exact Ground States of Hard

Ising Spin Glass Problems by Branch-and-Cut. New Optimization Algorithms in Physics,

June 2005.

[47] B.M. McCoy and T.T. Wu. The two-dimensional Ising model. Harvard University Press,

Cambridge, Mass., 1973.

[48] S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Springer-Verlag

London, 1993.

[49] M. Mezard, G. Parisi, and M.A. Virasoro. Spin glass theory and beyond . World Scientific

Teaneck, NJ, USA, 1987.

[50] T.M. Mitchell. Machine learning. McGraw-Hill, 1997.

[51] D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli. Convergence and Finite-Time Be-havior of Simulated Annealing. Advances in Applied Probability, 18(3):747–771, 1986.

[52] C.M. Newman and D.L. Stein. Blocking and Persistence in the Zero-Temperature Dynam-

ics of Homogeneous and Disordered Ising Models. Physical Review Letters, 82(20):3944–

3947, 1999.

[53] G. Pardella and F. Liers. Exact Ground States of Huge Two-Dimensional Planar Ising Spin

Glasses. Arxiv preprint arXiv:0801.3143, 2008.



BIBLIOGRAPHY 165

[54] G. Parisi. Infinite Number of Order Parameters for Spin-Glasses. Physical Review Letters,

43(23):1754–1756, 1979.

[55] D.J. Ram, TH Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algo-

rithms. Journal of Parallel and Distributed Computing, 37(2):207–212, 1996.

[56] J. Randa. Axial next-nearest-neighbor Ising (ANNNI) and extended-ANNNI models in

external fields. Physical Review Letters, 32(1):413–416, 1985.

[57] W. Selke. The ANNNI model-Theoretical analysis and experimental application. Physics

Reports, 170(4):213–264, 1988.

[58] D. Sherrington and S. Kirkpatrick. Solvable Model of a Spin-Glass. Physical Review

Letters, 35(26):1792–1796, 1975.

[59] P. Sutton, DL Hunter, and N. Jan. Short Communication The ground state energy of the

± J spin glass from the genetic algorithm. J. Phys. I France, 4:1281–1285, 1994.

[60] D.J. Thouless, P.W. Anderson, and R.G. Palmer. Solution of ’Solvable model of a spin

l ’ Phil hi l M i 35(3) 593 601 1977

Peter Alexander Foster

Documents

Transcript of Peter Alexander Foster