Peter Alexander Foster
Transcript of Peter Alexander Foster
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 1/183
Parallel Combinatorial Optimisation for Finding Ground
States of Ising Spin Glasses
Peter Alexander Foster
MSc in High Performance Computing
The University of Edinburgh
Year of Presentation: 2008
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 2/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 3/183
To my Parents
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 4/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 5/183
Abstract
This dissertation deals with the Ising spin glass ground state problem. An exact approach to
this optimisation problem is described, based on combining the Markov chain framework with
dynamic programming. Resulting algorithms allow ground states of the aperiodic k 2-spin lattice
to be computed in O k 22k time, which is subsequently improved to O k 2 2k , thus resembling
transfer matrix approaches. Based on parallel matrix / vector multiplication, cost optimal parallel
algorithms for the message passing architecture are described, using collective or alternatively
cyclic communications. In addition, a parallel realisation of the Harmony Search heuristic is
described. The implementation of both exact and heuristic approaches using MPI is detailed, as
is an application framework, which allows spin glass problems to be generated and solved.
Dynamic programming codes are evaluated on a small-scale AMD Opteron based SMP
system and a large-scale IBM P575 based cluster, HPCx. On both systems, parallel efficiencies
above 90% are obtained on 16 and 256 processors, respectively, when executing the Ok 22k
algorithm on problem sizes
≥142 spins. For the improved algorithm, while computationally
less expensive, scalability is considerably diminished. Results for the parallel heuristic approach
suggest marginal improvements in solution accuracy over serial Harmony Search, under certain
conditions. However, the examined optimisation problem appears to be a challenge to obtaining
near-optimum solutions, using this heuristic.
i
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 6/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 7/183
Acknowledgements
I sincerely thank my project supervisor, Dr. Adam Carter for guidance throughout the project,
and for commenting on this dissertation prior to submitting it.
In addition, I am grateful for funding awarded by the Engineering and Physical Sciences Re-
search Council.
iii
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 8/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 9/183
Table of Contents
1 Introduction 1
2 The Spin Glass 3
2.1 Introduction to magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Modelling magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Spin interaction models . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Spin models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The Ising spin glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Computational Background 13
3.1 Ising spin glass ground states and combinatorial optimisation . . . . . . . . . . 13
3.1.1 Approximate approaches for determining ground states . . . . . . . . . 15
3.1.2 Exact methods for determining ground states . . . . . . . . . . . . . . 19
3.2 A dynamic programming approach to spin glass ground states . . . . . . . . . 21
3.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Ising state behaviour as a Markov chain . . . . . . . . . . . . . . . . . 22
3.2.3 The ground state sequence . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.5 An order-n Markov approach to determining ground states . . . . . . . 27
4 Parallelisation Strategies 314.1 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Harmony search performance . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Existing approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Proposed parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 36
4.2 Dynamic programming approaches . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 First-order Markov chain approach . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Higher-order Markov chain approach . . . . . . . . . . . . . . . . . . 43
v
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 10/183
vi TABLE OF CONTENTS
5 The Project 45
5.1 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Available resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Project preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Initial investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Design and implementation . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.3 Implementation language and tools . . . . . . . . . . . . . . . . . . . 48
5.2.4 Choice of development model . . . . . . . . . . . . . . . . . . . . . . 49
5.2.5 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.6 Risk analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.7 Changes to project schedule . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.8 Overview of project tasks . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Software Implementation 53
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Implementation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Source code structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.1 Library functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.2 Client functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Performance Evaluation 69
7.1 Serial performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Parallel performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Conclusion 99
8.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.1 Algorithmic approaches . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.2 Existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Project summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A Project Schedule 103
B UML Chart 105
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 11/183
TABLE OF CONTENTS vii
C Markov Properties of Spin Lattice Decompositions 107
C.1 First-order property of row-wise decomposition . . . . . . . . . . . . . . . . . 107
C.2 Higher-order property of unit spin decomposition . . . . . . . . . . . . . . . . 108
D The Viterbi Path 111
D.1 Evaluating the Viterbi path in terms of system energy . . . . . . . . . . . . . . 111
E Software usage 113
F Source Code Listings 115
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 12/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 13/183
List of Figures
2.1 Types of spin interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Graphs of spin interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Frustrated systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Subsystems and associated interaction energy . . . . . . . . . . . . . . . . . . 10
2.5 Clamping spins to determine interface energy. . . . . . . . . . . . . . . . . . . 10
3.1 Computing total system energy from subsystem interactions . . . . . . . . . . 14
3.2 Example first-order Markov chain with states a, b, c . . . . . . . . . . . . . . . 22
3.3 Illustrating the principle of optimality. Paths within the dashed circle are known
to be optimal. Using this information, optimal paths for a larger subproblem can
be computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Sliding a unit-spin window across a lattice . . . . . . . . . . . . . . . . . . . . 28
4.1 Using parallelism to improve heuristic performance . . . . . . . . . . . . . . . 32
4.2 Conceptual illustration of harmony search behaviour within search space . . . . 33
4.3 Parallelisation strategies for population based heuristics . . . . . . . . . . . . . 34
4.4 Harmony search parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 37
4.5 Graph of subproblem dependencies for an n = 3, m = 2 spin problem . . . . . . 40
4.6 Parallel matrix operations. Numerals indicate order of vector elements. . . . . . 41
5.1 Spin glass structure design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Software framework design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1 Functions provided by spinglass.c . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Schematic of operations performed by get optimal prestates() (basic dynamic
programming, collective operations). In contrast, when using cyclic communi-
cations, processes evaluate diff erent configurations of row i−1, shifting elements
in minPath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3 Sliding window for improved dynamic programming . . . . . . . . . . . . . . 65
ix
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 14/183
x LIST OF FIGURES
6.4 Schematic of operations performed by get optimal prestates() (improved dyanamic
programming), executed on four processors. The problem instance is a 2×2 spin
lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.1 Execution times for serial dynamic programming (basic algorithm) . . . . . . . 70
7.2 Log execution times for serial dynamic programming (basic algorithm) . . . . 71
7.3 Execution times for serial dynamic programming (improved algorithm) . . . . 72
7.4 Log execution times for serial dynamic programming (improved algorithm) . . 72
7.5 Memory consumption for serial dynamic programming (basic algorithm) . . . . 73
7.6 Log memory consumption for serial dynamic programming (basic algorithm) . 74
7.7 Memory consumption for serial dynamic programming (improved algorithm) . 75
7.8 Log memory consumption for serial dynamic programming (improved algorithm) 75
7.9 Parallel execution time for dynamic programming (basic algorithm, Ness) . . . 78
7.10 Parallel efficiency for dynamic programming (basic algorithm, Ness) . . . . . . 78
7.11 Vampir trace summary for dynamic programming (basic algorithm, Ness) . . . 79
7.12 Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.13 Parallel efficiency for dynamic programming (basic algorithm, cyclic commu-
nications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.14 Vampir trace summary for dynamic programming (basic algorithm, cyclic com-
munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.15 Parallel execution time for dynamic programming (improved algorithm, Ness) . 82
7.16 Parallel efficiency for dynamic programming (improved algorithm, Ness) . . . 83
7.17 Vampir trace summary for dynamic programming (improved algorithm, Ness) . 83
7.18 Parallel execution time for dynamic programming (basic algorithm, HPCx) . . 84
7.19 Parallel efficiency for dynamic programming (basic algorithm, HPCx) . . . . . 85
7.20 Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.21 Parallel efficiency for dynamic programming (basic algorithm, cyclic commu-
nications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.22 Parallel execution time for dynamic programming (improved algorithm, HPCx) 87
7.23 Parallel efficiency for dynamic programming (improved algorithm, HPCx) . . . 87
7.24 Summary of parallel efficiencies on HPCx . . . . . . . . . . . . . . . . . . . . 88
7.25 Conceptual representation of properties relevant to parallel performance . . . . 89
7.26 Parallel harmony search convergence durations (ZONEEXBLOCK= 100) . . . 91
7.27 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100) . . . . . 91
7.28 Parallel harmony search convergence durations (ZONEEXBLOCK= 1000) . . . 92
7.29 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000) . . . . 93
7.30 Parallel harmony search convergence durations (ZONEEXBLOCK= 10000) . . 94
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 15/183
LIST OF FIGURES xi
7.31 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000) . . . 94
7.32 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100) . 95
7.33 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000) 96
7.34 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000) 96
A.1 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.1 UML class diagram of source code module and header relationships . . . . . . 106
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 16/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 17/183
List of Tables
5.1 Identified project risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1 Mean error µe, standard error σe and error rate e of serial harmony search
ground states for increasing solution memory NVECTORS. Results are basedon the ground truth value −30.7214. Error rate is defined as the amount of cor-
rectly obtained ground state configurations over the total amount of algorithm
invokations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Serial execution times for basic dynamic programming on Ness, for various
GCC 4.0 optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Serial execution times for basic dynamic programming on HPCx, for various
xlc optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.4 Results for parallel basic dynamic programming on HPCx using 32 processors,
for combinations of user space (US) or IP communications in conjunction with
the bulkxfer directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
xiii
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 18/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 19/183
Chapter 1
Introduction
This dissertation describes aspects concerned with obtaining solutions to an optimisation prob-lem, namely finding ground states of the Ising spin glass. Attention is given to parallel ap-
proaches, their implementation, and their performance.
The first half of this work is devoted to theoretical aspects: The Ising spin glass is a model
relevant to statistical physics and other fields. In Chapter 2, the origins of this model are de-
scribed. The relation is drawn between between the project’s physical background and the
aforementioned optimisation problem. The Ising spin glass is but one possibility of modelling
materials exhibiting glass-like properties; Chapter 2 also exposes its relation to more involved
models. In Chapter 3, the theoretical background of optimisation is examined. Existing ap-
proaches are reviewed. The two approaches bearing significance to undertaken practical work
are detailed, namely dynamic programming and the harmony search heuristic. Parallelisation
strategies are described in Chapter 4, based on dynamic programming and harmony search.
Having examined theoretical aspects, practical aspects are then considered: Chapter 5 de-
scribes work relevant to project organisation. It includes a description of the project’s objectives
and identified risks. This chapter is relevant to practical work undertaken during the project. As
a result of practical work, implemented software is described in Chapter 6. Software function-
ality is detailed, in addition to implemented libraries and the source code’s structure. In Chapter
7, the implemented codes are evaluated. Experimental procedures are described, alongside pa-
rameters used for testing. Results are presented and interpreted. Finally, Chapter 8 concludes
the work. The project’s objectives are reviewed in relation to undertaken practical work. Also,
possibilities for further work are explored.
1
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 20/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 21/183
Chapter 2
The Spin Glass
2.1 Introduction to magnetic systems
The phenomenon of magnetism is ubiquitously harnessed in modern technology; it crucially
underpins many applications in areas such as automotive engineering, information processing
and telecommunications. While known since antiquity, the scientific process has enabled an
increasingly accurate understanding of magnetic phenomena. In current research, investigating
the magnetic properties of physical systems remains of great interest in the field of condensed
matter physics. One physical system, the spin glass, is the subject of such investigations. It
forms the background of work undertaken during the course of this project.
Given a physical system, it is possible to characterise its magnetic properties by examining
the relation between interactions occurring between internal subsystems, and the system’s ex-
ternal magnetic moment. The system’s external magnetic moment is a manifestation of these
interactions. More generally, all externally observable magnetic properties are the result of indi-
vidual subsystems’ properties. This concept is applicable both to microscopic and macroscopic
systems, for single or multiple subsystems: As an extreme case, one might consider a single
electron a system, as it possesses an intrinsic magnetic moment. In contrast, the interactions
within a three dimensional crystalline solid, for example are considerably complex and moti-
vate current investigations. This complexity is chiefly due to magnetic interactions at atomic
scale.
At atomic level, the electron eff ects magnetism not only as a result of its intrinsic field,
but also as a consequence of its orbital motion. The former is associated with binary state,
known as spin, which describes the particle’s internal angular momentum. It is spin which
determines the direction of the electron’s intrinsic magnetic moment. In contrast, orbital motion
contributes towards the particle’s external angular momentum, since it describes the particle’s
movement about the nucleus. Atomic magnetic fields depend both on orbital configuration and
spin alignment, where each electron contributes towards the atom’s net magnetic moment.
3
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 22/183
4 Chapter 2. The Spin Glass
In general, an electron’s state is governed by quantum properties, which are subject to the
Pauli exclusion principle [31]. This asserts that for any fermion, such as the electron, particles
may not assume identical quantum state simultaneously. This has important consequences for
the spin configuration of interacting electrons and therefore influences the magnetic propertiesof multiatomic systems.
The first implication of the exclusion principle is that for two electrons possessing iden-
tical orbital movement, spins must antialign to satisfy state uniqueness. Consequentially, the
electrons’ intrinsic magnetic moments antialign, causing net cancellation of these fields for the
particle pair.
The second implication relates to minimising a system’s energy: For interacting electrons
with diff erent orbital motion, the Pauli exclusion principle states that parallel spin alignment will
be favoured, since it guarantees that orbital movement remains disjoint. Because of electrostatic
repulsion, decreasing proximity between electrons lowers the system’s energy. It is this relation
which allows certain materials to retain a magnetic field, the result of a surplus of aligned spins
opposed to disordered spin configuration, in a favourable energetic state.
It turns out that the difficulty in determining a system’s magnetic properties stems from the
complexity of spin interactions: The structure of a specified material may be irregular, resulting
in diff ering ranges between electron orbitals. The type of atomic bonds and electron config-
urations present in the material is also influential, since these influence the orbital energy of
electrons. It was previously mentioned that a system’s energy is sought to be minimised. This
energy depends on the proximity in which interactions occur and hence behaves characteristi-
cally for the examined system.
The energy associated with spin interaction is expressed exactly in the so-called exchange
energy, first formulated by Heisenberg [38] and Dirac [20]. Based on consequences of the Pauli
exclusion principle for the wavefunction of a system consisting of multiple fermions, the system
wavefunction is defined for combinations of aligned and antialigned spins. These wavefunctions
are then used to compute the exchange energy
J = 2
Ψ∗
1(r 1)Ψ∗2(r 2)V I (r 1, r 2)Ψ2(r 1)Ψ1(r 2) dr 1dr 2
where Ψ1, Ψ2 are wavefunctions of interacting particles with locations r 1, r 2 on the real line and
V I is the interaction energy.
Using eigenanlysis∗, it is furthermore possible to express the contribution towards the sys-
tem’s Hamiltonian arising from spin interaction, which depends on J and the spin operands s1,
s2 for a pair of spins:
− J (s1 · s2) (2.1)
∗An explanation is given by Griffiths [31]
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 23/183
2.2. Modelling magnetic systems 5
↑↑
(a) Ferromagnetic
↑ ↓
(b) Antiferromag-
netic
Figure 2.1: Types of spin interaction
This object is of fundamental importance for describing the interaction energy of large sys-
tems, since these may be described in terms of their underlying interacting subsystems. It is
employed simplified in models such as the Ising model [47] used in this project. The interaction
variable J is comprehensively known as the coupling constant . Although it assumes a positive
real value for spin interactions where parallel alignment is favoured, it is important to note that
antiparallel alignment is also favoured in many materials. Bearing this in mind, positive J are
associated with ferromagnetic coupling, whilst negative J are associated with antiferromagnetic
coupling. Figures 2.1(a), 2.1(b) illustrate these interactions.
2.2 Modelling magnetic systems
As currently described, the simplest type of magnetic interaction is expressed by defining two
fundamental operands and an associated coupling constant. Together with the coupling con-
stant, these fundamental operands are evaluated using an interaction operator. The operands
are commonly spins, whose state may be described using either a unit vector or an integer, forexample.
2.2.1 Spin interaction models
Because spin coupling is a symmetric relation, it is possible to describe interactions occurring
amongst multiple spins by considering the set E ⊆{si, s j} | si, s j ∈ S , i j
of pairwise bonds
amongst spins in a spin set S , given the weight function w : {si, s j} → R. This corresponds to
an undirected weighted graph. In the graph, the absence of the edge between two spins sk , sl
is equivalent to the zero coupled edge w (
{sk , sl
}) = 0. An example of such a graph is shown
in Figure 2.2(a). Given this general case of an undirected graph, there are three specialisations
which have been used extensively to investigate the properties of magnetic systems consisting
of many spins.
In terms of spin interactions, a comparatively involved model is the so-called Axial Next
Nearest Neighbour Interaction (ANNNI) model. Here, spins are arranged conceptually as a
lattice in Euclidean n-space, with bond edges defined between neighbouring spins along each
dimension. In addition to these bonds, interactions for each spin are extended in a ‘next spin
but one’ fashion along each dimensions. That is, interactions are defined by conducting a walk
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 24/183
6 Chapter 2. The Spin Glass
↑
↑
↑
↓
↓
↓
(a) General undirected case
↑↑↑
↑
↑↑↑
↓↓
(b) ANNNI model
↑↑
↑↓
↓↓↓
↓
↓
(c) EA model
Figure 2.2: Graphs of spin interactions
of length l ≤ 2 along the lattice in each dimension, given an initial node. A spin therefore
interacts with n ≤ 4d partner spins, as displayed in Figure 2.2(b). This model has been employed
extensively in research [57, 17, 56].If the ANNNI model is modified by extending the length of the walk to infinity in ar-
bitrary direction, the graph defined by spin interactions E becomes fully connected: E ={si, s j}|si, s j ∈ S , i j
. This realisation of lattice interactions is known as the Sherrington-
Kirkpatrick model [58], whose Hamiltonian is equal to
H = −(i, j)
J i j si · s j.
Here, the notation
(i, j) indicates the sum over all spin interactions, as described. The Sherrington-
Kirkpatrick model is employed by Parisi [54] for the purpose of exploring transition propertiesof magnetisation, using an approach known as mean field theory.
Given that spin interactions occur over short range, an elementary approach to representing a
system considers only nearest neighbour interactions between spins. In a two dimensional lattice
model, the graph of spin interactions becomes is then defined as E ={si, s j}|si, s j ∈ S , d (si, s j) = 1
,
where d (si, s j) is the block distance between spins si, s j. This is illustrated in Figure 2.2(c). The
Hamiltonian of such a system is
H = −i, j
J i j si · s j
where the notation i, jindicates the sum over nearest neighbour spin interactions. Due to
Edwards and Anderson [22], this model is the subject of work undertaken during the course of
this project.
Bonds
The exchange energy between two spins is governed by the magnitude of the coupling constant
J . When dealing with multiple interactions, these bond strengths are often selected from a
probability distribution. This distribution is a continuous uniform or Gaussian distribution for
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 25/183
2.2. Modelling magnetic systems 7
many modelling purposes [58, 60, 52]. When dealing with the Sherrington-Kirkpatrick model,
the exchange energy distribution often includes the property of exponential decay over spin
distance [12]. Another commonly used distribution [36, 21] permits only coupling constants
J ∈ {1, −1}, such that both values are equally probable.
Other distributions have also been employed for defining coupling constants, such as the
twin peaked Gaussian [15]. Ermilov et al. [23] provide an investigation of the implications for
interactions with arbitrarily distributed bonds. In this project, the equally distributed variant of
spin coupling is considered.
2.2.2 Spin models
As with the approaches to modelling spin interaction, the spin object itself may be modelled
to varying levels of of complexity. Most realistically, in a quantum Heisenberg model, eachspin is described by its quantum state in three dimensions, so that the Hamiltonian for a two
dimensional Edwards-Anderson model becomes†
H = −1
2
i j
J xσ xi σ x
j + J yσ y
iσ
y
j+ J zσ z
iσ z
j
where σ xk
, σ y
k , σ z
k are Pauli matrices corresponding to spin sk .
Alternatively, a classical Heisenberg formulation is also possible, as employed by Ding
[19], Kawamura [42]: Here, spins are represented as thee-dimensional real-valued unit vectors,
so that exchange energy between spins si, s j is calculated by means of the inner vector product,
as described in Equation 2.1. A simplification achieved by discretising spin state exists in the
so-called Potts model [63]. Here, a spin may assume a state si ∈ {1, . . . k } where k is the total
number of states. The Hamiltonian of a system of spins with nearest-neighbour interaction is
expressed as
H = −i, j
J i j cosθ (si) − θ (s j)
with θ (si) = 2πsi/k .
The Potts model may be simplified further, achieved when considering the case of the model
when k = 2: Define the Potts correlation function γ (si, s j) = cosθ (si) − θ (s j)
. Given that
θ : {1, 2} → {π, 2π}, the mapping
γ (si, s j) =
1, si = s j
−1, si s j
i s a s ufficient definition for the correlation function in the described case. Alternatively, γ (si, s j) =
†cf. Baxter [8]
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 26/183
8 Chapter 2. The Spin Glass
si s j, with si, s j ∈ {1, −1}. This leads to the definition of system energy as
H = −
i, j
J i j si s j,
with si, s j ∈ {−1, 1}.
When combined with nearest neighbour interactions and constant J , this archetypal model
of spin interaction is known as the Ising model [11]. As formulated, in the Ising model, a
spin’s state eff ects an exchange energy, whose sign is inverted if the spin’s neighbour’s assumes
opposing state. In this respect, the model spin object is an abstraction of electron state which
discards the consequences of orbital movement , considering only intrinsic angular momentum.
While comparatively restrictive, an adaptation of the Ising model has been the subject of
intense research in its originating field of statistical physics [8]. In addition to certain applica-
tions in investigating the behaviour of neural networks [4] and biological evolution [49], this
model has proven popular in examining the properties of materials in the field of condensed
matter physics [26]. One application involves investigating the properties of materials collec-
tively known as spin glasses. These possess distinctive properties, which are described in the
following.
2.3 The Ising spin glass
Spin glasses are substances which are characterised by structural disorder. This is the case for
chemical glasses or certain types of dilute metal alloys. These materials possess highly irreg-
ular microscopic structure, which has implications for magnetic interactions between ions. In
particular, disorder results in a distribution of ferromagnetic and antiferromagnetic interactions,
which are the origin of the phenomenon known as frustration.
The dynamics of spin glasses are such that there exists a critical phase transition tempera-
ture, above which the system behaves like a conventional paramagnet or ferromagnet. Below
the transition temperature however, a magnetic disorder manifests itself, called the spin glass
phase. This magnetic disorder is responsible for the system’s unique behaviour.
Frustration, the second component to characteristic behaviour, arises when a system’s ener-
getically optimal state is the result of combined interactions which cannot individually assume
optimum state. Instead, the global optimum requires certain interactions to be suboptimal. De-
pending on the constituent interactions, this may imply that there exist multiple state configura-
tions which yield the energetic optimum.
An example of this principle is shown in Figure 2.3(a). Here, three Ising spins s0, s1, s2 ∈{1, −1} interact in a triangular lattice. Because bonds are not consistently ferromagnetic, it is
apparent that some interactions require spins with opposing orientation, to be optimal. This is
the case for the antiferromagnetic bond between spins s1, s2. For either optimal configuration of
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 27/183
2.3. The Ising spin glass 9
↑↑
↓
s0
s1
s2
(a) Three spins
↑↑
↓↓
(b) Four spin ‘plaque-
tte’
Figure 2.3: Frustrated systems
the spin pair it is not possible however, to set s0 so that optimality of the remaining interactions
is satisfied. Similarly, when evaluating the system commencing with pairs s0, s1 or s0, s2, it is
not possible to set the remaining spin so that all interactions are satisfied. It follows that there
exists no configuration of this system in which all interactions are optimal.
In the n-dimensional lattice Ising realisation of a spin glass, the smallest structure capable
of exhibiting frustration is shown in Figure 2.3(b). Considering all 24 combinations of positive
and negative coupling constants, it can be seen that frustrated interactions occur for odd num-
bers of antiferromagnetic or ferromagnetic bonds. For larger systems, it is possible to analyse
frustration by decomposing the lattice into subsystems of this kind. In this context, the square
substructure is termed a plaquette.
Uses of the Ising spin glass
The extent to which the Ising model departs from a realistic representation of magnetic phe-
nomena was previously described. Although the model’s accuracy presents a disadvantage, its
comparative simplicity lends itself to certain analytical advantages: These advantages are based
on the fact that the ‘state space’ of a single spin is small, which has consequences for evaluating
sets of spin systems. Also, since spins interact only over nearest neighbour boundaries, it is
trivial to ‘decompose’ a system into its constituent subsystems, should this be required. Using
such a scheme, total exchange energy is the sum of internal subsystem energy and subsystem
interaction energy (Figure 2.4). This approach is employed in analytical methods described in
following chapters.
For experimental purposes, it is of interest to examine computationally the behaviour of
various realisations of spin glasses. As spin glasses are thermodynamic systems, knowledge
of ground state energy is of particular importance towards this aim. Formally, given an n-spin
system where S = {s0, s1, . . . , sn−1} represents some configuration of these spins,
argminS
H (S )
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 28/183
10 Chapter 2. The Spin Glass
Subsystem interaction energy
↑
↑↑
↑
↑↑
↑↑
↑
↑
↓↓
Figure 2.4: Subsystems and associated interaction energy
↑ ↑ ↑
↑
↑
↑
↑
↑
↑ ↑
↑ ↑
↑
↑
↑ ↓
↓
↓
↓
↓
↓
↓↓↓
↓
↓ ↓ ↓ ↓
↓↓↓
Free Invert, clamp
Figure 2.5: Clamping spins to determine interface energy.
is the system’s ground state. The Hamiltonian H (S ) describes the energy of system configura-
tion S . In the case of the Ising model with real valued coupling constants, there exists a single
ground state configuration, and an equivalent configuration with all spins inverted. For systems
with discrete valued coupling constants, a number of degenerate ground states may exist. Pro-
vided an algorithm for determining ground states, it may be of interest to examine the eff ect
system size on ground state energy.
Previous work investigates scaling with regard to a related quantity, the so-called interface
energy [15]. For an Ising-like model, interface energy is the absolute diff erence between ground
state energies, obtained when altering the model instance’s spin configuration with respect to
a certain boundary condition (coupling constants are left unaltered). Figure 2.5) shows an ex-
ample, again using the two dimensional lattice Ising model. Here, ground state configurations
are obtained for two experimental instances: In the first instance, the entire set of spin config-
urations is considered. In the second instance, spins in the rightmost column are ‘clamped’:
Their state is equal to that of the previously obtained configuration, only inverted. Enforcing
this condition in the second instance allows the behaviour of adjacent spins to be examined.
A closely related aspect deals with exploring the behaviour of spin glass properties in the
limit N → ∞, where N is the system size. For certain purposes, it is beneficial to approxi-
mate this condition by introducing periodicity into spin interactions. In the Ising model, pairs
of boundary spins along dimensions with periodic boundary conditions interact in the manner
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 29/183
2.3. The Ising spin glass 11
illustrated in Figure 2.2(b). This can easily be expressed mathematically by applying modular
arithmetic to the one dimensional Ising case H =
i J i si si+1 , requiring minor modification for
models with d > 1.
In thermodynamic systems, attention must be given to the relation between macroscopic andmicroscopic properties. To this extent, an important object is the partition function, defined as
Z (T ) =
S
e− H (S )/kT ,
where H (S ) is the system energy, T the absolute temperature and k the Boltzmann constant.
The sum is over all (microscopic) system configurations S . Using the partition function, it is
possible to determine the probability P(S ) of a specific state as
P(S ) =e− H (S )/kT
Z (T )
Fortunately, when examining an ensemble at T = 0K it turns out that P(S ) = 1 iff S is a ground
state configuration, otherwise P(S ) = 0. This fact has implications for computing ground state
energies of Ising spin glasses, the subject of this project.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 30/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 31/183
Chapter 3
Computational Background
In the previous chapter, the Ising model was introduced. System energy was described as a type
of utility function for evaluating system configurations. The problem of obtaining ground state
energy was introduced.
In this chapter, finding ground states of the Ising spin glass is approached as a combinatorial
optimisation problem. In this context, existing solutions are examined, in addition to describing
two approaches implemented in this project, harmony search and dynamic programming. The
latter approach is the consequence of describing spin glass interactions as a Markov chain, which
lends itself to a formulation of the most likely sequence of events in the chain, i.e. the Viterbi
path [61].
3.1 Ising spin glass ground states and combinatorial optimisation
Formally, any instance of the Ising spin glass defines the energy function E (S ) with E : {1, −1}n →R. Here, S = (s1, s2, . . . , sn) is an n-spin configuration, with each spin si ∈ {1, −1}. For con-
venience, a notation for describing a configuration partitioned into p disjoint subsystems is
introduced as S = {S 1, S 2, . . . , S p}. The real valued co-domain of E (S ) corresponds to the totalsystem energy. The total system energy of a partitioned system is
E (S ) =
pk =1
E (S k ) +i, j
J i j si s j |si ∈ S α, s j ∈ S β,
where i, j denotes nearest neighbour Ising interactions, as described in Chapter 2. The subsys-
tems S α, S β are disjoint. By decomposing spin interactions occurring within the entire system,
energy is expressed as the sum of subsystem energy and ‘system boundary’ energy. Defining
13
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 32/183
14 Chapter 3. Computational Background
E (S 2)
E (S 3)
E (S 4)
E (S 1)
→
E (S )
E (S 1, S 2)
E (S 2, S 3)
E (S 3, S 4)
E S , S
Figure 3.1: Computing total system energy from subsystem interactions
E b(S i, S j) as the system boundary energy between disjoint subsystems S i, S j.
E b(S i, S j) =q,r
J q,r sq sr |sq ∈ S i, sr ∈ S j,
the total system energy can be defined as
E (S ) =
p
k =1
E (S k ) +i, j E b(S i, S j)
where i, j denotes nearest neighbour interactions between subsystems, in analogy to nearest
neighbour interactions between individual spins. An example of system decomposition is pre-
sented in Figure 3.1, for a system with cyclic boundary interactions. Decomposition is relevant
to approaches described in this chapter.
Determining ground states
The ground state configuration of an Ising spin glass is defined as S min = argminS E (S ). The
domain of the evaluated function E (S ) implies that an exhaustive search of the system’s state
space requires 2|S | individual evaluations. Such a brute force approach might be implemented
using a depth-first traversal of the state space.
Clearly, using this method is only practicable for the very smallest of problem instances, as
the search space grows exponentially with the number of spins in the system. Therefore, it is
of interest to examine the possibility of restricting the search space, consequently reducing the
complexity of obtaining solutions to problem instances.
The fact that the upper bound of search space size grows exponentially, suggests that the
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 33/183
3.1. Ising spin glass ground states and combinatorial optimisation 15
ground state problem belongs to the class of NP problems. Due to Barahona [6], it is shown
that in fact, certain cases of the problem are NP-complete, such as the two dimensional lattice
where every spin interacts with an external magnetic field, and the three dimensional lattice
model. Istrail generalises the proof of NP-completeness to any model where interactions arerepresented as a non-planar graph [16].
Fortunately, planar instances of the Ising model are not guaranteed to be in NP; a polynomial-
time bound is shown by Barahona for the two dimensional, finite sized model. This fact implies
that obtaining ground states is not intractable for this case of the model, and motivates the devel-
opment of efficient algorithms which obtain exact solutions. The latter are defined as solutions
equivalent to those generated from an exhaustive search.
3.1.1 Approximate approaches for determining ground states
Regardless of NP-completeness, formulation of the ground state problem as a combinatorial
optimisation problem allows a second approach to be considered, involving the class of meta-
heuristic algorithms. Although these algorithms are typically only guaranteed to search exhaus-
tively as time goes towards infinity, many have been shown to produce optimal or near-optimal
solutions to a wide number of problems, provided sufficient execution time. It is therefore of
proximate interest to investigate the performance of these algorithms, in context of the Ising
spin glass.
By common definition, a metaheuristic is a heuristic applicable to solving a broad class
of problems [28]. In practice, this is achieved by defining a set of ‘black-box’ procedures,
i.e. routines specific to the problem. When dealing with combinatorial optimisation problems,
these routines typically include a utility function, whose purpose it is to evaluate candidate
solutions selected from the state space. Utility is then used to compare solutions amongst one
another.
To be of practical use for problems with large state spaces, a heuristic must arrive at a solu-
tion by considering some subset of this space its search space. The metaheuristic approach often
achieves this by random sampling [28], which may cause the algorithm to produce suboptimal
results. To apply a metaheuristic eff ectively, it may therefore be necessary to evaluate perfor-
mance against diff erent combinations of algorithm parameters. Generating sufficient amounts
of samples may motivate parallel algorithmic approaches. Also, although it has been shown
that the performance of optimisation algorithms remains constant over the class of all optimisa-
tion problems [62], there may be significant performance diff erences between algorithms when
applied to a specific problem class. It is hence of interest to examine diverse metaheuristic
approaches in conjunction with the described optimisation problem.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 34/183
16 Chapter 3. Computational Background
Evolutionary algorithms
One class of metaheuristic is inspired by biological evolution. Here, a population of candidate
solutions is created and subsequently evolved in an iterative process, where individual ‘parent’solutions are selected stochastically in order to generate ‘off spring’ solutions. The process of
selection is designed to favour solutions which exhibit high ‘fitness’, the latter evaluated using
a utility function. In a further biological analogy, off spring are generated by combining solution
parameters from both parents, prior to randomised modification (mutation). These new solutions
are then added to the population, which is typically maintained in order to stay in equilibrium.
The process is then repeated, terminating either on completing a specified number of iterations,
or when a convergence criterion is fulfilled.
Evolutionary algorithmic approaches applicable to combinatorial optimisation are known
as genetic algorithms [50]. The approach here involves representing a solution by the set of parameters supplied to the target function as a string. After evaluating solution fitness as pre-
viously described, crossover is typically realised as a manipulation of substrings: For example,
one might generate off spring as a combination of permuted substrings from parent strings. Cor-
respondingly, mutation might be realised as a permutation of substring elements from a single
solution. It is evident that the multitude of possibilities in which selection, crossover and mu-
tation may be implemented, has the potential to cause deviations in the optimisation process’
performance.
Genetic algorithms have been applied to the spin glass ground state problem by Gropengiesser
[32], who considers two variants of the basic evolution procedure. In the first, the populationis initialised to multiple instances of a single solution, to which mutation is then applied it-
eratively. Using a local search heuristic, mutations conducive to lowering the system energy
are accepted. In the second variant, the former regime is augmented with random parent se-
lection and crossover, such that every child solution replaces one of its parents. Results show
that performance is aff ected strongly by the method of admitting new candidate solutions to the
population, following mutation.
As one might expect, approaches incorporating local minimisation techniques have shown
to improve optimisation performance, as implemented by Hempel et al. [40], using a so-called
hybrid genetic algorithm. This is in comparison to an early investigation by Sutton [59], using
a general evolutionary approach. Houdayer and Martin [41] report good performance for the
Ising model with discrete ± J bond distribution, using a Genetic Renormalisation algorithm.
Here, domain specific knowledge is incorporated into the optimisation process by recursively
partitioning the graph of spin interactions, in resemblance to the description at the beginning of
this chapter. A local optimisation process is then applied to the partitioned system.
Given the nature of the project, of special interest are methods of parallelising genetic al-
gorithms. In the general context of evolutionary computing, Cantu-Paz [14] describes a coarse
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 35/183
3.1. Ising spin glass ground states and combinatorial optimisation 17
grained approach known as the ‘island’ method. In the distributed memory paradigm, processes
are arranged in a toroidal grid, each executing the algorithm in parallel. After each iteration,
a subpopulation of local solutions is selected based on fitness, and exported to neighbouring
processes asynchronously. As an alternative, a fine grained scheme may also be used, wherecrossover is allowed to take place between solutions residing at diff erent processes.
Simulated annealing
Simulated annealing is a technique readily applicable to calculating ground states, as it is based
on the principles in statistical physics which underpin the Ising model. The technique is derived
from the Metropolis-Hastings algorithm [37], in which a probability distribution is sampled in-
directly by means of a first-order Markov chain. That is, the distribution of a generated sample is
sufficiently defined by the value of its predecessor. In simulated annealing, a candidate solution
S in the state space is associated with the probability
P(S ) ∝ e− H (S )/(kT ),
the state probability of a canonical ensemble, which was introduced in Chapter 2.
Optimisation is performed by initialising a random solution configuration and sampling
proximate configurations in the state space by stochastic parameter modification: Specifically
for the Ising model, this would involve perturbing spins by inverting their state. The new con-
figuration is accepted if the perturbation resulted in lower system energy, otherwise the state is
accepted with probability e−∆ H /(kT ) where ∆ H is the change in system energy. Of importance is
the value of temperature T , which is initialised to a certain value and decreased monotonically
towards zero according to a specific annealing schedule, as the algorithm progresses.
In Chapter 2, it was mentioned that as T approaches zero, P(S ) = 1 i ff S is a ground
state. A consequence of this fact for the optimisation process is that if T is initialised to a
finite temperature and decreased sufficiently slowly, the algorithm is guaranteed to arrive at the
system’s globally optimal state [51]. In practice, execution time is restricted to a fraction of that
required for an exhaustive search, so that the annealing process becomes an approximation.
Simulated annealing was first applied to the spin glass problem by Kirkpatrick, Gelatt and
Vecchi [44]. It is important to note that the choice of annealing schedule significantly aff ects
the algorithm’s ability to arrive at an optimal solution. This is because temperature influences
the amount of selectivity involved as state space is explored. Conversely, it follows that the
solution landscape particular to a problem usually aff ects the accuracy of solutions obtained by
the algorithm using a particular schedule.
Ram et al. describe an approach to parallelising the algorithm [55]. Clustering simulated
annealing is based on the observation that a good initial solution typically reduces the amount
of iterations required for the algorithm to converge. After executing the algorithm on multiple
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 36/183
18 Chapter 3. Computational Background
processing elements with diff erent initial states, an exchange of partial results takes place to
determine the most favourable solution. This result is then redistributed to all processing ele-
ments, in order to repeat the process a set number of iterations, after which the final solution is
determined.
Harmony search
A recently developed optimisation algorithm is due to Geem [27]. Known as harmony search,
this algorithm has been applied to a number of optimisation problems such as structural design
[45] and data mining [25]. Harmony search can be considered an evolutionary algorithm, as it
maintains a population of candidate solutions, which compete with one another for permanency
and influence generation of successive candidates.
Inspired by the improvisational process exhibited by musicians playing in an ensemble, har-
mony search iteratively evolves new solutions as a composite of of existing solutions. As with
genetic algorithms, a utility function determines whether a newly generated solution is included
in the candidate set. In addition to devising a probabilistic scheme for combining parameters
from existing solutions, new solutions are modified according to a certain probability. This is
designed to improve exploration of the state space, similar to genetic mutation.
Formally, the algorithm defines an ordered set σ = (σ1, σ2, . . . , σm), of m candidate solu-
tions, where each candidate is an n-tuple σk = (σk 1
, σk 2
, . . . , σk n). Algorithm parameters are the
memory selection rate Pmem, the so-called pitch adjustment rate Pad j and the distance bandwidth
β∈R. Random variables X
∈ {1, 2, . . . , m
}and Y
∈[0, 1) are also defined. Using a termination
criterion such as the number of completed iterations, the algorithm performs the following steps
on the set of initially random candidates:
• Generate: σν = (τ(1), τ(2), . . . , τ(n)) where τ(i) =
σ X
iY ≤ Pmem
Random parameter value Y > Pmem
• Update: For 1 ≤ i ≤ n, σνi← σν
i+ β iff Y ≤ Pad j
• Replace:
– w
←argmaxw
{σw
}– σν ← min{σw, σν}– σw ← σν
In the first step, the algorithm generates a new candidate, whose parameters are selected at
random both from existing solutions in the population and from a probability distribution. In a
further stochastic procedure using random variable Y , solution parameters are modified. This
step is of particular significance for continuous optimisation problems; it may be preferable
to omit it in other cases. Finally, the population is updated by replacing its worst solution, if
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 37/183
3.1. Ising spin glass ground states and combinatorial optimisation 19
the generated candidate is of higher utility. The process is then repeated, using the updated
population.
An application of harmony search to the discrete Ising ground state problem is trivial, by
assigning each solution the ordered set of spins defined at the beginning of this chapter, i.e. σk =
(s1, s2, . . . , sn). Because the set of solution parameter values is discrete and small, the eff ect
of modifying solutions due to distance bandwidth β can be consolidated into the algorithm’s
‘generation’ step. The process thus consists solely of generating and conditionally replacing
existing solutions in memory, governed by parameters m (the candidate population size) and
Pmem (the memory selection rate). Work undertaken for this project examines the performance
of this algorithm for finding Ising spin glass ground states.
3.1.2 Exact methods for determining ground states
Graph theoretic methods
Returning to the spin glass as an exactly solvable model, it is necessary to examine the graph
representation of spin interactions more closely. An undirected graph G = (V , E ) is described
by a set of vertices V = {v1, v2, . . . , vn} and edges E ⊆ {{vi, v j}|vi, v j ∈ V }. Given an Ising spin
glass model S = {s1, s2, . . . , sn} let S = V and E = {{si, s j}| J i j > 0} where J i j is the bond
strength between spins si, s j. The set of vertices is partitioned into subsets S +, S − such that
S + = {si|si = 1}, S − = {si|si = −1}.
Grotschel et al. [29] provide a description of a method which is the basis of algorithms
developed by Barahona et al. [7]. Here, the system’s Hamiltonian is described in terms of S+
and S − as
H (S ) = −
i, j∈ E (S+)
J i j si s j −
i, j∈ E (S −)
J i j si s j −
i, j∈δ(S +)
J i j si s j
where E (T ) = {{si, s j}|si, s j ∈ T } and δ(T ) = {{si, s j}|si ∈ S , s j ∈ S \T }. Considering the eff ect
of opposing spin interactions, the Hamiltonian can be rewritten as
H (S ) = −
i, j∈ E (S+)
J i j −
i, j∈ E (S −)
J i j +
i, j∈δ(S +)
J i j,
from which it follows H (S ) +
i, j∈S
J i j = 2
i, j∈δ(S+)
J i j.
The ground state energy can now be formulated in terms of the function δ as
H min = minS+⊆S
2
i, j∈δ(S +)
J i j −
i, j∈S
J i j
Because the co-domain of δ consists of edges which define a cut of the graph of spin interac-
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 38/183
20 Chapter 3. Computational Background
tions, i.e. a partition of nodes into two disjoint sets, obtaining ground states is now described in
graph theoretical terms as a cut optimisation: As formulated, ground state energy is expressed
as the minimum cut of a weighted graph. Equivalently the problem can be formulated as a
maximisation, if the signs of interaction energies are inverted.Hadlock [34] shows further that finding a maximum cut of a planar graph is equivalent
to determining a maximum weighted matching of a graph, for which there exist polynomial
time algorithms. Bieche et al. [10] and Barahona [6] follow this approach, where a graph is
constructed based on interactions between spin plaquettes. A recent similar approach due to
Pardella and Liers [53] allows very large systems to be solved exactly.
De Simone et al. employ a method known as ‘branch-and-cut’. Here, the cut optimisa-
tion problem is initially expressed as an integer programming problem. In integer program-
ming, the objective is to determine max
uT x|Ax ≤ b
, where the components of vector x ∈ Zn
are determined subject to constraints defined by vectors a, b and matrix A. During execution,branch-and-cut specifically employs the linear relaxation of the programming problem, where
it is permitted that x ∈ R. This relaxation is combined with the branch and bound algorithm,
which is invoked when a non-integral solution of x is determined. Substituting the non-integral
component with integers, the problem is divided using a further algorithm, which recursively
generates a tree of subproblems. By maintaining bounds on solution utility, it is possible to
identify partial solutions which are guaranteed to be suboptimal. Since these are not required
to be subdivided further, the search tree is pruned. Liers et al. [46] describe the branch-and-cut
algorithm in detail, which permits tractable computation of spin glass models consisting of 502
spins without periodic boundaries.
Transfer matrix
A technique applicable to various problems in statistical mechanics is the transfer matrix method
[8]. The requirement is as described at the beginning of this chapter, where a system is described
in terms of adjacently interacting subsystems. Using the definition of system state probability,
a matrix describing interactions is defined as A = pi j
where pi j = P(S i
k +1, |S j
k ), given sub-
systems S k +1, S k assuming states S ik +1
∈ 2S k +1 , Sj
k ∈ 2S k . Conditional independence from other
systems is assumed, i.e. P(S ik +1
|S
j
k ) = P(S i
k +1
|S
j
k , S 1, S 2, . . . S p). Here, the notation 2S denotes
the set of all spin configurations of system S .
By implications of conditional state probability, given an initial subsystem it is possible to
evaluate the state of successive subsystems via a series of matrix multiplications. Problems such
as determining the partition function can be solved using eigenanalysis, an example of which is
given in [15]. The transfer matrix approach due to Onsager allows the partition function of the
two-dimensional Ising model to be formulated [39].
In the next section, the framework of Markov chain theory is used to examine in detail
probabilistic interactions within the Ising spin glass. The Markov transition matrix is equivalent
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 39/183
3.2. A dynamic programming approach to spin glass ground states 21
to the transfer matrix, hence it follows that methods for system properties are closely related.
The chosen approach exposes a dynamic programming formulation of the ground state problem
with implications for further parallelisation.
3.2 A dynamic programming approach to spin glass ground states
A system S is described by a set of states S =S 1, S 2, . . . , S n
, for example spin configurations
S =S i|S i ∈ 2S
. Again, 2S denotes the set of all system configurations. Residing in state S τ,
the system undergoes a series of non-deterministic state transitions, such that each successive
system configuration S τis determined from the assignment S τ
= t (S σ). The map t : 2S → 2S
is defined using a vector of random variables v = (vS 1 , vS 2 , . . . , vS n ), where vS i is a random
successor state the system may assume when in state S i. The probability mass function of theserandom variables is defined as
f vS i
S j= P
vS i = S j|S i
.
Given an initial distribution of states, it may be of interest to determine the most likely sequence
of states. For this purpose, it is useful to examine the system in terms of its Markov properties.
3.2.1 Markov chains
Define a sequence of states C = (S x1 , S x2 , . . . , S xm ). The sequence is said to fulfil the first-
order Markov property, if the value of any single state sufficiently determines the probability
distribution of the state’s successor in the sequence, i.e.
∀i
S xi+1 |S xi
= PS xi+1 |S xi , S xi−1 , . . . , S x1
.
Formulating the probabilities of state transitions in matrix form is convenient for evaluating
the behaviour of the sequence after finite or infinite state emissions: Define the transition matrix
between sequence elements i, i + 1 as
Mi, i+1=
P(S 1|S 1) P(S 1|S 2) . . . P(S 1|S n)
P(S 2|S 1) P(S 2|S 2) . . . P(S 2|S n)...
.... . .
...
P(S n|S 1) P(S n|S 2) . . . P(S n|S n)
,
where P(S τ|S σ) denotes the probability of the emission S τ as the i + 1th element in the chain af-
ter the ith emission, S σ. It follows that the probability distribution of states d =
P(S 1), P(S 2), . . . , P(S n)T
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 40/183
22 Chapter 3. Computational Background
a b c
P(b|a) P(c|b)
P(a|b P(b|c
Figure 3.2: Example first-order Markov chain with states a, b, c
after m sequence emissions can be evaluated as
d =
mk =1
Mk , k +1
d (3.1)
where vector d is the initial state distribution. If for all k , Mk , k +1 = Mk −1, k , the Markov chain
is termed time-homogeneous. Such a chain may be represented by a directed, weighted graph
as shown in Figure 3.2, where nodes represent states and labelled edges represent transition
probabilities. A detailed discussion of further Markov chain properties is provided by Meyn
and Tweedie [48].
By current definition, state emission is governed by an amount of ‘memory’, in that preced-
ing sequence values influence state output at any given point in the sequence. The first-order
Markov chain, where states are conditionally dependent on a single, immediate predecessor, is
the simplest instance of a Markov process.When extending the amount of chain memory, i.e. increasing the number of preceding states
which determine the distribution of output states, the order-n Markov chain must be considered.
A generalisation of the archetypal first-order model, the distribution of an emitted state depends
on n immediate predecessors in the sequence. Following the definition of the first-order model,
the requirement for an order-n chain is
∀i
S xi |S xi−1 , S xi−2 , . . . , S xi−n
= PS xi |S xi−1 , S xi−2 , . . . , S x1
,
i.e. knowledge of preceding n states sufficiently defines the probability of state S xi in the se-
quence. Both model have implications for algorithm design.
3.2.2 Ising state behaviour as a Markov chain
In context of the previously described Markov model, the following approach examines Ising
interactions within the two-dimensional lattice without boundary conditions. Initially, the lattice
lattice is partitioned into rows, as shown in Figure 3.1. Clearly, interactions between individual
rows occur in nearest-neighbour fashion, significantly along a single dimension. That is, for an
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 41/183
3.2. A dynamic programming approach to spin glass ground states 23
n × m spin system, the partition is defined as S = {S 1, S 2, . . . , S n} with S i ∈ {1, −1}m, 1 ≤ i ≤ n.
The energy of the system is
ni=1
H (S i) +
ni=2
H b(S i−1, S i) = H (S 1) +
ni=2
H (S i) + H b(S i−1, S i)
where H (S i) is the Hamiltonian of subsystem S i and H b(S i, S j) is the boundary energy between
subsystems S i, S j, as previously defined.
Since ∪S i= S , the entire lattice’s state is sufficiently described by the states of its constituent
rows. It is reminded that because it is a statistical mechanical model, state is probabilistic, with
P(S ) ∝ e− H (S )/(kT ). Using the described partitioning scheme, it turns out that subsystem state
probability fulfils the property of a first-order Markov chain (cf. Appendix C).
3.2.3 The ground state sequence
Given the Markov property under the chosen representation of Ising interactions, the implica-
tions of ground state for the chain of states (S x1
1, S
x2
2, . . . , S
xnn ) are next examined. Formally, the
probability Pgnd of obtaining ground state energy minS ∈S { H (S )} is
Pgnd
∝exp −
1
kT minS ∈S {
H (S )}
∝ maxS ∈S
exp
− 1
kT H (S )
,
from which it is clear that Pgnd must be maximised, in order to infer the ground state configura-
tion. This configuration is given by the sequence
argmax(S 1,S 2,...,S n)
P(S 1)
ni=2
P(S i|S i−1)
,
which is the most likely sequence of emitted states in a first-order Markov chain.
This result is of significance for obtaining an algorithm for computing ground states, be-
cause there exists a well-known approach due to Viterbi [61]. The basis of the Viterbi algo-
rithm is the observation that optimal state for the first symbol emission in the chain is simply
argminS 1H (S 1). Augmenting the size of considered subproblems, optimum solutions are de-
termined successively, until the size of the set of considered problems equals the originally
specified problem. At this point, the optimisation is complete.
The probability of the most likely sequence of emissions (S µ1
1, S
µ2
2, . . . , S
µnn ), known as the
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 42/183
24 Chapter 3. Computational Background
6
210
1
Figure 3.3: Illustrating the principle of optimality. Paths within the dashed circle are known to
be optimal. Using this information, optimal paths for a larger subproblem can be computed.
Viterbi path, can be obtained from the recurrent formulation
Pviterbi(S i) =
maxS i
{P (S i)
}i = 1
maxS i−1{P (S i|S i−1) Pviterbi (S i−1)} i > 1,
by evaluating maxS n{Pviterbi(S n)}. It follows that the actual sequence can be formulated as
viterbi(i) =
argmaxS i
{Pviterbi(S i)} i = 1
argmaxS i{Pviterbi(S i)} + viterbi(i − 1) i > 1,
determined by evaluating viterbi(n). In this case, the ‘+’ operator denotes symbol concatenation,
so that (S µ1
1, S
µ2
2, . . . , S
µnn ) = S
µ1
1+ S
µ1
2+ . . . + S
µnn .
It is important to note that recursive definition of the Viterbi path diff ers from the (subopti-mal) approach of optimising every conditional probability P(S i|S i−1) individually. Instead, the
path is defined as the optimum of incremented subproblems, where subproblems are defined as
optimal. Schematically depicted in Figure 3.3, this approach is an application of the principle
of optimality due to Bellman [9]. Consequentially, the Viterbi algorithm is an instance of the
dynamic programming problem, recursively defined for all x ∈ X as
V ( x) = max y∈Γ( x)
{F ( x, y) + γ V ( y)} ,
where Γ is a map and 0
≤γ
≤1 is the so-called discount factor . The function V ( x) is known as
the value function, and is optimised using F ( x, y).
The concrete algorithm for computing the Viterbi path probability avoids the overhead and
backtracking suggested by the aforementioned recursive formulation. It involves an iterative
loop to increment the size of the considered system:
opt[] := 1
f o r i : = 1 t o n
for Sj
i∈ S i
pmax := − ∞
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 43/183
3.2. A dynamic programming approach to spin glass ground states 25
for S k i−1
∈ S i−1
p := P
Sj
i | S k i−1
* opt[k]
if p > pmax
optNew[k] := p
opt := optNew
In the listing, Sj
idenotes configuration j of subsystem S i, according to previous convention.
The array opt[] records the optimum path probability for preceding subsystems S 1, S 2, . . . , S i
for every iteration i of the algorithm. Elements of the array are initially set to unity. A second
array optNew[] is used to store updated path probabilities, which are subsequently copied to
opt[] after each iteration of the outer loop. Although the values of optimal state emissions are
discarded in this pseudocode, it is possible to retain them by storing them in an associative data
structure. An implementation of this approach is presented in Chapter 6.
Examining the algorithm’s time complexity, it is apparent that execution time is proportional
to the product of the three loops’ length, since these assume nested structure. That is,
t (n) ∝ n2S 12,
where n is the number of subsystems, and 2S 1 is the set of configurations of subsystem S 1. It
follows that if the spin lattice has dimensions n × m, it is
t (n, m) ∝ n 22m
which is O n 22m
.
By further observation it turns out that the Viterbi path can also be used to evaluate system
energy (cf. Appendix D). This provides a dynamic programming solution to the two dimensional
lattice without boundary conditions, which is
Hmin(S i) =
minS i
{ H (S i)} i = 1
minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1.
(3.2)
3.2.4 Boundary conditions
It is of interest to examine the eff ects of introducing cyclic boundary conditions on state op-
timality, using the described approach. As the latter involves partitioning the spin lattice into
rows, it is possible to diff erentiate between energetic contributions occurring within subsystems
S 1, S 2, . . . , S n, and energetic contributions occurring between these. It is apparent that horizon-
tal conditions have an eff ect on subsystem energy, whereas vertical conditions eff ect subsystem
interactions.
The first eff ect is caused by horizontal boundary interactions, as these involve spins located
at the outermost positions of each spin row. The Hamiltonian H (S i) thus eff ectively includes an
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 44/183
26 Chapter 3. Computational Background
additional term to account for an additional pairwise interaction. The Hamiltonian of the entire
lattice isn
i=1 H (S i)+n
i=2 H b(S i, S i−1), which sufficiently accounts for horizontal boundary in-
teractions within the system. Since the recursive formulation of ground state energy in Equation
3.2 also computes the sum of all subsystem Hamiltonians and their interactions, the existingdynamic program formulations and algorithms can be left unmodified. It follows that the al-
gorithmic complexity of computing ground states does not increase for the case with cyclic
boundaries along a single dimension.
In contrast, the vertical cyclic boundary condition results in pairwise interactions between
subsystems S 1, S n, i.e. the initial and ultimate spin rows. Here, each row constituent spin s j ∈S k (k ∈ {1, n}) potentially has a non-zero bond interaction with its neighbour, s
j∈ S k (k ∈
{1, n} \{k }). Consequentially, The Hamiltonian for the entire lattice is given byn
i=1 H (S i) +
ni=2 H b(S i, S i−1) + H b(S 1, S n), where the latter term is the interaction energy between the two
boundary systems in question. Here, it follows that the proposed existing solution does not
yield the ground state energy, as the recursive formulation does not include the additional term.
Configuration optimality is therefore not guaranteed, for the case with cyclic boundaries along
both lattice dimensions.
As a modification of the original dynamic programming solution, it is conjectured that the
ground state configuration can be determined by evaluating the set of problem instances where
both boundary rows are assigned spin configurations in advance, i.e.
H min = minS 1, S n
{ H min (S n, S n, S 1)
},
with
Hmin(S n, S i, S 1) =
H (S i) + H b(S 1, S n) i = 1
minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S n, S i−1, S 1)} i > 1,
Adapting the previous algorithm, this formulation implies that the execution time t (n) is
t (n) ∝2S 1
t (n)
where n is the number of subsystems, 2S 1 is the set of configurations of S 1 and t (n) is the
execution time of the previously specified algorithm. Therefore,
t (n, m) ∝ 2mn 22m
∝ n 23m
which is O(n 23m),
where the system consists of n × m spins.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 45/183
3.2. A dynamic programming approach to spin glass ground states 27
Proof of the conjecture is by induction. Since interactions within the system occur in a
regular lattice, the two adjacent boundary subsystems can be chosen arbitrarily, so the recursive
formulation becomes
Hmin(S j, S i, S j+1) =
H (S i) + H b(S i, S i−1) i = j + 1
minS i−1
H (S i) + H b (S i, S i−1) + Hmin
S j, S i−1, S j+1
otherwise,
with subsystems S 0, S 1, . . . , S n−1, boundary subsystems S j, S j+1 and subsystem interactions
mod n. It follows that the ground state energy is defined as
H min = minS j, S j+1
H min
S j, S n, S j+1
.
Choosing boundary subsystems S j+1, S j+2 the formulation further becomes
Hmin(S j+1, S i, S j+2) =
H (S i) + H b(S i, S i−1) i = j + 2
minS i−1
H (S i) + H b (S i, S i−1) + Hmin
S j+1, S i−1, S j+2
otherwise,
which clearly is the optimal sequence of emitted states, given states S j+1, S j+2. As the ground
state configuration can be deduced from minS j+1,S j+2
H min
S j+1, S n, S j+2
, the sequence re-
mains optimal also for this case. Therefore, the sequence is optimal for all j, i.e.
∀0≤i<n∃k ,S i ∀0≤ j<n∃k ,S j
H
S k
j ∪ S j
< H
S k
i ∪ S i
, (3.3)
using the notation S j to denote S \S j.
3.2.5 An order-n Markov approach to determining ground states
Having introduced the Markov model for both the first-order case and its higher-order extension,
it is of interest to examine whether the latter lends itself to a more powerful formulation of Ising
system state probability: Previously, the approach consisted of a row-wise system decomposi-
tion, which resulted in a sequence of subsystems with nearest-neighbour interactions along one
dimension. Reducing subsystem size, it is apparent that interactions between subsystems are no
longer restricted to occurring along one dimension.
Consider the extreme case, where a subsystem consists of a single spin. For the two-
dimensional n × m spin lattice, there exist subsystems S = {S 0, S 1, . . . , S n m−1}. The system’s
total energy is the result of horizontal and vertical interactions between subsystems, which may
be evaluated by sliding a window across the entire lattice, as shown in Figure 3.4. For each spin,
this window considers the interactions originating from a vertical and horizontal predecessor.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 46/183
28 Chapter 3. Computational Background
Figure 3.4: Sliding a unit-spin window across a lattice
Formally, the Hamiltonian is expressed as
H (S ) =
nm−1i=0
H b(S i, S i−1) + H b(S i, S i−m),
where H b(S i, S i−m) is the interaction energy between S i and its vertical predecessor. Simi-
larly H b(S i, S i−1) is the interaction due to horizontal predecessor S i−1. Also, subsystem indices
are computed mod (nm), in order to evaluate interactions occurring across lattice boundaries.
Here, it indeed turns out that a higher-order formulation of system state is possible (cf. Appendix
C), namely
P(S ) =
nm−1i=0
P (S i|S i−1, S i−2, . . . , S i−m−1) ,
from which ground state probability can be formulated as
Pviterbi(S i, S i−1, . . . , S i−m) =
P (S i, S i−1, . . . , S i−m) i ≤ m
maxS i−m−1{P (S i|S i−1, . . . , S i−m−1) Pviterbi (S i−1, . . . , S i−m−1)} i > m,
for the lattice without cyclic boundary interactions. As previously described, this probability
can be used to determine the actual ground state configuration, and can be reformulated to
determine ground state energy. It follows that the algorithm for obtaining solutions to thisdynamic programming problem is also a modification of the previous approach:
opt[] := 1
for i := m to n*m
for
Sj 0
i , Sj 1
i−1, . . . , S
jm
i−m
∈ ( S i, S i−1, . . . , S i−m )
i f i > m
pmax := − ∞for S k
i−m−1∈ S i−m−1
p := P
Sj0
i | Sj1
i−1, . . . , S k
i−m−1
* opt[
S
j1
i−1, . . . , S k
i−m−1
]
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 47/183
3.2. A dynamic programming approach to spin glass ground states 29
if p > pmax
optNew[k] := p
else
p := P Sj 0
i , Sj 1
i
−1
, . . . , Sjm
i
−m
optNew[
S j 0i , S j 1
i−1, . . . , S jm
i−m
] := p
opt := optNew
The above pseudocode consists of three nested loops, the outermost of which is responsible
for calculating the probability P(S i|S i−1, S i−2, . . . , S i−m−1) for iteratively increasing i. The loop
thus eff ectively specifies a sliding window of size m+1, which is moved across the lattice in the
fashion previously described. For each position of the window all spin configurations are eval-
uated, using the associative data structure opt[] to obtain the probabilities of preceding window
configurations. These are referenced by the tupleS
j 0
i, S
j 1
i−1, . . . , S
jm
i−m
, which represents a win-
dow configuration. The algorithm is for the case without cyclic boundary conditions, therefore
the window is not required to precede position i = m+ 1; at this position, window configuration
probability is unconditional.
Adapting the algorithm for calculating ground state energy, where the statement
p := P
Sj0
i| S
j1
i−1, . . . , S k
i−m−1
* opt[
S
j1
i−1, . . . , S k
i−m−1
]
becomes a summation of subsystem energies, the optimisation proceeds by determining ener-
getically minimal preceding window states for each position of the window on the system lat-
tice. In this form, the algorithm performs identically to the transfer matrix optimisation scheme
described in [15]. It follows that the described scheme must have equivalent computational
complexity.An analysis thereof confirms this assumption: Given that the lattice consists of n × m spins,
the algorithm’s execution time is proportional to
t (n, m) ∝ (nm − m − 1) 22(S 1,S 2,...,S m+2)
,where 2(S 1,S 2,...,S m) is the set of configurations of tuple (S 1, S 2, . . . , S m). Therefore,
t (n, m) ∝ (nm − m − 1) 2m+2 + 2m+1
which is O (nm
−m
−1 ) 2m+2
= O
(nm) 2m .
Although not considered in further detail, the opportunity for further modification of this al-
gorithm presents itself, to account for cyclic boundary interactions within the spin lattice. This
entails invoking the algorithm for specified configurations of the spin tuple (S 1, S 2, . . . , S 1+m),
similar to the algorithm employing a row-wise lattice decomposition. This is conjectured to in-
crease the algorithmic complexity to O(nm2m2m), since there are O(2m) possible configurations
of the specified spin tuple.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 48/183
30 Chapter 3. Computational Background
In the following chapter, parallelisation strategies are described for the harmony search
heuristic, the first-order Markov chain solution, and as an extension the aforementioned higher-
order modification.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 49/183
Chapter 4
Parallelisation Strategies
To be of practical use, a computational solution to a given problem must be able to be im-plemented on a machine architecture, such that the algorithm completes within a reasonable
amount of time. While computational complexity provides a means of qualitatively evaluating
problem tractability, the properties of the machine determine the amount of time required for
solving a particular problem instance.
To reduce machine execution time, an approach applicable to physical architectures is to
increase the processing rate of machine instructions. This may be achieved in practice by in-
creasing the machine’s CPU clock rate, improving memory bandwidth, and augmenting the ar-
chitecture by additional features such as registers, caches and pipelining. In general terms, this
requires no conceptual modification to the algorithm , although the algorithm’s performance isusually amenable to optimisation for the respective architecture.
The second approach to increasing machine performance involves parallelisation. Here, per-
formance is improved by distributing computation among a set of processing elements. With
the exception of algorithms with implicit parallelism in operations on data structures in combi-
nation with vector processing architectures, it is necessary to adapt the algorithm and devise a
scheme for achieving this distribution. For message passing architectures, this includes defining
explicit communication operations.
In the following, the potential for implementing parallel versions of harmony search and
dynamic programming methods is considered, with regard to MIMD architectures.
4.1 Harmony search
In the previous chapter, harmony search was described as a probabilistic algorithm employing
an evolutionary strategy for both discrete and continuous optimisation. As such, it performs
a heuristic evaluation of problem state space, i.e. search is non-exhaustive. Since improving
performance motivates parallelisation, it is necessary to examine the heuristic for the purpose of
31
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 50/183
32 Chapter 4. Parallelisation Strategies
P1
(a) No distribution (serial)
P1 P2
(b) ‘Weak scaling’
P1 P2
(c) ‘Strong scaling’
Figure 4.1: Using parallelism to improve heuristic performance
defining performance relevant characteristics.
For any heuristic algorithm, on one hand performance can be quantified by the search pro-
cess’ accuracy. The latter is influenced by the algorithm’s state space traversal policy, signifi-cantly by the size of the search space. It follows that performance can be improved by enlarging
the search space, since in the limit of search space towards state space, solution optimality is
guaranteed.
On the other hand, it may be of interest to restrict the heuristic’s execution time, as previ-
ously described for the general class of halting algorithms. In this case, the task is to increase
the rate at which search is performed.
Using parallelism to improve either of these characteristics, it is apparent that distribution
of computation among processors bears similarity to the concepts of strong scaling and weak
scaling, commonly encountered in parallel performance analysis. Whereas weak scaling im-plies increasing the number of processing elements while keeping the problem size constant
(therefore varying the fraction of computation assigned to a processor), strong scaling increases
the problem size with the number of processors (therefore keeping the fraction of computation
assigned to a processor constant). Similarly, in the case of the heuristic, parallelism can either
be applied for the purpose of distributing a search space of constant size (weak scaling), or for
increasing the size of the search space (strong scaling). Using a tree model, an example of this
relationship is shown in Figure 4.1.
4.1.1 Harmony search performance
The evolutionary strategy used by harmony search for combinatorial optimisation consists of ini-
tial candidate generation, followed by iterative randomised candidate recombination (including
randomised mutation) and solution replacement. The algorithm is probabilistic, hence search
is a random walk, whose average length is influenced by the memory choosing rate (Figure
4.2(a)). Also, the number of solution vectors influences search, such that for NVECTORS=1,
the optimisation becomes greedy: This is because a single solution is retained, which is only
replaced when a solution of higher utility is found . For larger NVECTORS i.e. maintaining a
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 51/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 52/183
34 Chapter 4. Parallelisation Strategies
SlaveSlaveSlave
MasterSolution
(a) Master-slave
Migrate
(b) Coarse-grained
Migrate & select
(c) Fine-grained
Figure 4.3: Parallelisation strategies for population based heuristics
4.1.2 Existing approaches
Parallelisation methods for metaheuristic algorithms were briefly mentioned in Chapter 3. These
are considered in more detail, in order to assess their potential adaptation for harmony search.
Cantu-Paz [14] provides an overview of parallelisation schemes for evolutionary algorithms.
Although these are discussed specifically in context of genetic algorithms, they are also appli-
cable to other evolutionary heuristics, such as those introduced by Koza for generating software
programs [5]. Cantu-Paz discerns between three classes of approach, known as global master-
slave, fine-grained and coarse-grained , respectively. These diff er in the way the evolution-
ary process is distributed amongst processors and to which extent solutions are communicated
amongst them.
Depicted schematically in Figure 4.3(a), the master-slave approach implements a single pop-
ulation; off spring are generated from potentially any parent solutions in the population (termed
panmixia). This is achieved by assigning the population to a single master processor, allowing
slave processors to access and modify individual solutions. Slave processors may be tasked with
evaluating solution fitness, whereas the master is responsible for selection and crossover. It is
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 53/183
4.1. Harmony search 35
possible to consider both a synchronous variant, where solutions are retrieved and modified in
discrete generations, and an asynchronous variant, where a slave may initiate a retrieval in ad-
vance of its peers. Either are suited for implementation on shared-memory or message passing
architectures, however it is noted that the heterogeneous organisation of processes into masterand slaves makes the approach generally less suitable for massively parallel architectures.
In the coarse-grained approach (Figure 4.3(b)), the evolutionary process is no longer pan-
mictic. The set of solutions which forms the population is partitioned among processors, so
that optimisation progresses primarily within semi-isolated ‘demes’ [14]. To allow evolution to
progress globally, demes exchange a proportion of their population with neighbours in a prede-
fined graph topology. This allows solutions of high utility to propagate across the graph, which
promotes convergence towards a common, global solution. On the other hand, the insularity
of subpopulations permits a high degree of diversity, allowing multiple local optima to be ap-
proached independently, thereby preventing early convergence. Previous work includes investi-
gations based on coarse-grained approaches, using both fixed toroidal or hypercubic topologies
and dynamic topologies . The distributed approach makes this technique particularly attractive
for implementation on message passing architectures.
The fine-grained approach, shown in Figure 4.3(c), is also based on distributing the solu-
tion population amongst processors. However in contrast, exchange of solutions occurs more
frequently during the evolutionary process: Instead of periodically initiating migration between
subpopulations, selection itself takes place between processor-assigned demes, which in the
most extreme case consists of a single solution. Depending on the specified network topology,
it may be practicable to select from all subpopulations within a certain vicinity from the initiat-
ing deme, which results in a overlapping selection scheme. Cantu-Paz notes that if this vicinity
is equal to the network diameter for all nodes, evolution regains panmixia. Suited for mas-
sively parallel architectures due to its scalability, this approach appears to be especially eff ective
because of its flexibility.
Aside from evolutionary algorithms, a potentially relevant approach to parallelising a heuris-
tic is presented by Ram et al. [55]. Here, the simulated annealing algorithm is executed indepen-
dently by multiple processors, where each initialises search with a random configuration. This
allows parallel exploration of the seach space, in analogy to the eff ect achieved by executing an
evolutionary process such as genetic algorithms using disjoint subpopulations: Since annealing
proceeds independently, the process executed by each processor potentially converges towards a
diff erent local optimum. To counteract state space exploration, periodically the most promising
solution is determined and exchanged between processors. Akin to migrating solutions between
demes, this promotes global convergence towards a single solution. The number of algorithm
iterations required for convergence is hence reduced. In their implementation, Ram et al. em-
ploy a collective exchange scheme for communicating solutions between individual annealing
processes. However, the neighbourhood exchange scheme described by Cantu-Paz is equally
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 54/183
36 Chapter 4. Parallelisation Strategies
applicable.
4.1.3 Proposed parallelisation scheme
In the described approaches, parallelism is applied with the intention of enhancing the explo-
rative or exploitative properties of heuristics: Whereas the coarse-grained evolutionary approach
improves exploration alone through parallel selection, the remaining approaches include an el-
ement of parallel search exploitation, by propagating promising solutions in order to accelerate
solution convergence. The method used by Ram et al. can be viewed as a simplification of
the coarse-grained evolutionary approach, where the graph defining solution exchanges is fully
connected.
Having stated the motivation for parallelising harmony search, the opportunity is given to
apply the described approaches to this heuristic. Given that harmony search is an evolutionary
algorithm, distributed state space exploration and exploitation are readily adapted from parallel
genetic algorithms.
Figure 4.4 schematically depicts the proposed parallelisation scheme. Here, optimisation
takes place in distributed fashion, so that the heuristic is executed by multiple processors, each
assigned a set of solution vectors. To allow solutions to be exchanged between processors,
the latter are arranged in a ring. Periodically, processors send solutions to their successors,
while receiving these from predecessors. This reflects the behaviour of the aforementioned
fine-grained approach. In addition however, processors are organised into a twofold hierarchy,
where subordinate processors are not directly involved in cyclic exchange of solutions. Instead,
these exchange solutions using collective operations, based on the scheme described by Ram
et al. Subordinate processors are grouped in such a way that each subgroup includes a ‘ring
exchange’ processor. It follows that collective exchanges consider solutions obtained through
the cyclic exchange process.
Although the proposed scheme is comparatively involved, it allows the behaviour of the
heuristic to be altered by introducing a bias towards search space exploration or conversely
search space exploitation: If the size of subgroups is equal to the total number of processors,
communication is restricted to collective solution exchanges, so that rapid convergence is pro-
moted. In this case, eff ectively only a single subgroup exists. Providing that communication
occurs at short intervals to ensure that similar solution vectors are held in memory, it is specu-
lated that the algorithm will exhibit the described ‘weak scaling’ behaviour while increasing the
number of processors. On the other hand, for unit subgroup size, collective solution exchanges
are absent from the distributed search process. As a consequence, the ring-based approach is
reinstated. Here, the expectation is that the heuristic will emphasise on explorative search, and
therefore exhibit ‘strong scaling’ behaviour when increasing the number of processors.
It is apparent that there are a multitude of parameters which influence parallel optimisation,
in addition to the memory choosing rate and number of solution vectors defined by serial har-
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 55/183
4.1. Harmony search 37
Collective exchangeCyclic exchangeProcessor
Figure 4.4: Harmony search parallelisation scheme
mony search. These include the total number of processors involved in search, and the size
of subgroups. Also, of significance is the rate at which solutions are exchanged, both for the
ring and collective subgroup operations. Finally, the latter two operations must be defined in
detail; these may for example involve selecting solutions at random, or communicating the most
promising solutions.
The following describes a pseudocode prototype of a parallel harmony search algorithm for
obtaining Ising spin glass ground states, using the message passing model:1 S o l u t i o n [ ] s o l u t i o n s := i n i t i a l i s e r a n d o m s o l u t i o n s ( NVECTORS ) ;
2
3 f o r ( i =1 ; h a s c o n v e rg e d ( ) ; i ++) {4 S o l u t i o n s o l u t i o n = new S o l u t i o n ;
5
6 f l o a t h i g h e s t e n e r g y = c o m p u t e h i g h e s t e n e r g y ( s o l u t i o n s ) ;
7 i n t h i g h e s t e n e r g y v e c t o r = c o m p u t e h i g h e s t e n e r g y v e c t o r ( s o l u t i o n s ) ;
8
9 f o r ( j :=1 ; j <= s o l u t i o n . l e n g t h ; j ++) {10 i f ( r a n d ( 0 , 1 ) < MEMORY CHOOSING RATE) {11 s o l u t i o n [ j ] := s o l u t i o n s [ r a n d ( ) ] [ j ] ;
12 } e l s e {13 s o l u t i o n [ j ] := r a n d o m s p i n ( ) ;
14 }15 }16 i f ( s p i ng l as s e n er g y ( s o l u t i o n ) < h i g h e s t e n e r g y ) {17 s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] := s o l u t i o n ;
18 }19 i f ( PROCESSOR ID mod ZONE SIZE = 0 ) {20 m s g s e n d ( s o l u t i o n s [ r a n d ( ) ] , ( PROCESSOR ID+ZONE SIZE) mod
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 56/183
38 Chapter 4. Parallelisation Strategies
N PROCESSORS ) ;
21 m s g r c v ( r c v s o l u t i o n ) ;
22 c o p y m i n ( r c v s o l u t i o n , s o l u t i o n s [ r a n d ( ) ] ) ;
23
}24 i f ( i mod ZONEEXBLOCK = 0 ) {25 r e d u c e m i n z o n e ( s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] ) ;
26 }27 }
As with serial harmony search, the algorithm consists of an iterative loop, whose purpose it
is to generate successive solutions and evaluate their utility. The proposed algorithm involves
terminating the loop when the most favourable configurations held by processes have identical
energies. Although a more obvious approach might involve a less stringent termination criterion,
it is thought that using this scheme, the number of iterations until termination provides a rea-
sonable means of evaluating solution exploitation. Within the loop, solutions with random spins
are generated, based on the configuration of existing solutions (lines 9–15), and replaced (lines
16–18). The constants NVECTORS and MEMORY CHOOSING RATE control the number of
retained solution vectors and the memory choosing rate, respectively. Following this, each loop
iteration contains communication instructions for processors involved in ring exchange of solu-
tions: Lines 20 and 21 swap random solution vectors between processors, following which the
function copy min() on line 22 copies the value of the energetically more favourable argument
to its complementary argument. In this way, energetically favourable solutions are propagated
within a ring of search processes. There are (N PROCESSORS÷ZONE SIZE) such processors
in the ring.
In addition, solutions are periodically exchanged between subgroups of processes, using the
collective operation reduce min zone. This performs a reduction based on the most favourable
of argument solutions. As defined, the operation involves the highest energy solutions held by
each search process. The operation is executed at a rate determined by the constant ZONE-
EXBLOCK. Subgroup size is influenced by the value of constant ZONE SIZE. When equal
to N PROCESSORS, there exists a single group for which collective operations are defined,
whereas ring communications are without eff ect. Conversely, for unit ZONE SIZE all processes
are involved in ring communications, whereas collective operations are without eff ect.
4.2 Dynamic programming approaches
In the previous chapter, exact solutions to the ground state problem were presented, based on
modelling spin interactions as Markov chains. The latter in turn were used to arrive at dynamic
programming formulations of the respective optimisation problems. Run-time complexities are
lower than the 2nm bound required for finding the ground states of the n × m spin lattice using
brute force, nevertheless they are high enough to merit investigating parallelisation strategies.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 57/183
4.2. Dynamic programming approaches 39
4.2.1 First-order Markov chain approach
Parallelisation is based on an approach by Grama et al. [30], where a dynamic programming
problem which is serial and monadic is decomposed into a tabular arrangement of solutions
to subproblems of increasing size. The order of operations required to solve the problem is
equivalent to the order of individual scalar multiplications and additions required for a series
of matrix / vector multiplications. The parallelisation approach is therefore given by parallel
matrix / vector multiplication, which is well studied.
A dynamic programming problem is monadic if its optimisation equation contains a single
recursive term. That is, given the function c = g ( f ( x1), f ( x2), . . . , f ( xn)), which assigns a cost
to the solution constructed from subproblems x1, x2, . . . , xn, monadicity exists when g is defined
as f ( j)
⊗a( j, x), where
⊗is an associative operator. In this form, each solution depends on a
single subproblem.
Furthermore, a dynamic programming problem is serial, if there are no cycles in the graph
of dependencies between subproblems. More formally, the graph G = (V , E ) is defined by the
set of nodes V , where each edge represents a subproblem. An edge between nodes exists, if the
optimisation equation contains a recursive term indicating a dependency between subproblems.
Examining the optimisation equation for lattice ground state energy (without cyclic bound-
ary conditions),
Hmin(S i) =
minS i { H (S i)} i = 1
minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1,
it is apparent that the equation is monadic. To establish existence of the serial property, the
graph of subproblem dependencies is visualised (Figure 4.5(a)). As depicted, rows of nodes
represent states of subsystems S i, which characterise the values of subproblems. Since there
are n subsystems, there are n2|S 1 | nodes in the graph. Since a subproblem may assume as many
values as there are values of its preceding dependency, the graph has a trellis-like structure con-
sisting of bipartite graph segments. Because this organisation into individual levels is acyclic,
the dynamic programming problem is serial.
The graph is modified to include information on system energy. Given the pair of nodes
associated with subsystem configurations S k i
, S li−1
, define the weight function wS k
i, S l
i−1
=
wk , li
= H S k
i
+ H b
S k
i, S l
i−1
, for 1 < i ≤ n. Further define an additional node α, such that the
set of graph edges in extended to E = E ∪(α, S k
1)|1 ≤ k ≤ q
for q subsystem configurations.
For i = 1, the weight function is defined as wα, S k
i
= H (S i). Minimising system energy is
then equivalent to obtaining mink pα, S k
n
, where p
α, S k
n
is the minimum path between nodes
α and S k n.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 58/183
40 Chapter 4. Parallelisation Strategies
n
2m
(a) First-order
(n − 1)m
2m+1
(b) Higher-order
Figure 4.5: Graph of subproblem dependencies for an n = 3, m = 2 spin problem
A further observation is that that the minimum paths p
α, S k i
, 1 ≤ k ≤ q are expressed as
p(α, S 1i ) = min
w
1,1i
+ p(α, S 1i−1), w
1,2i
+ p(α, S 2i−1), . . . , w
1,q
i+ p(α, S
q
i−1)
,
p(α, S 2i ) = min
w
2,1i
+ p(α, S 1i−1), w
2,2i
+ p(α, S 2i−1), . . . , w
2,q
i+ p(α, S
q
i−1)
,
...
p(α, Sq
i) = min
w
q,1
i+ p(α, S 1
i−1), wq,2
i+ p(α, S 2
i−1), . . . , wq,q
i+ p(α, S
q
i−1)
,
for i > 1. For i = 1, p(α, S k i
) = w(α, S k i
). In an analogy to matrix / vector multiplication,
where addition is substituted by minimisation and multiplication is substituted by addition, the
equations are equivalent to
pi = Mi, i−1 × pi−1
where pi = [ p(α, S 1i
) p(α, S 2i
) . . . p(α, Sq
i)] T . For i > 1, the matrix is defined as
Mi, i−1 =
w1,1i
, w1,2i
, . . . , w1,q
i
w2,1i
, w2,2i
, . . . , w2,q
i...
..
.
. ..
..
.w
q,1
i, w
q,2
i, . . . , w
q,q
i
,
otherwise
Mi, i−1 =
w(α, S 1i
), w(α, S 1i
), . . . , w(α, S 1i
)
w(α, S 2i
), w(α, S 2i
), . . . , w(α, S 2i
)...
.... . .
...
w(α, Sq
i), w(α, S
q
i), . . . , w(α, S m
i)
.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 59/183
4.2. Dynamic programming approaches 41
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
P1
P1
P1
P1
P2
P2
P2
P2
P3
P3
P3
P3
P4
P4
P4
P4
×
×
×
×Step 1
Step 2
Step 3
Step 4
(a) Basic
1
2
3
4
2
3
4
1
3
4
1
2
4
1
2
3
P1
P1
P1
P1
P2
P2
P2
P2
P3
P3
P3
P3
P4
P4
P4
P4
×
×
×
×
(b) Improved
Figure 4.6: Parallel matrix operations. Numerals indicate order of vector elements.
Using a sequence of n matrix / vector operations, it is now possible to compute minimum paths
p(α, S k i
), by initialising p to a q-component zero vector: The first operation M1,0 × p0 yields
minimum paths p(α, S k 1
) for 1 ≤ k ≤ q. Retaining the value of the resulting vector as the
argument for the next matrix / vector operation, minimum paths p(α, S k 2
) for 1 ≤ k ≤ q are
computed. The process is continued, until minimum paths p(α, S k n) have been computed. The
minimum vector component then corresponds to ground state energy.
Matrix operation parallelisation
A simple approach to parallelising the matrix / vector operation is shown in Figure 4.6(a). Here,
the matrix is distributed in such a way that each processor stores the values of q
prows, where p
is the number of processors. Each is responsible for computing the same fraction of components
of the resulting vector. It follows that the latter is assembled from partial results computed by
each processor. In the message passing model, this can be achieved using a gather operation.
For the required purpose, it is necessary for each processor to access all components of the
resulting vector subsequently. Therefore, it is practical to gather collectively. The algorithm is
described in the following pseudocode, where M i, i−1k , l
denotes the component in row 1 ≤ k ≤ q,
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 60/183
42 Chapter 4. Parallelisation Strategies
column 1 ≤ l ≤ q of matrix M i, i−1:
Float[] p
Float[] p
for k:= (proc id * q p
+1) to ((proc_id+1) * q p
)
Float minval := ∞f o r l : = 1 t o q
if p[l] + M i, i−1k ,l
< minval
minval := p[l] + M i, i−1k ,l
p[k ] := minval
all gather(p, p)
In the pseudocode, the outer loop is responsible for iterating through matrix rows. For each row,
elements are added to vector components stored in p. The minimum sum becomes a component
of the vector p. Matrix rows are assigned to processors based on the processor identifier proc id,whose value is in the range [0, number of processors). The computation concludes with the
collective operation all gather().
Examining the algorithm’s computational complexity, it can be seen that execution time is
t (q) ∝ q
pq. Since determining ground state energy requires n iterations of the algorithm, where
n is the number of rows in the spin lattice, total execution time is t (n, q) ∝ nq2
p. Considering that
the lattice contains m = log2 (q) spin columns, execution time expressed in terms of lattice size
is O
n p
22m
, which is cost optimal in comparison to the serial algorithm presented in Chapter 3.
Memory efficient matrix / vector computation
Alternatively, it is possible to perform the desired matrix / vector computation using a parallel
algorithm with reduced memory requirements for vectors q, q. In resemblance to Cannon’s
algorithm [13], it can be observed that although all processors access vector q in its entirety, in-
dividual components need not be accessed simultaneously, as in the described approach above.
Instead, the vector can be distributed between processors, so that each holdsq
pcomponents.
Computation commences with each processor performing additions of matrix elements asso-
ciated with its allocated vector components. After the latter have been processed, all proces-
sors perform a cyclic shift of vector components, which allows the minimisation operation to
progress further. This procedure is repeated until processors have completed the minimisation
operation on their assigned rows. The approach is illustrated in Figure 4.6(b), for which the
modified pseudocode is:
Float[] p
Float[] p
for k:= (proc id *q
p+1) to ((proc_id+1) *
q
p)
Float minval := ∞f o r l : = 1 t o q
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 61/183
4.2. Dynamic programming approaches 43
if (l modq
p) = 1
cyclic shift(p)
if p[(l-1) modq
p+ 1 ] + M
i, i−1k ,l
< minval
minval := p[(l-1) modq
p+ 1 ] + M
i, i−1k ,l
p[(k-1) mod q p
+ 1 ] : =minval
Here, the previously defined loop has been adapted to index the components of the distributed
vectors. Since the result vector p becomes an operand in successive iterations of the algorithm,
performing a collective operation on p is not necessary; this vector is thus distributed identically
to p.
In Chapter 3, a serial algorithm was presented for the ground state energy of the lattice
with cyclic boundary conditions. This involved evaluating the boundaryless ground state en-
ergy H min for all configurations of boundary subsystems S 1, S n. To adapt the parallel matrix
algorithm for this problem, define the weight function between nodes α, S k
1as w α, S k
1 =
H S k
1
+ H b
S k
1, S l
n
, for boundary subsystem configuration S l
n. The ground state energy can
then be obtained by performing the described series of matrix operations for all configurations
of subsystem S n. For each configuration S k n, the final result vector contains the minimum path
lengths pn = [ p(α, S 1n) . . . p(α, S k
n) . . . p(α, Sqn)]T , of which the relevant component is retained.
The ground state energy is the minimum of these retained components. The complexity of the
entire computation is O
n p
23m
executed on p processors, for an n-row, m-column lattice. In
comparison to the serial algorithm, this is cost optimal.
4.2.2 Higher-order Markov chain approach
It remains to develop a parallel solution to the approach based on the higher-order Markov chain.
For this model, it was formulated that ground state probability is
Pviterbi(S i, S i−1, . . . , S i−m) =
P (S i, S i−1, . . . , S i−m) i ≤ m
maxS i−m−1{P (S i|S i−1, . . . , S i−m−1) Pviterbi (S i−1, . . . , S i−m−1)} i > m,
where m is the number of lattice columns. By the relation between state probability and energy,
in analogy to the approach based on row-wise lattice decomposition shown in Chapter 3, it was
shown that
Hmin(S i, S i−1, . . . , S i−m) =
H (S i, S i−1, . . . , S i−m) i ≤ m
minS i−m−1{ H b (S i, (S i−1, . . . , S i−m−1)) + Hmin (S i−1, . . . , S i−m−1)} i > m,
where H (S i, S i−1, . . . , S i−m) is the energy of the ordered set of subsystems (S i, S i−1, . . . , S i−m)
and H b(S i, (S i−1, . . . , S i−m−1)) is the interaction energy between system S i and the ordered set
(S i−1, . . . , S i−m−1). Examining this optimisation equation, it can be seen that it is monadic,
since it contains a single recursive term. As each level of recursion eff ects a unit decrease
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 62/183
44 Chapter 4. Parallelisation Strategies
of indices of the tuple (S i, S i−1, . . . , S i−m), there are no cyclic dependencies between subprob-
lems. The dynamic programming formulation is therefore also serial. Considering this sim-
ilarity, the opportunity is given to adapt the parallel matrix based computation to solve this
dynamic programming problem. To achieve this, the weighted graph of subproblems is re-established, with an edge connecting two nodes if the recursive formulation indicates depen-
dency. For an n × m spin lattice, there are (n − 1) m 2m nodes in the graph, because each tuple
(S i, S i−1, . . . , S i−m) has 2m configurations and a solution is constructed from (n − 1) m subprob-
lems. A given subproblem corresponds to a certain position of the sliding window on the lat-
tice, as described in Chapter 3. The function w ((S i, S i−1, . . . , S i−m), (S i−1, S i−2, . . . , S i−m−1)) =
H b (S i, (S i−1, . . . , S i−m−1)), defined for i > m, describes the weight of an edge. As before,
the graph is extended with an additional node α, so that the set of edges is defined as E =
E ∪ {(α, (S 1, S 2, . . . , S m+1)) | for all configurations of (S 1, . . . , S m+1)}. For i ≤ m, define the
weight function w(α, (S i, S i−1, . . . , S i−m)) = H (S i, S i−1, . . . , S i−m). This results in a trellis-likegraph, shown in Figure 4.5(b). Minimising system energy is equivalent to obtaining
min(S nm,S nm−1,...,S nm−m)
{ p (α, (S nm, S nm−1, . . . , S nm−m))} ,
where the function p is the minimum path between two nodes in the graph.
Previously, matrices of edge weights between trellis segments were used to compute min-
imum paths, for which the parallel matrix operation was presented. From the optimisation
equation and Figure 4.5(b), it is observed that each node at a given level is connected to
only two nodes at the preceding level. This is because there are two configurations of tuple
(S i−1, S i−2, . . . , S i−m−1) for any specified tuple (S i, S i−1, . . . , S i−m). Assigning infinite weights
to unconnected nodes between trellis levels, it follows that the matrices are sparse, with regard
to infinite valued elements.
Providing matrix sparseness can be exploited, an adaptation of the existing parallel algo-
rithm will execute in t (n, m) ∝ (n − 1)m 1 p
2m time on p processors, since each matrix contains
2m rows distributed between processors. With a total of (n − 1) m matrix operations, the ground
state energy of the lattice without cyclic boundary conditions can be obtained in O( nm p
2m) time.
This is cost optimal in comparison to the serial algorithm described in Chapter 3. Using bit
string representations of spin tuples in combination with shift operations, an approach which
considers matrix sparseness is described in Chapter 6.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 63/183
Chapter 5
The Project
In previous chapters, the theoretical background to the ground state optimisation problem wasdescribed. Having described the two approaches identified for solving this problem, this chapter
deals with undertaken practical work towards their implementation and evaluation.
5.1 Project description
The purpose of the project is to conduct practical investigation into parallel algorithms for de-
termining ground states of the Ising spin glass. Specifically, the project deals with the two-
dimensional Edwards-Anderson model, i.e. the Ising model with lattice aligned spins, in which
spins are able to assume two discrete states.Investigations deal with a method for obtaining spin glass ground states exactly. The method
is based on the transfer matrix method, in which the statistical-mechanical properties of the lat-
tice system are used to obtain solutions. It follows that one project objective is to develop a
parallel algorithm based on the Transfer Matrix method. As an additional objective, the project
includes investigating an alternative parallel algorithm, with which solutions to the ground state
problem are obtained heuristically. The performance of both parallel algorithms is to be evalu-
ated; in the case of the heuristic this entails evaluating solution accuracy.
Investigation requires that algorithms are developed in software. The software should be
self-contained: From the user’s perspective, the software should off er sufficient functionality tobe useful as a research tool, allowing various types of problem instance to be solved using the
implemented algorithms. The software should be able to be executed on a wide range MIMD
multiprocessing architectures.
5.1.1 Available resources
There are two computing resources available for the project. The first of these, Ness, is a shared
memory multiprocessor system [2]. It has a total of 32 back-end processors, which are parti-
45
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 64/183
46 Chapter 5. The Project
tioned into two interconnected groups. This configuration allows a single job to request 16 pro-
cessors at maximum. The system is constructed from AMD 64-bit Opteron processors, which
have a clock frequency of 2.6GHz. Jobs are submitted to the back-end from a dual processor
front-end, which executes the Sun Grid Engine scheduling system. The back-end has 32 × 2GBof RAM. The system is based on the Linux operating system, providing Fortran, C and Java
programming environments. Both shared memory and message passing model programming
are supported, using the MPI and OpenMP programming interfaces. Ness does not implement
a budget system for CPU time, however access to queues is restricted according to the amount
of requested computation time.
Also available is the supercomputing resource HPCx [3]. This consists of a cluster of IBM
P575 shared memory nodes, each containing 16 processors and 32GB of RAM. For executing
jobs, the system consists of 160 compute nodes. Nodes are constructed from Power5 proces-
sors, which have a clock frequency of 1.5GHz. The processor architecture allows for 6.0Gflop / stheoretical peak performance. Inter-node communication is supported using IBM High Perfor-
mance Switch interconnects. These provide a maximum unidirectional inter-node bandwidth
of 2GB / s, at MPI latencies of 4–6 µs [24]. Based on the AIX operating system, the serial and
parallel programming environments are similar to those provided on Ness. The job scheduler,
LoadLeveler, provides queues for serial and parallel jobs, using a budget system for CPU time.
5.2 Project preparation
Before commencing the project, an initial phase was designated to project preparation. Thisconsisted of investigating the problem background and defining the project’s aims. Potential
approaches to solving the spin glass problem were identfied and implemented as prototype soft-
ware. Project process activities were carried out, consisting of a risk analysis and scheduling. A
software development model was decided upon.
5.2.1 Initial investigations
Access to an existing serial transfer matrix code was provided before commencing the project
preparation phase. The potential was given for a code level analysis of parallelism; this approach
was considered an alternative to basing an implementation on the mathematical formulation of
the optimisation problem, which was subsequently undertaken. With a view to implementing
the parallel approach described by Grama et al. [30], initial work consisted of investigating the
exact optimisation technique described in Chapter 3.
The harmony search algorithm was identified as a potential secondary approach to com-
pare to the envisaged exact ground state solver. After initialising a CVS repository for project
source code and experiment data, a serial implementation of the heuristic was evaluated, in
order to assess the algorithm’s suitability for further parallelisation. The evaluation consisted
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 65/183
5.2. Project preparation 47
spinglass.h
+int xSize
+int ySize
+double[] weights
+Spin[] initialSpins
+Spin[] spins
+boolean[] clamps
Figure 5.1: Spin glass structure design
of determining solution accuracy, based on ground states obtained for a collection of random
spin glasses, using an implementation of a brute force algorithm. Discussed in Chapter 7, re-
sults suggest that solution accuracy might be increased, using a parallel implementation of the
algorithm.
5.2.2 Design and implementation
A basic software framework was developed, to facilitate the collation of performance data. This
framework consisted of a set of utilities, implementing rudimentary functionality for creating
spin glass problem instances and evaluating their energy. Based on this, a design for a more
extensive framework was created, based on the following list of client operations on a spin glass
API:
• Initialisation of spin lattices with specific boundary conditions
• Destruction of spin lattices
• Calculation of system energy
• Bond randomisation
Also, a spin glass data structure was designed. Shown in Figure 5.1, this consists of instance
variables for storing the height and width of the spin lattice. The values of spins themselves are
stored in an associative array-like data structure, as are the values of coupling constants. The
former are stored two-dimensionally in row major fashion, while the latter require an additional
dimension. In the design, two 2-dimensional arrays store vertical and horizontal bonds, again
using a row major storage scheme. To record whether a spin is clamped to a specific state, the
data structure includes a further array. Finally, the initial values of spins are stored. This stores
the actual state to which a spin is clamped, allowing the primary spin array to be reserved for
computation.
A schema of the framework is shown in Figure 5.2. This includes an interface for per-
forming input / output operations: It allows representations of coupling constants to be read from
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 66/183
48 Chapter 5. The Project
SpinGlass
+spinGlass_new()
+spinGlass_remove()
+spinGlass_energy()
IO+readBonds()
+writeBonds()
+readClamps()
+writeClamps()
writeBonds writeClamps
transferMaxtrixSolverSolver
Figure 5.2: Software framework design
files, similarly a function allows the clamping state of spins to be read. These operations are
complemented by functionality for writing representations to file.
The IO operations are required by the two utilities writebonds and writeclamps, which fa-
cilitate creating spin glass problem instances. These are responsible for writing data to files,
which are subsequently read by solver utilities. The format of clamping state files is specified as
a UNIX UTF-8 encoded text file, containing the symbols ‘1’ and ‘0’. These provide a represen-
tation of whether a spin is clamped, such that a string encodes the state of a lattice row. Strings
consist of the aforementioned symbols, separated by whitespace. Spin clamps are stored in the
file as consecutive strings, separated by line feed characters. The file format for spin coupling
constants is similar: Here, symbols are floating point numbers in decimal notation, again sep-
arated by whitespace and line feed characters. The format reflects the design of the spin glass
data structure, in that two consecutive blocks retain values of vertical and horizontal bonds. The
format specifies that these blocks are separated by a single blank line.
Figure 5.2 also shows the design of the spin glass API. This exports functionality to client
solvers, which themselves implement a simple interface for solving spin glass instances. A
solver uses the IO interface to construct a spin glass instance from bond and clamp state files.
Thereafter, it invokes its implementation of a ground state algorithm. The latter utilises further
API operations, to evaluate spin glass energy. Finally, the spin glass instance is destroyed, after
an output of the determined solution has been generated.
5.2.3 Implementation language and tools
During the course of software design, the choice of implementation language and tools was
considered. The C language was selected due to its general widespread use as a development
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 67/183
5.2. Project preparation 49
language on high performance systems, and availability of compilers both on the two computa-
tion resources and development machines. To ensure portability, ANSI C 89 was selected as the
implementation standard.
To expedite software development, it was decided to implement the software using the GLib
library [1]. This is a cross-platform collection of utility functions which implement general
purpose data structures, parsers etc. Macros and type definitions are provided, which potentially
reduce the amount of required pointer casts in a code. This in turn has an impact on cast errors
and debugging time.
A build management system was also selected. Widely used in conjunction with the C and
C++ programming language on UNIX based systems, this allows makefiles to be generated
semi-automatically and configured for diff erent target systems. This was considered useful for
providing an application package for a variety of systems.
Given the available computing resources of which the HPCx is a clustered system, the MPI
message passing library was chosen for parallel development. For this reason, the algorithms
described in Chapter 4 are given for the message passing model. Although the possibility of
using a hybrid shared memory / message passing approach using MPI and e.g. OpenMP is given,
this was considered beyond the scope of the project.
5.2.4 Choice of development model
For the choice of software development model, multiple factors were taken into account. These
included the amount of time available, the required functionality and overall software complex-
ity.
Intuitively, implementation can be realised in two phases, each relating to one of the two
algorithms. From previous experience and design requirements, it was assumed that each of
the implementation tasks would involve a relatively small amount of written code. Instead, im-
plementation eff ort was assumed to focus on distribution of data, communication patterns and
algorithm correctness. Therefore, it was thought that the approach of applying staged delivery to
each phase would be advantageous to the project. Following the design of the framework’s over-
all architecture with multiple ground state solvers, this approach involves discrete design / imple-
mentation / testing activities associated with one release for each ground state solver. Developing
each ground state solver is associated with iteratively augmenting software functionality.
5.2.5 Project schedule
The devised project schedule is shown in Appendix A. Based on an available time frame of 16
weeks, the schedule accounts for all project deliverables, implementation goals and exploratory
aims. Therefore both a practical component, consisting of software development and evaluation,
and the project report and presentation are included.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 68/183
50 Chapter 5. The Project
Risk Type Impact Likelihood Action
Data loss Schedule High Low Avoid
Lack of time Schedule, Scope High Moderate Reduce
Unavailable testing resources Schedule, Quality, Scope High Low Avoid
Algorithmic complexity Scope, Schedule Moderate Moderate Avoid
Table 5.1: Identified project risks
The practical component is split into two distinct phases. Each of these corresponds to the
development and evaluation of the dynamic programming and harmony search based ground
state solvers. A development / evaluation iteration is comprised of tasks for designing, imple-
menting, debugging and testing software, before gathering performance data. Following devel-
opment and evaluation, tasks are specified for producing the report and presentation. A single
week is left unallocated for making amendments to the produced work.
The implementation, debugging and testing tasks required for software development are
scheduled in parallel, as it was thought that this best reflects the nature of the chosen develop-
ment model, where functionality is integrated iteratively. Evaluation tasks are interleaved with
software development, so as to minimise the eff ects of unavailable resources, should these have
occurred.
5.2.6 Risk analysis
To assess the chance of the project’s successful completion, potentially detrimental factors were
considered. Such factors include those aff ecting the project plan and scheduling, software qual-
ity and software scope. Table 5.1 lists risks identified during project preparation by type, esti-
mated impact, likelihood of occurring and proposed action.
Judging from the product of impact and likelihood of occurrence, the most significant risk is
lack of time. As the time frame for completing the project and required deliverables was short,
this was conceivable. To counteract this, care was taken to define project goals rigorously to
avoid feature creep, furthermore all tasks were scheduled within a 15 week time frame, allowing
for a further week as float time.
The remaining risks were avoided by ensuring sufficient computing time on parallel ma-
chines (pertaining to unavailable resources), backups and software version control (pertaining
to data loss) and sufficient background research (pertaining to sophisication of algorithms). As
a fallback action in the event of not being able to implement the researched transfer matrix
scheme, the possibility of performing a code level analysis of an existing serial transfer ma-
trix solver code was given. As a caveat, this approach would have off ered less insight into the
underpinnings of parallelism in the transfer matrix method.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 69/183
5.2. Project preparation 51
5.2.7 Changes to project schedule
A number of changes were made to the project schedule. These concerned both the order of
scheduled tasks and their estimated duration.
Most significantly, developing the parallel harmony search solver proved to require less
time than envisaged in the project schedule; it claimed only two schedule weeks in comparison
to the four weeks assigned during preparation. As a result, it was possible to implement a more
advanced exact parallel solver, as previously described.
Also, the original decision to designate performance evaluation to a single task for each of
the two solver types proved impractical. Instead, data were gathered separately for each comput-
ing resource, with subtasks for each variant of the exact solver. Separating evaluation between
machines was initiated by the fact that implementing experiments on HPCx was delayed due to
compilation issues with the required version of the GLib library.
Furthermore, after devising the original project schedule, the communicated date for the
presentation proved to be after the date for the remaining deliverables. The time gained was
allocated to completing the project report.
5.2.8 Overview of project tasks
The following provides a description of tasks undertaken during the project, as an account of
the extent to which the project schedule was adhered to.
In weeks 1 and 2, the ideas presented in Chapter 3 were developed as a basic serial exact
ground state solver code. The parallelisation method using collective operations, discussed in
Chapter 4 was also implemented. In both cases, the algorithms were based on the spin lattice
without boundary conditions.
In week 2, timing data were collected for the previously implemented serial solver. In addi-
tion, scaling data for the parallel solver were collected on the Ness computing resource. Work
commenced on implementing the improved parallel ground state solver using cyclic commu-
nication patterns, also described in Chapter 4. The improved parallel ground state solver was
completed in week 3. In week 4, further scaling performance data were collected on Ness for
this code. Remaining time in week 4 was used to conduct a code review, based on the entirety
of implemented software.
In week 5, work commenced on developing the harmony search ground state solver. Both
serial and parallel code was completed in week 6, during which the dynamic programming
code was modified to support solving systems with cyclic boundary conditions. In week 6,
performance data for the dynamic programming code were collected on the HPCx machine.
In week 7, further performance data were gathered on HPCx. This was to evaluate the
dynamic programming code with cyclic communication patterns. Also, routines were developed
for evaluating harmony search performance, which was subsequently evaluated in week 8.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 70/183
52 Chapter 5. The Project
In weeks 9 and 10 a further modification to the exactly solving dynamic programming ap-
proach was implemented, based on the higher-order Markov chain theory described in Chapter
3. This was for the spin glass model without cyclic boundary conditions. In week 10, perfor-
mance data were gathered for this algorithm.The remaining time was used to complete the project report and perform a final revision of
all deliverables.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 71/183
Chapter 6
Software Implementation
6.1 Introduction
The implemented software is a framework for experimenting with two-dimensional lattice spin
glass ground state problems. It consists of utilities which assist with generating spin glass in-
stances, which may be subsequently solved using either exact or heuristic based solver utilities.
The latter provide information on both the energy and spin configuration of ground states. While
aimed primarily at generating solutions using parallel algorithms, it is also possible to reconfig-
ure the software to use serial computation only.
The software is implemented in the C programming language. The GNU C compiler was
used on the development system. To increase C90 standard conformity, the compiler flags -ansi-pedantic were used. Development took place predominantly on a 32 bit single processor Linux
system, on which gcc 4.1.2 and gdb 6.6 were installed. The MPI implementation was MPICH2,
version 1.0.6. To assist with debugging, the Valgrind suite was used to check for memory leaks.
The version control system CVS was used extensively during implementation. Based on a
central repository stored on the Ness machine, version control was used as a means of retrieving
the entire code base and synchronising code modifications between machines.
The build management system used for the software is the GNU autotools suite. This is used
to automatically configure the software prior compiling it on the target architecture. Instructions
on how this can be achieved are given in Appendix E.
In the following, an overview of the software framework is given.
6.2 Implementation overview
From the user’s perspective, the framework consists of a set of binary executables. These are:
• genbonds
53
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 72/183
54 Chapter 6. Software Implementation
• genclamps
• sbforce
• dpsolver
• dpsolverfast
• hmsolver
The two utilities genbonds and genclamps are used to generate random coupling constants
and specify the clamping state of spins in the lattice, respectively. As implemented, the utilities
produce character based representations as described in the design in Chapter 5. The utilities
write to the standard output. Using UNIX shell redirection, this output can be stored in files, in
preparation to invoking a ground state solver on the data. Using these utilities therefore facili-tates creating instance data. Both genbonds and genclamps use standard command line options
for specifying spin lattice dimensions and related parameters. For example, lattice dimensions
are specified using –xSize= x –ySize= y, for a system with x rows and y columns.
The remaining executables correspond to implementations of algorithms described in Chap-
ters 3 and 4: For testing purposes, the sbforce utility implements a simple exhaustive search,
hmsolver the harmony search algorithm in its parallel realisation. Similarly, dpsolver and dp-
solverfast provide exact solvers based on dynamic programming approaches. As before, all of
these executables use command line parameters for specifying options. In this case, the most
significant parameters are those for specifying bond and clamp configuration files. These utili-ties write solutions to standard output.
From the perspective of implementation, the software is constructed using a modular ap-
proach. Also based on the design described in the previous chapter, there exist various library
modules, which provide functionality such as IO and spin glass manipulation. These are utilised
by client modules, which include implementations of of ground state solvers. By means of C
headers, client modules are able to reference APIs. API implementations are used to generate
separate binary executables through the linking process.
Appendix B includes a UML class schema of the relationships between source code modules
and headers. As shown, source code modules reference various headers, which include arrays.h,gstatefinder.h, io.h, random.h and spinglass.h are defined. Their purpose is as follows:
• arrays.h Specifies multidimensional array operations
• gstatefinder.h Specifies the interface to be implemented by ground state solvers
• io.h Defines IO operations
• random.h Defines randomisation functions
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 73/183
6.3. Source code structure 55
• spinglass.h Defines the spin glass data structure and operations
As shown in Figure B.1, multidimensional arrays are used by the dynamic programming based
solvers, as befits the algorithms’ requirements for associative data structures. The IO header
is used by module main.c, which implements an entry point for all executables. Further-
more, gstatefinder.h is included by main.c, bforce gstate finder.c, dp gstate finder.c and har-
mony gstate finder.c, the latter three implementing exhaustive search, dynamic programming
and harmony search, respectively. Whereas dp gstate finder.c implements the basic exact op-
timisation algorithm described in Chapter 3, a further module dp gstate finder fast.c provides
an implementation of the improved dynamic programming algorithm, described in the same
chapter.
6.3 Source code structure
From the description of source module and header purpose, the following provides a more de-
tailed description of the implementation. This is given at function level for a selection of the
code base, to illustrate core functionality.
6.3.1 Library functionality
arrays.h
As previously mentioned, the implementation of the exactly solving algorithm requires access
to multidimensional arrays. Given the restriction in C to defining single-dimensional dynamic
arrays, next to using static arrays, it is necessary to use pointer arithmetic and casts to implement
multidimensional arrays. Confining implementation to source module arrays.c, functions are
provided for constructing and destroying arrays in two and three dimensions of arbitrary size.
Returning pointer types, the constructor functions allow data elements to be accessed using
conventional array syntax, while preserving memory contiguity. These functions are invoked
repeatedly by dp gstate finder.c and dp gtate finder fast.c. While a less involved approach
might have off ered increased performance, implementing the dynamic programming algorithm
otherwise was considered too cumbersome, given the allocated time for software development.
As an alternative, the header defines macros which emulate a multidimensional array, based
on performing arithmetic on a single pointer. Although syntactically less convenient, this ap-
proach requires fewer dereferencing operations to access a pointer element. For performance
reasons, the approach is utilised by the spin glass library functions in spinglass.c.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 74/183
56 Chapter 6. Software Implementation
io.h
Header io.h defines six functions, responsible for reading and writing files containing repre-
sentations of spin state, clamping state and coupling constants. Three functions responsible
for reading from file are of the form *read(char *fileName, int *xSize, int *ySize). Of all pa-
rameters, which are all called by reference, the value of fileName is read upon invoking the
function, whereas xSize and ySize retain the spin lattice diminsions after the function call has
been completed. The function returns a pointer to state data read from file.
Complementary functions for writing to file are of the form write(struct SpinGlass *spin-
Glass, char *fileName). Here, the parameters consist of a pointer to an instance of the spin
glass abstract data type (described in the previous chapter), and the name of the file to write to.
The function return type is void.
The file-reading functions in io.c are implemented using a single static method, GQueue*parse file(). As the name suggests, this provides simple parsing capabilities, using a loop to
iterate through string tokens obtained from the standard library function strtok(). Recording and
verifying counts of symbols on each line, tokens are added to a queue. This queue is returned
by the function. Dequeuing elements stored in the queue, the aforementioned reading functions
then construct data structures representing spin glass parameters.
spinglass.h
The spin glass data structure is defined in header spinglass.h. Using a C struct type, the fol-
lowing fields are defined:
1 s t r u c t S p in Gl as s {2 g i n t x S i z e ;
3 g i n t y S i z e ;
4 S p i n ∗ s p i n s ;
5 g d o u b l e ∗ w e i g h t s ;
6 g b o o l e a n ∗ c l a m p s ;
7 S p i n ∗ i n i t i a l S p i n s ;
8 } ;
As given by the design description in Chapter 5, the structure specifies variables ∗ for storing
lattice dimensions. An enumeration type defines the Spin type; the pointer field is used to
reference a memory block storing the state of spins. The enumeration defines integer states
UP=1 and DOWN=-1. Spins’ states are stored using a row-major scheme. This matches the
access method using a single pointer, defined in arrays.h. Coupling constants, clamping states
and the field initialSpins store states similarly. The latter field provides an account of spin state
∗GLib specifies wrappers for standard C types; motivation for their use is discussed in the GLib documentation
[1]
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 75/183
6.3. Source code structure 57
(a) row energy() (b) interrow energy() (c) ensemble delta()
Figure 6.1: Functions provided by spinglass.c
distinct to field spins, the latter storing the state of spins while performing optimisation. Using
two separate fields allows lattice configurations to be compared before and after optimisation.
Header functions in spinglass.h are grouped into four categories, associated with allocating
memory for the data type, computing lattice energy, writing lattice properties to file, and mis-
cellaneous activities. All functions operate on the spin glass data structure, which is passed by
reference from a caller function.
The purpose of the memory related functions is as described in the design: These ensure that
the spin glass structure is initialised and terminated correctly. The constructor function is of the
form *spinglass alloc(gint xSize, gint ySize, Spin *initialSpins, gdouble *weights, gboolean
*clamps); it requires as parameters the lattice dimensions, initial spin configuration, coupling
constants, and clamping states. The function returns a pointer to a newly allocated data structure(fields are assigned according to supplied parameters). To assist in freeing memory after use,
the function spinglass free() is implemented.
Lattice energy is computed using a collection of five functions. The simplest of these is de-
fined as spinglass energy(struct SpinGlass *spinGlass), which returns as a floating point num-
ber the energy arising from all interactions in the lattice. For convenience, spinglass(struct Sp-
inGlass *spinGlass, Spin *conf) returns the energy due to coupling constants specified in *spin-
Glass, however the configuration is given as a separate array *conf. A comparison between the
remaining three energy calculating functions is given in Figure 6.1: The spinglass row energy()
function determines the energy of a spin row (considering horizontal bonds), whereas inter-row energy() uses vertical bonds to calculate the interaction energy between adjacent rows.
With ensemble delta(), the energetic contribution between a single spin and its predecessors in
horizontal and vertical dimensions is calculated.
The file output functions in spinglass.c are used to implement the output functions in io.c.
The functions are of the form write(struct SpinGlass *spinglass, FILE *file), i.e. arguments
include a pointer to a spin glass structure and a file pointer. If required, this allows spin glass
properties to be easily echoed to screen, using the file pointer stdout.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 76/183
58 Chapter 6. Software Implementation
Finally, miscellaneous functions include get random spins() (used to generate random spin
configurations, while considering spin clamping state), has vertical boundary() (used to deter-
mine whether cyclic boundary interactions are present along the lattice’s vertical dimension),
and correlate(). The latter is used to compare spin configurations between spin glass structuresin terms of diff ering spin state.
6.3.2 Client functionality
Having described library functionality provided by the software, attention is now given to the
code modules utilising this functionality. These include the entry point module main.c, and
more importantly, the modules implementing optimisation algorithms. Note that the code base
includes additional modules for the utilities genbonds and genclamps. These do not make use
of library functions; as their implementation is trivial, these are not considered in further detail.
The source code for all algorithms is provided in Appendix F.
main.c
Module main.c uses the standard argument processing library provided by GLib to implement
execution parameter parsing for solver utilities. This requires a number of auxiliary data types
and structures, which are defined as static global and local variables in the module’s main()
function. The latter is responsible for reading file name arguments associated with specific flags,
describing the locations of coupling constant and clamping state files. Also, a file describing a
spin configuration to compare the solution to may be specified.After parsing program arguments, presence of required and optional parameters is verified.
A local function init() then initialises a spin glass data structure, using previously described
function spinglass alloc(). Optimisation is then initiated by invoking the header defined func-
tion find ground states(). After the solution has been obtained, spinglass correlate() performs
a comparison, should the related flag have been specified. After deallocating the data structure,
init() and main() terminate. By each optimisation algorithm implementing find ground states()
in its own module and linking with main.c, the main() function is provided by the same mod-
ule for all utilities. This promotes code reuse and facilitates extending the code base with new
algorithms.
bforce gstate finder.c
To generate ground truth data for testing purposes, module bforce gstate finder.c implements
a brute force ground state solver. The solver is based on an infix traversal of state space. This
is achieved using a function find ground states(), which is called recursively. A conditional
statement restricts recursion depth, based on a variable whose value represents the position of a
window on the spin lattice. For each invocation of the function, the state of the spin under the
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 77/183
6.3. Source code structure 59
window is flipped. Before and after flipping spin state, recursive calls are performed, in each
case advancing the window by one spin. The base case eff ects evaluation of system energy. If
system energy is found to be lower than the recorded minimum, energy and configuration are
output before updating the minimum. Since search is exhaustive, the ground state configurationis eventually output.
harmony gstate finder.c
Serial and parallel harmony search algorithms were described in Chapters 3 and 4. The se-
rial algorithm consists of initial random solution generation (characterised by the parameter
NVECTORS) followed by an iterative process, in which low-utility solutions are replaced. Re-
placement is based on combining the components of stored solutions, using randomisation. The
latter is controlled by the memory choosing rate parameter. The parallelisation strategy involvesa collection of harmony search processes which exchange solutions between each other, using a
hierarchical system of nearest-neighbour and collective communication patterns.
Excepting the number of processss, module harmony gstate finder.c defines all parameters
controlling the behaviour of harmony search using preprocessor directives. These parameters
include the number of solutions held by a process (NVECTORS), the memory choosing rate,
the number of iterations before performing a collective communication operation, and the size
of subgroups involved in collective communications.
In addition to the module’s entry function find ground states(), the implementation consists
of seven static functions, responsible for initialising and finalising message passing communica-tions, collectively evaluating solution energy, and verifying the algorithm’s state of convergence.
When the entry function is invoked, the implementation begins by allocating memory for a
single solution vector *neighbourSpins, which is used to store data from nearest-neighbour ring
communications. After initialising communications, solution vectors are generated randomly
and assigned to elements of an array Spin *spins[NVECTORS]. The latter is the collection
of solution vectors used during the heuristic process. The actual heuristic consists of a loop
executed directly after the aforementioned solution generation, which is of the form:
1 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++) {2 / ∗ C r e at e new v e c t o r ∗ / 3
4 / ∗ Co mpu te h i g h e s t e n e rg y v e c t o r ∗ / 5
6 / ∗ S e t v e c t o r c o mp on e nt s ∗ / 7
8 / ∗ R ep la ce v e c t or i n m emory , i f new v e c t or i s o f h i gh e r f i t n e s s ∗ / 9
10 / ∗ P e rf o rm c o mm u ni c at i on o p e r a t i o n s ∗ / 11 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 78/183
60 Chapter 6. Software Implementation
As shown, the loop’s execution is controlled by get stabilised status(), responsible for eval-
uating the state of convergence. Within the loop body, memory for a new solution vector is
allocated; like all other solution vectors, the memory block consists of xSize × yS ize elements
of type Spin, where xS ize × ySize are the dimensions of the spin lattice. After determiningthe solution vector with highest energy, the values of the new solution vector’s components are
set from existing vectors, according to the algorithm described in Chapter 3. Following this,
the new solution’s energy is determined. The highest energy solution is replaced, if compari-
son yields that the new solution’s energy is lower. Communication routines are executed, after
which the process begins anew.
The hierarchical communication scheme is implemented using two separate conditional
statements, responsible for performing nearest-neighbour ring communications and collective
operations:
1 i f ( S o l v e r P r o c I D % ZONE SIZE == 0 ) {2 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ;
3 M PI S en dr ec v ( s p i n s [ r an do m ] , 1 , T yp e Ar ra y , ( S o l v e r P r o c I D+ZONE SIZE)%
S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , T y p e A r r a y , MPI ANY SOURCE ,
MPI ANY TAG , COMM, MPI STATUS IGNORE) ;
4 r e d u c t i o n f u n c t i o n ( n e i g h bo u r S pi n s , s p i n s [ r an do m ] , NULL , NULL ) ;
5 }6
7 i f ( i % ZONEEXBLOCK == 0 ) {8 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ m ax Ve ct or ] , S o lv e r Zo n e ) ;
9 }
The exchange begins by processes selecting solutions at random (line 2) and sending them to
their neighbours. Ring communication is performed using the send / receive operation in line 3,
where each process with ID Solver ProcID sends to process ID ((Solver ProcID + ZONE SIZE)
mod Solver NProcs). Here, Solver NProcs is the total number of processes and ZONE SIZE
is the number of processes in a subgroup. In this way, ZONE SIZE controls the number of
processes involved in ring communications. Every random solution is received into the memory
block referenced by *neighbourSpins. Whether this is committed to a process’ solution set
spins[], depends on the result of applying reduction function(). The latter performs identically
to the copy min() function in Chapter 4, copying the energetically minimal argument to its
complement. Consequentially, line 4 is responsible for accepting or rejecting solutions received
in the ring exchange operation. Line 7 performs the aforementioned collective operation; this
involves each subgroup performing a reduction on their least favourable solutions, using the
communicator Solver Zone. The communicator refers to all processes in a subgroup based on
the instruction
M PI C omm s plit (COMM, So lv er Pr oc I D / ZON E SIZE , 0 , & S o l v e r Z o n e ) ;
which partitions the set of all processes, such that processes with equal Solver ProcID /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 79/183
6.3. Source code structure 61
ZONE SIZE share the same subgroup. The function reduce minimal spin vectors is itself
based on the MPI Allreduce() operation, using reduction function() as a custom reduction op-
erator. The frequency of reduction is controlled by the value of constant ZONE SIZE.
After the optimisation loop has terminated, the function find ground states() performs anumber of operations to finalise optimisation, such as determining the most favourable solution
held hitherto in the solution set among processes. The obtained configuration data are copied to
the spins field of the spin glass data structure, and the solution is output by invoking the function
spinglass write spins(). Memory for storing solution vectors is deallocated, following which
MPI communications are terminated.
To complete the description of the harmony search module, it remains to detail the function
which controls the heuristic’s termination, get stabilised status(). Like the collective operation
used for exchanging solutions between processes, this is based on reduction operations used to
determine whether the most favourable solutions held by processes have equal energy. This isachieved with the instructions
c o m p u t e l o w e s t e n e r g y ( & m in E ne r gy , &m i n V e c t o r ) ;
M PI All r ed uce ( &minE ner gy , &global M inE ner g y , 1 , MPI DOUBLE , MPI MI N , COMM) ;
i f ( minE ner gy == g l o b a l M i n E n e r g y ) l o c a l Ha s O p t im u m = TRUE;
M P I A l l r e d u c e ( & l o c a l H as O p t i m u m , &a l l H a v e O p i t i m u m , 1 , M PI I NT , MPI LAND , COMM
) ;
the first of which determines the lowest energy locally, the second the lowest energy glob-
ally, followed by a further reduction to determine whether all processes possess solutions with
energies corresponding to that of the globally most favourable solution. This implements the
termination condition described in Chaper 4.
dp gstate finder.c
In Chapter 3, it was established that the ground state energy of the Ising spin glass can be
obtained using an algorithm consisting of nested loops. Based on formulating ground state
energy as a dynamic programming problem, approaches to parallelisation inspired by those used
for matrix / vector multiplication were presented in Chapter 4. The basic O(nm22m) time serial
algorithm for computing ground state energy of the lattice without cyclic boundary conditions
leads to two parallel variants, using a collective communication operation between processes,or alternatively a cyclic shift operation. The latter was shown to be more memory efficient. To
account for cyclic boundary conditions in more than one dimension, the algorithm is required
to be executed for all configurations of an arbitrary spin row (cf. Theorem 3.3). In the collective
variant, the basic algorithm for systems without cyclic boundary conditions is given by the
pseudocode
Float[] p
Float[] p
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 80/183
62 Chapter 6. Software Implementation
for k:= (proc id *q
p+1) to ((proc_id+1) *
q
p)
Float minval := ∞f o r l : = 1 t o q
if p[l] + M i, i−1k ,l
< minval
minval := p[l] + M i, i−1k ,l
p[k ] := minval
all gather(p, p)
which is executed n times for an n × m spin column, using vector p as argument p in
successive iterations of the algorithm and matrices Mi, i−1 to store interaction energies between
configurations of spin rows i, i − 1. The latter are evaluated in the ith iteration of the algorithm.
The all gather() operation combines the vector distributed among p processors into a single vec-
tor. Upon termination, vector p contains ground state energies for all configurations of the nth
spin row, from which ground state energy can be obtained for the entire lattice by determining
the minimum vector component.
As described, the algorithm is capable only of computing ground state energy; implicit
information on actual ground state configuration is discarded. To enable this information to
be computed, it is necessary to retain at each iteration of the algorithm the value of l yielding
the assignment p[k] := minval, for all values of k . This corresponds to retaining the optimal
configuration of row i − 1 for each of the q configurations of row i, with 1 < i ≤ n. This requires
a two-dimensional array.
Module dp gstate finder.c implements the basic dynamic programming algorithm, suited
for both serial and parallel execution. Both parallel variants based on collective and cyclic shift
operations are implemented. To promote code reuse, this is achieved by using preprocessor
directives for conditional compilation.
Similar to the implementation of harmony search, in addition to the entry function find ground states(),
the module consists of six static functions. These are responsible for initialising and finalising
message passing, computing ground state energy, manipulating spin rows and applying the ob-
tained ground state configuration to the spin glass data structure.
Given the parallel algorithm in either of its variants, a problem the implementation must
address is how to distribute the set of configurations a spin row may assume, among processes.
This amounts to distributing the rows of matrices Mi,i−1 among processes, where each row ac-
counts for a unique configuration of spin row i. As spins assume binary state, a simple approach
is to represent spin subsystems as bit strings, e.g. assigning spin values +1 → 1, −1 → 0.
Exploiting the fact that processes are addressed using integer numbers in MPI, the bit string
representation can be split into a prefix and suffix, where the prefix is given by the process
number. For an m spin subsystem and p processors, prefixes consist of log2 p bits, suffixes
m − log2 p bits. Providing the number of processes is a power of 2, it is possible to enumerate
all possible spin configurations by each process considering its process number prefix, and all
suffixes 0 ≤ k < 2m−log2 p. This is the approach implemented in dp gstate finder.c.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 81/183
6.3. Source code structure 63
When find ground states() is invoked, the implementation begins by initialising message
passing, following which the function get minimum path() is invoked. This is responsible for
initiating a series of further function calls, based on a loop which iterates through each row
in the lattice. After allocating memory for an array *minPath , get minimum path() allocates**minPathConf, the two-dimensional array used to record optimal subssystem configurations.
The aforementioned loop then commences; for each spin row i, the function
g e t o p t i m a l p r e s t a t e s ( s p i nG l a ss , m i n P a t h P ar t i a l , m in Pa th Co nf [ i ] , i , t r e l l i s C o l s , 0 ) ;
is invoked, which performs the parallel matrix / vector operation previously described in
pseudocode. The arguments are the spin glass data structure to optimise, a memory block cor-
responding to vector p, the matrix row to hold the optimal states of row i-1, the current spin
row, and the total number of elements in p. The final argument is used to enforce a particular
configuration of the final spin row. In absence of cyclic boundary conditions its value is not
significant. Should the spin glass indeed possess cyclic boundary conditions, the loop over spin
rows is repeated for all configurations of this row, and the lowest obtained energy is accepted as
the ground state energy.
Using conditional compilation based on the constant CYCLIC EXCHANGE, two imple-
mentations of get optimal prestates() are provided, to account for both variants of the parallel
algorithm. If CYCLIC EXCHANGE is left undefined, a further constant USE MPI allows con-
trol over whether message passing communications are used. If the latter is left undefined, the
optimisation proceeds serially.
Both implementations of get optimal prestates() are based on the pseudocode designs pre-
viously discussed, using control flow instructions for dealing with spin rows when i=1, for which
cyclic boundary interactions must be considered. In contrast to the presented pseudocode, the
elements of matrices M i, i−1 are not stored explicitly in a data structure. Instead, loop variables
are used to determine matrix elements as demanded, which are computed by invoking the func-
tions defined in spinglass.h on the spin glass instance. To this end, of importance is the function
adjust spin row(), which modifies a spin glass instance according to the bit string representation
of a spin row.
The collective implementation of get optimal prestates() begins by allocating the array
*minPathNew, which is equivalent to vector p in the pseudocode, with elements distributed
among processes. Elements of *minPathNew are assigned values, based on elements in *min-
Path and interaction energies arising from the examined spin rows. Having completed this
evaluation, distributed vector elements are combined and reassigned to *minPath, using the in-
struction
MP I Al lg at he r ( minPathNew , t r e l l i s C o l s / Sol ver NPr ocs ,M PI DOUBLE , minPath ,
t r e l l i s C o l s / So lv er NP r oc s , MPI DOUBLE ,COMM) ;
where trellisCols/Solver NProcs is the amount of vector components stored by each pro-
cess, COMM is the global communicator and MPI DOUBLE is the data type of vector elements.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 82/183
64 Chapter 6. Software Implementation
minPath
minPathNew
Configurations of row i − 1
Configurations of row i
Determine optimum states of
row i − 1, for row i
Gather results held in minPathNew
Figure 6.2: Schematic of operations performed by get optimal prestates() (basic dynamic pro-
gramming, collective operations). In contrast, when using cyclic communications, processes
evaluate diff erent configurations of row i − 1, shifting elements in minPath.
A schematic depiction of the optimisation process for a single invocation of get optimal prestates()
is shown in Figure 6.2.
Similar in its operation, the realisation of get optimal prestates() using cyclic shift oper-
ations between processes distributes vector pamong processes using the array *minPathNew.
However, instead of assigning all components of vector p to each process, these are also dis-tributed among *minPath. This requires multiple communication operations as optimisation
progresses for a single spin row. Here, elements in *minPath are examined in parallel by each
process, however since each only retains a fraction of components in p, it is necessary to perform
a cyclic shift of data. It turns out that as iteration through elements in *minPath progresses, it is
possible to communicate elements residing at neighbouring processes is advance. This suggests
a nonblocking communication scheme, which is implemented in the software module. The non-
blocking communication scheme utilises MPI Issend(), Wait() and Recv() instructions inserted
into the optimisation loops (cf. Appendix F).
After get optimal prestates() has been invoked for all spin rows, it remains to obtain the
ground state energy from *minPath and the corresponding ground state configuration from
**minPathConf. Since the latter stores optimal configurations of preceding spin rows, for each
spin row, the ground state configuration can be recovered. This is achieved by determining the
optimum configuration of the final spin row, and traversing through matrix rows, referencing
preceding subsystem configurations. Function set optimal config() performs this activity. It is
invoked by get minimum path(), following which the ground state configuration is output using
spinglass write spins().
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 83/183
6.3. Source code structure 65
Figure 6.3: Sliding window for improved dynamic programming
dp gstate finder fast.c
In Chapter 3, an improved serial algorithm for computing ground states was presented. In
contrast to the previous algorithm, instead of considering interacting spin rows in the lattice,
subsystems can be considered positions of a ‘sliding window’. This window covers spin rows
horizontally, such that the total number of spins is equal to the number of columns in the lattice
plus one. As with the row-wise approach, optimisation is achieved by comparing adjacent
subsystems. Here, adjacent subsystems are those obtained by advancing the sliding window by
one spin (Figure 6.3).
In Chapter 4, it was suggested that the matrix / vector approach can be used to arrive at an im-
proved parallel algorithm. As previously, matrices retain interaction energies between adjacent
subsystems. However, as a caveat of the sliding window approach, interacting subsystems must
share spin configurations in the overlapping region between window positions. This means that
for every subsystem configuration, it is only necessary to evaluate interactions with two config-
urations of the preceding subsystem.
The module dp gstate finder fast.c implements the improved algorithm for obtaining ground
states, for the lattice without cyclic boundary conditions. Similar in structure to dp gstate finder.c,
the module consists of a function get minimum path(), which is responsible for performing the
main optimisation. Given a spin glass instance, it proceeds to invoke get optimal prestates() in
a loop which iterates through all subsystems in the lattice.
Two main diff erences arise from the ‘sliding window’ approach to subsystems. Firstly,
adjusting spin configurations based on bit strings requires a ‘leading spin’ to be referenced
in the spin lattice, instead of a spin row. For this reason, the module implements the func-
tion adjust spin ensemble(), whose arguments include the problem instance and referential
spin. Secondly, interaction between subsystems involves the energy introduced by a single
spin interacting with vertical and horizontal neighbours (Figure 6.1(c)). Therefore, function
get optimal prestates() utilises the library function spinglass ensemble delta().
Invoking get optimal prestates() serves the same purpose as previously, namely to record
optimal energy for increasing size, recording configuration data in a two dimensional array.
Again, this is achieved using an argument *minPath which corresponds to vector p in the pseu-
docode algorithm. After the function has returned, this array stores data equivalent to vector
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 84/183
66 Chapter 6. Software Implementation
p. The computation performed by get optimal prestates() is shown in Figure 6.4. Here, ele-
ments corresponding to vector p are computed in parallel, such that interactions between each
corresponding subsystem configuration and preceding subsystems in both of its two states are
compared. Given the irregular pattern in which elements in *minPath are accessed, the ap-proach using a collective operation to combine elements of the resulting array *minPathNew is
favourable.
The method of determining configurations of preceding subsystems to evaluate involves ma-
nipulating the subsystem’s bit string representation. Given a bit string where the most significant
bit describes the leading spin’s state, conducting a left arithmetic shift reveals permissable con-
figurations of the preceding subsystem (the least significant bit may assume 1 or 0). Figure 6.4
illustrates bit strings corresponding to subsystem configurations, for a 2 × 2 spin lattice.
Once optimisation has completed, as with dp gstate finder.c it remains to restore the ground
state configuration from data stored in **minPathConf. Again, this is achieved using a functionset optimal config(). In this case, each row of **minPath yields information on the optimum
state of one spin. The final row is used to infer the state of an entire subsystem. The entire
ground state configuration can then be output.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 85/183
6.3. Source code structure 67
0 0 0
0 0 1
0 1 0
1 0 0
1 1 0
1 1 1
0 1 1
1 0 1
0 0 0
0 0 1
0 1 0
1 0 0
1 1 0
1 1 1
0 1 1
1 0 1
m i n P a t h
m i n P a t h N e w
W i n d o w p o s i t i o n i −
1
W i n d o w p o s i t i o n i
D e t e r m i n e o p t i m u m s t a t e s o f
r o w i −
1 ,
f o r r o w i
↑
↑
↑
↑
↑
↑ ↑
↑
↑ ↑
↑
↑
↑
↑ ↑
↑
↑
↑ ↑
↑
↑ ↑
↑
↑
↓
↓
↓
↓
↓
↓
↓
↓ ↓
↓
↓
↓
↓
↓
↓
↓
↓
↓
↓ ↓
↓
↓
↓
↓
G a t h e r r e s u l t s h e l d i n m i n P a t
h N e w
P 1
P 2
P 3
P 4
Figure 6.4: Schematic of operations performed by get optimal prestates() (improved dyanamic
programming), executed on four processors. The problem instance is a 2 × 2 spin lattice.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 86/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 87/183
Chapter 7
Performance Evaluation
So far, approaches to solving spin glass ground states have been presented. These include
exactly solving methods based on dynamic programming, and the harmony search heuristic.
Both approaches are implemented in software, suited for serial and parallel execution using
MPI. The dynamic programming implementation incorporates two variants, which are referred
to as the basic and improved algorithms. Previous complexity analysis showed that the improved
algorithm requires less run time than its counterpart.
In examining techniques for parallelising these exact and heuristic algorithms, further al-
ternatives were described in Chapter 4. In the case of the dynamic programming algorithms,
approaches based on collective and cyclic communication patterns were given. The latter are
implemented using nonblocking synchronous send operations in MPI. Both collective and cyclic
variants are applicable to the basic dynamic programming algorithm, whereas the improved dy-
namic programming algorithm relies solely on collective communications.
In this chapter, the aforementioned solver implementations are examined in terms of their
performance. Data are presented against varying parameters and interpreted. For the parallel
exact solvers, a comparison is given between attainable performance on the Ness and HPCx
machines.
7.1 Serial performance
In the development process, serial versions of ground state solvers were implemented prior to
their parallel analogues. For the exact algorithms, besides facilitating an incremental develop-
ment strategy, this allowed an initial evaluation of performance, in order to gauge the possible
behaviour of parallel dynamic programming. Similarly, performance data for serial harmony
search were examined, in particular to assess the accuracy of solutions generated by the algo-
rithm.
69
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 88/183
70 Chapter 7. Performance Evaluation
0
500
1000
1500
2000
2500
3000
0 20 40 60 80 100 120 140 160 180 200
T i m e ( s )
Spins
Serial dynamic programming code performance
Figure 7.1: Execution times for serial dynamic programming (basic algorithm)
7.1.1 Dynamic programming
Execution time data for serial dynamic programming were gathered on Ness. The experimental
procedure involved invoking both variants of the algorithm on the machine’s back-end, against
varying problem sizes. Timing data were recorded using the shell’s time command. While
off ering limited accuracy and resolution, this method was deemed sufficient, considering the
magnitude of execution times. The source code was compiled using the gcc compiler, supplying
the -O2 optimisation flag. Random problem instances were generated as square lattice k -spin
systems without cyclic boundary conditions.
Basic algorithm
Results for basic dynamic programming are shown in Figure 7.1. As shown, problem instances
are generated for systems of up to 142 spins. As one would expect, execution time rises mono-
tonically, such that the recorded time for 142 spins is approximately 42min. Considering the
ascertainments made in Chapter 3 about the algorithm’s asymptotic behaviour, the graph ap-
pears to confirm an exponential relationship between system size and execution time.
To examine run time behaviour more closely, the data are visualised as a logarithmic plot
(Figure 7.2). Here, it is apparent that execution time cannot be accurately approximated with
the function f (k ) = αe β k , since it is ln ( f (k )) = ln (α) + βk , which corresponds to a line. Also,
the plot shows near-constant values for the first three data points. This is likely to result from
limited timing resolution.
In Chapter 3, the algorithm’s asymptotic complexity was shown to be O√
k 22√
k , for a
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 89/183
7.1. Serial performance 71
-12
-10
-8
-6
-4
-2
0
2
4
6
8
0 20 40 60 80 100 120 140 160 180 200
l g ( T i m e ) ( s )
Spins
Serial dynamic programming code performanceCurve fit
Figure 7.2: Log execution times for serial dynamic programming (basic algorithm)
square lattice k -spin system without cyclic boundary interactions. From this fact, it is clear
that a more accurate model of execution time must consider an exponential relationship with
respect to the root of system size. The function f (k ) = αe β√
k is thought to be an adequate
approximation.
Figure 7.2 includes a fit of the function ln ( f (k )) = ln (α) + β√
k to log plotted data points.
The first three data points are excluded from the fit. This was obtained using the Marquardt-Levenberg algorithm implemented in Gnuplot. With asymptotic standard errors of 0.9365%
and 0.8656% respectively, values of α = 1.77111 10−6 and β = 1.50197 were computed. The
valueβ
ln 2= 2.1667 bears similarity to the theoretical value of 2 in the exponential term of the
algorithm’s asymptotic complexity. The greater value may be attributed to approximation using
constant α.
Improved algorithm
Results for improved dynamic programming are shown in Figure 7.3. Here, problem instances
were generated in the range of k = [4, 361] spins. Comparison with Figure 7.1 reveals that
as expected, execution times are lower. As a practical advantage, this allowed the algorithm’s
performance to be evaluated against larger problem instances during experimentation.
A log plot of these data is shown in Figure 7.4. As before, this representation reveals near-
constant execution time for the first data points in the series. A unique feature is the data point
at k = 49, which is an outlier in what appears to be another exponential curve against√
k . It is
speculated that the outlier is due to caching eff ects: The Opteron 1218 processor on Ness has
a 64KiB L1 data cache, which is likely to be sufficient for containing optimisation data held in
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 90/183
72 Chapter 7. Performance Evaluation
0
20
40
60
80
100
120
0 50 100 150 200 250 300 350 400
T i m e ( s )
Spins
Serial improved dynamic programming code performance
Figure 7.3: Execution times for serial dynamic programming (improved algorithm)
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250 300 350 400
l g ( T i m e ) ( s )
Spins
Serial improved dynamic programming code performanceCurve fit
Figure 7.4: Log execution times for serial dynamic programming (improved algorithm)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 91/183
7.1. Serial performance 73
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
8e+06
100 200 300 400 500 600 700
R e s i d e n t m e m o r y c o n s u m p t i o n ( K i B )
Spins
Serial dynamic programming code performance
Figure 7.5: Memory consumption for serial dynamic programming (basic algorithm)
**minPathConf and *minPath (cf. Chapter 6): The former requires 6 × 7 × 28 × 4bytes = 42KiB,
the latter 28 × 4bytes = 1KiB. The spin glass data structure is estimated to require less than
1KiB, yielding a total of less than 64KiB (considering the size of additional memory blocks).
Fitting the log plot to the function used for analysing basic dynamic programming, ln ( f (k )) =
ln (α) + β√
k allows further comparison of the two algorithms. Using the same procedure for
producing the fit, obtained values are α = 1.0845 10−5
, β = 1.2275, with asymptotic standarderrors of 0.8924% and 0.9401%, respectively. The value of β is close to the theoretical value
of 1 in the exponential term of the algorithm’s complexity function; compared to basic dynamic
programming, execution time is observed to grow at a slower rate, as expected.
Memory consumption
Brief experiments were conducted to assess memory consumed by the dynamic programming
implementations. Considering resident memory values, as reported by the top process utility,
data were recorded by initiating computation using increasingly large problem sizes. For both
algorithms, as allocated memory remains constant for the majority of computation, it was not
necessary to execute until termination.
Plots of memory consumption are shown in Figures 7.5,7.6. For basic dynamic program-
ming, the data reveal that to avoid swapping on a machine with 4GiB (e.g. Ness), the maximum
problem size is a 24×24 spin lattice. With improved dynamic programming, the maximum prob-
lem size decreases to 19 × 19 spins. This behaviour is expected, since **minPathConf contains
O√
k 2√
k
vs. Ok 2
√ k
elements, for a k -spin square lattice. Again using a log plot approach
(Figures 7.7,7.8), performance data are fit to the function f (k ) = β k α 2√
k , whose logarithm is
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 92/183
74 Chapter 7. Performance Evaluation
10
11
12
13
14
15
16
100 200 300 400 500 600 700
l g ( R e s i d e n t m e m o r y c o n s u m p t i o n ) ( K i B )
Spins
Serial dynamic programming code performanceCurve fit
Figure 7.6: Log memory consumption for serial dynamic programming (basic algorithm)
ln ( β) + α ln (k ) +√
k ln(2)
. For basic dynamic programming, obtained values are α = −9.46851,
β = 40.42 (asymptotic standard errors 1.401% and 1.924%, respectively). The values for im-
proved dynamic programming are α = −6.76659, β = 27.1801 (asymptotic standard errors
2.092% and 2.844%). Comparing the two values of β, it is apparent that between the two vari-
ants of dynamic programming, there exists a trade-off between execution time and memory
efficiency: In terms of execution time, improved dynamic programming is preferable, whereasfor memory consumption, the basic algorithm is preferable.
7.1.2 Harmony search
Serial harmony search was evaluated by comparing solutions generated by the heuristic to
ground truth, based on a 6 × 6 spin problem instance with equally distributed bonds in the
range [−1, 1). Ground truth was obtained by conducting an exhaustive search on the problem
instance. While varying the number of solution vectors used, search was executed multiple
times. Results were used to compute minimum error , mean error and standard error values.
Totalling 80 executions for each value of NVECTORS, results are presented in Table 7.1.
As shown, standard and mean error values improve monotonically when increasing algo-
rithm memory capacity. No improvement in error rate is given, when increasing memory to
NVECTORS=50; the algorithm’s ability to find the exact ground state decreases under the spec-
ified parameter value. Despite this, µe and σe suggest that large NVECTORS benefits solution
quality in general. This is in agreement with the behaviour of ‘solution exploration’ described
in Chapter 3. Exploring the algorithm’s behaviour against large NVECTORS is indeed the mo-
tivation behind developing parallel harmony search.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 93/183
7.1. Serial performance 75
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
100 150 200 250 300 350 400
R e s i d e n t m e m o r y c o n s u m p t i o n ( K i B )
Spins
Serial improved dynamic programming code performance
Figure 7.7: Memory consumption for serial dynamic programming (improved algorithm)
10
11
12
13
14
15
16
100 150 200 250 300 350 400
l g ( R e s i d e n t m e m
o r y c o n s u m p t i o n ) ( K i B )
Spins
Serial improved dynamic programming code performanceCurve fit
Figure 7.8: Log memory consumption for serial dynamic programming (improved algorithm)
NVECTORS = 1 NVECTORS = 2 NVECTORS = 10 NVECTORS = 50 µe 1.84 1.55 0.97 0.83
σe 0.83 0.77 0.77 0.61
e 0.06 0.10 0.14 0.10
Table 7.1: Mean error µe, standard error σe and error rate e of serial harmony search ground
states for increasing solution memory NVECTORS. Results are based on the ground truth value
−30.7214. Error rate is defined as the amount of correctly obtained ground state configurations
over the total amount of algorithm invokations.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 94/183
76 Chapter 7. Performance Evaluation
Optimisation flags Execution time
-O0 10.682s
-O1 10.542s
-O2 6.354s
-O3 6.340s
-O3 -funroll-loops 4.043s
-O3 -funroll-loops -ftree-loop-im 4.043s
-O3 -funroll-loops -ftree-loop-im -funswitch-loops 4.043s
Table 7.2: Serial execution times for basic dynamic programming on Ness, for various GCC 4.0
optimisation flags
7.2 Parallel performance
The architecture of the Ness and HPCx machines was described in Chapter 5. In the following,
the method and results of performance assessment are presented for the implemented parallel
algorithms. As with the serial algorithms, results are interpreted.
7.2.1 Dynamic programming
Since the dynamic programming algorithms are deterministic, the opportunity is given to assess
parallel performance in terms of execution time. That is, given parallel execution time T p on
p processors, and serial execution time T s, it is possible to describe performance in terms of
parallel efficiency T s/(T p p).
In preparation for experiments on Ness, serial execution time was measured against vari-
ous combinations of gcc compiler flags, based on the basic dynamic programming algorithm
and a 11 × 11 test spin problem. Using the -O3 optimisation level with flag -funroll-loops,
for automated loop unrolling off ered the greatest gain in performance over unoptimised code.
Timing data are shown in Table 7.2. This behaviour is not surprising, since the code is heavily
reliant on loops for processing spin glass data structures. In contrast, rudimentary analysis of
the source code reveals few cases where performance would likely benefit from loop-invariant
motion (pertaining to other optimisation flags used).
On HPCx, the same test spin problem was used to assess execution time on the machine’s
serial job node. Here, the eff ect of target architecture optimisation was considered, using the
xlc r re-entrant compiler, version 8.0. For all tests, 64-bit compilation was enabled using the
-q64 flag. Timing data are listed in Table 7.3. The set of compiler flags used for parallel
performance evaluation was -qhot -qarch=pwr5 -O5 -Q -qstrict.
The parallel environment on HPCx allows control over a number of settings [3], poten-
tially influencing distributed application performance. Specifically, settings eff ect the protocol
used for communicating between shared memory nodes, including use of remote direct memory
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 95/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 96/183
78 Chapter 7. Performance Evaluation
0.1
1
10
100
1000
10000
0 2 4 6 8 10 12 14 16
T i m e ( s )
Processors
10x10 spins11x11 spins
12x12 spins13x13 spins14x14 spins15x15 spins
Figure 7.9: Parallel execution time for dynamic programming (basic algorithm, Ness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P a r a l l e l e f f i c i e n c y
Processors
10x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins
Figure 7.10: Parallel efficiency for dynamic programming (basic algorithm, Ness)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 97/183
7.2. Parallel performance 79
0.75
0.8
0.85
0.9
0.95
1
0 2 4 6 8 10 12 14 16
A p p l i c a t i o n t i m e ( s ) / T o t a l e x e c u t i o n t i m e ( s )
Processors
10x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins
Figure 7.11: Vampir trace summary for dynamic programming (basic algorithm, Ness)
10 × 10 instance, at a rate decreasing against p.
To interpret these results, it is reminded that the basic dynamic programming algorithm re-
quires a sequence of √
k blocking, collective gather operations to complete computation. For
each of these operations, each processor contributes 2√
k elements. After ground state energy
has been obtained from array *minPath, the ground state configuration is recovered from **min-
PathConf through a similar sequence of √ k gather operations.
Clearly, scalability is aff ected by the size of problem instances, since this influences the
amount and size of messages sent between processors. If the cost of a single collective gather
is approximated as t gather = p
T 0 + m 1 B
where p is the number of processors, T 0 the message
initialisation cost, m the message size and B the bandwidth, it follows that for constant mes-
sage size, overall cost relates linearly to p. This serves as a possible explanation for the linear
reduction in parallel efficiency observed for the majority of problem instances in Figure 7.10.
The increase in efficiency for larger problem instances can then be attributed to the fact that
computing ground state energy requires ∝ 1 p
2 m operations per processor (cf. Chapter 4). Conse-
quentially, for constant p, the fraction m/ m2 p diminishes as m is increased; communication costs
thus become less significant as the problem size increases. It is speculated that the 10 × 10 spin
lattice causes severe imbalance between communication and computation, so that the amount of
computation is closely approximated by a constant, regardless of p.
Figure 7.11 shows the fraction T c/T m of parallel computation time over communication
time. These data were gathered by re-linking compiled source code with the Vampir library and
recording summary data as reported by the applications trace utility. Time spent on tracer API
calls is omitted. As a general trend, it is observed from the plot that increasing the number of
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 98/183
80 Chapter 7. Performance Evaluation
1
10
100
1000
10000
0 2 4 6 8 10 12 14 16
T i m e ( s )
Processors
10x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins
Figure 7.12: Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, Ness)
processors does indeed increase the proportion of time spent on communication. For the 14×14,
15×15 lattices, T c/T m does not decrease monotonically with p. This may be due to the accuracy
of trace data, which indicate a non-monotonic relation between lattice sizes and scalability.
Having examined performance of basic dynamic programming using collective operations,
a similar procedure is given for the approach based on cyclic communication. In Figures 7.12,
7.10, 7.11, plots of execution time, parallel efficiency and the fraction T c/T m are shown. From
Figure 7.12, it is again observed that increasing the processor count causes execution time to
diminish, with the exception of the 10×10 lattice. For the latter, performance appears to degrade
more profoundly as with the collective variant of the algorithm. This is to the extent that exe-
cution time on 16 processors exceeds that obtained for a single processor. For larger processor
counts and the remaining problem instances, performance appears to degrade uniformly; this
eff ect is shown more clearly in Figure 7.13. Here, parallel efficiency fluctuates in the range of
[1, 4] processors, before decreasing monotonically for each examined problem instance. Signifi-
cantly, scalability does not improve monotonically as lattice size is increased. Nevertheless, it is
possible to group problem instances into two categories, such that the smaller 10×10 and 11×11
lattices result in parallel efficiency in the range [.4, .5] on four processors, with the remainder
attaining [.99, .8] efficiency. Increasing the processor count to 16, parallel efficiency drops to
[.4, .5], [.01, .2] for the respective groups. From Figure 7.14, it is observed that communication
costs become significant for all problem sizes, as the processor count increases: For p = 16, the
fraction T c/T m lies in the range [.4, .5] for all examined lattices, except the 10 × 10 lattice, for
which the fraction is further diminished due to communication costs.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 99/183
7.2. Parallel performance 81
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P a r a l l e l e f f i c i e n c y
Processors
10x10 spins11x11 spins12x12 spins
13x13 spins14x14 spins15x15 spins
Figure 7.13: Parallel efficiency for dynamic programming (basic algorithm, cyclic communica-
tions, Ness)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
A p p l i c a t i o n t i m e ( s ) / T o t a l e x e c u t i o n t i m e ( s )
Processors
10x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins
Figure 7.14: Vampir trace summary for dynamic programming (basic algorithm, cyclic commu-
nications, Ness)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 100/183
82 Chapter 7. Performance Evaluation
0.1
1
10
100
1000
0 2 4 6 8 10 12 14 16
T i m e ( s )
Processors
10x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins
Figure 7.15: Parallel execution time for dynamic programming (improved algorithm, Ness)
Comparing the two variants’ performance, it is observed that using collective communi-
cations reduces execution time on few processors. This suggests that in this case, collective
communication costs are less expensive than cyclic operations. Also, it is reminded that the
cyclic variant of the algorithm requires additional conditional statements, which increases the
number of branch instructions in the code. Scalability is significantly reduced, indicating that
problem instances significantly larger than 15 × 15 spins are required to obtain favourable ef-ficiency at p > 16 processors. It is possible that sufficiently large problem instances might
expose the cyclic approach as advantageous, these are however not explored due to restricted
experimental time scales. For the examined problem sizes, reduced scalability is thought to be
influenced by synchronisation overhead, such that the amount of computation within the nested
loops∗ is not sufficient to merit overlapping communications.
Results for improved dynamic programming executed on Ness are shown in Figures 7.15,
7.16, 7.17. For all examined problem instances, parallel execution times behave similarly as ob-
served for the 10 × 10 lattice using basic dynamic programming: Here, increasing the processor
count causes performance to degrade severely for smaller lattices, such that parallel efficiencydrops to around 20% at p = 4 processors. Larger lattices result in slightly enhanced paral-
lel efficiency, however increasing to p = 16 causes near-uniform degradation to around 10%.
Figure 7.17 shows performance degradation from the perspective of computation and commu-
nication time. The fraction T c/T m behaves as expected in relation to Figure 7.16, indicating
that performance degradation is due to communication costs. In comparison to basic dynamic
programming using cyclic communications, the eff ect of increasing processors is further pro-
∗cf. Chapter 4
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 101/183
7.2. Parallel performance 83
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P a r a l l e l e f f i c i e n c y
Processors
10x10 spins12x12 spins
14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins
Figure 7.16: Parallel efficiency for dynamic programming (improved algorithm, Ness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
A p p l i c a t i o n t i m e ( s ) / T o t a l e x e c u t i o n t i m e ( s )
Processors
10x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins
Figure 7.17: Vampir trace summary for dynamic programming (improved algorithm, Ness)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 102/183
84 Chapter 7. Performance Evaluation
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
T i m e ( s )
Processors
11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins
Figure 7.18: Parallel execution time for dynamic programming (basic algorithm, HPCx)
nounced, such that T c/T m is reduced to under 20% at 16 processors.
Comparing basic and improved variants of the algorithm, it appears there exists a trade-
off between scalability and algorithmic complexity. Whereas basic dynamic programming has
higher algorithmic complexity, results show favourable scalability up to 16 processors. In con-
trast, improved dynamic programming is a more efficient algorithm in terms of complexity,
however scalability is considerably diminished on Ness for examined problem sizes. A possibleexplanation for this behaviour is provided by the number of communication operations, which
is O (k ) for the improved variant, versus O√
k
required for the basic variant, for a k -spin lat-
tice. Given that communication takes place every O
22√
k
instructions, versus every O
2√
k
instructions for basic (collective) and improved algorithms, respectively, it is clear that the ratio
of computation against communication is lower for the improved algorithm. Since communi-
cations are non-blocking in both cases, it follows that for improved dynamic programming, a
greater proportion of execution time is due to communication operations. As a consequence,
this reduces scalability.
Performance on HPCx
Plots of performance data on HPCx for basic dynamic programming using collective communi-
cations are shown in Figures 7.18, 7.19. Because of how the machine’s resources are grouped
into logical partitions and their implication for time budgeting, the processor count was scaled
like 16 2n, albeit to greater magnitude than on Ness. For small problem sizes, behaviour is as
observed on Ness, where increasing the processor count eff ects little improvement in execution
time. Scalability improves as problem size is increased, to the extent that parallel efficiency is
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 103/183
7.2. Parallel performance 85
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 50 100 150 200 250 300
P a r a l l e l e f f i c i e n c y
Processors
11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins
Figure 7.19: Parallel efficiency for dynamic programming (basic algorithm, HPCx)
greater than 95%, for lattices with 15 × 15, 16 × 16 spins solved on 256 processors. A distinct
feature is observed for the 15 × 15 lattice, where super linear speedup appears to occur in the
range of [16, 128] processors.
In Figures 7.20, 7.21, results for the algorithm variant using cyclic communications are
shown. In comparison to the collective approach, again performance improves as problem size
is increased. However, the obtained parallel efficiency is around 60% at 256 processors, for a
16 × 16 spin lattice. This decline in performance is similar to that observed on Ness. In contrast,
on HPCx , increasing parallel efficiency reflects the ordering of problem sizes more accurately.
Fluctuations observed on Ness are not present; for all examined problem instances execution
time decreases monotonically against the number of processors. As with the collective variant,
parallel efficiency obtained for the 15 × 15 lattice exceeds that for the 16 × 16 lattice, on 16
and 32 processors. In contrast, scaling performance is not sufficient for super linear speedup, as
previously noted.
Results for improved dynamic programming on HPCx are shown in Figures 7.22, 7.23.
Here, performance drops rapidly for all explored problem sizes, such that executing on 16 pro-
cessors reduces parallel efficiency to below 50%. Increasing the number of processors, effi-
ciency tails off further; at 256 processors, it is less than 10%. Significantly, in resemblance to
the aforementioned results, the largest examined problem instance does not result in the most
scalable computation: The 22 × 22 lattice falls behind 18 × 18 and 20 × 20 instances in terms of
parallel efficiency. This phenomenon is observed for all evaluated processor counts.
Concluding from performance data on HPCx, the three algorithm variants exhibit varying
degrees of scalability. From most to least scalable, the algorithms are ordered as:
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 104/183
86 Chapter 7. Performance Evaluation
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
T i m e ( s )
Processors
11x11 spins12x12 spins13x13 spins
14x14 spins15x15 spins16x16 spins
Figure 7.20: Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, HPCx)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P a r a l l e l e f f i c i e n c y
Processors
11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins
Figure 7.21: Parallel efficiency for dynamic programming (basic algorithm, cyclic communica-
tions, HPCx)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 105/183
7.2. Parallel performance 87
1
10
100
1000
0 50 100 150 200 250 300
T i m e ( s )
Processors
12x12 spins14x14 spins
16x16 spins18x18 spins20x20 spins22x22 spins
Figure 7.22: Parallel execution time for dynamic programming (improved algorithm, HPCx)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P a r a l l e l e f f i c i e n c y
Processors
12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins
Figure 7.23: Parallel efficiency for dynamic programming (improved algorithm, HPCx)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 106/183
88 Chapter 7. Performance Evaluation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P a r a l l e l e f f i c i e n c y
Processors
16x16 spins, Improved DP16x16 spins, Basic collective DP
16x16 spins, Basic cyclic DP
Figure 7.24: Summary of parallel efficiencies on HPCx
• Basic algorithm using collective communications
• Basic algorithm using cyclic communications
• Improved algorithm using collective communications
This ordering is as observed on Ness, however problem scalability is higher on HPCx, for eachof the variants. This is attributed to lower communication costs on HPCx, resulting from higher
message passing bandwidth available on the machine. A summary of the algorithms’ parallel
efficiency on HPCx is shown in Figure 7.24, based on a 16 × 16 lattice.
7.2.2 Harmony search
The parallel harmony search algorithm introduced in Chapter 4 is based on a combination of
two types of communication operation. Considering additional algorithm parameters, the al-
gorithm exhibits a high degree of flexibility; this leads to a potentially large set of algorithm
variants. The latter must be considered when examining performance. To restrict the space
of algorithm variants, it was decided to confine the behaviour of communication operations:
Hence, cyclic operations are based on exchanging random solution vectors between processes,
such that favourable solutions are retained. Collective operations take place between process
groups of specified size. Cyclic operations are executed every iteration of the harmony search
algorithm, while collective operations are executed periodically.
The question arises how to assess the heuristic’s parallel performance. For a deterministic
algorithm, such as the exact dynamic programming based solver, performance is characterised
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 107/183
7.2. Parallel performance 89
Processors
T i m e
(a) Non-heuristic
A c c u r a c y
Processors
T
i m e
(b) Heuristic
Figure 7.25: Conceptual representation of properties relevant to parallel performance
by scalability. Scalability is quantified in terms of the algorithm’s execution time against the
number of processors on which it is executed. From the latter, measures such as speedup and
parallel efficiency can be computed. This leads to a two-dimensional space (Figure 7.25(a)),
which may be explored experimentally; for a given problem size, it may for example be of in-
terest to approximate the function which maps the number of processors to execution time. In
the case of heuristic algorithms however, an additional dimension is significant for characteris-
ing performance, namely the accuracy of generated solutions. As a result, the space in which
performance is evaluated is three-dimensional (Figure 7.25(b)). Experimental exploration may
involve assessing the relation between accuracy and execution time, for a given number of pro-cessors. Another possibility might involve approximating the boundary surface in the space,
providing such a surface exists.
From the discussion in Chapter 3, it is evident that quantifying solution accuracy is non-
trivial: It is necessary to define a measure to compare solutions with one another. An obvious
approach is to use the utility function, if defined by the heuristic. However, it might prove
advantageous to employ a measure more reflective of the problem’s solution landscape, for
example considering the distribution of solution utility values.
In the following description of an attempt at performance evaluation, parallel harmony
search was executed on a number of test instances, while varying the number of processes and
a selection of algorithm parameters. As previously explained, the algorithm possesses a signif-
icant number of parameters. Given the specified communication strategies, these include the
number of solution vectors NVECTORS, the memory choosing rate, and the rate of performing
collective operations ZONEEXBLOCK.
Experiment series are based on three lattice sizes of 12 × 12, 14 × 14 and 15 × 15 spins.
For each size, five instances were generated, using random uniform bond distributions in the
range [−1, 1). The procedure for every configuration of parameters and process count involved
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 108/183
90 Chapter 7. Performance Evaluation
executing the algorithm on each lattice instance five times. Result data were then collected and
mean values computed. A single data point used in visualisation corresponds to the mean result
obtained for a given lattice size instance.
Evidently, using several problem instances multiplies the number of times the parallel al-gorithm must be invoked. As a compromise to reduce the number of invokations, the two
parameters NVECTORS and the memory choosing rate were held constant. More importantly,
the three-dimensional space to explore is adapted, such that execution time is replaced by the
number of loop iterations executed by harmony search. This is thought to better reflect the
performance property of state space exploitation, described in Chapter 3. An advantage of the
parallel algorithm’s design is that it terminates when all processes hold identical solution vec-
tors (cf. Chapter 4). Consequentially, the aforementioned performance property can be seen
as a ‘dependent variable’ reflecting solution exploitation, which need not be considered when
permuting algorithm parameters. Eff ectively, this allows performance assessment to be dividedbetween exploring the relations number of processes against accuracy and number of processes
against algorithm iterations.
Experiments were carried out on Ness, using up to 16 processors. The size of processor
subgroups ZONESIZE was varied in the range [1, 16], so that the number of processors lies in
the range [ZONESIZE, 16] for each experiment. The parameter ZONEEXBLOCK was variably
assigned values 102, 103, 104. For each lattice instance, solution accuracy was characterised
in terms of energetic aberrance from ground truth data obtained using dynamic programming.
Also, solution configurations were compared using the Hamming distance [35]†. Finally, the
number of algorithm iterations was recorded.
Performance results
In Figure 7.27 performance data for ZONEEXBLOCK = 10 are shown, against varying proces-
sor numbers, lattice sizes and ZONESIZE. Quantitatively, the plot corresponds to the series of
experiments where collective operations are performed frequently among processes. As the al-
gorithm is defined, solutions are exchanged at a constant rate between process groups. The latter
however vary in size with parameter ZONESIZE, as previously mentioned. Given a subgroup
size, the smallest collection of processes consists of a single subgroup; in general the processor
count must be a multiple of ZONESIZE. For this reason, curves in the plot vary in length. As an
example of reading the plot, consider the curves s16 which range from 4 to 16 processes. These
correspond to invoking the algorithm with a subgroup size of 4. As a special case, for each plot
there exist two curves per lattice size in the range [1, 16]. These correspond to subgroup sizes
of 1 and 2.
Figure 7.27 describes ∆ E , the diff erence between ground truth and mean solution energies,
†The implemented algorithm takes the complement of spin configurations into account, where all spin states are
inverted.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 109/183
7.2. Parallel performance 91
1
10
100
1000
10000
100000
2 4 6 8 10 12 14 16
I t e r a t i o n s
Processors
s12
s14
s16
Figure 7.26: Parallel harmony search convergence durations (ZONEEXBLOCK= 100)
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D e l t a E
Processors
s12
s14
s16
Figure 7.27: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 110/183
92 Chapter 7. Performance Evaluation
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D e l t a
E
Processors
s12
s14
s16
Figure 7.28: Parallel harmony search convergence durations (ZONEEXBLOCK= 1000)
against processors p. On initial consideration, it is observed that increasing the processor count
reduces aberrance in some cases: Accuracy for one of the 16 × 16 spin lattice series improves
from around −160 to −60 at 16 processors. It turns out that this series corresponds to the
parameter value ZONESIZE = 1. Similar improvements occur for 12 × 12 and 14 × 14 lattices,
from −120 and −85 to −35 and −17, respectively. However, increasing ZONESIZE to 2 eff ects
an increase in solution accuracy in all cases, such that little improvement in accuracy is observedwhen increasing p.
Comparing Figures 7.27, 7.28, 7.29 allows insight to be gained into the eff ect of increas-
ing the frequency of collective exchanges within processor subgroups. For increasing ZONE-
EXBLOCK, the eff ect of p becomes less significant: With the exception of experiment series
conducted for ZONESIZE = 1, all processor counts yield energetic aberrances in the approxi-
mate range [−10, −20]. For ZONESIZE = 1, behaviour is consistent for all values of ZONE-
EXBLOCK, to the extent that increasing p eff ects a significant increase in solution accuracy as
observed for ZONEEXBLOCK = 102.
From the previous observations, two conclusions can be drawn with regard to solution ex-
ploration. Firstly, it appears that increasing the value of ZONEEXBLOCK causes solution ex-
ploration to improve, given that accuracy as characterised by ∆ E improves. This is in agreement
with the assumption made in Chapter 4, where solution exploration and exploitation were de-
scribed as opposing qualities in the search process. Assuming that collectively exchanging so-
lutions benefits solution exploitation, an obvious consequence of reducing the frequency of this
operation is increased solution accuracy. Secondly, from the increase in solution accuracy be-
tween subgroups sized 1 and 2, it is concluded that contrary to prior expectation, the ring-based
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 111/183
7.2. Parallel performance 93
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D e l t a
E
Processors
s12
s14
s16
Figure 7.29: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000)
scheme of exchanging solutions contains an element of solution exploitation. In increasing the
size of subgroups, more opportunity is evidently given for diverse solution ‘islands’, since there
exist processes only participating in infrequent collective operations. A possible explanation for
the increase in accuracy against p is the circumference of the ring in which processes exchange
solutions. For large circumferences, it becomes increasingly propagating a solution across the
ring becomes increasingly involved. This also improves solution diversity.
Figures 7.26, 7.30, 7.31 show performance results in terms of algorithm iterations until con-
vergence. The scheme is identical to that used to visualise solution aberrance. In Figure 7.26,
results for ZONEEXBLOCK = 100, (where collective operations occur frequently) show that
increasing p above ZONESIZE causes a reduction in execution time for all lattice and process
subgroup sizes. As previously observed, an exception are the series executed for unit ZONE-
SIZE, where the number of iterations increases against the processor count. Also, maximum
execution times occur for ZONESIZE = 16.
These results are interpreted as follows: Firstly, the reduction in execution times against p is
attributed to the solution exploitation property of ring-based communications: As p is increased,
so does the number of processor subgroups. Since the latter exchange solutions frequently, con-
vergence is promoted between those processes involved in ring communications. Convergence
between remaining processors is aff ected by the rate of subgroup communications. Secondly,
when no cyclic communications take place, it follows that convergence is only promoted by
collective communications, which in all experiments occur infrequently in comparison to cyclic
communications. This serves as an explanation for peak execution times when ZONESIZE = p.
Thirdly, for unit ZONESIZE execution times are comparatively short, which is attributed to
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 112/183
94 Chapter 7. Performance Evaluation
1
10
100
1000
10000
100000
1e+06
2 4 6 8 10 12 14 16
I t e r a t i o n s
Processors
s12
s14
s16
Figure 7.30: Parallel harmony search convergence durations (ZONEEXBLOCK= 10000)
1
10
100
1000
10000
100000
1e+06
2 4 6 8 10 12 14 16
I t e r a t i o n s
Processors
s12
s14
s16
Figure 7.31: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 113/183
7.2. Parallel performance 95
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H a m m i n g d i s t a n c e
Processors
s12
s14
s16
Figure 7.32: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100)
absence of processes exempt from cyclic communications. Since the latter occur frequently,
convergence is promoted especially rapidly.
Figures 7.32, 7.33, 7.34 plot the Hamming distances of generated solutions against proces-
sors, for all conducted experiment series. This metric is designed to expose accuracy in terms
of the number of dissimilar spin states, in solutions generated by the heuristic. Increasing the
number of processors to 16 appears to decrease Hamming distance slightly, for all lattice in-
stances. It is observed that distances are approximately equal to k 2
, where k is the number of
spins. This suggests that the distribution of spin configurations against system energy might be
uniform. Considering this, the metric does not appear expressive of solution accuracy.
Overall, results indicate that parallel Harmony search does improve solution accuracy. How-
ever, it must be considered that the improvements shown in Figures 7.27, 7.28, 7.28 are marginal.
Also, it is noted that comparatively good performance is achieved on few processors, providing
algorithm parameters are selected carefully. Cyclic communications were observed to contain
a significant element of solution exploitation. Unsurprisingly considering the latter, lowest en-
ergetic aberrance is achieved when communications are minimised. The attempt to quantify
accuracy in terms of Hamming distance highlights the difficulty of obtaining solutions heuris-
tically: The spin glass problem appears to have a rough solution landscape, which poses a
difficulty for finding ground states using harmony search. In all conducted experiment series,
only suboptimal solutions were found.
Because of their fundamental diff erences, comparison between examined exact approaches
and harmony search is difficult to achieve. Whereas dynamic programming places exact de-
mands on computation due to its deterministic nature, the heuristic is flexible in terms of re-
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 114/183
96 Chapter 7. Performance Evaluation
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H a m m i n g d i s t a n c e
Processors
s12
s14
s16
Figure 7.33: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000)
40
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H a m m i n g d i s t a n c e
Processors
s12
s14
s16
Figure 7.34: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000)
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 115/183
7.2. Parallel performance 97
sources, albeit at the expense of accuracy. All dynamic programming approaches were shown
to benefit from high bandwidth communications as found on HPCx. The codes are thus suited
for execution on non-vector supercomputer machines with many processors. In contrast, de-
pending on algorithm parameters, execution performance on a commodity cluster system withlow latency Gigabit Ethernet may prove adequate. This is estimated from 153s execution time
on Ness, corresponding to around 20000 iterations of harmony search on 16 processors, for a
256 spin lattice. Guest [33] provides an overview of message passing performance on commod-
ity systems, which suggests reasonable bandwidth would be obtained.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 116/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 117/183
Chapter 8
Conclusion
In the previous chapters, implemented parallel optimisation software was described and exper-imental results presented. Given the project’s scope, there exist numerous possibilities for con-
ducting further work. Based on theoretical and practical aspects described in this dissertation,
the following discusses such possibilities briefly, before concluding.
8.1 Further work
In Chapter 2, the spin glass problem was introduced. Here, it was established that the Ising
spin glass is a simplification of spin interaction. The two objects defining the exchange energy
between spins are the spins themselves, and coupling constants. In general, the graph of spininteractions can be arbitrary. Spins assume state, whose representation can vary in complexity
from the classical or quantum Heisenberg formulation of state, to the binary Ising formulation.
Coupling constants may be chosen from arbitrary distributions, such as a discrete or continuous
Gaussian etc.
8.1.1 Algorithmic approaches
Considering that the project is concerned with the Ising spin glass, the opportunity presents
itself to explore the behaviour of more involved models. As an intermediate model between
Heisenberg and Ising formulations, one might implement the Potts model, where spins assume
discrete state. Provided that the model of spin interactions is left unaltered, this model appears
comparatively simple to implement: Applying the framework of subsystems and subsystem
interactions to the Potts model, it is apparent that the total energy of a system is still the sum
of subsystem energies and interaction energies between them. However, for a p state model,
the number of states a k -spin subsystem can assume is p k , instead of 2k . The consequence of
greater diversity is that the computational complexity of basic dynamic programming increases
to On p 2m
for an n×m lattice. Similarly, improved dynamic programming has a complexity of
99
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 118/183
100 Chapter 8. Conclusion
O (nm p m). A further ramification of spin state concerns the algorithm’s implementation, which
is based on bit string representations of subsystems. Clearly, allowing more than binary state
requires the code to be redesigned. A possible approach might involve representing subsystems
as linked lists of integers. A likely consequence of this for all algorithms would be reducedperformance from additional memory operations.
One might also consider extending the algorithms to higher dimensions. While this is trivial
in the case of the heuristic, the dynamic programming approaches require the notion of a subsys-
tem to be extended into higher dimensions: Whereas basic dynamic programming is based on
a sequence of interacting spin rows for the square lattice, it is necessary to consider a sequence
of interacting lattices for the cubic lattice. The relation is analogous between hypercubes of
d and d + 1 dimensions. As a caveat, the algorithms become computationally expensive: The
basic algorithm requires O
n 2
2 n d −1time for an n
d
-spin Ising hypercubic lattice, since thereare n (d − 1)-dimensional subsystems in the lattice. For the improved algorithm, the sliding
window approach is based on a sequence of d − 2-dimensional subsystems, yielding a time
complexity of Ond 2nd
. It is assumed that both algorithms’ parallel performance will degrade,
since higher-dimensional data are required to be communicated between processes. This places
greater requirements on message passing bandwidth.
Another possibility for further work involves applying the framework described in Chapter
3 to more general models of spin interaction: For an arbitrary graph of interacting spins, the
concept of probabilistic spin configuration (s1, s2, . . . , sn) can be expressed as
P(s1, s2, . . . , sn) =
ni=1
P (si|Πi),
where Πi is the set of precursor spins associated with spin si. The task is then to arrive at
a formulation of optimum spin configuration, as shown in Chapter 3. It is believed that the
resulting dynamic programming problem must be both non-serial and polyadic, since the graph
may contain cycles, and since a spin is permitted to have multiple ancestors. This is likely to
have consequences for the complexity of the corresponding optimisation algorithm.
Of particular interest is the algorithm described by Pardella and Liers [53]. This provides
a polynomial time solution to the planar spin glass problem, allowing ground states to be de-
termined exactly, for problem instances far larger than those examined in this project. The
approach is based on combining the cut optimisation problem with the notion of ‘Kasteleyn
cities’, i.e. complete graphs which are subgraphs in the dual lattice representing plaquette frus-
trations in the spin lattice. Pardella and Liers apply the algorithm to a 3000 ×3000 lattice, which
represents an improvement over previous graph theoretical approaches [46]. Parallelisation of
cut optimisation might be achieved using the approach described by Diaz, Gibbons et al. [18].
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 119/183
8.1. Further work 101
8.1.2 Existing code
Next to implementing additional algorithms for spin glass optimisation, further work might be
conducted on the existing code base. Possible additional features include augmenting function-ality to allow algorithm parameters to be controlled at runtime, or implementing further bond
distributions. Unlike basic dynamic programming, the improved dynamic programming algo-
rithm does not support lattices with periodic boundary conditions. This can be implemented by
adapting the approach described in Chapter 3, where the algorithm is invoked repeatedly, for
diff erent configurations of boundary spins.
More pertinent is the optimisation of the existing code’s performance. Considering the
project’s scope, it was decided to adopt a design promoting code maintainability, described
in Chapters 5 and 6. Given additional time, it would be of interest to examine the cost of
pointer operations, replacing them where possible by static arrays. Also, although state-of-the-
art compilers were used during development and evaluation, the potential is given for optimising
kernel code segments: In the function get optimal prestates(), one might for example consider
manual function inlining or loop unrolling. Similar treatment for the harmony search module is
conceivable.
As implemented, the codes use MPI for achieving message passing parallelism. Although
the algorithms are indeed based on the message passing architecture, one might consider a
shared memory approach: Given the method of state space decomposition, where configura-
tions of spin subsystems are distributed equally among processes, the parallel for directive as
e.g. implemented in OpenMP appears an obvious instrument in implementing shared memory
versions of the algorithms.
8.1.3 Performance evaluation
In Chapter 7, performance data were gathered for dynamic programming and harmony search
algorithms. Scalability of the exact algorithms was examined on two machines. Further exper-
imental work might be concerned with evaluating scalability on other machines, such as com-
modity clusters or the Blue Gene architecture, if available. A more detailed examination of per-
formance on existing architectures might consider the implications of message passing latency
and bandwidth, especially with regard to the dynamic programming code using asynchronous
communications. Also applicable to harmony search, it is of interest to examine scalability.
Due to time constraints, undertaken work considered the algorithm’s accuracy. Additionally,
one might consider the eff ect of processor count and communication frequency on algorithm
iterations (ideally the latter should remain constant). Finally, there exists the potential to exper-
iment with alternative communication strategies as proposed in this work.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 120/183
102 Chapter 8. Conclusion
8.2 Project summary
During the course of the project, software was developed to compute ground states of the Ising
spin glass. The software includes implementations of serial and parallel optimisation algo-rithms. The latter include parallel dynamic programming algorithms, available in two variants.
The first of these allows lattice instances with arbitrary boundary conditions to be solved, while
the second is computationally more efficient. Performance was examined, indicating good scal-
ability for the first variant. In contrast, scalability is limited for the second variant. Also, a
further algorithm was examined. This implements a parallel ground state optimiser, based on
the harmony search heuristic. Performance was examined in terms of solution accuracy and
algorithm convergence.
In Chapter 5, the project’s goals were described. These consisted of developing an exact
ground state solver based on the transfer matrix method. As an additional objective, inves-
tigation was to include an alternative, heuristic parallel algorithm. The performance of both
algorithms was to be examined. It was intended that the software should be self-contained,
off ering sufficient functionality to be useful as a research tool.
In the light of undertaken work, the project’s goals are considered fulfilled to considerable
extent: Implemented software includes variants of exact optimisation algorithms. In theoretical
work, the dynamic programming approach was shown to off er identical performance to transfer
matrix based methods, therefore both approaches are considered computationally equivalent.
The described harmony search heuristic was also implemented. Both dynamic programming
and harmony search are implemented as message passing codes. Performance was investigated
as proposed, examining scalability of dynamic programming codes, and accuracy of parallel
harmony search. Although it remains of interest to examine scalability of the alternative code,
overall the project is considered a success.
8.3 Conclusion
In this dissertation, the Ising spin glass was introduced as a combinatorial optimisation problem.
The theoretical background was discussed, identifying and developing solutions to the problem.
A description of undertaken project work was provided. Implemented software was described
and experimental results were presented. Finally, possibilities for further work were identified.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 121/183
Appendix A
Project Schedule
103
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 122/183
104 Chapter A. Project Schedule
W k 1
W k 2 W k 3 W k 4 W k 5 W
k 6 W k 7 W k 8 W k 9 W k 1 0
W k 1 1 W k 1 2 W k 1 3 W k 1 4 W k
1 5 W k 1 6
D e t a i l e d d e s i g n
I m p l e m e n t a t i o n
D e b u g g i n g
T e s t i n g
P e r f o r m a n c e E v a l u a t i o n
D e t a i l e d d e s i g n
I m p l e m e n t a t i o n
D e b u g g i n g
T e s t i n g
P e r f o r m a n c e E v a l u a t i o n
R e p o r t
P r e s e n t a t i o n
S u b m i s s i o n ,
C o r r e c t i o n s
Figure A.1: Project schedule
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 123/183
Appendix B
UML Chart
105
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 124/183
106 Chapter B. UML Chart
i o . c
i o . h
s p i n g l a s s . c
s p i n g l a s s . h
m a i n . c
r a n d o m . c
r a n d o m . h
a r r a y s . c
a r r a y s . h
g s t a t e f i n d e r . h
b f o r c e_
g s t a t e_
f i n d e r . c
d p_
g s t a t e_
f i n d e r . c
d p_
g s t a t e_
f i n d e r_ f a s t . c
h a r m o n y_
g s t a t e_
f i n d e r . c
Figure B.1: UML class diagram of source code module and header relationships
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 125/183
Appendix C
Markov Properties of Spin Lattice
Decompositions
C.1 First-order property of row-wise decomposition
Using a row-wise decomposition strategy of spin rows, system state probability is expressed as
P(S ) =1
Z (T )exp
− 1
kT
H (S 1) +
ni=2
H (S i) + H b(S i−1, S i)
= 1 Z (T )
exp− 1
kT H (S 1)
ni=2
exp− 1
kT ( H (S i) + H b (S i−1, S i))
.
The partition function is expanded in a similar manner to account for subsystems, as
Z (T ) =S ∈S
exp
− 1
kT H (S )
=
S 1
exp
− 1
kT H (S 1)
ni=1
S i
exp
− 1
kT ( H (S i) + H b(S i−1, S i)
=
ni=2
Z i(T ), with Z i(T ) =
S i
exp −1
kT H (S
i) i = 1
S iexp− 1
kT ( H (S i) + H b(S i−1, S i))
1 < i ≤ n
Substituting Z (T ) in Equation C.1, state is defined as
P(S ) =1
Z 1(T )exp
− 1
kT H (S 1)
ni=2
1
Z i(T )exp
− 1
kT ( H (S i) + H b (S i−1, S i))
= P(S 1)
ni=2
P (S i|S i−1).
107
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 126/183
108 Chapter C. Markov Properties of Spin Lattice Decompositions
which shows that the chosen approach fulfils the property of a first-order Markov chain; the
conditional probability P(S i|S i−1) is due to dependence of row S i on its predecessor’s configu-
ration.
C.2 Higher-order property of unit spin decomposition
Applying an analogous approach to determining system state probability, P(S ) is expressed as
P(S ) =1
Z (T )exp
− 1
kT
nm−1
i=0
H b(S i, S i−1) + H b(S i, S i−m)
=1
Z (T )
nm−1
i=0
exp
− 1
kT ( H b(S i, S i−1) + H b(S i, S i−m))
.
with Z (T ) =nm−1
i=0 Z i(T ) and Z i(T ) =
S iexp− 1
kT ( H b(S i, S i−1) + H b(S i, S i−m))
it follows
that
P(S ) =
nm−1i=0
P (S i|S i−1, S i−m).
It is reminded that ground state information can be obtained by optimising P(S ). For this par-
ticular model, the ground state configuration is obtained by maximising P(S ), i.e.
argmaxS 0,S 1,...,S nm−1
nm−1
i=0
P (S i|S i−1, S i−m)
.
Next, it is necessary to adapt the Viterbi path formulation, in order to arrive at a recursive expres-
sion of ground state energy for the higher-order Markov model. Disregarding cyclic boundary
interactions in the model, and noting that P(S i|S i−1, S i−mi) = P(S i) for i = 0, a prototypical
approach is
Pviterbi(S i) =
maxS i{P(S i)} i = 1
maxS i−1,S i−m {P (S i|S i−1, S i−m) Qviterbi(S i−1) Qviterbi(S i−m)} i > 1.
Unfortunately, there exists a caveat against recursively stating
Pviterbi(S i) = maxS i−1,S i−m
{P (S i|S i−1, S i−m) Pviterbi(S i−1) Pviterbi(S i−m)} ,
because by definition, probability of subsystem S i assuming a given state is conditionally de-
pendent on subsystems S i−1, S i−m, which in turn are both conditionally dependent on subsystem
S i−m−1. This ordering requires that when evaluating terms P viterbi(S i−1) and Pviterbi(S i−n) identi-
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 127/183
C.2. Higher-order property of unit spin decomposition 109
cal sets of subsystem configurations are considered. The The mapping Qviterbi must reflect this
behaviour in terms of Pviterbi.
A solution to the dependency problem of vertical and horizontal predecessor spins can be
obtained by increasing the order of the Markov model to m + 1. As a result, system stateprobability is given by the product
P(S ) =
nm−1i=0
P (S i|S i−1, S i−2, . . . , S i−m−1) ,
from which ground state probability can be formulated as
Pviterbi(S i, S i−1, . . . , S i−m) =
P (S i, S i−1, . . . , S i−m) i ≤ m
maxS i
−m
−1
{P (S i
|S i
−1, . . . , S i
−m
−1) Pviterbi (S i
−1, . . . , S i
−m
−1)
}i > m.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 128/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 129/183
Appendix D
The Viterbi Path
D.1 Evaluating the Viterbi path in terms of system energy
It is of interest to examine the behaviour of system state probability, which is present in the
recursive formulation of the Viterbi path, and evaluated in the described pseudocode algorithm.
Taking the natural logarithm of the state probability , it is observed that
ln (P (S )) = ln
1
Z (T )exp
− 1
kT H (S )
= ln
1
Z (T )
− H (S )
kT ∝ − H (S ).
Using this result, the natural logarithm of the conditionally dependent state probability P(S i|S i−1)
is
ln (P(S i|S i−1)) = ln
P(S i, S i−1)
P(S i−1)
= ln (P(S i, S i−1)) − ln (P(S i−1))
∝ − ( H (S i) + H (S i−1) + H b (S i, S i−1)) + H (S i−1)
∝ − ( H (S i) + H b (S i, S i−1)) ,
which allows system probability to be evaluated quantitatively in terms of its Hamiltonian. This
in turn permits reformulation of the dynamic programming optimisation problem;
ln (Pviterbi(S i)) =
maxS i
{ln (P (S i))} i = 1
maxS i−1{ln (P (S i|S i−1)) + ln (Pviterbi (S i−1)}) i > 1
ln (Pviterbi(S i)) =
c minS i
{ H (S i)} i = 1
minS i−1{ H (S i) + H b (S i, S i−1) + c ln (Pviterbi (S i−1))} i > 1,
with c ∈ R. It is trivial to apply the same approach to the recursive function viterbi(i), which
evaluates to the actual sequence of emitted states in the Viterbi path, and the described pseu-
111
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 130/183
112 Chapter D. The Viterbi Path
docode algorithm.
Setting c = 1, the evaluated optimal sequence remains the Viterbi path. Further substitution
yields
Hmin(S i) =
minS i
{ H (S i)} i = 1
minS i−1{ H (S i) + H b (S i, S i−1) + Hmin (S i−1)} i > 1,
(D.1)
which is the Hamiltonian of the system (S 1, S 2, . . . , S i), whose states are equal to those emitted
by the Viterbi algorithm. Since the Viterbi path corresponds to the most probable system state,
Hmin is the system’s ground state. This provides a solution to the ground state problem for the
two dimensional lattice without vertical or horizontal boundary interactions.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 131/183
Appendix E
Software usage
The following provides instructions on how to install and use the software described in thisdissertation.
Requirements
The software requires the library glib-2.0 to be installed. By default, this library is expected
to reside in the directory /usr/lib, with headers located at /usr/include/glib-2.0 and /usr/lib/glib-
2.0/include. These settings may be changed by modifying the file Makefile.am. An implemen-
tation of MPI, such as MPICH2, is also required.
Configure and compile
The software is delivered as a compressed tarball with the .tar.gz file name extension. It is un-
packed by issuing
tar xvzf ising.tar.gz
at the command prompt. Following this, it is necessary to initiate configuration by issuing
./configure
from within the package’s root directory. Environment variables are used to specify configu-
ration options, including the compiler used (which defaults to mpicc). For example, to disable
optimisation, the necessary commands are:
export CFLAGS=-O0; ./configure
113
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 132/183
114 Chapter E. Software usage
Providing configuration was successful, compilation is initiated using
make
Usage
Upon completion, the source directory contains the binaries genbonds, genclamps, sbforce,
dpsolver, dpsolverfast, hmsolver, whose purpose is described in chapter 6. Most significantly,
the solver utilities dpsolver, dpsolverfast, hmsolver operate on spin bond configuration files.
which are generated using genbonds. To generate a sample 12 × 12 spin configuration file
BONDS, the required command is
./genbonds -x 12 -y 12 > BONDS
which is solved e.g. using improved dynamic programming on a single process by invoking
./dpsolverfast -b BONDS
Multiprocessing is enabled either by invoking mpiexec directly, or by using one of the SUN
GridEngine scripts located inside the source root directory. All utilities support the -? flag for
displaying a list of command line options.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 133/183
Appendix F
Source Code Listings
1 / ∗2 ∗ F i l e : main . c
3 ∗4 ∗ I m p le m e nt s common e n t r y p o i n t f o r g ro u nd s t a t e s o l v e r u t i l i t i e s .
5 ∗ R e s p o n s i b l e f o r p r o c e s s i n g command l i n e o p t i o n s a nd i n i t i a t i n g c o m p u ta t i o n
6 ∗7 ∗ / 8
9 # i n c l u d e < s t d i o . h>
10 # i n c l u d e < s t d l i b . h>
11 # i n c l u d e < g l i b . h>
12 # i n c l u d e < g l i b / g p r i n t f . h>
13
14 # i n c l u d e ” s p i n g l a s s . h ”
15 # i n c l u d e ” io . h”
16 # i n c l u d e ” g s t a t e f i n d e r . h”
17
18 / ∗ T h es e s t o r e v a l u e s o f c ommand l i n e a r gu m en t s ∗ / 19 s t a t i c g c h a r ∗ s p i n C o n f i g = NULL ;
20 s t a t i c g c h a r ∗ b o n d C o n f i g = NULL ;
21 s t a t i c g c h a r ∗ c l a m p C o n f i g = NULL;
22 s t a t i c g c h a r ∗ c o m p S p i n C o n f i g = NULL ;
23
24 / ∗ Da ta s t r u c t u r e f o r command l i n e p r o c e ss i n g .
25 ∗ S p e c i f i e s p r o p e r t i e s o f command l i n e o p t i o n s ∗ / 26 s t a t i c G O pt i on E nt r y e n t r i e s [ ] =
{27 { ” s p i n − i n i t i a l −c o n f i g ” , ’ s ’ , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME, &
s p i n C o n f i g , ” I n i t i a l s p i n c o n f i g u r a t i o n f i l e ” , ” s p i n C o n f i g ” } ,
28 { ” b o n d−co nf ig ” , ’b ’ , 0 , G OPTION ARG FILENAME, &bondConf ig , ” I n i t i a l bond
c o n f i g u r a t i o n f i l e ” , ” b o n dC o nf i g ” } ,
29 { ” c l a m p−co n fi g ” , ’ c ’ , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME , &
c l am p C o nf i g , ” I n i t i a l s p i n c l am p c o n f i g u r a t i o n f i l e ” , ” c l a m p C o n f ig ” } ,
30 { ” s p i n −c o m p a r i s o n − c o n f i g ” , ’x ’ , G OPTION FLAG OPTIONAL ARG , G OPTION ARG FILENAME
, & co mp Sp in Co nf ig , ” S p i n c o n f i g u r a t i o n t o c om pa re r e s u l t w i t h ” , ”
c o m p S p i n C o n f i g ” } ,
31 { NULL }32 } ;
115
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 134/183
116 Chapter F. Source Code Listings
33
34 s t a t i c v oi d i n i t i a l i s e c o m p u t a t i o n ( ) ;
35
36 i n t main ( i n t a r g c , char ∗ a r g v [ ] ) {
3738 / ∗ I n i t i a l i s e d a t a s t r u c t u r e f o r a r gu m en t p r o c e s s i n g ∗ / 39 G E r r o r ∗ e r r o r = NULL ;
40 G O p ti o n C on t e x t ∗ c o n t e x t ;
41
42 c o n t e x t = g o p t i o n c o n t e x t n e w ( ”− C a l c u l a t e s p i n g l a s s g ro un d s t a t e s ” ) ;
43 g o p t i o n c o n t ex t a d d m a i n e n tr i e s ( c on t ex t , e n t r i e s , NULL) ;
44 / ∗ P a r se a r g u m en t s ∗ / 45 g o p t i o n c o n t e x t p a r s e ( c o nt ex t , &a rg c , &a rg v , & e r r o r ) ;
46
47 / ∗ H a nd l in g o f r e q u i r e d a r gu m en t s ∗ / 48 i f ( b o n d C o n f i g == NULL) {49 g f p r i n t f ( s t d e r r , ” P le as e s p e c i f y a n i n p ut b ond c o nf i gu r a t i o n f i l e . \ n” ) ;
50 e x i t ( EXIT FAILURE ) ;51 }52 i f ( c l a m pC o n f i g != NULL && sp in Co nf ig == NULL) {53 g f p r i n t f ( s t d e r r , ” S pe ci fy in g a clam p c o n f ig u r a t io n f i l e r e q u i r e s t h e u se o f
a n i n i t i a l s p i n c o n f i g u r a t i o n f i l e .\ n” ) ;
54 e x i t ( EXIT FAILURE ) ;
55 }56
57 i n i t i a l i s e c o m p u t a t i o n ( ) ;
58
59 g o p t i o n c o n t e x t f r e e ( c o n t e x t ) ;
60 r e t u r n ( EXIT SUCCESS ) ;
61 }6263 v o i d i n i t i a l i s e c o m p u t a t i o n ( ) {64 g i n t x Si ze , y Si z e , x Si ze 1 , y Si ze 1 ;
65
66 / ∗ Used t o c o n st r u c t s pi n g l as s s t r u c t u r e ∗ / 67 g d o u b l e ∗ w e i g h t s = NULL ;
68 g b o o l e a n ∗ c l a m p s = NULL ;
69 S p i n ∗ s p i n s = NULL ;
70 S p i n ∗ c o m p S p i n s = NULL ;
71
72 s t r u c t S p i n G l a s s ∗ s p i n G l a s s ;
73
74 / ∗ R ea d w e i g h t s f ro m p r e v i o u s l y o b t a i n e d f i l e name ∗ / 75 w e i g h t s = r e a d w e i g h t s ( b o n d C o n fi g , &x S i z e , & y S i z e ) ;76
77 i f ( c l a m pC o n f i g != NULL) {78 / ∗ Read s p i n c la mp s f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ / 79 c l a m p s = r e a d c l a m p s ( c l a m p C o n f i g , &x S i z e 1 , & y S i z e 1 ) ;
80
81 / ∗ C hec k t h a t s i z e s o f s p i n a nd c l am p m a t r i c e s m a tc h ∗ / 82 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {83 g f p r i n t f ( s t d e r r , ” E rr or : Bond and cl am p m a t r ix s i z e s do no t ma tc h .
A b o r t i n g \n” ) ;
84 e x i t ( EXIT FAILURE ) ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 135/183
117
85 }86 }87
88 i f ( s p i n C o n f i g != NULL) {
89 / ∗ Read i n i t i a l s p i n c o n f i g u r a t i o n f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ / 90 s p i n s = r e a d s p i n s ( s p i n C o n f i g , &x S i ze 1 , &y S i z e 1 ) ;
91
92 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {93 g f p r i n t f ( s t d e r r , ” E rr or : Bond and sp in c o n f i g u r a t i o n m at r ix s i z e s do n o t
m a tc h . A b o r t i n g \ n” ) ;
94 e x i t ( EXIT FAILURE ) ;
95 }96 }97
98 i f ( c o m p S p i n Co n f i g != NULL) {99 / ∗ Read c om pa ri so n s p i n c o n f i g u r a t i o n f ro m p r e v i o u s l y o b t a in e d f i l e name ∗ /
100 c o mp S p in s = r e a d s p i n s ( c o m p S pi n C o n fi g , &x S i z e 1 , & y S i z e 1 ) ;
101102 i f ( x S i ze != x S i z e 1 | | y S iz e != y S i z e 1 ) {103 g f p r i n t f ( s t d e r r , ” E rr or : R e f e r e n c e s p in c o n f i g u r a t i o n an d bond m at ri x
s i z e s do n o t m at ch . A b or t in g \ n” ) ;
104 e x i t ( EXIT FAILURE ) ;
105 }106 }107
108 / ∗ I n i t i a l i s e s p i n g l a s s ∗ / 109 s p i n G l a s s = s p i n g l a s s a l l o c ( x S iz e , y S iz e , s p i n s , w e ig h t s , c l am p s ) ;
110
111 / ∗ C om pu te g r ou n d s t a t e ∗ / 112 f i n d g r o u n d s t a t e s ( s p i n G l a ss ) ;
113114 i f ( c o m p S p i n s != NULL) {115 / ∗ Compare r e s u l t i n g c o n f i g ur a t i o n t o s p e c i f i e d r e f er e n c e c o n f i g ur a t i o n ∗ / 116 g i n t d i s t a n c e ;
117 s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 = s p i n g l a s s a l l o c ( x S i ze , y S i z e , c om p Sp in s , NULL ,
NULL) ;
118 d i s t a n c e = s p i n g l a s s c o r r e l a t e ( s p in G la s s , s p i n Gl a s s2 ) ;
119
120 g p r i n t f ( ” C o r r e l a t i o n d i s t a n c e : %d\n ” , d i s t a n c e ) ;
121 s p i n g l a s s f r e e ( s p i n G l a s s 2 ) ;
122 }123
124 s p i n g l a ss f r e e ( s p i n G l a ss ) ;
125 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 136/183
118 Chapter F. Source Code Listings
1 / ∗2 ∗ F il e : d p g s t a t e f i n d e r . c
3 ∗4 ∗ I m pl e me n ts s e r i a l an d p a r a l l e l b a s i c d yn am ic p ro gr am mi ng a l g o r i th m s
5 ∗6 ∗ / 7
8 # i n c l u d e < s t d l i b . h>
9 # i n c l u d e <math . h>
10 # i n c l u d e < s t r i n g . h>
11 # i n c l u d e < g l i b . h>
12 # i n c l u d e < g l i b / g p r i n t f . h>
13
14 # i n c l u d e ” s p i n g l a s s . h ”
15 # i n c l u d e ” a r r a y s . h ”
16 # i n c l u d e ” g s t a t e f i n d e r . h”
17
18 / ∗ CYCLIC EXCHANGE d e f i n e s c y c l i c c o m m u ni c a t i on p a t t e r n s ∗ / 19 # d e f i n e YCLIC EXCHANGE
20
21 / ∗ USE MPI d e f i n e s p a r a l l e l c od e ∗ / 22 # d e f i n e USE MPI
23 # i f d e f USE MPI
24 # i n c l u d e <mpi . h>
25 # e n d i f
26
27 / ∗ D e f in e s d a ta t y p e f o r m es sa ge p a s si n g ∗ / 28 # d e f i n e T INT MPI LONG LONG INT
29
30 / ∗ C o n st a nt a l i a s ∗ / 31 # d e f i n e IGNORE BITMASK TRUE32
33 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 34 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;
35 s t a t i c g i n t S o lv e r P ro c I D = 0 ;
36 # d e f i n e COMM MPI COMM WORLD
37 s t a t i c g u i n t 6 4 S o l v e r P r o c es s o r M a sk = 0 ;
38 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 39
40 / ∗ A d ju s t row o f s p i ns a cc or di ng t o b i t s t r i n g r e p r e s e n t a t i o n
41 ∗ s p i n G l a s s ( w ri t e ) t h e s p i n g la ss s t r u c t u r e t o m a n i p u l a t e
42 ∗ row s p e c i f i e s t h e s p i n row i n t h e r a n g e [ 0 ,NROWS)
43 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row
44 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID d o e s n o t i n fl u en ce t h e b i t s t r i n g ∗ /
45 s t a t i c v oi d a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g i n t row , t i n t c o nf ,
g b o o l e a n i g n o r e B i t m a s k ) ;
46
47 / ∗ D et er mi ne g ro un d s t a t e and c o n f i g u ra t i o n o f a s p in g l a s s i n s t an c e
48 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ / 49 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
50
51 / ∗ D e te r mi n e o p ti mu m c o n f i g u r a t i o n s o f s p i n ro w row −1 , f o r a l l c o n f i g u r a t i o n s o f r ow
ro w
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 137/183
119
52 ∗ s pi nG la ss ( r e a d / w r i te ) s pi n g l a s s i n s t a n c e
53 ∗ m i n P a t h ( r e a d / w r i t e ) s t o r e s minimum p a th ( i . e . g ro un d s t a t e e n er g y ) o f
s u bs y st e m b e f o r e a nd a f t e r i n c r em e n t in g r ow row
54 ∗ m i n Pa t h Co n f ( r e a d / w r it e ) s t o r e s o ptimum c o n f i g u r a ti o n s o f r ows
55 ∗ row row o f t h e s p i n l a t t i c e t o p r o c e s s56 ∗ t r e l l i s C o l s number of s p i n row c o n fi g u r a t i o n s
57 ∗ f i n a l R o w C o n f u s e d t o s pe c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c li c
b ou nd ar y c o n d i t i o n s a re p r e s e n t ∗ / 58 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t
∗ m in Pa th Co nf , g i n t row , t i n t t r e l l i s C o l s , t i n t f i na l Co n fR o w ) ;
59
60 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n r ow s , b as ed on o p ti mu m c o n f i g u r a t i o n s
61 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e
62 ∗ m i n P a t h C o n f ( r e a d ) s t o r es op timum s pi n row c o n f ig u r a t io n s
63 ∗ c o n f o ptimum c o n f i g u r a t i o n o f u l t i m a t e s p i n row ∗ / 64 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t
c o n f ) ;
6566 / ∗ I n i t i a l i s e m e s s ag e p a s s i n g c o m m u n i c a t i o n s
67 ∗ / 68 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
69
70 / ∗ T e r m i n a t e m e s sa g e p a s s i n g c o m m u n i c a ti o n s
71 ∗ / 72 s t a t i c v oi d term comms ( v o i d ) ;
73
74
75 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {76
77 g d o u b l e e n e r g y ;
7879 i f ( s p i n G l a s s −> y S i z e > 6 3 ) {80 g f p r i n t f ( s t d e r r , ” E rr o r : The s p e c i f i e d s pi n l a t t i c e e xc e ed s a c o un t o f 63
c o l u m n s \n” ) ;
81 }82
83 i ni t c om ms ( s p i n G l a ss ) ;
84
85 g et m in im um p at h ( s p in G la s s ) ;
86
87 term comms ( ) ;
88
89 / ∗ M as te r p r o c e ss o u t p u t s s p i n g l a s s g r ou nd s t a t e ∗ / 90 i f ( S o l v e r P r o c I D == 0 ) {91 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;
92 g p r i n t f ( ” En e r g y : %E\n ” , e n e r g y ) ;
93 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t d o u t ) ;
94 }95
96 r e t u r n e n e r g y ;
97 }98
99 s t a t i c v oi d a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g i n t row , t i n t c o nf ,
g b o o l e a n i g n o r e B i t m a s k ) {
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 138/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 139/183
121
152
153 / ∗ S e t c u r re n t s p in row c o n f i g ur a t i o n ∗ / 154 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;
155
156 f o r ( k =0 ; k < t r e l l i s C o l s ; k ++ ) {157 g d o u b l e i n t e r R o w E n e r g y ; / ∗ E n e rg e ti c c o n t r i b u t io n o f c u r re n t a nd
p r e v i o u s r ow ∗ / 158 g d o u b l e r o wE n er g y ; / ∗ E n e rg e ti c c o n t r i b u t io n o f c u r re n t r ow ∗ / 159
160 / ∗ S e t p r e ce d i ng s p i n row c o n f i g u r a t i o n ∗ / 161 a d j u s t s p in r o w ( s pi nG la ss , pr e v i o u sR o w , k , IGNORE BITMASK ) ;
162
163 / ∗ C a lc u l at e e n e r g et i c c o n t r i b u t i o n s ∗ / 164 i n t e r R o w E n e r g y = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou s Ro w ) ;
165 r o wE n er g y = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) ;
166
167 i f ( m i n P a t h [ k ]+ i n t e r R o w E n e r g y+r o w E n e r g y < p a t h ) {
168 p a t h = m i n P a t h [ k ] + i n t e r R o w E n e r g y + rowEnergy ;169 c o n f = k ;
170 }171 }172
173 / ∗ R ec or d o p ti mu m p a t h s t o e xa m in e d s t a t e ∗ / 174 m i n P a t h C o n f [ j ] = c o n f ;
175 minPathNew [ j ] = p a t h ;
176 }177 }178
179 # i f d e f USE MPI
180 / ∗ E x c ha n g e m in im um p a t h s ∗ / 181 M P I A l l g a t h e r ( m i nP at hN ew , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,
t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;
182 # e l s e
183 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;
184 # e n d i f
185
186 g f r e e ( min PathNew ) ;
187 }188 # e n d i f
189
190 / ∗ C y cl i c v a r ia n t ∗ / 191 # i f d e f CYCLIC EXCHANGE
192 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t
∗ m in Pa th Co nf , g i n t row , t i n t t r e l l i s C o l s , t i n t f i na l Ro w Co n f ) {193
194 t i n t j , k ;
195
196 / ∗ C om pu te n e i g h b o u r p r o c e s s I D ∗ / 197 g i n t l e f tN e i gh bo u r = ( S o l v e r P r o c I D −1+S o l v e r N P r o c s ) % S o l v e r N P r o c s ;
198
199 / ∗ S t o r e s u p d at e d mi ni mu m p a t h d a t a ∗ / 200 g d o u b l e ∗ minPathNew = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;
201 g d o u b l e ∗ b u f f e r = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;
202
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 140/183
122 Chapter F. Source Code Listings
203 g i nt p r e v i o u s R o w ;
204
205 i f ( r o w == 0 ) {206 p r e v i o u s R o w = ( s p i n G l a s s −> x S i z e ) − 1 ;
207208 / ∗ S e t p r e ce d i ng row c o n f i g u r a t i o n ∗ / 209 a d j u s t s p i n r o w ( s p i n G l as s , p re vi ou sR o w , f in a lR o wC o nf , IGNORE BITMASK ) ;
210
211 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {212 m i n P a t h C o n f [ j ] = f i n a l R o w C o n f ; / ∗ T h e o r e t i c a l l y r e du n da n t ∗ / 213
214 / ∗ S e t c u r re n t s p in row c o n f i g ur a t i o n ∗ / 215 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;
216
217 / ∗ C a l cu l a te e n e r g et i c c o n t r i b u t io n ∗ / 218 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) +
s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou sR o w ) ;
219 }220 } e l s e {221 M P I R e q u e s t r e q u e s t ;
222 p r e v i o u s R o w = ro w − 1 ;
223
224 / ∗ I t e r a t e t hr ou gh s u b se t o f c u r re n t row ’ s s t a t e s ∗ / 225 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {226 g d o u b l e p a t h = G MAXDOUBLE;
227 t i n t c o n f ;
228
229 / ∗ S e t s p in row c o n f i g ur a t i o n ∗ / 230 a d j u s t s p i n r o w ( s pi n Gl as s , row , j , ! IGNORE BITMASK ) ;
231
232 / ∗ I t e r a t e t hr ou gh ∗ a l l ∗ s t a t e s o f p r ec e di ng s p in row ∗ / 233 f o r ( k =0 ; k < t r e l l i s C o l s ; k ++ ) {234 g d o u b l e i n t e r R o w E n e r g y ;
235 g d o u b l e r o wE n er g y ;
236
237 / ∗ S e t p r e v io u s r ow c o n f i g u r a t i o n ID ∗ / 238 t i n t cID = ( S o l v e r P r o c I D ∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) + k ) %
t r e l l i s C o l s ;
239
240 / ∗ I n i t i a t e n e i g hb o u r r o t a t i o n o f m i np a th ∗ / 241 i f ( k == 0) MPI Isse nd ( minPath , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
l e f t N e i g h b o u r , 0 , COMM, & r e q u e s t ) ;
242
243 / ∗ S e t p r e ce d i ng s p i n r ow c o n f i g u r a t i o n ∗ / 244 a d j u s t s p in r o w ( s pi nG la ss , pr e v i o u sR o w , cID , IGNORE BITMASK ) ;
245
246 / ∗ C a lc u l at e e n e r g et i c c o n t r i b u t i o n s ∗ / 247 i n t e r R o w E n e r g y = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l as s , p r ev i ou s Ro w ) ;
248 r o wE n er g y = s p i n g l a s s r o w e n e r g y ( s p i n G l as s , r ow ) ;
249
250 i f ( k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) == 0 && k != 0 ) {251 / ∗ R e c e i ve d a t a ∗ / 252 MPI Recv ( b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , (
S o l v e r P r o c I D +1) % Sol ver NPr oc s , MPI ANY TAG, COMM,
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 141/183
123
MPI STATUS IGNORE) ;
253 MPI Wait (& re q u e s t , MPI STATUS IGNORE ) ;
254 memcpy ( m i n Pa th , b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( g d o u b l e
) ) ;
255 / ∗ . . . r e c e i v e d a t a ∗ / 256 / ∗ S e nd d a t a ∗ / 257 M P I I s s e n d ( m i n Pa th , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
l e f t N e i g h b o u r , 0 , COMM, & r e q u e s t ) ;
258 / ∗ S e nd d a t a ∗ / 259 }260
261 i f ( minP ath [ k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + i n t e r R o w E n e r g y +
r o w E n e r g y < p a t h ) {262 p a t h = min Pat h [k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + i n t e r R o w E n e r g y +
rowEnergy ;
263 c o n f = cID ;
264 }
265 }266
267 m i n P a t h C o n f [ j ] = c o n f ;
268 minPathNew [ j ] = p a t h ;
269
270 / ∗ R e c e i ve d a t a ∗ / 271 MPI Recv ( b uf fe r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , ( S o l v e r P r o c I D +1)
% So lv er NP ro cs , MPI ANY TAG, COMM, MPI STATUS IGNORE) ;
272 MPI Wait (& r e qu e st , MPI STATUS IGNORE ) ;
273 memcpy ( mi n Pa t h , b uf f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( g d o u b l e ) ) ;
274 }275 }276
277 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;278
279 / ∗ F r e e m em or y ∗ / 280 g f r e e ( min PathNew ) ;
281 g f r e e ( b u f f e r ) ;
282 }283 # e n d i f
284
285 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {286 t i n t j ;
287 g u i n t i ;
288
289 g ui n t t re l l is Ro ws = s p i n G l a s s −> x S i z e ;
290 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e ) ;291
292 g d o u b l e p a t h = G MAXDOUBLE;
293 t i n t c on f ;
294
295 / ∗ S t o r e s minimum p a th t o c u r r e n t l y e xa mi ne d s u bs y st e m f o r e ac h o f i t s s t a t e s ∗ / 296 # i f d e f CYCLIC EXCHANGE
297 g d o u b l e ∗ m i n P a t h P a r t i a l = g new0 ( gdo uble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;
298 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ; / ∗ S t o r e s mi nimu m p a t h d a t a o f a
s ub s y s t em i n a s u b s e t o f i t s s t a t e s ∗ / 299 # e l s e
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 142/183
124 Chapter F. Source Code Listings
300 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ;
301 g d o u b l e ∗ m i n P a t h P a r t i a l = m i n P a t h ;
302 # e n d i f
303
304 t i n t ∗∗ m i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t im a l c o n f i g u r a t i o n s o f p r e ce d i ng s ub sy s te m , g i v en s u bs y st e m i i n
s t a t e j ∗ / 305
306 i f ( ! s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s p i n G la s s ) ) {307 f o r ( i =0 ; i < t r e l l i sR o w s ; i ++ ) {308 g e t o p t i m a l p r e s t a t e s ( s p in Gl as s , m in P a th Pa rt i a l , m in P a t h C o n f [ i ] , i ,
t r e l l i s C o l s , 0 ) ; / ∗ L a st a rg um en t i s z er o , s i n c e we do n ’ t c a re a b ou t
v e r t i c a l b ou nd ar y ∗ / 309 }310
311 # i f d e f CYCLIC EXCHANGE
312 M PI A l lg at he r ( m i nP a th P a rt i al , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,
t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;313 # e n d i f
314
315 / ∗ G e t m in im um p a t h ∗ / 316 f o r ( j =0 ; j < t r e l l i s C o l s ; j ++ ) {317 i f ( minPath [ j ] < p a t h ) {318 p a t h = m i n P a t h [ j ] ;
319 c o n f = j ;
320 }321 }322 s e t o p t i m a l c o n f i g ( s pi nG la ss , m in Pa th Co n f , c on f ) ;
323
324 } e l s e {325 t i n t ∗∗ r e t a i n e d M i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s /
S o l v e r N P r o c s ) ;
326
327 f o r ( j =0 ; j < t r e l l i s C o l s ; j ++ ) {328 f o r ( i =0 ; i < t r e l l i sR o w s ; i ++ ) {329 / ∗ L a st a rg um en t c o r r es p o nd s t o f i x e d s p i n f o r b ou nd ar y i n t e r a c t i o n ∗ / 330 g e t o p t i m a l p r e s t a t e s ( s p in G l a s s , m in Pa th Pa rt ia l , mi n P a t h C o n f [ i ] , i ,
t r e l l i s C o l s , j ) ;
331 }332
333 # i f d e f CYCLIC EXCHANGE
334 M PI Al lg at he r ( m i n P a t h P ar t i a l , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
minP ath , t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;
335 # e n d i f 336
337 / ∗ T r a ck e n e r g y ∗ / 338 i f ( minPath [ j ] < p a t h ) {339 p a t h = m i n P a t h [ j ] ;
340 c o n f = j ;
341 / ∗ R e ta i n s t a t e s s t o re d i n mi nC onf ∗ / 342 memcpy (& ( r et ai ne dM in Pa th Co nf [ 0 ] [ 0 ] ) , &( mi n P a t h C o n f [ 0 ] [ 0 ] ) , t r e l li s R o ws
∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) ∗ s i z e o f ( t i n t ) ) ;
343 }344 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 143/183
125
345
346 s e t o p t i m a l c o n f i g ( s pi nG la ss , r et ai ne dM in Pa th Co nf , c on f ) ;
347 a r r a y f r e e 2 D ( r et ai ne dM in Pa th Co nf ) ;
348 }
349350 g f r e e ( m in Pa t h ) ;
351 a r r a y f r e e 2D ( m in Pa th Co nf ) ;
352 # i f d e f CYCLIC EXCHANGE
353 g f r e e ( m i nP a th P ar t i a l ) ;
354 # e n d i f
355 }356
357 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t
c o n f ) {358 g i n t i ;
359 g ui n t t re l l is Ro ws = s p i n G l a s s −> x S i z e ;
360 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e ) ;
361362 # i f d e f USE MPI
363 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e e xc ha ng ed (
c o mp l et e ) row c o n f i g u r a t i o n d a ta ∗ / 364 # e n d i f
365
366 / ∗ I t e r a t e t hr ou gh s p in r o ws i n r e v er s e ∗ / 367 f o r ( i= t r e l l i s R o w s −1 ; i >=0; i −− ) {368 / ∗ S e t r ow c o n f i g u r a t i o n ∗ / 369 a d j u s t s p i n r o w ( s p i nG l as s , i , c on f , IGNORE BITMASK ) ;
370
371 / ∗ R e f er e nc e o pt imu m c o n f i g u r a t i o n o f p r e ce d i ng s p i n r ow ∗ / 372 # i f d e f USE MPI
373 M P I A l l g a t h e r ( m in P at h Co n f [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T I NT , m i nP at h Co n fR o w, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;
374 c o n f = minPathConfRow [ conf ] ;
375 # e l s e
376 c o n f = m i n P a t h C o n f [ i ] [ c o n f ] ;
377 # e n d i f
378 }379
380 # i f d e f USE MPI
381 g f r e e ( mi nPa th Co nf Ro w ) ;
382 # e n d i f
383 }384
385 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {386
387 # i f d e f USE MPI
388 g d ou bl e b i n a r y P l a c e s ;
389
390 M P I I n i t ( NULL , NULL ) ;
391 M PI C o mm s iz e (COMM, & S o l v e r N P r o c s ) ;
392 M PI C om m r an k (COMM, & S o l v e r P r o c I D ) ;
393
394 / ∗ C hec k p r o c e s so r c o un t i s a p ow er o f t wo o r u n i t y ∗ / 395 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2 !=0 ) {
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 144/183
126 Chapter F. Source Code Listings
396 g f p r i n t f ( s t de r r , ” The p r o c e s s o r c o un t mu st be a power o f two . A bo rt in g . \ n” ) ;
397 e x i t ( EXIT FAILURE ) ;
398 }399
400 / ∗ C r e at e p r o c e s s o r m as k ∗ / 401 S ol ve r P ro ce ss or Ma sk = S o l v e r P r o c I D ;
402 b in ar yP l a ce s = ( l o g ( ( g d o u b l e ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;
403 S ol ve r P ro ce ss or Ma sk <<= ( s p i n G l a s s −> y S i z e ) − ( g i n t ) b i n a r y P l a c e s ; / ∗ S h i f t l o g 2 (
N p r oc s ) b i t s l e f t ∗ / 404 # e n d i f
405 }406
407 s t a t i c v oi d t e r m c o m m s ( ) {408
409 # i f d e f USE MPI
410 M P I F i na li z e ( ) ;
411 # e n d i f
412 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 145/183
127
1 / ∗2 ∗ F il e : d p g s t a t e f i n d e r f a s t . c
3 ∗4 ∗ I m p le m e nt s s e r i a l a nd p a r a l l e l i m pr o ve d d y n am ic p ro gr am mi ng a l g o r i t h m s
5 ∗6 ∗ / 7
8 # i n c l u d e < s t d l i b . h>
9 # i n c l u d e <math . h>
10 # i n c l u d e < s t r i n g . h>
11 # i n c l u d e < g l i b . h>
12 # i n c l u d e < g l i b / g p r i n t f . h>
13
14 # i n c l u d e ” s p i n g l a s s . h ”
15 # i n c l u d e ” a r r a y s . h ”
16 # i n c l u d e ” g s t a t e f i n d e r . h”
17
18 / ∗ USE MPI d e f i n e s p a r a l l e l c od e ∗ / 19 # d e f i n e USE MPI
20 # i f d e f USE MPI
21 # i n c l u d e <mpi . h>
22 # e n d i f
23
24 / ∗ D e f in e s d a ta t y p e f o r m es sa ge p a s si n g ∗ / 25 # d e f i n e T INT MPI LONG LONG INT
26
27 / ∗ C o n st a nt a l i a s ∗ / 28 # d e f i n e IGNORE BITMASK TRUE
29
30 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 31 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;32 s t a t i c g i n t S o lv e r P ro c I D = 0 ;
33 # d e f i n e COMM MPI COMM WORLD
34 s t a t i c t i n t S o l v e r P r o c e s s o r M a sk = 0 ;
35 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 36
37 / ∗ A d ju s t g ro up o f s p i ns a cc o rd i ng t o b i t s t r i n g r e p r e s e n t a t i o n
38 ∗ s p i n G l a s s ( w ri t e ) t h e s p i n g la ss s t r u c t u r e t o m a n i p u l a t e
39 ∗ l e a d i n g S p i n s p e c i f i e s s l i d i n g window p o s i t i o n i n t h e r a n g e [ y S i z e
, x S i z e ∗ y S i z e )
40 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row
41 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID d o e s n o t i n fl u en ce t h e b i t
s t r i n g ∗ / 42 s t a t i c v oi d a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a di n g Sp i n , t i n t
c o n f , g b o o l e a n i g n o r e B i t m a s k ) ;
43
44 / ∗ D et er mi ne g ro un d s t a t e and c o n f i g u ra t i o n o f a s p in g l a s s i n s t an c e
45 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ / 46 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
47
48 / ∗ D et er mi ne o pt im um c o n f i g u r a t i o n s o f s p i n g ro up l e a di n g Sp i n −1 , f o r a l l
c o n f i g u r a t i o n s o f g r ou p l e a d i n g S p i n
49 ∗ s pi nG la ss ( r e a d / w r i te ) s pi n g l a s s i n s t a n c e
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 146/183
128 Chapter F. Source Code Listings
50 ∗ m i n P a t h ( r e a d / w r i t e ) s t o r e s minimum p a th ( i . e . g ro un d s t a t e e n er g y ) o f
s ub s y st em b e fo r e a nd a f t e r i n cr e me n ti n g b y s p in l e ad i n gS p i n
51 ∗ m i n Pa t h Co n f ( r e a d / w r i te ) s t o r e s op timum c o n f i g u r at i o n s o f s p in g ro up s
52 ∗ l e a d i n g S p i n p o s i t io n o f s l i d i n g window i n t h e r a n g e [ y S i z e , x S i z e ∗
y S i z e )53 ∗ t r e l l i s C o l s number of s p i n gr o u p c o nf i g ur a ti o ns
54 ∗ f i n a l R o w C o n f u s e d t o s pe c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c li c
b ou nd ar y c o n d i t i o n s a re p r e s e n t ∗ / 55 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t
∗ m i n P at h C o n f , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) ;
56
57 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n g ro up s , b as ed on o p ti mu m c o n f i g u r a t i o n s
58 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e
59 ∗ m i n P a t h C o n f ( r e a d ) s t o r es op timum s pi n g r o u p c o nf i g u r a t io n s
60 ∗ c o n f o ptimum c o n f i g u r a t i o n o f s p i n g r o u p a t u l t i m a t e
s l i d i n g wind ow p o s i t i o n ∗ / 61 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t
c o n f ) ;62
63 / ∗ I n i t i a l i s e m e s s ag e p a s s i n g c o m m u n i c a t i o n s
64 ∗ / 65 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
66
67 / ∗ T e r m i n a t e m e s sa g e p a s s i n g c o m m u n i c a ti o n s
68 ∗ / 69 s t a t i c v oi d term comms ( v o i d ) ;
70
71
72 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {73
74 g d o u b l e e n e r g y ;75
76 i f ( s p i n G l a s s −> y S i z e > 6 3 ) {77 g f p r i n t f ( s t d e r r , ” E rr o r : The s p e c i f i e d s pi n l a t t i c e e xc e ed s a c o un t o f 63
c o l u m n s \n” ) ;
78 }79
80 i ni t c om ms ( s p i n G l a ss ) ;
81
82 g et m in im um p at h ( s p in G la s s ) ;
83
84 term comms ( ) ;
85
86 / ∗ M as te r p r o c e ss o u t p u t s s p i n g l a s s g r ou nd s t a t e ∗ / 87 i f ( S o l v e r P r o c I D == 0 ) {88 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;
89 g p r i n t f ( ” En e r g y : %E\n ” , e n e r g y ) ;
90 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t d o u t ) ;
91 }92
93 r e t u r n e n e r g y ;
94 }95
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 147/183
129
96 s t a t i c v oi d a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a di n g Sp i n , t i n t
c on f , g b o o le a n i g n o r e B it m a s k ) {97 g i n t i ;
98 S p i n s p i n ;
99100 # i f d e f USE MPI
101 / ∗ Row c o n f i g ur a t i o n i s d ep en de nt on p r oc e ss o r I D , w hi ch i s a b i t p r e f i x ∗ / 102 i f ( ! i g n o r e B i t m a s k ) c o n f = c o n f | S o l v e r P r o c e s s o r M a s k ;
103 # e n d i f
104
105 f o r ( i =0; i <=s p i n G l a s s −> y S iz e ; i ++ ) {106 i f ( c o n f % 2 != 0 ) s p in = UP ;
107 e l s e s p i n = DOWN;
108
109 / ∗ S et s pi n a t p o s i t i o n i w i th i n s l i d i n g window ∗ / 110 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −( s p i n G l a s s −> y S i z e )+ i ] = s p i n ;
111
112 c o n f = c o n f >> 1 ;113 }114 }115
116 s t a t i c v oi d g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g d o u bl e ∗ m i nP a th , t i n t
∗ m i n P at h C o n f , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) {117 t i n t j ;
118 t i n t k ;
119
120 / ∗ S t o r e s u p d at e d mi ni mu m p a t h d a t a ∗ / 121 g d o u b l e ∗ minPathNew = g new0 ( gdo ubl e , t r e l l i s C o l s / S o l v e r N P r o c s ) ;
122
123 i f ( l e a d i n g S p i n == s p i n G l a s s −> y S i z e ) {124 / ∗ s p i n G l a s s −> y S i z e c o r re s po n d s t o t h e f i r s t s p i n i n t h e s ec on d row o f t h e
l a t t i c e ∗ / 125
126 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {127 / ∗ S e t c u r re n t s p in g ro up c o n f i g u ra t i o n ∗ / 128 a d j u st s p i n e n s e mb l e ( s p in Gl as s , l e a di n gS p in , j , ! IGNORE BITMASK ) ;
129
130 / ∗ C a l cu l a te e n e r g et i c c o n t r i b u t io n ∗ / 131 minPathNew [ j ] = s p i n g l a s s e n s e m b l e d e l t a ( s p in G l as s , l e a d i n gS p i n ) +
s p i n g l a s s r o w e n e r g y ( s p i n G l as s , 0 ) ;
132 }133
134 } e l s e {135 f o r ( j =0 ; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++ ) {136 g d o u b l e p a t h = G MAXDOUBLE;
137 g d o u b l e e n s e m b l e E n e r g y ;
138 t i n t c o n f I n d e x , c o n f ;
139
140 / ∗ S e t c u r re n t s p in e ns em bl e c o n f i g ur a t i o n ∗ / 141 a d j u st s p i n e n s e mb l e ( s p in Gl as s , l e a di n gS p in , j , ! IGNORE BITMASK ) ;
142 e n s e m b l e E n e r g y = s p i n g l a s s e n s e m b l e d e l t a ( s p i n G l as s , l e a d i n g S p i n ) ;
143
144 / ∗ C a l c u l at e i n d ex f o r a c c e ss i n g p r e ce d i ng e n se mb l e c o n f i g u r a t i o n ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 148/183
130 Chapter F. Source Code Listings
145 c o n f I n d e x = ( ( ( j | S o l v e r P r o c e s s o r M a s k ) << 1 ) | t r e l l i s C o l s ) ˆ
t r e l l i s C o l s ;
146
147 f o r ( k =0 ; k < 2; k ++) {
148 / ∗ M i n im i se o n sum o f e n s em b le e n e r g i e s ∗ / 149 i f ( m i n P at h [ c o n f I n d e x+k ]+ e n s e m b l e E n e r g y < p a t h ) {150 p a t h = m i n P a t h [ c o n f I n d e x+k ] + e n s e m b l e E n e r g y ;
151 c o n f = c o n f I n d e x + k ;
152 }153 }154
155 / ∗ R ec or d o p ti mu m p a t h s t o e xa m in e d s t a t e ∗ / 156 m i n P a t h C o n f [ j ] = c o n f ;
157 minPathNew [ j ] = p a t h ;
158 }159 }160
161 # i f d e f USE MPI162 / ∗ E x c ha n g e m in im um p a t h s ∗ / 163 M P I A l l g a t h e r ( m i nP at hN ew , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , m i n P a t h ,
t r e l l i s C o l s / So lv er NP ro cs , MPI DOUBLE, COMM) ;
164 # e l s e
165 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) m i n P a t h [ j ] = minPathNew [ j ] ;
166 # e n d i f
167
168 g f r e e ( min PathNew ) ;
169 }170
171 s t a t i c v oi d g e t m i n i m u m p a t h ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {172 t i n t j ;
173 g u i n t i ;174
175 g ui n t t re l l is Ro ws = ( s p i n G l a s s −>x S i z e −1) ∗ s p i n G l a s s −> y S i z e ;
176 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e+1 ) ;
177
178 g d o u b l e p a t h = G MAXDOUBLE;
179 t i n t c on f ;
180
181 g d o u b l e ∗ m i n P a t h = g new0 ( gdo ub le , t r e l l i s C o l s ) ;
182
183 t i n t ∗∗ m i n P a t h C o n f = a r r a y n e w 2 D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t im a l c o n f i g u r a t i o n s o f p r e ce d i ng s ub sy s te m , g i v en s u bs y st e m i i n
s t a t e j ∗ / 184185 f o r ( i =0; i < t r e l l i sR o w s ; i ++ ) {186 g e t o p t i m a l p r e s t a t e s ( s pi nG la ss , mi nPa th , m in Pa th Co nf [ i ] , s pi n Gl as s −> y S i z e+ i ,
t r e l l i s C o l s ) ;
187 }188 / ∗ F in d o pt im um c o n f i g u r a t i o n o f s p i n g r ou p a t u l t i m a t e s l i d i n g w ind ow p o s i t i o n ∗ / 189 f o r ( j =0; j < t r e l l i s C o l s ; j ++ ) {190 i f ( minPath [ j ] < p a t h ) {191 p a t h = m i n P a t h [ j ] ;
192 c o n f = j ;
193 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 149/183
131
194 }195 s e t o p t i m a l c o n f i g ( s pi n Gl as s , m in Pa th Co nf , c on f ) ;
196
197 g f r e e ( m in Pa t h ) ;
198 a r r a y f r e e 2D ( m in Pa th Co nf ) ;199 }200
201 s t a t i c v oi d s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , t i n t ∗∗ m i n P at h C o n f , t i n t
c o n f ) {202 g i n t i ;
203 g ui n t t re l l is Ro ws = ( s p i n G l a s s −>x S i z e −1) ∗ s p i n G l a s s −> y S i z e ;
204 t i n t t r e l l i s C o l s = 1 << ( s p i n G l a s s −> y S i z e+1 ) ;
205
206 # i f d e f USE MPI
207 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e e xc ha ng ed (
c o mp l et e ) row c o n f i g u r a t i o n d a ta ∗ / 208 # e n d i f
209210 f o r ( i= t r e l l i s R o w s −1 ; i >0 ; i −− ) {211
212 / ∗ S e t s p in G la s s s p in a c co r d in g t o l e ad i n g s p in c o n f i g u ra t i o n ∗ / 213 g i n t s p i n V a l = c o n f >> ( s p i n G l a s s −> y S i z e ) ;
214 g i n t l e a d i n g S p i n = ( s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e − 1 ) − ( t r e l l i s R o w s −1
− i ) ;
215 i f ( s p i n V al != 0 ) ( s p i n G la s s −> s p i n s ) [ l e a d i n g S p i n ] = UP ;
216 e l s e ( s p i n G l a s s −> s p i n s ) [ l e a d i n g S p i n ] = DOWN;
217
218 # i f d e f USE MPI
219 M P I A l l g a t h e r ( m in P at h Co n f [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T I NT , m i nP at h Co n fR o w
, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;
220 c o n f = minPathConfRow [ conf ] ;221 # e l s e
222 c o n f = m i n P a t h C o n f [ i ] [ c o n f ] ;
223 # e n d i f
224 }225
226 / ∗ S e t e n se mb l e c o n f i g u r a t i o n d ue t o f i r s t l e a d i n g s p i n ∗ / 227 a d j u st s p i n e n s e mb l e ( s p in Gl as s , s pi n Gl as s −>ySi ze , con f , IGNORE BITMASK) ;
228
229
230 # i f d e f USE MPI
231 g f r e e ( mi nPa th Co nf Ro w ) ;
232 # e n d i f
233 }234
235 s t a t i c v oi d i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {236
237 # i f d e f USE MPI
238 g d ou bl e b i n a r y P l a c e s ;
239
240 M P I I n i t ( NULL , NULL ) ;
241 M PI C o mm s iz e (COMM, & S o l v e r N P r o c s ) ;
242 M PI C om m r an k (COMM, & S o l v e r P r o c I D ) ;
243
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 150/183
132 Chapter F. Source Code Listings
244 / ∗ C hec k p r o c e s so r c o un t i s a p ow er o f t wo o r u n i t y ∗ / 245 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2 !=0 ) {246 g f p r i n t f ( s t de r r , ” The p r o c e s s o r c o un t mu st be a power o f two . A bo rt in g . \ n” ) ;
247 e x i t ( EXIT FAILURE ) ;
248 }249
250 / ∗ C r e at e p r o c e s s o r m as k ∗ / 251 S ol ve r P ro ce ss or Ma sk = S o l v e r P r o c I D ;
252 b in ar yP l a ce s = ( l o g ( ( g d o u b l e ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;
253 S ol ve r P ro ce ss or Ma sk <<= ( s p i n G l a s s −> y S i z e ) + 1 − ( g i n t ) b i n a r y P l a c e s ; / ∗ S h i f t
l o g 2 ( N p ro c s ) b i t s l e f t ∗ / 254 # e n d i f
255 }256
257 s t a t i c v oi d t e r m c o m m s ( ) {258
259 # i f d e f USE MPI
260 M P I F i na li z e ( ) ;261 # e n d i f
262 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 151/183
133
1 / ∗2 ∗ F i l e : h a r m o n y g s t a t e f i n d e r . c
3 ∗4 ∗ I m pl e me n ts p a r a l l e l h armo ny g ro un d s t a t e s o l v e r
5 ∗6 ∗ / 7
8 # i n c l u d e < g l i b . h>
9 # i n c l u d e < g l i b / g p r i n t f . h>
10 # i n c l u d e <mpi . h>
11 # i n c l u d e < s t r i n g . h>
12
13 # i n c l u d e ” s p i n g l a s s . h ”
14 # i n c l u d e ” g s t a t e f i n d e r . h”
15 # i n c l u d e ”random . h”
16
17
18 / ∗ S e r i a l a l g o ri t h m p a ra m et e rs ∗ / 19 # d e f i n e NVECTORS 10
20 # d e f i n e MEMORY CHOOSING RATE 0 . 9 5
21
22 / ∗ P a r a l l e l a l g o ri t h m p a ra m et e rs ∗ / 23 # d e f i n e ITERBLOCK 10 0
24 # d e f i n e ZONEEXBLOCK 10 0
25
26 / ∗ Common s p i n g l a s s d a t a ∗ / 27 s t a t i c s t r uc t S p i n G l a s s ∗ s p i n G l a s s ;
28 s t a t i c S p i n ∗ s p i n s [NVECTORS] ;
29 s t a t i c g i n t x S iz e ;
30 s t a t i c g i n t y S iz e ;
31 / ∗ Common s p i n g l a s s d a t a ∗ / 32
33 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 34 # d e f i n e COMM MPI COMM WORLD
35 # d e f i n e ZONE SIZE 16
36 s t a t i c M P I D a t a t yp e T y p e A r r a y ;
37 s t a t i c M PI Op R e d u c t i o n O p ;
38 s t a t i c MPI Comm S o l v e r Z o n e ;
39 s t a t i c g i n t S o lv e r P ro c I D = 0 ;
40 s t a t i c g i n t S o lv e r N Pr o c s = 1 ;
41 / ∗ C o m m u n i c a ti o n s d a t a ∗ / 42
43 / ∗ D et er mi ne h i g h e st e ne rg y s p in g l a ss h el d by t h i s p r oc e ss
44 ∗ h ig he st En er gy ( w r i te ) t he e n e rg y o f th e o b ta in ed s o l u t i o n v e c t o r 45 ∗ v e c t o r N u m ( w ri te ) t h e in d e x of th e s o l u t i o n v e c t o r a s st o re d i n t h e
a r r a y s p i n s [ ] ∗ / 46 s t a t i c v oi d c o m p u t e h i g h e s t e n e r g y ( g d o ub l e ∗ h i g h e st E n e r g y , g i n t ∗ vectorNum ) ;
47
48 / ∗ D et er mi ne l o w es t e ne rg y s p in g l a s s h e ld b y t h i s p r oc es s
49 ∗ h ig he st En er gy ( w r i te ) t he e n e rg y o f th e o b ta in ed s o l u t i o n v e c t o r
50 ∗ v e c t o r N u m ( w ri te ) t h e in d e x of th e s o l u t i o n v e c t o r a s st o re d i n t h e
a r r a y s p i n s [ ] ∗ / 51 s t a t i c v oi d c o m p u t e l o w e s t e n e r g y ( g d o u b l e ∗ l o we s tE n e rg y , g i n t ∗ vectorNum ) ;
52
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 152/183
134 Chapter F. Source Code Listings
53 / ∗ D et er mi ne t h e a l g o ri t h m ’ s c o nv e rg e nc e s t a t u s , b as ed on s o l u t i o n v e c t o r s h e ld b y
e ac h p r o c e s s
54 ∗ r e t u r n s TRUE , i f t h e a l g o r i t h m h a s c o n v e r g e d ∗ / 55 s t a t i c gbo ole an g e t s t a b i l i s e d s t a t u s ( v o i d ) ;
5657 / ∗ C o l l e c t i v e l y o b ta i n e n e r g e t i c a l l y m in im al s o l u t i o n v e c to r h el d b y p r o c es s es
58 ∗ s p i n V e c t o r ( r e a d / w r i te ) s p e c i f i e s s o l u t i o n v e c to r t o p er fo rm r e d uc t i on on ,
b a se d o n e n e rg y
59 ∗ comm ( r e a d ) MPI c o m m u n i c a t o r t o s p e c i f y p ro ce s s e s i n vo lv e d i n
r e d u c t i o n ∗ / 60 s t a t i c v oi d r e d u c e m i n i m a l s p i n v e c t o r ( S p in ∗ sp in Ve cto r , MPI Comm comm) ;
61
62 / ∗ D e f in e s o p e ra t i on , o n w h ic h r e d u c t i o n i s b as ed
63 ∗ v e c t o r1 , v e c t o r 2 ( r e ad / w r i t e ) o p e r a t i o n a r g um e nt s
64 ∗ l e n g t h ( r e a d ) l e n g t h o f v e c t o r s
65 ∗ d a t a t y p e ( r e a d ) d a t a t y p e u s e d f o r c o m m u n i c a t i o n s ∗ / 66 s t a t i c v oi d r e d u c t i o n f u n c t i o n ( S pi n ∗ v e c t o r1 , S p in ∗ v e ct o r 2 , g i n t ∗ l e n g t h ,
M P I D a t a t y p e ∗ d a t a T y p e ) ;67
68 / ∗ I n i t i a l i s e m e s s ag e p a s s i n g c o m m u n i c a t i o n s
69 ∗ / 70 s t a t i c v oi d i n i t c o m m s ( v o i d ) ;
71
72 / ∗ T e r m i n a t e m e s sa g e p a s s i n g c o m m u n i c a ti o n s
73 ∗ / 74 s t a t i c v oi d term comms ( v o i d ) ;
75
76 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ p a r a m S p i n G l a s s ) {77 g i n t i , j ;
78
79 / ∗ U se d t o s t o r e e n e rg y a nd i d e n t i f i e r o f h i g h e s t e n e rg y v e c t o r i n memory ∗ / 80 g d ou bl e h ig he st En er gy ;
81 g i n t m a x Vec t o r ;
82
83 / ∗ U se d t o s t o r e e n e rg y a nd i d e n t i f i e r o f l o w e s t e n e rg y v e c t o r i n memory ∗ / 84 g do ub le m in E n er g y ;
85 g i n t m i n V e c t o r ;
86
87 / ∗ U se d f o r c o mm u ni c at i ng s p i n v e c t o r s ∗ / 88 S p i n ∗ n e i g h b o u r S p i n s = g n e w ( S p i n , p a r a m S p i n G l a s s −> x S i z e ∗ p a r a m S p i n G l a s s −> y S i z e ) ;
89
90 / ∗ S t or e s p in g l a s s g l o b a l l y ∗ / 91 s p i n G l a s s = p a r a m S p i n G l a s s ;
92 x S i z e = p a r a m S p i n G l a s s −> x S i z e ;93 y S i z e = p a r a m S p i n G l a s s −> y S i z e ;
94
95 i n i t c o m m s ( ) ;
96
97 / ∗ I n i t i a l i s e b y g e n e r a t i n g r an do m v e c t o r s ∗ / 98 f o r ( i =0 ; i <NVECTORS; i ++ ) s p i n s [ i ] = s p i n g l a s s g e t r a n d o m s p i n s ( s p i n G l a s s ) ;
99
100 / ∗ B e g i n i t e r a t i v e p r o c e s s ∗ / 101 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++ ) {102 / ∗ C r e a te new v e c t o r ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 153/183
135
103 S p i n ∗ n e w S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;
104
105 / ∗ C om pu te i n i t i a l h i g h e s t e n e r g y v e c t o r ∗ / 106 c o mp u te h i gh e st e n er g y (& h ig h es tE n er gy , &m ax Ve ct or ) ;
107108 / ∗ S e t v e c t o r c o mp o ne n ts ∗ / 109 f o r ( j =0 ; j < x S i z e ∗ y S i z e ; j ++ ) {110 i f ( s p i n G l a s s −>c l a mp s != NULL && ( sp in Gl as s −>clam ps ) [ j ]) {111 / ∗ C la mp in g c o n d i t i o n ∗ / 112 n ew S p i n s [ j ] = s p i n G l a s s −> s p i n s [ j ] ;
113 } e l s e i f ( r a n d c o n t i n u o u s ( 0 , 1 ) < MEMORY CHOOSING RATE) {114 / ∗ Memory s e l e c t i o n c o n d i t i o n ∗ / 115 n ew S p i n s [ j ] = s p i n s [ g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ] [ j ] ;
116 } e l s e i f ( r a n d c o i n t o s s ( ) ) {117 n ew S p i n s [ j ] = UP;
118 } e l s e {119 n ew S p i n s [ j ] = DOWN;
120 }121 }122
123 / ∗ R e p la c e v e c t o r i n memo ry , i f t h e new v e c t o r i s f i t t e r ∗ / 124 i f ( s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , n ew Sp in s ) < h i g h e s t E n e r g y ) {125
126 g f r e e ( s p i n s [ m ax V ec to r ] ) ; / ∗ F re e p r e v i ou s v e c t o r ∗ / 127 s p i n s [ m a x Ve ct o r ] = n e w S p i n s ;
128 } e l s e {129 g f r e e ( n e w Sp i n s ) ;
130 }131
132 i f ( S o l v e r P r o c I D % Z ONE SI ZE == 0 ) {133 / ∗ P e r i o di c e xc ha ng e o f s p i n v e c t o r s b et we en n e i gh b o ur i n g z o ne s ∗ / 134 / ∗ H i gh e st e n er g y v e c t o r i s r e p l ac e d b y ra nd om v e c t o r ∗ / 135 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS ) ;
136 M P I S e n dr e c v ( s p i n s [ random ] , 1 , T yp e A r r ay , ( S ol ve r P ro cI D+ZONE SIZE)%
S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , T y p e A r r a y , MPI ANY SOURCE ,
MPI ANY TAG , COMM, MPI STATUS IGNORE ) ;
137 r e d u c t i o n f u n c t i o n ( n ei gh bo ur Sp in s , s p i ns [ random ] , NULL, NULL) ;
138 }139
140 / ∗ Zo ne i n t e r n a l v e c t o r e xc ha ng e ∗ / 141 i f ( i % ZONEEXBLOCK == 0 ) {142 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ ma x Ve ct o r ] , S ol ve r Z on e ) ;
143 }144 }145
146 / ∗ D et er mi ne minimum v e c to r , c op y c o n f i g u r a t i o n b ac k t o o r i g i n a l s t r u c t u r e ∗ / 147 c o m p u t e l o w e s t e n e r g y (& mi nE ne rg y , &m i nV e ct o r ) ;
148 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ m i nV e ct o r ] , COMM) ;
149 memcpy ( s pi n Gl as s −> s p i n s , s p i n s [ m i n V e c t o r ] , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;
150
151 / ∗ M as te r p r o c e ss o u t p u t s s o l u t i o n ∗ / 152 i f ( S o l v e r P r o c I D == 0 ) {153 p r i n t f ( ” S t a b i l i s e d a f t e r %d i t e r a t i o n s . \ n ” , i ) ;
154 g p r i n t f ( ” En e r g y : %E\n ” , m i n E n e r g y ) ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 154/183
136 Chapter F. Source Code Listings
155 s p i n g l a s s w r i t e s p i n s ( s p in G l a s s , s t do u t ) ;
156 }157
158 term comms ( ) ;
159160 f o r ( i =0; i <NVECTORS; i ++ ) g f r e e ( s p i n s [ i ] ) ;
161 g f r e e ( n e i gh bo u r Sp in s ) ;
162
163 r e t u r n minEnergy ;
164 }165
166 s t a t i c gbo ole an g e t s t a b i l i s e d s t a t u s ( v o i d ) {167 g do ub le m in E n er g y ;
168 g do ub le g lo ba lM in En er gy ;
169 g bo ol ea n l oc al Ha sO pt im um = FALSE ;
170 g bo ol ea n a ll Ha ve Op it im um ;
171
172 g i n t m i n V e c t o r ;173
174 / ∗ P er fo rm r e d u c t i o n o n l o w e s t e n er gy s o l u t i o n s ∗ / 175 c o m p u t e l o w e s t e n e r g y (& mi nE ne rg y , &m i nV e ct o r ) ;
1 76 M P I A l l r e d u c e ( & m i n E n e r g y , & g l o b a l M i n E n e r g y , 1 , MPI DOUBLE , MP I MI N , COMM) ;
177
178 / ∗ D et er mi ne w he th er a l l p r oc e s s e s r e t a i n i d e n t i c a l l o w es t e ne rg y s o l u t i o n s ∗ / 179 i f ( minEnerg y == g l o b a l M i n E n e r g y ) l o c a l H as O p t i m um = TRUE ;
180 M P I A l l r e d u c e ( & l o c a l Ha s O p ti m u m , &a l l H a v eO p i t i m u m , 1 , M PI I NT , MPI LAND , COMM) ;
181
182 r e t u r n ( a l l H a v e O p i t i m u m ) ;
183 }184
185 s t a t i c v oi d c o m p u t e h i g h e s t e n e r g y ( g d o ub l e ∗ h i g h e st E n e r g y , g i n t ∗ vectorNum ) {186 g i n t i ;
187
188 ∗ h i g h e s t E n e r g y = −G MAXDOUBLE;
189
190 f o r ( i =0; i <NVECTORS; i ++ ) {191 / ∗ I t e r a t e t hr ou gh a l l s o l u t i o n v e ct o rs , d e te r mi ne h i g h e st e ne rg y ∗ / 192 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y c o n f ( s p i n G la s s , s p i n s [ i ] ) ;
193 i f ( e n e r g y > ∗ h i g h e s t E n e r g y ) {194 ∗ h i g h e s t E n e r g y = e n e r g y ;
195 ∗ vectorNum = i ;
196 }197 }198 }199
200 s t a t i c v oi d c o m p u t e l o w e s t e n e r g y ( g d o u b l e ∗ l o we s tE n e rg y , g i n t ∗ vectorNum ) {201 g i n t i ;
202
203 ∗ l o w e s t E n e r g y = G MAXDOUBLE;
204
205 f o r ( i =0; i <NVECTORS; i ++ ) {206 / ∗ I t e r a t e t hr ou gh a l l s o l u t i o n v e ct o rs , d e te r mi ne l o we s t e ne rg y ∗ / 207 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y c o n f ( s p i n G la s s , s p i n s [ i ] ) ;
208 i f ( e n e r g y < ∗ l o w e s t E n e r g y ) {
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 155/183
137
209 ∗ l o w e s t E n e r g y = e n e r g y ;
210 ∗ vectorNum = i ;
211 }212 }
213 }214
215 s t a t i c v oi d r e d u c e m i n i m a l s p i n v e c t o r ( S p in ∗ sp in Ve cto r , MPI Comm comm) {216 S p i n ∗ n e w S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;
217
218 M P I A ll r e du c e ( s p i n Ve c t o r , n ew Sp in s , 1 , T yp e Ar r ay , R e du c ti o n Op , comm ) ;
219 memcpy ( s p i n V ec t o r , n e wS pi ns , x S i z e∗ y S i z e ∗ s i z e o f ( Spin ) ) ;
220 g f r e e ( n e w S p i n s ) ;
221 }222
223 s t a t i c v oi d r e d u c t i o n f u n c t i o n ( S pi n ∗ v e c t o r1 , S p in ∗ v e ct o r 2 , g i n t ∗ l e n g t h ,
M P I D a t a t y p e ∗ d a t a T y p e ) {224 g do ub le en er gy 1 , e ne rg y2 ;
225226 e n e r g y 1 = s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , v e c t o r 1 ) ;
227 e n e r g y 2 = s p i n g l a s s e n e r g y c o n f ( s p i n G l as s , v e c t o r 2 ) ;
228
229 / ∗ O p e ra t i o n c o n d i t i o n ∗ / 230 i f ( e n e r g y 1 < e n e r g y 2 ) {231 memcpy ( v ec to r2 , v ec to r1 , x Si ze ∗ y S i z e ∗ s i z e o f ( Spin ) ) ;
232 }233 }234
235 s t a t i c v oi d i n i t c o m m s ( v o i d ) {236 M P I D a ta t yp e s p in Ty pe ;
237
238 M P I I n i t ( NULL , NULL ) ;239
240 M PI C o mm s iz e (COMM, & S o l v e r N P r o c s ) ;
241 M PI C om m r an k (COMM, & S o l v e r P r o c I D ) ;
242 i f ( S o l v e r P r o c I D == 0 ) p r i n t f ( ” N P ro c s : %d , z o n e s i z e : %d\n ” , S o l ve r N P r oc s ,
ZONE SIZE) ;
243
244 / ∗ S p l i t c o mm u ni c at o r ∗ / 245 M PI C om m s pl it (COMM, S o l v e r P r o c I D / Z ON E SI ZE , 0 , & S o l v e r Z o n e ) ;
246
247 / ∗ I n i t i a l i s e r e d u c t i o n o p e r a t i o n ∗ / 248 M PI O p c re at e ( ( M PI U se r f u nc ti on ∗ ) r e d u c t i o n f u n c t i o n , 1 , &R e d uc t io n O p ) ;
249 M PI T yp e v ec to r ( 1 , s i z e o f ( Spin ) , s i z e o f ( Spin ) , MPI BYTE , &spinT ype ) ;
250 M P I Ty p e v ec t o r ( x S iz e , y S iz e , y S iz e , s p in T yp e , & T yp e Ar r ay ) ;251 M PI T yp e c om mi t(& T y pe A r ra y ) ;
252 }253
254 s t a t i c v oi d term comms ( v o i d ) {255 MPI Co mm fr ee (& S o lv e r Z o ne ) ;
256 M PI T yp e f re e(& Ty pe A rr ay ) ;
257 M P I F i na li z e ( ) ;
258 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 156/183
138 Chapter F. Source Code Listings
1 / ∗2 ∗ F i l e : s p i n g l a s s . h
3 ∗4 ∗ S p e c i f i e s s p in g l a s s o p e ra t io n i n t e r f a c e and s p i n g l a s s d a t a s t r u c t u r e
5 ∗6 ∗ / 7
8 # i n c l u d e < g l i b . h>
9 # i n c l u d e < s t d i o . h>
10
11 # i f n d e f SPINGLASS H
12 # d e f i n e SPINGLASS H
13
14 / ∗ C o ns ta n ts f o r s p in g l a s s IO ∗ / 15 # d e f i n e STR SPIN UP ”+”
16 # d e f i n e STR SPIN DOWN ”−”
17 # d e f i n e STR CLAMPED ” 1 ”
18 # d e f i n e STR UNCLAMPED ” 0 ”19 # d e f i n e WEIGHT FMT ”%l f ”
20
21 / ∗ S p in d a ta t y p e ∗ / 22 t y p e d e f enum S p i n {23 UP = 1 ,
24 DOWN = −1
25 } S p i n ;
26
27 / ∗ S pi n g l a ss s t r u c t u r e ∗ / 28 s t r u c t S p i n G l a s s {29 / ∗ L a t t i c e d i me n s io n s ∗ / 30 g i n t x S i z e ;
31 g i n t y S i z e ;32
33 / ∗ V ec to r o f s p in s t a t e s ∗ / 34 S p i n ∗ s p i n s ;
35
36 / ∗ S t o r e s c o u p li n g c o n s t a n t s . Da ta a re s t o r e d a s t w o r ow m aj or m ap pi ng s o f s p i n s
t o v e c to r s ,
37 ∗ s uc h t h a t v e r t i c a l b on ds p r ec e de h o r i z o n t a l b on ds . ∗ / 38 g d o u b l e ∗ w e i g h t s ;
39 / ∗ S t o r e s c la mp in g s t a t e s s i m i l a r l y ∗ / 40 g b o o l e a n ∗ c l a m p s ;
41 / ∗ S t o r e s i n i t i a l s p i n c o n f i g u r a t i o n ∗ / 42 S p i n ∗ i n i t i a l S p i n s ;
43 } ;44
45 / ∗ C o ns t ru c t a new s p in g l a s s s t r u c t u r e
46 ∗ x S i z e l a t t i c e r o w s
47 ∗ y S i z e l a t t i c e c o l u m n s
48 ∗ i n i t i a l S p i n s ( r e ad ) v e c t o r o f i n i t i a l s p i n s t a t e s . I f NULL , a v e c t o r o f UP s p i n s
i s a l l o c a te d
49 ∗ w e i gh t s ( r ea d ) v e c t o r o f b on ds . I f NULL , z e ro w e i gh t s a re i n i t i a l i s e d
50 ∗ c l a m p s ( r e a d ) v e c t o r of cl am p i n g s t a t e s .
51 ∗ r e t u r n s s p i n g l a s s d a t a s t r u c t u r e ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 157/183
139
52 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t x Si ze , g i n t y Si ze , S pi n ∗ i n i t i a l S p i n s , g do ub le
∗ w e i g h t s , g b o o l e a n ∗ clamp s ) ;
53
54 / ∗ D e s t ru c t a s p i n g l a s s s t r u c t u r e . P er fo rm s d ee p d e a l l o c a t i o n .
55 ∗ s p i n G l a s s s p i n g l a s s d a t a s t r u c t u r e ∗ / 56 v o i d s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
57
58 / ∗ D et er mi ne t o t a l e ne rg y o f s p in g l a ss
59 ∗ s p in G la s s ( r ea d ) s p in g l a ss d at a s t r u c t u r e , wh ose s p i n s t a t e s and b on ds a r e
r e f e r e n c e d
60 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 61 g d ou b le s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
62
63 / ∗ D et er mi ne t o t a l e ne rg y o f s p in g l a ss u s i n g a l t e r n a t i v e s p in v e c to r
64 ∗ s p in G la s s ( r ea d ) s p in g l a ss d at a s t r u ct u r e , wh ose b on ds a r e r e f e r en c e d
65 ∗ c o nf ( r e a d ) v ec to r o f s pi ns whos e s t a t e s ar e re fe r e nc ed
66 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ /
67 g do ub le s p i n g l as s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f ) ;68
69 / ∗ D e te r mi n e e n e r g y o f s p i n r o w
70 ∗ row s p i n row i n r a n g e [ 0 ,NROWS)
71 ∗ s pi n G la s s ( r e ad ) s pi n g l a s s d at a s t r u ct u r e
72 ∗ r e tu rn s t o t a l e n e r g y , a c c o u n t i n g f or c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 73 g d ou b le s p i n g l a s s r o w e n e rg y ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g i n t ro w ) ;
74
75 / ∗ D et er mi ne e n er g y r e s u l t i n g f ro m v e r t i c a l i n t e r a c t i o n s b et we en t wo r ow s row , row+1
76 ∗ s pi n G la s s ( r e ad ) s pi n g l a s s d at a s t r u ct u r e
77 ∗ row row i n s p i n l a t t i c e , i n t h e r a n g e [ 0 ,NROWS)
78 ∗ r e tu rn s row e n e r g y , a c c o u n t i n g f o r c y c l i c b o u n d a r y i n t e r a c t i o n s ∗ / 79 g do ub le s p i n g l a ss i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G la s s , g i n t r ow ) ;
8081 / ∗ D e te r mi n e e n e rg y b e tw e en s p i n a nd i t s n e i g h b o u rs i m m e d i a t e ly a b ov e a nd t o t h e l e f t
o f i t
82 ∗ s pi n G la s s ( r e ad ) s pi n g l a s s d at a s t r u ct u r e
83 ∗ l ea di ng Sp in s p in p o s i t i o n i n t h e r a n g e [ 0 , XSIZE ∗ Y SI ZE ) , w i t h r ow m a jo r
e n u m e r a t i o n ∗ / 84 g do ub le s p i n g l a ss e n s e m b l e d e l t a ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a d i n gS p i n ) ;
85
86 / ∗ W ri te s p in s t a t e s t o f i l e
87 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o
88 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e ∗ / 89 v o i d s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;
90
91 / ∗ W ri te s p in s t a t e s t o f i l e92 ∗ c o n f ( r e a d ) s p i n c o n fi gu r a ti o n ve c to r t o o u t p u t
93 ∗ s p in G la s s ( r ea d ) u se d t o s p e c i f y l a t t i c e d im en s i o ns
94 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 95 v o i d s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f , F IL E ∗ f i l e ) ;
96
97 / ∗ W r it e c o u p li n g c o n s t a n t s t o f i l e
98 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e
99 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 100 v o i d s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;
101
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 158/183
140 Chapter F. Source Code Listings
102 / ∗ W r it e c la mp in g s t a t e s t o f i l e
103 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e
104 ∗ f i l e ( r e a d ) f i l e t o w r i t e t o ∗ / 105 v o i d s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) ;
106107 / ∗ G en e ra t e r an do m s p i n s b as ed on u n if o rm d i s t r i b u t i o n , a c c ou n t in g f o r c la mp ed s p i n s
108 ∗ s p in G la s s ( r ea d ) u se d t o s p e c i f y l a t t i c e d im en s i o ns and c la mp in g s t a t e s
109 ∗ r e tu rn s v ec t or of sp i ns s t or in g l a t t i c e c o n fi gu r a t i on ∗ / 110 S pi n ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
111
112 / ∗ D et er mi ne w he th er s p in g l a s s h as c y c ic v e r t i c a l b ou nd ar y i n t e r a c t i o n s
113 ∗ s p i n g l as s ( r e a d ) s pi n g l as s d at a s t r u c t u r e
114 ∗ r e t u r n s TRUE i f c o n d i t i o n p r e s e n t ∗ / 115 g bo ol ea n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
116
117 / ∗ Compare s p i n s t a t e s o f t w o s p i n g l a s s e s
118 ∗ s p in g la s s1 ( r e ad ) s pi n g l as s d a ta s t r u ct u r e
119 ∗ s p in g la s s2 ( r e ad ) s pi n g l as s d a ta s t r u ct u r e120 ∗ r et u r ns minimum number o f d i f f e r i n g s p i n s , c on s i de ri ng s p in Gl as s 1 ’ s
i n v e r s i o n ∗ / 121 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) ;
122
123 # e n d i f / ∗ SPINGLASS H ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 159/183
141
1 / ∗2 ∗ F i l e i : s p i n g l as s . c
3 ∗4 ∗ I mp le me nt s s p in g l a s s o p e ra t i on i n t e r f a c e
5 ∗6 ∗ / 7
8 # i n c l u d e < s t d i o . h>
9 # i n c l u d e < s t r i n g . h>
10 # i n c l u d e < g l i b . h>
11 # i n c l u d e < g l i b / g p r i n t f . h>
12
13 # i n c l u d e ” s p i n g l a s s . h ”
14 # i n c l u d e ” a r r a y s . h ”
15 # i n c l u d e ”random . h”
16
17 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t x Si ze , g i n t y Si ze , S pi n ∗ i n i t i a l S p i n s , g do ub le
∗ w e i g h t s , g b o o l e a n ∗ c l a m p s ) {18 g i n t i ;
19
20 s t r u c t S p i n G l a s s ∗ s p i n G l a s s = g new ( s t r u c t S p i n G l as s , 1 ) ;
21
22 s p i n G l a s s −> x S i z e = x S i z e ;
23 s p i n G l a s s −> y S i z e = y S i z e ;
24 i f ( x S i z e < 2 | | y S i z e < 2 ) {25 g f p r i n t f ( s t d e r r , ” War nin g : T r ie d t o c o n s t r u c t s pi n g l a s s w it h d i m e ns io ns %d
by %d\ n ” , x S i ze , y S i z e ) ;
26 }27
28 / ∗ A l l o ca t e s p in m a tr ix ∗ / 29 i f ( i n i t i a l S p i n s == NULL) {30 s p i n G l a s s −> s p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;
31 / ∗ A ss ig n d e f a u l t v a lu e s ∗ / 32 f o r ( i =0 ; i < x S i z e ∗ y S iz e ; i ++ ) ( s p i n G l as s −> s p i n s ) [ i ] = UP ;
33 s p i n G l a s s −> i n i t i a l S p i n s = NULL ;
34 } e l s e {35 s p i n G l a s s −> s p i n s = i n i t i a l S p i n s ;
36 / ∗ S e t i n i t i a l s p i n s ∗ / 37 s p i n G l a s s −> i n i t i a l S p i n s = g n e w ( S p i n , x S i z e ∗ y S i z e ) ;
38 memcpy ( s p i n G l a s s −> i n i t i a l S p i n s , s pi n Gl as s −> s p i n s , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;
39 }40
41 / ∗ A l l o c a t e b on d w e i g h t m a t ri x − s t o r e s v e r t i c a l bo nd s , t h en h o r i z o n ta l b on ds ∗ / 42 i f ( w e i g h t s == NULL ) s p i n G l a s s −> w e i g h t s = g n e w0 ( g d o u b l e , x S i z e ∗ y S i z e ∗2 ) ;43 e l s e s p i n G l a s s −> w e i g h t s = w e i g h t s ;
44
45 s p i n G l a s s −>c l a m p s = c l a m p s ;
46
47 r e t u r n s p i n G l a s s ;
48 }49
50 v o i d s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {51 / ∗ F r e e a l l f i e l d s ∗ / 52 i f ( s p i n G l a s s −> s p i ns != NULL ) g f r e e ( s p i n G l a s s −> s p i n s ) ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 160/183
142 Chapter F. Source Code Listings
53 i f ( s p i n G l a s s −> i n i t i a l S p i n s != NULL ) g f r e e ( s p i n G l a s s −> i n i t i a l S p i n s ) ;
54 i f ( s p i n G l a s s −> w e ig h ts != NULL ) g f r e e ( s p i n G l a s s −> w e i g h t s ) ;
55 i f ( s p i n G l a s s −>c l am p s != NULL ) g f r e e ( s p i n G l a s s −>clam ps ) ;
56
57 g fr e e ( s pi nG la ss ) ;58 }59
60 g d ou b le s p i n g l a s s r o w e n e rg y ( s t r u c t S p i n G l a s s ∗ s p in G la s s , g i n t row ) {61 g i n t i ;
62 g d o u b l e e n e r g y = 0 ;
63
64 g d o u b l e w e i g h t ; / ∗ B on d w e i g h t ∗ / 65 S p i n s p i n 0 , sp in 1 ; / ∗ N e ig h bo u r s p i n s ∗ / 66
67 g i n t x S i z e = s p i n G l a s s −> x S i z e ;
68 g i n t y S i z e = s p i n G l a s s −> y S i z e ;
69 S p i n ∗ s p i n s = s p i n G l a s s −> s p i n s ;
70 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;71
72 / ∗ I t e r a t e t h ro u gh r ow s p i n s ∗ / 73 f o r ( i =0; i < y S i ze ; i ++) {74 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i ) ;
75
76 / ∗ C a l c u l at e h o r i z o n t a l b on d e n e rg y ∗ / 77 w e i g h t = A r r ay A cc e ss 3 D ( w e i g h t s , y S i z e , x S i ze , r ow , i , 1 ) ;
78 i f ( i <y S i z e −1 ) s p in 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i +1) ;
79 e l s e s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , 0 ) ;
80 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;
81
82 / ∗ S e t e n er g y t o MAXFLOAT , i f s p i n0 s t a t e i s u n p e r m i ss i b l e d ue t o c la mp s t a t e
∗ / 83 i f ( s p i n G l a s s −>c l a mp s != NULL) {84 g b o o l e a n cl am p = A r r a y A c c e s s 2 D ( s p i n G l a s s −>c l am p s , y S i z e , r ow , i ) ;
85 i f ( clamp == TRUE && sp in 0 != A r r a y A c c e s s 2 D ( s p i n G l a s s −> i n i t i a l S p i n s , y Si ze ,
r o w , i ) ) {86 e n e r g y = −G MAXDOUBLE;
87 }88 }89 }90
91 r e t u r n −1 ∗ e n e r g y ;
92 }93
94 g do ub le s p i n g l a ss i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p in G la s s , g i n t row ) {95 g i n t i ;
96 g d o u b l e e n e r g y = 0 ;
97
98 g d o u b l e w e i g h t ;
99 S p i n s p i n 0 , sp in 1 ;
100
101 g i n t x S i z e = s p i n G l a s s −> x S i z e ;
102 g i n t y S i z e = s p i n G l a s s −> y S i z e ;
103 S p i n ∗ s p i n s = s p i n G l a s s −> s p i n s ;
104 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 161/183
143
105
106 / ∗ I t e r a t e t h ro u gh row s p in s , a c c um u la t in g e n er g y ∗ / 107 f o r ( i =0; i < y S i ze ; i ++) {108 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , i ) ;
109110 / ∗ C a l c u l at e v e r t i c a l b on d e n er g y ∗ / 111 w e i g h t = A r r ay A cc e ss 3 D ( w e i g h t s , y S i z e , x S i ze , r ow , i , 0 ) ;
112 i f ( r o w<x S i z e −1 ) s p i n1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow+1 , i ) ;
113 e l s e s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , 0 , i ) ;
114 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;
115 }116
117 r e t u r n −1 ∗ e n e r g y ;
118 }119
120 g d ou b le s p i n g l a s s e n s e m b l e d e l t a ( s t r u c t S p i n G l a s s ∗ s p i nG l as s , g i n t l e a d i n gS p i n ) {121 g d o u b l e e n e r g y = 0 ;
122123 S p i n sp i n 0 , s p in 1 ;
124 S p i n ∗ s p i n s = s p i n G l a s s −> s p i n s ;
125 g d o u b l e ∗ w e i g h t s = s p i n G l a s s −> w e i g h t s ;
126
127 g i n t row = l e a d i n g S p i n / s p i n G l a s s −> y S i z e ;
128 g in t column = l e a d i n g S p i n % s p i n G l a s s −> y S i z e ;
129
130 g i n t y S i z e = s p i n G l a s s −> y S i z e ;
131 g i n t x S i z e = s p i n G l a s s −> x S i z e ;
132 g d ou bl e w e i gh t ;
133
134 i f ( r o w > 0 ) {135 / ∗ C a l c u l at e v e r t i c a l c om po ne nt ∗ / 136 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n ) ;
137 s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow −1 , c o l um n ) ;
138 w e i g h t = A r r a y Ac c e s s3 D ( w e i g h t s , y S i z e , x S i z e , r ow −1 , c ol um n , 0 ) ;
139 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;
140 }141
142 i f ( column > 0 ) {143 / ∗ C a l c u l at e h o r i z o n t a l c om po ne nt ∗ / 144 s p i n 0 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n −1 ) ;
145 s p i n 1 = A r r a y Ac c e s s2 D ( s p i n s , y S i z e , r ow , c o lu m n ) ;
146 w e i g h t = A r r a y Ac c e s s3 D ( w e i g h t s , y S i z e , x S i z e , r ow , c o lu m n −1 , 1 ) ;
147 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;
148 }149
150 r e t u r n −1 ∗ e n e r g y ;
151 }152
153 g d ou b le s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {154
155 g d o u b l e e n e r g y = 0 ;
156
157 g i n t i ;
158 / ∗ T o t a l e n er g y i s sum o f r ow s ’ e n e r g i e s a n d row i n t e r a c t i o n s ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 162/183
144 Chapter F. Source Code Listings
159 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {160 e n e r g y += s p i n g l a s s i n t e r r o w e n e r g y ( s p in G l as s , i ) ;
161 e n e r g y += s p i n g l a s s r o w e n e r g y ( s p i n G la s s , i ) ;
162 }
163164 r e t u r n e n e r g y ;
165 }166
167 g d ou b le s p i n g l a s s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f ) {168 g d ou bl e e n er gy ;
169
170 S p i n ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;
171 s p i n G l a s s −> s p i n s = c o n f ;
172 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;
173 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;
174
175 r e t u r n e n e r g y ;
176 }177
178 v o i d s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {179 g i n t i , j ;
180 S p i n s p i n ;
181
182 / ∗ I t e r a t e t hr ou gh s p i ns and f or ma t o u tp u t ∗ / 183 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {184 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {185 s p i n = A r r a y A c c e s s 2 D ( s p i n G l a s s −> s p i n s , s p i n G l as s −>y S iz e , i , j ) ;
186 i f ( s p i n == UP) {187 g f p r i n t f ( f i l e , ”%s ” , STR SPIN UP ) ;
188 } e l s e {189 g f p r i n t f ( f i l e , ”%s ” , STR SPIN DOWN ) ;190 }191
192 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;
193 }194
195 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;
196 }197 }198
199 v o i d s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , S p in ∗ c o n f , F IL E ∗ f i l e ) {200 S p i n ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;
201 s p i n G l a s s −> s p i n s = c o n f ;
202 s p i n g l a s s w r i t e s p i n s ( s pi nG la ss , f i l e ) ;203 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;
204 }205
206 v o i d s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {207 g i n t i , j , k ;
208 g d ou bl e w e i gh t ;
209
210 / ∗ I t e r a t e t h ro u gh w e i g ht s a nd f o rm a t o u t p u t ∗ / 211 f o r ( k =0; k < 2 ; k ++) {212 f o r ( i =0 ; i < s p i n G l a s s −> x S i z e ; i ++ ) {
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 163/183
145
213 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {214 w e i g h t = A r r a y A c c e s s 3 D ( s p i n G l a s s −>w e ig h t s , s p i n G l as s −>y S iz e , s p i n G l as s
−>x Si ze , i , j , k ) ;
215 g f p r i n t f ( f i l e , WEIGHT FMT ” ” , w e i g h t ) ;
216 }217
218 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;
219 }220
221 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;
222 }223 }224
225 v o i d s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , F IL E ∗ f i l e ) {226 g i n t i , j ;
227 g bo ol e an cl am p ;
228
229 / ∗ I t e r a t e t h ro u gh c la mp s a nd f o rm a t o u t p u t ∗ / 230 f o r ( i =0; i < s p i n G l a s s −> x S i z e ; i ++ ) {231 f o r ( j =0 ; j < s p i n G l a s s −> y S i z e ; j ++ ) {232 cl amp = A r r a y A c c e s s 2 D ( s p i n G l a s s −>c l am p s , s p i n G l a s s −>y S iz e , i , j ) ;
233 i f ( clamp ) {234 g f p r i n t f ( f i l e , ”%s ” , STR CLAMPED ) ;
235 } e l s e {236 g f p r i n t f ( f i l e , ”%s ” , STR UNCLAMPED) ;
237 }238
239 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;
240 }241
242 g f p r i n t f ( f i l e , ”%s ” , ” \ n” ) ;243 }244 }245
246 S pi n ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {247 g i n t t o t a l = s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e ;
248 g i n t i ;
249
250 / ∗ A l l o ca t e s p i ns ∗ / 251 S p i n ∗ s p i n s = g new ( Spin , t o t a l ) ;
252
253 / ∗ A s si g n s p i n v a l u e s ∗ / 254 f o r ( i =0; i < t o t a l ; i ++) {255 i f ( s p i n G l a s s −>c l a mp s != NULL && ( sp in Gl as s −>c l a m p s ) [ i ] ) {256 / ∗ C la mp ed s t a t u s ∗ / 257 s p i n s [ i ] = ( s p i n G l a s s −> s p i n s ) [ i ] ;
258 } e l s e {259 / ∗ A s s ig n r an do m s p i n v a l u e s ∗ / 260 g b o o l e a n ran d om Val = r a n d c o i n t o s s ( ) ;
261 i f ( randomVal == TRUE) {262 s p i n s [ i ] = UP ;
263 } e l s e {264 s p i n s [ i ] = DOWN;
265 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 164/183
146 Chapter F. Source Code Listings
266 }267 }268
269 r e t u r n s p i n s ;
270 }271
272 g bo ol ea n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {273 g bo ol ea n h a s Ve r t ic al B o un d a r y = FALSE ;
274
275 g i n t y S i z e = s p i n G l a s s −> y S i z e ;
276 g i n t x S i z e = s p i n G l a s s −> x S i z e ;
277
278 g i n t i ;
279 / ∗ I t e r a t e t hr ou gh s p i ns i n u l t i m a te row , c h ec k in g f o r non− z e ro s p i n v a l u e s ∗ / 280 f o r ( i =0; i < y S i z e && ! h a s V e r t i c a l B o u n d a r y ; i ++ ) {281 g d o u b l e w e i g h t = A r r a y A c c e s s 3 D ( s p i n G l a s s −>w e ig h t s , y S iz e , x S iz e , x S iz e −1 , i ,
0 ) ;
282 i f ( w e i g h t != 0 ) h a s V e r t i c a l B o u n d ar y = TRUE ;283 }284
285 r e t u r n h a s V e r t i c a l B o u n d a r y ;
286 }287
288 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) {289 g i n t i , j , k ;
290 g i n t f i n a l Di s t a n c e = G MAXINT ;
291 g in t d i s t a n c e ;
292
293 f o r ( k =0; k < 2 ; k ++) {294 / ∗ R ep ea t , c om pa ri ng b o th o r i g i n a l a nd i n v e r s e o f s p i nG l a ss 1 ∗ / 295 d i s t a n c e = 0 ;296
297 f o r ( i =0 ; i < s p i n G l a s s 1 −> x S iz e ; i ++ ) {298 f o r ( j =0 ; j < s p i n G l a s s 1 −> y S i z e ; j ++ ) {299 S p i n s p i n 1 = A r r a y A c c e s s 2 D ( s p i n G l a s s 1 −> s p i n s , s p i n G la s s 1 −>y Si ze , i , j )
;
300 S p i n s p i n 2 = A r r a y A c c e s s 2 D ( s p i n G l a s s 2 −> s p i n s , s p i n G la s s 2 −>y Si ze , i , j )
;
301 i f ( k == 0 ) {302 i f ( s p i n 1 != s p i n2 ) d i s t a n c e ++ ;
303 } e l s e {304 i f ( s p i n 1 == s p i n2 ) d i s t a n c e ++ ;
305 }306 }307 }308 i f ( d i s t a n c e < f i n a l D i s t a n ce ) f i n a l D i s t an c e = d i s t a n c e ;
309 }310
311 r e t u r n f i n a l D i s t a n c e ;
312 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 165/183
147
1 / ∗2 ∗ F il e : i o . h
3 ∗4 ∗ S p e c i f i e s IO o p e r a ti o n i n t e r f a c e
5 ∗6 ∗ / 7
8 # i n c l u d e ” s p i n g l a s s . h ”
9
10 # i f n d e f IO H
11 # d e f i n e IO H
12
13 / ∗ For f i l e i n p ut r o u t i n es u s i n g f g e t s ( ) ∗ / 14 # d e f i n e MAX LINE LEN 100 000
15
16 / ∗ Rea d s p i n c o n f i g u r a t i o n f ro m f i l e
17 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g
18 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n19 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n
20 ∗ r e t u rn s v e c to r o f s pi ns , s t or e d i n r ow m ajo r o r de r ∗ / 21 S pi n ∗ r e a d s p i n s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;
22
23 / ∗ R ea d s p i n c l am p in g s t a t e f ro m f i l e
24 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g
25 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n
26 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n
27 ∗ r e t u rn s v e c to r o f s p in c la mp s t a t es , s t o re d i n r ow m aj or o r d er ∗ / 28 g b oo l ea n ∗ r e a d c l a m p s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;
29
30 / ∗ R ea d s p i n b on d c o n f i g u r a t i o n f ro m f i l e
31 ∗ f i l e N a m e ( r e a d ) f i l e n ame f r om w h ic h t o i n i t i a t e r e a d i n g32 ∗ x S i ze ( w r i te ) n umber o f r ow s i n t h e o b ta i ne d c o n f i g u ra t i o n
33 ∗ y S i ze ( w r i te ) n umber o f c ol um ns i n t h e o b t a in e d c o n f i g ur a t i o n
34 ∗ r e t u rn s v e c to r o f s p in bo nd s , s t o re d i n r ow ma jo r o r de r
35 ∗ d at a f o r v e r t i c a l b on ds p r e ce de t h os e f o r h o r iz o n al b on ds ∗ / 36 g d ou b le ∗ r e a d w e i g h t s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) ;
37
38
39 / ∗ W r it e s p i n c o n f i g u r a t i o n t o f i l e
40 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a
41 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 42 v o i d w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;
43
44 / ∗ W r it e s p i n c la mp i ng s t a t e t o f i l e45 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a
46 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 47 v o i d w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;
48
49 / ∗ W r it e s p i n b on d c o n f i g u r a t i o n t o f i l e
50 ∗ s pi n G la s s ( r e ad ) d at a s t r u c t u r e s t o r i n g s p i n g l a s s d at a
51 ∗ f il eN am e ( r e ad ) f i l e name t o w r it e d at a t o ∗ / 52 v o i d w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) ;
53
54 # e n d i f / ∗ IO H ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 166/183
148 Chapter F. Source Code Listings
1 / ∗2 ∗ F i l e : i o . c
3 ∗4 ∗ I m pl e me n ts IO o p e r a t i o n s s p e c i f i e d i n i o . h
5 ∗6 ∗ / 7
8 # i n c l u d e < s t d l i b . h>
9 # i n c l u d e < s t d i o . h>
10 # i n c l u d e < s t r i n g . h>
11 # i n c l u d e < g l i b . h>
12 # i n c l u d e < g l i b / g p r i n t f . h>
13
14 # i n c l u d e ” s p i n g l a s s . h ”
15 # i n c l u d e ” io . h”
16
17 / ∗ P ar se s a f i l e , a dd in g t o ke n s t o a q ue ue
18 ∗ f i l e Na m e ( r e ad ) f i l e name t o r e ad f ro m19 ∗ x S i ze ( w r i te ) number o f t ok e n r o ws c o nt a in e d i n t h e f i l e
20 ∗ y S i ze ( w r i te ) number o f t ok e n c ol um ns c o nt a in e d on t h e f i l e
21 ∗ r e t u rn s q ue ue c o n ta i n in g p ar se d t o ke n s ∗ / 22 s t a t i c GQueue ∗ p a r s e f i l e ( g c h a r ∗ f il e Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {23 g i n t nRows = 0 ;
24 g i n t n C o l s = 0 ;
25 g i n t n Co lC he ck = 0 ;
26
27 GQueue ∗ t o k e n Q u e u e = g q u e u e n e w ( ) ;
28
29 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” r ” ) ;
30 g c ha r l i n e [ MAX LINE LEN+1 ] ;
31 i f ( f i l e != NULL) {32
33 / ∗ Read l i n e s u n t i l en d o f f i l e , p r oc e ss i f non z er o l e n gt h ∗ / 34 w h i l e (NULL != f g e t s ( l i n e , MAX LINE LEN , f i l e ) ) {35 i f ( s t r l e n ( l i n e ) > 0 && l i n e [ 0 ] != ’ \n ’ ) {36 g c h a r ∗ t o k e n ;
37 nRows++ ;
38
39 n C ol Ch ec k = 0 ;
40 / ∗ T o k en i se l i n e s ∗ / 41 t o k e n = s t r t o k ( l i n e , ” \ t \n” ) ;
42 w h i l e ( t o k e n != NULL) {43 g c h a r ∗ tokenMem = g m a l l o c ( s t r l e n ( t o k e n ) + 1 ) ;
44 s t r c p y ( tokenMem , t o k e n ) ;45
46 n C ol Ch eck ++ ;
47
48 / ∗ Add t o k e n t o q ue u e∗ / 49 g q u e u e p u s h t a i l ( t o k e n Qu e u e , tokenMem ) ;
50 t o k e n = s t r t o k ( NULL , ” \ t \n” ) ;
51 }52
53 / ∗ C he ck f o r m a t ch i ng ro w l e n g t h s ∗ / 54 i f ( n C o l s == 0 ) n C ol s = nColCheck ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 167/183
149
55 i f ( n C o l C h e c k != n C o l s ) {56 g f p r i n t f ( s t d e r r , ” E r r o r : The i n p u t d a t a m a t r i x d o e s n o t c o n t a i n
r ow s o f e q u al l e n g t h s . \ n” ) ;
57 e x i t ( −1 ) ;
58 }59 }60 }61 } e l s e {62 g f p r i n t f ( s t d e r r , ” An e r r o r o cc ur re d w hi le o p en i ng t he f i l e %s . \ n ” , f i l e N a m e ) ;
63 e x i t ( −1 ) ;
64 }65
66 f c lo s e ( f i l e ) ;
67
68 ∗ x S i z e = nRows ;
69 ∗ y S i z e = n C o l s ;
70
71 r e t u r n t o k e n Q u e u e ;72 }73
74 S pi n ∗ r e a d s p i n s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {75 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 76 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l eN a m e , x S i ze , y S i z e ) ;
77 S p i n ∗ s p i n s = g n e w ( S p i n , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;
78
79 i n t i =0 ;
80
81 / ∗ P r o ce s s t o k e n s ∗ / 82 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {83 / ∗ G et t o k e n ∗ / 84 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;85
86 / ∗ C he ck w h et h er s t r i n g s a ss um e e x p e ct e d v a l u e s ∗ / 87 i f ( s t r c m p ( t o k e n , S TR S P IN U P ) ==0) {88 s p i n s [ i ] = UP ;
89 } e l s e i f ( st rc mp ( tok en , STR SPIN DOWN) ==0) {90 s p i n s [ i ] = DOWN;
91 } e l s e {92 g f p r i n t f ( s td er r , ” E r r o r : Un r e c o g n i s e d sp i n da t a . \ n” ) ;
93 e x i t ( −1 ) ;
94 }95
96 g f r e e ( t o k e n ) ;
97 i ++ ;98 }99
100 g q u eu e f re e ( t ok en Qu eu e ) ;
101 r e t u r n s p i n s ;
102 }103
104
105 v o i d w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {106 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 107
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 168/183
150 Chapter F. Source Code Listings
108 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;
109
110 i f ( f i l e != NULL) {111 s p i n g l a s s w r i t e s p i n s ( s p in Gl as s , f i l e ) ;
112 } e l s e {113 g f p r i n t f ( s t de r r , ”An e r r o r o c c ur r e d w hi le o pe ni ng t he f i l e %s . ” , f il eN am e ) ;
114 }115
116 f c l os e ( f i l e ) ;
117 }118
119 g b o o le a n ∗ r e a d c l a m p s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {120 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 121 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l eN a m e , x S i ze , y S i z e ) ;
122 g b o o l e a n ∗ c l a m p s = g n e w ( g b o o l e a n , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;
123
124 i n t i =0;
125126 / ∗ P r o ce s s t o k e n s ∗ / 127 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {128 / ∗ G et t o k e n ∗ / 129 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;
130
131 / ∗ C he ck w h et h er s t r i n g s a ss um e e x p e ct e d v a l u e s ∗ / 132 i f ( str cm p ( to ken , STR CLAMPED) ==0) {133 c l a m p s [ i ] = TRUE ;
134 } e l s e i f ( st rc mp ( to ke n , STR UNCLAMPED) ==0) {135 c l a m p s [ i ] = FALSE ;
136 } e l s e {137 g f p r i n t f ( s td er r , ” E r r o r : Un r e c o g n i s e d sp i n d a t a . \ n” ) ;
138 e x i t ( −1 ) ;139 }140
141 g f r e e ( t o k e n ) ;
142 i ++ ;
143 }144
145 g q u eu e f re e ( t ok en Qu eu e ) ;
146 r e t u r n c l a m p s ;
147 }148
149 v o i d w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {150 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 151 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;152
153 i f ( f i l e != NULL) {154 s p i n g l as s w r i t e c la m p s ( s pi nG la ss , f i l e ) ;
155 } e l s e {156 g f p r i n t f ( s t de r r , ”An e r r o r o c c ur r e d w hi le o pe ni ng t he f i l e %s . ” , f il eN am e ) ;
157 }158
159 f c l os e ( f i l e ) ;
160 }161
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 169/183
151
162 g d o ub l e ∗ r e a d w e i g h t s ( g c h a r ∗ f i le Na m e , g i n t ∗ x Si ze , g i n t ∗ y S i z e ) {163 g i n t nRows , n Co ls ;
164 g i n t i =0 ;
165
166 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ / 167 GQueue ∗ t o k e n Q u e u e = p a r s e f i l e ( f i l e N a m e , &n Ro ws , & n C o l s ) ;
168 g d o u b l e ∗ w e i g h t s = g n e w ( g d o u b l e , ( n R ow s∗ nCo ls ) ) ;
169
170 / ∗ A cc ou nt f o r v e r t i c a l and h o r i z o n ta l w e ig h ts s t o re d i n f i l e ∗ / 171 ∗ x S i z e = nRows / 2 ;
172 ∗ y S i z e = n C o l s ;
173
174 / ∗ S i mp l e c h ec k f o r m at ch i ng v e r t i c a l / h o r i z o n t a l b on d n u mb er s ∗ / 175 i f ( n R o w s % 2 == 1 ) {176 g f p r i n t f ( s t de r r , ” Odd number o f d at a r ows d e te c te d when r e a di n g bond f i l e .
S h o ul d b e e v en . \ n” ) ;
177 e x i t ( −1 ) ;
178 }179
180 / ∗ P r o ce s s t o k e n s ∗ / 181 w h i l e ( g q u e u e g e t l e n g t h ( t o k en Q u eu e ) > 0 ) {182 / ∗ G et t o k e n ∗ / 183 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( t o k e n Q u e u e ) ;
184 g d o u b l e w e i g h t V a l = 0 ;
185
186 / ∗ C o n ve r t t o d o u b le ∗ / 187 i f ( s s c a n f ( t o k e n , WEIGHT FMT , & w e i g h t V a l ) != 1 ) {188 g f p r i n t f ( s td er r , ” E r r o r : U n r e c o g n i s e d bond da t a . \ n” ) ;
189 e x i t ( −1 ) ;
190 }191192 w e i g h t s [ i ++ ] = w e i g h t V a l ;
193
194 g f r e e ( t o k e n ) ;
195 }196
197 g q u eu e f re e ( t ok en Qu eu e ) ;
198 r e t u r n w e i g h t s ;
199 }200
201 v o i d w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l as s , g c h a r ∗ f i l e N a m e ) {202 / ∗ Open f i l e , d e l eg a t e t o s p in g l a s s m odu le ∗ / 203 FILE ∗ f i l e = f o p e n ( f i l e N a m e , ” w” ) ;
204205 i f ( f i l e != NULL) {206 s p in g l a s s w r i t e w e ig h t s ( s pi nG la ss , f i l e ) ;
207 } e l s e {208 g f p r i n t f ( s t de r r , ”An e r r o r o c c ur r e d w hi le o pe ni ng t he f i l e %s . ” , f il eN am e ) ;
209 }210
211 f c l os e ( f i l e ) ;
212 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 170/183
152 Chapter F. Source Code Listings
1 / ∗2 ∗ F i l e : a r ra y s . h
3 ∗4 ∗ S p e c i f i e s a r r ay o p er a ti o n i n t e r f a c e
5 ∗ an d d e f i n e s m ac ro s f o r a r ra y o p e r a t i o n s6 ∗7 ∗ / 8
9 # i n c l u d e < g l i b . h>
10
11 # i f n d e f ARRAYS H
12 # d e f i n e ARRAYS H
13
14 / ∗ E m u l a t es t wo −d i m e n s i o n a l a r r a y a c c e s s
15 ∗ a r r a y p o i n t er t o d a t a
16 ∗ i , j a r r a y i n d i c e s ∗ / 17 # d e f i n e A r ra y Ac c es s 2D ( a r r a y , r o w l e n g t h , i , j ) ( ( a r r a y ) [ ( i ) ∗ ( r o w l e n g t h ) + ( j ) ] )
1819 / ∗ E m u la t es t h r e e −d i m en s i on a l a r ra y a c c es s
20 ∗ a r r a y p o i n t er t o d a t a
21 ∗ i , j , k a rr ay i n d i ce s ∗ / 22 # d e f i n e A r ra y Ac c es s 3D ( a r r a y , r o w l e n g t h , c o l u m n l en g t h , i , j , k ) ( ( a r r a y ) [ ( c o l u m n l e n g t h )
∗ ( r o w l e n g t h ) ∗ ( k ) + ( i ) ∗ r o w l e n g t h + ( j ) ] )
23
24
25 / ∗ A rr ay d a ta t y p e s ∗ / 26 t y p e d e f g u i n t 6 4 t i n t ;
27 t y p e d e f g d ou b le t d o u b l e ;
28
29 / ∗ C o n s t r u c t t wo −d i m en s i on a l a r ra y . Da ta c o n t i g u i t y i s e n su r ed
30 ∗ nRows number o f r o ws31 ∗ n Co ls number o f c ol um ns
32 ∗ r e tu r ns p o i n t er t o a l l o c a te d d a t a ∗ / 33 t i n t ∗∗ a r r a y n e w 2 D ( t i n t n Ro ws , t i n t n C ol u mn s ) ;
34
35 / ∗ D e s t r u c t t wo −d i m en s i o na l a r ra y p r e v i o u s l y a l l o c a t e d w i t h a rr ay ne w 2D ( )
36 ∗ a r r a y t h e a r r a y to d e s t r u c t ∗ / 37 v o i d a r r a y f r e e 2 D ( t i n t ∗∗ a r r a y ) ;
38
39 / ∗ C o n s t r u c t t wo −d i m en s i on a l a r ra y . Da ta c o n t i g u i t y i s e n su r ed
40 ∗ nRows number o f r o ws
41 ∗ n Co ls number o f c ol um ns
42 ∗ nZ s i z e o f t h ir d d i m e n s i o n
43 ∗ r e tu r ns p o i n t er t o a l l o c a te d d a t a ∗ / 44 t d o u b l e ∗∗∗ a r r a y n e w 3 D ( t i n t nZ , t i n t n Ro ws , t i n t n C ol u mn s ) ;
45
46
47 / ∗ D e s t ru c t t h re e −d i m en s i on a l a r ra y p r e v i o u s l y a l l o c a t e d w i th a rr ay n ew 3 D ( )
48 ∗ a r r a y t h e a r r a y to d e s t r u c t ∗ / 49 v o i d a r r a y f r e e 3 D ( t d o u b l e ∗∗ ∗ a r r a y ) ;
50
51 i n t a r r a y u t e s t ( v o i d ) ;
52
53 # e n d i f / ∗ ARRAYS H ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 171/183
153
1 / ∗2 ∗ F i l e : a r ra y s . c
3 ∗4 ∗ I m pl e me n ts a r ra y o p e r a ti o n i n t e r f a c e s p e c i f i e d i n a r r ay s . h
5 ∗6 ∗ / 7
8 # i n c l u d e < g l i b . h>
9 # i n c l u d e < s t d i o . h>
10
11 # i n c l u d e ” a r r a y s . h ”
12
13 t i n t ∗∗ a r r a y n e w 2 D ( t i n t n Ro ws , t i n t n C ol u mn s ) {14 g i n t i ;
15
16 / ∗ A l l o ca t e p o i n t er b l oc k ∗ / 17 t i n t ∗∗ a r r a y = g m a l l o c ( n R ow s ∗ s i z e o f ( t i n t ∗ ) ) ;
18 / ∗ A l l o c a t e d a ta b l o ck ∗ / 19 a r r a y [ 0 ] = g m a l l o c ( n R ow s ∗ nColumns ∗ s i z e o f ( t i n t ) ) ;
20 / ∗ A s s i g n d a t a o f f s e t s ∗ / 21 f o r ( i =1 ; i <n Ro ws ; i ++) a r r a y [ i ] = a r r a y [ 0 ] + i ∗ nColumns ;
22
23 r e t u r n a r r a y ;
24 }25
26 v o i d a r r a y f r e e 2 D ( t i n t ∗∗ a r r a y ) {27 g f r e e ( a rr ay [ 0 ] ) ;
28 g fr e e ( a rr ay ) ;
29 }30
31 t d o u b l e ∗∗∗ a r r a y n e w 3 D ( t i n t nZ , t i n t n Ro ws , t i n t n C ol u mn s ) {32 g i n t i ;
33
34 / ∗ A l l o ca t e p o i n t er b l oc k ∗ / 35 t d o u b l e ∗∗∗ a r r a y = g m a l l o c ( n Z ∗ s i z e o f ( t d o u b l e ∗∗ ) ) ;
36 / ∗ A l l o c a t e s e co n ds p o i n t e r b l o ck ∗ / 37 a r r a y [ 0 ] = g m a l l o c ( n Z ∗ nRows ∗ s i z e o f ( t d o u b l e ∗ ) ) ;
38 / ∗ A l l o c a t e d a ta b l o ck ∗ / 39 a r r a y [ 0 ] [ 0 ] = g m a l l o c ( n Z ∗ nRows ∗ nColumns ∗ s i z e o f ( t d o u b l e ) ) ;
40
41 / ∗ A s si g n d a ta b l o ck ∗ / 42 f o r ( i =0; i <nZ ; i ++ ) a r r a y [ i ] = a r r a y [ 0 ] + nRows∗ i ;
43 / ∗ A s si g n d a ta b l o ck ∗ / 44 f o r ( i =0 ; i <nZ ∗ n Ro ws ; i ++) ( ∗ a r r a y ) [ i ] = ( ∗ a r r a y ) [ 0 ] + i ∗ nColumns ;45
46 r e t u r n a r r a y ;
47 }48
49 v o i d a r r a y f r e e 3 D ( t d o u b l e ∗∗ ∗ a r r a y ) {50
51 g f r e e ( a rr ay [ 0 ] [ 0 ] ) ;
52 g f r e e ( a rr ay [ 0 ] ) ;
53 g fr e e ( a rr ay ) ;
54 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 172/183
154 Chapter F. Source Code Listings
55
56 i n t a r r a y u t e s t ( v o i d ) {57
58 g in t i , j , k ;
59 t i n t ∗∗ a r r a y = a r r a y n e w 2 D ( 1 0 , 1 0 ) ;60 t d o u b l e ∗∗∗ a r r a y 2 = a r r a y n e w 3 D ( 5 , 3 2 , 3 2 ) ;
61
62 f o r ( i =0 ; i < 1 0; i ++ ) {63 f o r ( j =0 ; j < 1 0; j ++ ) a r r a y [ i ] [ j ] = i ∗ 10 + j ;
64 }65 f o r ( i =0 ; i < 1 0; i ++ ) {66 f o r ( j =0 ; j < 1 0; j ++ ) g a s s e r t ( a r r a y [ i ] [ j ] == i ∗10+ j ) ;
67 }68
69 a r r a y f r e e 2 D ( a r r a y ) ;
70
71 f o r ( i =0; i < 5 ; i ++) {
72 f o r ( j =0 ; j < 3 2; j ++ ) {73 f o r ( k =0 ; k < 3 2; k ++) {74 a r r a y 2 [ i ] [ k ] [ j ] = i ∗ 1024 + k ∗ 32 + j ;
75 g a s s e r t ( a r r a y 2 [ i ] [ k ] [ j ] == i ∗ 1024 + k ∗ 32 + j ) ;
76 }77 }78 }79
80 f o r ( i =0 ; i < 5 ; i ++) {81 f o r ( j =0 ; j < 3 2; j ++ ) {82 f o r ( k =0 ; k < 3 2; k ++) {83 g a s s e r t ( a r r a y 2 [ i ] [ k ] [ j ] == i ∗ 1024 + k ∗ 32 + j ) ;
84 }85 }86 }87
88 a r r a y fr e e 3D ( a rr ay 2 ) ;
89
90 r e t u r n 0 ;
91 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 173/183
155
1 / ∗2 ∗ F i l e : r an do m . h
3 ∗4 ∗ D e f in e s i n t e r f a c e f o r ra ndo m n um be r g e n e r at i o n
5 ∗6 ∗ / 7
8 # i n c l u d e < g l i b . h>
9
10 / ∗ G e n er a te c o n t i n u o u s l y d i s t r i b u t e d r an do m d o u b le i n t h e r a ng e [ l ow e r , u p pe r )
11 ∗ l o w e r l o w e r l i m i t
12 ∗ u p p e r u p p e r l i m i t ∗ / 13 g d o ub l e r a n d c o n t i n u o u s ( g d o u bl e l ow er , g d o ub l e u p p er ) ;
14
15 / ∗ G en e ra t e e q u a l l y d i s t r i b u t e d ran dom b o ol e an
16 ∗ / 17 g b oo l ea n r a n d c o i n t o s s ( ) ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 174/183
156 Chapter F. Source Code Listings
1 / ∗2 ∗ F i l e : r an do m . c
3 ∗4 ∗ I m p le m e nt s i n t e r f a c e f o r r an do m nu mb er g e n e r a t i o n
5 ∗6 ∗ / 7
8 # i n c l u d e < s t d i o . h>
9 # i n c l u d e < g l i b . h>
10 # i n c l u d e ”random . h”
11
12 g d o ub l e r a n d c o n t i n u o u s ( g d o u bl e l ow er , g d o ub l e u p p er ) {13 r e t u r n g r a n d o m d o u b l e r a n g e ( l o w er , u p p e r ) ;
14 }15
16 g bo ol ea n r a n d c o i n t o s s ( ) {17 g b o o l e a n v a l u e = g r a n d o m b o o l e a n ( ) ;
18 r e t u r n v a l u e ;19 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 175/183
157
1 / ∗2 ∗ F il e : b f o r c e g s t a t e f i n d e r . c
3 ∗4 ∗ I m pl e me n ts b r u t e f o r c e g ro un d s t a t e f i n d e r
5 ∗6 ∗ / 7
8 # i n c l u d e < g l i b . h>
9 # i n c l u d e < g l i b / g p r i n t f . h>
10 # i n c l u d e < s t d i o . h>
11 # i n c l u d e ” s p i n g l a s s . h ”
12 # i n c l u d e ” g s t a t e f i n d e r . h”
13
14 s t a t i c v oi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l ea d in g Sp i n , g do ub le ∗ minEnergy ,
s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
15
16 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {
17 g i n t n S p i n s = s p i n G l a s s −> x S i z e ∗ s p i n G l a s s −> y S i z e ;18 g d ou bl e m in E n er g y = G MAXDOUBLE;
19
20 / ∗ I n i t i a t e b r u t e f o r c e e v a l u a t i o n ∗ / 21 f i n d g r o u n d s t a t e s b r u t e f o r c e ( n Sp in s , &minE nerg y , s p in G la s s ) ;
22
23 r e t u r n minEnergy ;
24 }25
26 / ∗ R e cu r si v e b r ut e f o r ce g ro un d s t a t e e v a l ua t i o n
27 ∗ l e a di n g Sp i n s p in ‘ window ’ p o s it i o n , u se d t o s p e c i f y s t a t e t o be f l i p p e d . Used t o
e v a l u a t e b as e c a se
28 ∗ m in En er gy ( r ea d / w r i t e ) R ec or ds c u r r e n t minimum e n e rg y . Fo r ea ch i n v o c a t i o n o f
t h e f u n ct i on , s t a t e s a re o u tp u t i f t h e i r e ne rg y i s l ow er t ha n t h e v a l u ec u r r e n t ly h el d b y t h i s v a r i a bl e
29 ∗ s pi nG la ss ( r e a d / w r i te ) s p in g l a s s d a ta s t r u c t u r e who se s p i n s a re m a ni pu la te d
d u r i n g s e a r ch ∗ / 30 s t a t i c v oi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l ea d in g Sp i n , g do ub le ∗ minEnergy ,
s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {31 / ∗ B a se c a s e ∗ / 32 i f ( l e a d i n g S p i n == 0 ) {33 / ∗ C o mp u te e n e r g y ∗ / 34 g d o u b l e e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;
35
36 i f ( e n e r g y < ∗ minEnerg y ) {37 ∗ m i n E n e r g y = e n e r g y ;
38 }39
40 i f ( e n e r g y == ∗ minEnergy ) {41 g p r i n t f ( ” \ n L e a f n o de w i t h e n e r g y %E\ n ” , e n e r g y ) ;
42 g p r i n t f ( ” I s c u r r e n t g r o u n d s t a t e \n” ) ;
43 s p i n g l a ss w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;
44 }45
46 } e l s e {47 / ∗ C o mp u te e n e r g y ∗ / 48 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , m in E ne r gy , s p i n G l a s s ) ;
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 176/183
158 Chapter F. Source Code Listings
49 / ∗ F l i p s p i n down ∗ / 50 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −1] ∗= DOWN;
51 / ∗ C o mp u te e n e r g y ∗ / 52 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , m in E ne r gy , s p i n G l a s s ) ;
53 }54 }
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 177/183
159
1 / ∗2 ∗ F i l e : g s t a t e f i n d e r . h
3 ∗4 ∗ S p e c i f i e s i n t e r f a c e f o r g r o u n d s t a t e s o l v e r s
5 ∗ / 6
7 # i n c l u d e ” s p i n g l a s s . h ”
8
9 # i f n d e f GSTATEFINDER H
10 # d e f i n e GSTATEFINDER H
11
12 / ∗ D et er mi ne g ro un d s t a t e s o f s p i n g l a s s
13 ∗ s pi n G la s s ( r e ad ) t he s p i n g l a s s t o e v a l u a te ∗ / 14 g do ub le f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;
15
16 # e n d i f / ∗ GSTATEFINDER H ∗ /
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 178/183
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 179/183
Bibliography
[1] The GLib library. http: // library.gnome.org / devel / glib / , 2008. Accessed 2 July, 2008.
[2] The Ness user guide. http: // www2.epcc.ed.ac.uk / ness / documentation / index.html, 2008.
Accessed 2 July, 2008.
[3] User’s guide to the HPCx service.
http: // www.hpcx.ac.uk / support / documentation / UserGuide / HPCxuser / HPCxuser.html,
2008. Accessed 2 July, 2008.
[4] D.J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.
Physical Review A, 32(2):1007–1018, 1985.
[5] D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation using
the transputer network architecture. Advances in genetic programming: volume 2 table of
contents, pages 317–337, 1996.
[6] F. Barahona. On the computational complexity of Ising spin glass models. J. Phys. A:
Math. Gen, 15(10):3241–3253, 1982.
[7] F. Barahona, M. Grotschel, M. Junger, and G. Reinelt. An application of combinato-
rial optimization to statistical physics and circuit layout design. Operations Research,
36(3):493–513, 1988.
[8] R.J. Baxter. Exactly solved models in statistical mechanics. Academic Press, London;
Tokyo, 1982.
[9] R. Bellman. Dynamic Programming. Science, 153(3731):34–37, 1966.
[10] I. Bieche, R. Maynard, R. Rammal, and JP Uhry. On the ground states of the frustration
model of a spin glass by a matching method of graph theory. J. Phys. A: Math. Gen,
13:2553–2576, 1980.
[11] S.G. Brush. History of the Lenz-Ising Model. Rev. Mod. Phys., 39(4):883–893, Oct 1967.
161
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 180/183
162 BIBLIOGRAPHY
[12] M. Campanino, E. Olivieri, and A.C.D. van Enter. One dimensional spin glasses with po-
tential decay 1/r 1+g. Absence of phase transitions and cluster properties. Communications
in Mathematical Physics, 108(2):241–255, 1987.
[13] Lynn Elliot Cannon. A cellular computer to implement the kalman filter algorithm. PhD
thesis, Montana State University, Bozeman, MT, USA, 1969.
[14] E. Cantu-Paz. A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux
et Systems Repartis, 10(2):141–171, 1998.
[15] A. Carter. Finite-size scaling studies of Ising spin glasses. PhD thesis, Department of
Physics and Astronomy, University of Manchester, 2003.
[16] B.A. Cipra. The Ising Model Is NP-Complete. SIAM News, 33(6), 2000.
[17] D. de Fontaine and J. Kulik. Application of the ANNNI model to long-period superstruc-
tures. ACTA METALLURG., 33(2):145–165, 1985.
[18] J. Dıaz, A. Gibbons, G.E. Pantziou, M.J. Serna, P.G. Spirakis, and J. Toran. Parallel
algorithms for the minimum cut and the minimum length tree layout problems. Theoretical
Computer Science, 181(2):267–287, 1997.
[19] H.Q. Ding. Monte Carlo simulations of Quantum systems on massively parallel computers.
Proceedings of the 1993 ACM / IEEE conference on Supercomputing, pages 34–43, 1993.
[20] P.A.M. Dirac. On the Theory of Quantum Mechanics. Proceedings of the Royal Society
of London. Series A, Containing Papers of a Mathematical and Physical Character (1905-
1934), 112(762):661–677, 1926.
[21] B. Drossel and MA Moore. The ± J spin glass in Migdal-Kadanoff approximation. The
European Physical Journal B Condensed Matter , 2001.
[22] S.F. Edwards and P.W. Anderson. Theory of spin glasses. Journal of Physics F: Metal
Physics, 5(5):965–974, 1975.
[23] A.N. Ermilov, A.N. Kireev, and A.M. Kurbatov. Investigation of models of spin glass with
arbitrary distributions of the coupling constants. Theoretical and Mathematical Physics,
49(3):1071–1076, December 1981.
[24] Chochia et al. IBM High Performance Switch on System p5 575 Server - Performance.
http: // www-03.ibm.com / systems / p / hardware / whitepapers / 575 hpc perf.html, 2008. Ac-
cessed 2 July, 2008.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 181/183
BIBLIOGRAPHY 163
[25] R. Forsati, M. Mahdavi, M. Kangavari, and B. Safarkhani. Web page clustering using Har-
mony Search optimization. Electrical and Computer Engineering, 2008. CCECE 2008.
Canadian Conference on, pages 001601–001604, 2008.
[26] M. Gabay and G. Toulouse. Coexistence of Spin-Glass and Ferromagnetic Orderings.
Physical Review Letters, 47(3):201–204, 1981.
[27] Z.W. Geem, J.H. Kim, et al. A New Heuristic Optimization Algorithm: Harmony Search.
SIMULATION , 76(2):60, 2001.
[28] F. Glover and G.A. Kochenberger. Handbook of Metaheuristics. Springer, 2003.
[29] C.D. Godsil, M. Grotschel, and D.J.A. Welsh. Combinatorics in statistical physics. Hand-
book of combinatorics (vol. 2) table of contents, pages 1925–1954, 1996.
[30] A. Grama, V. Kumar, A. Gupta, and G. Karypis. Introduction to Parallel Computing:
Design and Analysis of Algorithms. Addison-Wesley, 2003.
[31] D.J. Griffiths. Introduction to Quantum Mechanics. Prentice Hall, 1995.
[32] U. Gropengiesser. The ground state energy of the ± J spin glass. A comparison of vari-
ous biologically motivated algorithms. Journal of Statistical Physics, 79(5-6):1005–1012,
1995.
[33] M.F. Guest. Communications Benchmarks on High-End and Commodity-Class Com-
puters. http: // www.cse.scitech.ac.uk / disco / Benchmarks / pmb.2004 / index.htm, 2008. Ac-
cessed 2 July, 2008.
[34] F. Hadlock. Finding a maximum cut of a planar graph in polynomial time. SIAM Journal
on Computing, 4(3):221–225, 1975.
[35] R.W. Hamming. Error Detecting and Error Correcting Codes. Computer Arithmetic, II ,
29(2):147–160, 1990.
[36] A.K. Hartmann. Scaling of stiff ness energy for three-dimensional ± J Ising spin glasses.
Physical Review E , 59(1):84–87, 1999.
[37] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applica-
tions. Biometrika, 57(1):97–109, 1970.
[38] W. Heisenberg. Mehrk orperproblem und Resonanz in der Quantenmechanik. Zeitschrift
f¨ ur Physik , 38(6):411–426, 1926.
[39] P.C. Hemmer, H. Holden, and S.K. Ratkje. The collected works of Lars Onsager: with
commentary. World Scientific, Singapore; River Edge, NJ, 1996.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 182/183
164 BIBLIOGRAPHY
[40] G. Hempel, G. Blaschke, and KF Pal. The ground state energy of the Edwards-Anderson
Ising spin glass with a hybrid genetic algorithm. Physica A, 223(3):283–292, 1996.
[41] J. Houdayer and O.C. Martin. Hierarchical approach for computing spin glass ground
states. Physical Review E , 64(5):56704, 2001.
[42] H. Kawamura. Chiral ordering in Heisenberg spin glasses in two and three dimensions.
Physical Review Letters, 68(25):3785–3788, 1992.
[43] J.H. Kim, Z.W. Geem, and E.S. Kim. Parameter estimation of the nonlinear Muskingum
model using harmony search. Journal of the American Water Resources Association,
37(5):1131–1138, 2001.
[44] S. Kirkpatrick, CD Gelati Jr, and MP Vecchi. Optimization by Simulated Annealing.
Biology and Computation: A Physicist’s Choice, 1994.
[45] K.S. Lee and Z.W. Geem. A new structural optimization method based on the harmony
search algorithm. Computers and Structures, 82(9-10):781–798, 2004.
[46] F. Liers, M. Junger, G. Reinelt, and G. Rinaldi. Computing Exact Ground States of Hard
Ising Spin Glass Problems by Branch-and-Cut. New Optimization Algorithms in Physics,
June 2005.
[47] B.M. McCoy and T.T. Wu. The two-dimensional Ising model. Harvard University Press,
Cambridge, Mass., 1973.
[48] S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Springer-Verlag
London, 1993.
[49] M. Mezard, G. Parisi, and M.A. Virasoro. Spin glass theory and beyond . World Scientific
Teaneck, NJ, USA, 1987.
[50] T.M. Mitchell. Machine learning. McGraw-Hill, 1997.
[51] D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli. Convergence and Finite-Time Be-havior of Simulated Annealing. Advances in Applied Probability, 18(3):747–771, 1986.
[52] C.M. Newman and D.L. Stein. Blocking and Persistence in the Zero-Temperature Dynam-
ics of Homogeneous and Disordered Ising Models. Physical Review Letters, 82(20):3944–
3947, 1999.
[53] G. Pardella and F. Liers. Exact Ground States of Huge Two-Dimensional Planar Ising Spin
Glasses. Arxiv preprint arXiv:0801.3143, 2008.
8/3/2019 Peter Alexander Foster
http://slidepdf.com/reader/full/peter-alexander-foster 183/183
BIBLIOGRAPHY 165
[54] G. Parisi. Infinite Number of Order Parameters for Spin-Glasses. Physical Review Letters,
43(23):1754–1756, 1979.
[55] D.J. Ram, TH Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algo-
rithms. Journal of Parallel and Distributed Computing, 37(2):207–212, 1996.
[56] J. Randa. Axial next-nearest-neighbor Ising (ANNNI) and extended-ANNNI models in
external fields. Physical Review Letters, 32(1):413–416, 1985.
[57] W. Selke. The ANNNI model-Theoretical analysis and experimental application. Physics
Reports, 170(4):213–264, 1988.
[58] D. Sherrington and S. Kirkpatrick. Solvable Model of a Spin-Glass. Physical Review
Letters, 35(26):1792–1796, 1975.
[59] P. Sutton, DL Hunter, and N. Jan. Short Communication The ground state energy of the
± J spin glass from the genetic algorithm. J. Phys. I France, 4:1281–1285, 1994.
[60] D.J. Thouless, P.W. Anderson, and R.G. Palmer. Solution of ’Solvable model of a spin
l ’ Phil hi l M i 35(3) 593 601 1977