Testing and analysis of a computational model of human rhythm perception

244
Open Research Online The Open University’s repository of research publications and other research outputs Testing and Analysis of a Computational Model of Human Rhythm Perception Thesis How to cite: Angelis, Vassilis (2014). Testing and Analysis of a Computational Model of Human Rhythm Perception. PhD thesis The Open University. For guidance on citations see FAQs . c 2014 Vassilis Angelis Version: Version of Record Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk

Transcript of Testing and analysis of a computational model of human rhythm perception

Page 1: Testing and analysis of a computational model of human rhythm perception

Open Research OnlineThe Open University’s repository of research publicationsand other research outputs

Testing and Analysis of a Computational Model ofHuman Rhythm PerceptionThesisHow to cite:

Angelis, Vassilis (2014). Testing and Analysis of a Computational Model of Human Rhythm Perception. PhDthesis The Open University.

For guidance on citations see FAQs.

c© 2014 Vassilis Angelis

Version: Version of Record

Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyrightowners. For more information on Open Research Online’s data policy on reuse of materials please consult the policiespage.

oro.open.ac.uk

Page 2: Testing and analysis of a computational model of human rhythm perception

T E S T I N G A N D A N A LY S I S O F AC O M P U TAT I O N A L M O D E L O F H U M A N

R H Y T H M P E R C E P T I O N

vassilis angelis

A thesis submitted toThe Open University, UK

for the degree of

Doctor of Philosophy

Computing Department

Faculty of Mathematics, Computing and Technology

The Open University, UK

Supervisors: Simon Holland, Martin Clayton, Paul J. Upton

August 2013 –

Page 3: Testing and analysis of a computational model of human rhythm perception

Vassilis Angelis: Testing and analysis of a computational model of human rhythmperception , © August 2013

Page 4: Testing and analysis of a computational model of human rhythm perception

Dedicated to all my family

Page 5: Testing and analysis of a computational model of human rhythm perception
Page 6: Testing and analysis of a computational model of human rhythm perception

A B S T R A C T

This thesis presents an original methodology, as detailed below, appliedto the testing of an existing computational model of human rhythmperception. Since the computational model instantiates neural resonancetheory (Large and Snyder, 2009), the thesis also tests that theory. Neuralresonance theory is a key target for testing since, as contrasted withmany other theories of human rhythm perception, it has relatively strongphysiological plausibility. Rather than simply matching overt featuresof human rhythm perception, neural resonance theory shows how thesefeatures might plausibly emerge from low-level properties of interactingneurons.

The thesis tests the theory using several distinct research methods. Themodel stood up well to each family of tests, subject to limitations that areanalysed in detail.

Firstly, the responses of the model to several types of polyrhythmicstimuli were compared with existing empirical data on human responsesregarding beat identification to the same stimuli, at a variety of tempi.Polyrhythmic stimuli closely resemble real life complex rhythmical stimulisuch as music, and were used for the first time to test the model. It wasfound that the set of categories of response predicted by the model matchedhuman behaviour.

Secondly, the model was systematically analysed by exploring the degreeof dependence of its behaviour on the values of its parameters (sensitivityanalysis). The behaviour of the model was found to retain consistency inthe face of systematic numerical manipulation of its parameters.

Thirdly, the behaviour of the model was compared to that of relatedmodels. In particular, the focal computational model, which balancesphysiological plausibility with mathematical convenience, was comparedwith other models that relate more directly to brain physiology. In eachcase, all relevant behaviours were found to be closely in line.

Lastly, the outputs of the model under polyrhythmic stimuli wereanalysed to make new testable predictions about previously unobservedhuman behaviour regarding the time it takes for people to perceive beatin polyrhythms. These predictions led to the design and conductionof new human experimental studies. It was found that the model hadsuccessfully predicted previously unobserved aspects of human behaviour,more specifically it predicted the timescale within which people start toperceive beat in a given polyrhythmic stimulus.

v

Page 7: Testing and analysis of a computational model of human rhythm perception
Page 8: Testing and analysis of a computational model of human rhythm perception

P U B L I C AT I O N S

Chapter 6 and parts of Chapter 5 are based on the following journal paper.

Angelis, V., Holland, S., Upton, J. P., and Clayton, M. (2013). Testing acomputational model of rhythm perception using polyrhythmic stimuli, InJournal of New Music Research, 42(1): 47-60.

vii

Page 9: Testing and analysis of a computational model of human rhythm perception
Page 10: Testing and analysis of a computational model of human rhythm perception

. . . for being so inspiringthat you get our neurons firing

and spontaneously re-wiringOU, we owe you

— Open University 40th Anniversary Poemby Matt Harvey

A C K N O W L E D G M E N T S

Big thank you to my supervisors Martin Clayton, Paul J. Upton, and SimonHolland for the sheer support to me from the very beginning to the veryend, and for helping me to realise the most out of my doctoral studies.Always indebted.

Thank you to Agota Solyomi for coping and bearing with me all this timeand for studying together with me over the weekends. Santiago Ramon yCajal would have been very pleased to know you.

Thanks to Marian Petre for setting me up in the thesis writing-up orbit,for listening to me and for sharing her insights with all of us. We have beenfortunate to have you around. Thanks to fellow PhD students; thank youAndy Milne for keeping up the good work and for bringing the brocotstogether, thank you Jean Barbier for really caring about things, thank youAlexandra Pawlik for your companionship. Thanks to Shirley, Jacky andAnn from the Music Department for helping me out with administrationissues.

Thanks to my participants for their interest in participating to my studies.Thank you to Edward Large and other peers for their advice on publicationsrelated to this work. Thank you to my examiners Prof Uwe Grimm, andProf Eduardo Miranda whose work on brain-computer music interfacinghas certainly set the current work in motion from as early as 2002.

ix

Page 11: Testing and analysis of a computational model of human rhythm perception
Page 12: Testing and analysis of a computational model of human rhythm perception

Contents

1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 human rhythm perception . . . . . . . . . . . . . . . . . 9

2.1 The temporal structure of stimuli . . . . . . . . . . . . . . 10

2.1.1 Metricality . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2 Temporal nuances . . . . . . . . . . . . . . . . . . . . 14

2.2 Beat perception . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Beat as mental pulsation . . . . . . . . . . . . . . . . 15

2.2.2 Multiplicity of beat perception . . . . . . . . . . . . . 16

2.2.3 Dynamic perception of beat . . . . . . . . . . . . . . 17

2.3 Meter perception . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Polyrhythm perception . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 Polyrhythmic stimuli . . . . . . . . . . . . . . . . . . 21

2.4.2 Human tapping behaviours in polyrhythm

perception . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 computational models of beat and meter

perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Internal Clock Models . . . . . . . . . . . . . . . . . . . . . 28

3.2 Probabilistic models . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Oscillator Models . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Filter Array Models . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Issues of Physiological Plausibility . . . . . . . . . . . . . . 36

3.6 Types of input encoding: symbolic and/or acoustic . . . . 38

xi

Page 13: Testing and analysis of a computational model of human rhythm perception

xii contents

3.7 Causal and non-causal models . . . . . . . . . . . . . . . . 39

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 neural resonance model . . . . . . . . . . . . . . . . . . . 41

4.1 Physiological plausibility of the neural resonance model . 42

4.2 Historical background in modelling a neural oscillator . . 46

4.3 Mathematical derivation of the neural resonance model . 48

4.3.1 Mathematical derivation of a neural oscillator . . . . 50

4.3.2 Canonical derivation of a neural oscillator . . . . . . 54

4.4 Properties of neural oscillator . . . . . . . . . . . . . . . . 58

4.4.1 Spontaneous oscillation . . . . . . . . . . . . . . . . . 58

4.4.2 Entrainment . . . . . . . . . . . . . . . . . . . . . . . 60

4.4.3 Higher Order Resonance . . . . . . . . . . . . . . . . 60

4.5 Simulation of rhythm perception with the neural resonance

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.1 The methodology of validation . . . . . . . . . . . . . . . . 70

5.2 Simulations (Tests) . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.1 Input encoding . . . . . . . . . . . . . . . . . . . . . . 72

5.2.2 Frequency range of the bank of oscillators . . . . . . 74

5.2.3 Number of oscillators . . . . . . . . . . . . . . . . . . 77

5.2.4 Duration of simulation . . . . . . . . . . . . . . . . . 78

5.2.5 Connectivity and number of networks . . . . . . . . 80

5.2.6 Numerical solvers and nominal values of parameters 81

5.3 Experimental method . . . . . . . . . . . . . . . . . . . . . 82

5.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . 83

5.3.2 Polyrhythmic stimulus and presentation rates . . . . 83

5.3.3 Experimental task . . . . . . . . . . . . . . . . . . . . 84

5.3.4 Stimulus presentation . . . . . . . . . . . . . . . . . . 85

Page 14: Testing and analysis of a computational model of human rhythm perception

contents xiii

5.3.5 Method of comparing modelled with empirical data 85

5.3.6 Justification of selected method of comparison . . . 87

6 validation of the model against existing empirical

data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.1 Preparing to compare human with model behaviour . . . 90

6.2 Simulations with isochronous polyrhythmic stimuli . . . . 93

6.2.1 Simulation with a 4:3 polyrhythmic stimulus repeating

once per second . . . . . . . . . . . . . . . . . . . . . 93

6.2.2 Discussion on simulation . . . . . . . . . . . . . . . . 96

6.2.3 Simulations using the 4:3 polyrhythm at different

tempi . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.2.4 Simulation with a 3:2 polyrhythmic stimulus repeating

once per second . . . . . . . . . . . . . . . . . . . . . 107

6.2.5 Simulation with a 2:5 polyrhythmic stimulus repeating

once per second . . . . . . . . . . . . . . . . . . . . . 110

6.2.6 Simulation with a 3:5 polyrhythmic stimulus repeating

once per second . . . . . . . . . . . . . . . . . . . . . 112

6.2.7 Simulation with a 4:5 polyrhythmic stimulus repeating

once per second . . . . . . . . . . . . . . . . . . . . . 113

6.3 Generalising the frequency response pattern . . . . . . . . 114

6.4 Simulations with non-isochronous polyrhythmic stimuli . 115

6.4.1 Preparation of polyrhythmic stimulus comprising

non-isochronous pulse trains . . . . . . . . . . . . . . 117

6.4.2 Stimulating the model . . . . . . . . . . . . . . . . . 119

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7 model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.1 Parameter sensitivity analysis . . . . . . . . . . . . . . . . . 126

Page 15: Testing and analysis of a computational model of human rhythm perception

xiv contents

7.1.1 Initial conditions of each oscillator: amplitude and

phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.1.2 Driving variable parameters: external polyrhythmic

stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.1.3 State variable parameters . . . . . . . . . . . . . . . . 134

7.2 Cross-model comparison with the Wilson-Cowan model . 148

7.2.1 Frequency response pattern of the Wilson-Cowan model

to a 4:3 polyrhythmic stimulus with 1 Hz repetition

frequency. . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.2.2 Frequency detuning in the neural resonance model 151

7.2.3 Superimposed frequency response patterns for both the

neural resonance and the Wilson-Cowan model . . . 152

7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8 model predictions about human behaviour . . . . . 155

8.1 Predictions to extend the categories of human tapping

behaviours associated with polyrhythm perception . . . . 156

8.2 Prediction of the timescale people start to tap to

polyrhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.2.1 Measuring relaxation time . . . . . . . . . . . . . . . 158

8.2.2 Modifying the original hypothesis . . . . . . . . . . 160

8.2.3 Method limitations in evaluating the original

hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 161

8.3 Experimental studies in human perception of polyrhythms 161

8.3.1 Experimental design and participants . . . . . . . . 162

8.3.2 Data management and visualisation . . . . . . . . . 162

8.3.3 Raw data . . . . . . . . . . . . . . . . . . . . . . . . . 163

8.3.4 Identifying the category of tapping behaviour: an

example . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Page 16: Testing and analysis of a computational model of human rhythm perception

contents xv

8.3.5 Creating a dataset for each combination, i.e.,

polyrhythm type and tempo . . . . . . . . . . . . . . 169

8.3.6 Visualisation of data using boxplots . . . . . . . . . 170

8.3.7 Empirical results of tapping along with a 4:3 polyrhythm

for different tempi . . . . . . . . . . . . . . . . . . . . 173

8.4 Evaluation of the model’s predictions . . . . . . . . . . . . 178

8.4.1 Extension of categories of tapping behaviours . . . . 178

8.4.2 Evaluating oscillator relaxation time . . . . . . . . . 179

8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

9 conclusions , limitations and further work . . . . . 193

9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9.1.1 Validation against existing empirical data . . . . . . 195

9.1.2 Evaluation of model’s predictions . . . . . . . . . . . 196

9.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

9.3 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.4 Final Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 204

bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Page 17: Testing and analysis of a computational model of human rhythm perception

List of Figures

Figure 2.1 Binary-branched representation of a metrical

structure. 17

Figure 2.2 Categories of periodic tapping behaviours along with a

4:3 polyrhythm. 23

Figure 4.1 The anatomy of a neuron 44

Figure 4.2 Schematic representation of the neural oscillator. 45

Figure 4.3 Schematic representation of the neural oscillator. 49

Figure 4.4 A limit cycle of a neural oscillator exhibiting

self-sustained oscillation (spontaneous

oscillation). 52

Figure 4.5 Trajectories of spontaneous and damped

oscillations. 59

Figure 4.6 An example of a neural resonance model that consists

of an array of oscillators with frequencies from 0.5Hz -

8.0 Hz. 61

Figure 4.7 Average frequency responses of the neural resonance

model in the presence of a series of metronomic

clicks. 64

Figure 5.1 Amplitude response of oscillators to a 4:3 polyrhythmic

stimulus. 79

Figure 6.1 Time series representation of a 4:3 polyrhythmic

stimulus. 94

xvi

Page 18: Testing and analysis of a computational model of human rhythm perception

List of Figures xvii

Figure 6.2 Frequency response of the neural resonance model in

the presence of a 4:3 polyrhythm. 96

Figure 6.3 Response of a series of linear oscillators in the presence

of a 4:3 polyrhythmic stimulus that repeats once per

second. The polyrhythm consists of two pulses 4 Hz

and 3 Hz. 98

Figure 6.4 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 2400 msec. 100

Figure 6.5 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 2000 msec. 101

Figure 6.6 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 1800 msec. 101

Figure 6.7 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 1600 msec. 102

Figure 6.8 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 1400 msec. 102

Figure 6.9 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 1200 msec. 103

Figure 6.10 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 800 msec. 104

Figure 6.11 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 600 msec. 104

Figure 6.12 Frequency response of the model in the presence of a

4:3 polyrhythm repeating once every 400 msec. 105

Figure 6.13 Frequency response of the model in the presence of a

3:2 polyrhythm repeating once every 1000 msec. 110

Figure 6.14 Frequency response of the model in the presence of a

2:5 polyrhythm repeating once every 1000 msec. 111

Page 19: Testing and analysis of a computational model of human rhythm perception

xviii List of Figures

Figure 6.15 Frequency response of the model in the presence of a

3:5 polyrhythm repeating once every 1000 msec. 112

Figure 6.16 Frequency response of the model in the presence of a

4:5 polyrhythm repeating once every 1000 msec. 114

Figure 6.17 Time series representation of a 4:3 polyrhythm

performed by a human at 60 bpm. 118

Figure 6.18 Frequency response of the model in the presence

of a 4:3 polyrhythm performed by a human at 60

bpm. 120

Figure 6.19 Time series representation of a 4:3 polyrhythm

performed by a human at 60 bpm with no metronomic

guidance. 122

Figure 6.20 Frequency response of the model in the presence of a

4:3 polyrhythm performed by a human at 60 bpm with

no metronomic guidance. 123

Figure 7.1 Amplitude response of the neural resonance model

over a period of 12.5 seconds where the initial

amplitude for all oscillators is 0.7. 129

Figure 7.2 Amplitude response of the neural resonance model

over a period of 12.5 seconds where the initial

amplitude for all oscillators is close to 0. 130

Figure 7.3 Superimposed frequency response patterns for

different initial amplitude of the oscillators 0.5 (blue)

and near 0 (green). 130

Figure 7.4 Superimposed frequency response patterns of the

neural resonance model for the three different ways of

encoding a polyrhythmic stimulus. 132

Page 20: Testing and analysis of a computational model of human rhythm perception

List of Figures xix

Figure 7.5 Superimposed frequency response patterns for

different amplitudes of a 4:3 polyrhythmic pattern with

1 Hz repetition frequency. 133

Figure 7.6 Frequency response patterns for (α = 0) (blue), and

(α = −0.1) (green). 141

Figure 7.7 Frequency response patterns corresponding to

sensitivity analysis for bifurcation parameter (α)

and nonlinearity parameter (ε). 143

Figure 7.8 Frequency response patterns for (α > 0) (blue) and

(α = 0) (green). 144

Figure 7.9 Frequency response patterns corresponding to

sensitivity analysis for saturation parameter (β)

and nonlinearity parameter (ε). 147

Figure 7.10 Frequency response patterns of the Wilson-Cowan

model in the presence of a 4:3 polyrhythmic stimulus

with 1 Hz repetition frequency for a range of different

amplitudes of the stimulus. 151

Figure 7.11 Frequency response pattern of the neural resonance

model in the presence of a 4:3 polyrhythmic stimulus

with 1 Hz repetition frequency for a negative value of

the frequency detuning parameter (δ). 152

Figure 7.12 Superimposed frequency response patterns of

Wilson-Cowan and neural resonance model. 153

Figure 8.1 Amplitude responses of the bank of oscillators in the

presence of a 4:3 polyrhythmic stimulus with 1 Hz

repetition frequency. 160

Figure 8.2 Excerpt of pooled raw empirical data 164

Page 21: Testing and analysis of a computational model of human rhythm perception

xx List of Figures

Figure 8.3 Timing data for successive taps in the presence of a

4:3 polyrhythmic stimulus repeating once every 1000

msec. 166

Figure 8.4 An example of a dismissed trial. 167

Figure 8.5 Empirical tapping data 169

Figure 8.6 Multiple boxplot summarizing the categories of

tapping and the range of time it takes for people to start

tapping along with the given 4:3 polyrhythm repeating

once every 1000 msec. 171

Figure 8.7 Descriptive statistics for the 4:3 polyrhythm repeating

once per second. 172

Figure 8.8 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 2500 msec. 173

Figure 8.9 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 2000 msec. 174

Figure 8.10 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 1800 msec. 174

Figure 8.11 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 1600 msec. 175

Page 22: Testing and analysis of a computational model of human rhythm perception

List of Figures xxi

Figure 8.12 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 1400 msec. 175

Figure 8.13 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 1200 msec. 176

Figure 8.14 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 800 msec. 176

Figure 8.15 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 600 msec. 177

Figure 8.16 Multiple boxplots for the range of tapping modes and

their corresponding period of time within which people

start tapping along with a 4:3 polyrhythm repeating

once every 400 msec. 177

Figure 8.17 Temporal development of resonances among the bank

of oscillators in the presence of a 4:3 polyrhythm

repeating once per second. 185

Figure 8.18 The period of time it takes for people to start tapping

in a given mode along with a 4:3 polyrhythm repeating

once every 1000 msec. 186

Page 23: Testing and analysis of a computational model of human rhythm perception

Figure 8.19 The modes of tapping and their corresponding periods

of time (boxplots) within which people start tapping

along with a 3:2 polyrhythm repeating once every 1000

msec. 188

Figure 8.20 Temporal development of resonances in the network in

the presence of a 3:2 polyrhythm repeating once every

1000 msec. 188

Figure 8.21 The modes of tapping and their corresponding periods

of time (boxplots) within which people start tapping

along with a 2:5 polyrhythm repeating once every 1000

msec. 189

Figure 8.22 Temporal development of resonances in the network in

the presence of a 5:2 polyrhythm repeating once every

1000 msec. 189

Figure 9.1 Frequency response of a network of interconnected

oscillators in the presence of a 4:3 polyrhythmic

stimulus of 1 Hz repetition frequency. 202

List of Tables

Table 2.1 Metrical hierarchy representation 21

Table 2.2 Tapping rates along with different polyrhythms 25

xxii

Page 24: Testing and analysis of a computational model of human rhythm perception

acronyms xxiii

Table 5.1 Potential periodic human tapping behaviours along

with a 4:3 polyrhythm expressed in frequencies for a

series of repetition rates. 76

Table 6.1 Periodic tapping behaviours along with a 4:3

polyrhythmic stimulus expressed in terms of

frequencies, when the repetition frequency of the

polyrhythmic pattern is 1 Hz. 92

Table 6.2 A generic frequency response pattern of the neural

resonance model in the presence of a two-pulse train

polyrhythm. 114

Table 7.1 Range of numerical values for parameters (α), (β) and

(ε). Values are calculated based on a ±10% uniform

variation around nominal values. 139

Table 7.2 Combinations of different runs for parameters (α) and

(ε) considering both single and multiple sensitivity

analysis. 142

Table 7.3 Combinations of runs regarding parameters (β) and

(ε) considering both single and multiple parameter

analysis. 146

Page 25: Testing and analysis of a computational model of human rhythm perception
Page 26: Testing and analysis of a computational model of human rhythm perception

1I N T R O D U C T I O N

People dancing, musicians playing music together and concertgoers

tapping their feet while watching a band perform are all examples of

behaviours which are mediated through the perception of musical rhythm.

Various theories have been suggested to account for human rhythm

perception, some informed by musicological analysis, others in which

rhythm perception is approached as an engineering task, and yet more

where the concern is to understand the physiological underpinnings that

give rise to the perception of rhythm. The perception of rhythm is a

fundamental catalyst for the creation and appreciation of the art of music.

Research shows that listening to music triggers the generation of emotions

and also, that engagement with rhythmic music coordinates brain activity

which can both play a profound role as a therapeutic medium for various

psychological and biological diseases associated with brain malfunction

(Thaut et al., 2005). Understanding the physiological underpinnings of

rhythm perception therefore can shed light to the above areas and promote

human well-being in a non-invasive way. The theory of neural resonance

has relatively strong physiological plausibility and rather than simply

matching overt features of human rhythm perception, it shows how these

features might plausibly emerge from low-level properties of neurons. As a

result, the theory of neural resonance can be used as a medium to promote

our understanding in how the brain perceives rhythm and inform the

design of non-invasive ways of promoting healthy brain functionality.

1

Page 27: Testing and analysis of a computational model of human rhythm perception

2 1 introduction

In this thesis we focus principally on one theory of rhythm perception,

namely the theory of neural resonance (Large and Snyder, 2009) according

to which key aspects of musical experience [such as rhythm perception]

can be explained directly in terms of brain dynamics (Large, 2008). The

physiological plausibility of the neural resonance theory, in combination

with its scope and competency (which are described in detail later) make

it an excellent candidate theory of rhythm perception. The aim of this

thesis is to evaluate the theory of neural resonance by examining the

extent to which the theory can provide an account for human perception

of a complex class of rhythms known as polyrhythms. Thus, the research

question in general terms can be stated as follows.

To what extent can the theory of neural resonance provide an account for

human perception of polyrhythms?

This question can be broken down into two halves.

Question 1: To what extent can the theory of neural resonance provide an

account for existing empirical data on how humans perceive polyrhythms?

Question 2: Can we make predictions about human behaviour in

polyrhythm perception based on the theory of neural resonance and

subsequently test these predictions?

In order to address the above research questions, a four-fold set of

research objectives will be outlined below. But first of all, to help illuminate

these objectives, we will quickly sketch the bare bones of neural resonance

theory. Briefly, neural resonance theory suggests that there are a number

Page 28: Testing and analysis of a computational model of human rhythm perception

3

of different populations of neurons in the brain where each population

behaves like an oscillator each with its own natural frequency. In the

presence of an external rhythmic stimulus some of these populations react

to the stimulus and a pattern of multiple responses is established. The

theory proposes that this pattern is responsible for giving rise to the

perception of rhythm. The neural resonance theory may be instantiated

as a computational model using a bank of oscillators, where each oscillator

simulates a single neural population. The particular computational model

used in this thesis (Large et al., 2010) (described and justified in detail in

Chapter 4) will from now onwards be referred to as the ’neural resonance

model’.

Various tests have been already carried out on the model (Large, 2010), to

provide empirical support for the neural resonance theory with respect to

key aspects of musical experience including aspects of not just rhythm, but

also tonality perception. The objective of this thesis goes beyond existing

tests of the theory by conducting a series of trials that examine the extent

to which the neural resonance theory can account for aspects of human

polyrhythm perception. Thus the thesis deals with a class of rhythms that

have not been previously studied in the context of the theory and the model.

Polyrhythms refer to a particular subset of a class of rhythms where

at least two different rhythms are present in parallel. In its simplest

instantiation a polyrhythm can be created by superimposing at least two

different streams of pulses where each one has a different periodicity.

In existing experimental studies carried out by music psychologists,

polyrhythms are considered an advantageous rhythmic stimulus in

comparison to other simple rhythms (Handel, 1984) because they balance

one of the more complex forms of rhythmical structure to be encountered

in authentic musical pieces with ease of experimental control. Using

Page 29: Testing and analysis of a computational model of human rhythm perception

4 1 introduction

polyrhythms to test the neural resonance model inherits similar advantages:

it provides an opportunity to use stimuli that encompass some of the

complexity of authentic musical contexts, while allowing carefully graded

comparisons to be arranged.

Additionally, the use of polyrhythms to test the neural resonance model

is particularly advantageous because it bypasses limitations of the model

(or any comparable model) in dealing with melodic or harmonic elements

of a stimulus. The multiple rhythms of a polyrhythm afford an alternative

way to represent the full range of conflicting rhythmical cues in real stimuli

conveyed by several physical characteristics of the stimulus such as pitch.

The objective of the thesis can now be presented broken down into four

sub-objectives:

1) Conduct tests to examine whether the neural resonance model can

provide an account for existing empirical data on how humans perceive

polyrhythms.

2) Conduct model analysis to limit the sources of uncertainty with regard

to the physiological plausibility of the assumptions of the theory/model

and to examine the dependency of the model on the numerical values of

its parameters.

3) Use the neural resonance model to make predictions about human

behaviour in polyrhythms previously unobserved.

4) Conduct novel experimental studies in human polyrhythm perception

in order to evaluate the predictions made in 3.

Page 30: Testing and analysis of a computational model of human rhythm perception

5

Briefly, Chapter 2 contains background information on the phenomenon

of human rhythm perception with an emphasis on fundamental aspects

to be accounted for in modelling attempts. Chapter 3 reviews a range

of computational models of rhythm perception that exist in the literature.

Chapter 4 is devoted to introducing the neural resonance model in detail.

Chapter 5 discusses the general methodological position taken in this thesis

and it describes the methods adopted in Chapters 6 and 8. Chapters 6-8

contain original contributions. Chapter 6 validates the neural resonance

model against existing empirical data, Chapter 7 focuses on analysing the

neural resonance model and Chapter 8 evaluates specific predictions about

human behaviour made by the model. Chapter 9 is devoted to conclusions,

limitations and future work.

More specifically, Chapter 2 discusses the role of the structure of an

external stimulus in eliciting rhythm perception, as well as the challenges it

imposes in modelling attempts. Polyrhythmic stimuli are introduced as

the main type of stimulus used in this thesis. Also, it introduces beat

and meter perception as the main aspects of human rhythm perception

in order to highlight particular characteristics of the phenomenon that

any theory or model of rhythm perception should address. Chapter 3

reviews a range of computational models of rhythm perception, with

an emphasis on underlining some of the modeling dimensions, such

as different computational methods and scope. Chapter 4 begins by

discussing the physiological plausibility of the neural resonance model,

it then introduces its mathematical formulation and level of abstraction,

and reviews studies whereby the model is used to simulate human rhythm

perception. Chapter 5 is concerned with the methodology and methods of

this thesis. It begins by briefly discussing the notion of validation in order to

indicate appropriate ways of interpreting the results obtained by the studies

Page 31: Testing and analysis of a computational model of human rhythm perception

6 1 introduction

of this thesis. It continues by describing the method of setting up the

neural resonance model for running simulations to obtain modelled data.

The method of conducting experimental studies in human polyrhythm

perception is described following predictions about human behaviour made

by the model. The chapter finishes by explaining and justifying the adopted

method of comparison between modelled and empirical data in this thesis,

which is employed in Chapters 6 and 8.

Chapter 6 focuses on answering the first half of the research question

based on evidence provided by utilising sub-objective 1 in the above list. In

particular, the chapter is concerned with the outcome of a series of tests in

the model using polyrhythmic stimuli and how these outcomes compare

with existing empirical data. The outcome of the above comparisons is

the type of evidence used to address the first half of the research question,

which is to examine the extent to which the theory of neural resonance can

provide an account for existing empirical data on how humans perceive

polyrhythms.

In particular, the type of human behaviour that the model aims to account

for is the ways people tap along with a polyrhythm in a periodic manner,

a behaviour similar to that of concertgoers tapping their foot along with a

performance. The study is conducted on the basis of five different types of

polyrhythms each one presented at ten different tempi, and empirical data

are borrowed from the study of Handel and Oshinsky (1981) in human

polyrhythm perception. Lastly, the outcome of the simulations suggests

that the way the neural resonance model responds in the presence of

a two-rhythm polyrhythm can be generalised, which may be used as a

reference to inform and update the general consensus on how humans

perceive polyrhythms.

Page 32: Testing and analysis of a computational model of human rhythm perception

7

Chapter 7 is concerned with tackling sources of uncertainty arising from

the modeling assumptions of the neural resonance model and the nominal

numerical values given in the parameters of the model (sub-objective 2).

In particular the method of parameter sensitivity analysis is employed

with regard to examining the sensitivity of the model’s behaviour

given a particular set of numerical values. Uncertainty with regard to

biological assumptions is tackled by employing the method of cross-model

comparison, whereby the behaviour of the neural resonance model is

compared with the behaviour of another widely accepted biological model

regarding neural dynamics, namely the Wilson-Cowan model (Wilson and

Cowan, 1972). The neural resonance model is consistent with the way it

responds regardless the systematic alteration in its numerical values and

also, it produces similar responses to the Wilson-Cowan, thus the results

presented in Chapter 6 and 8 can be considered robust. To our best

knowledge this is the first study in which parameter sensitivity analysis

is applied to the neural resonance model.

Chapter 8 introduces predictions made by the neural resonance model

about human behaviour related to polyrhythm perception. In particular,

the behaviour concerned here is the time it takes for people to

start tapping along with a polyrhythm. These predictions drive the

design of novel experimental studies in human polyrhythm perception.

Briefly, 37 participants were recruited to take part in the study. The

experimental variables monitored the time it takes for people to start

tapping to polyrhythms, and the mode of tapping, i.e., tapping frequency.

The empirical results were subsequently compared with the model’s

predictions. The model manages to successfully predict the time it takes for

people to start tapping up to a certain degree. Additionally, the empirical

data have shown new ways of how people tap (or perceive beat) along

Page 33: Testing and analysis of a computational model of human rhythm perception

8 1 introduction

with polyrhythmic stimuli. The time it takes for people to start tapping

along with a range of polyrhythms is considered an experimental variable,

which according to our best knowledge is monitored for the first time in the

literature. Chapter 8 contains the type of evidence defined by sub-objectives

3 and 4 and thus, it is used to answer the second half of the research

question, which examines whether the model can make predictions about

aspects of behaviour related to human perception of polyrhythms.

Page 34: Testing and analysis of a computational model of human rhythm perception

2H U M A N R H Y T H M P E R C E P T I O N

A fundamental aspect of the phenomenon of human rhythm perception is

the perceptual identification of periodicity in the presence of a rhythmic

stimulus. There are many everyday behavioural manifestations of this

phenomenon, for example when audiences clap their hands in time with

a song. In fact, as we will see later, one of the principal methodologies

for the experimental investigation of rhythm perception is to ask people

to tap periodically along with rhythmic stimuli. Rhythmic periodicity can

be characterised in terms of beat and meter, the two principal perceptual

aspects of rhythm perception. Beat and meter are briefly introduced below,

before a more comprehensive treatment in Sections 2.2 and 2.3.

An intuitive way of thinking about beat perception is to think of it as the

feeling of an ongoing, steady, and unaccented pulsation in the presence of a

rhythmic stimulus. Meter perception corresponds to the feeling of a regular

pattern of strong pulses interposed by a series of weak pulses. Research

in the area of beat and meter perception employs empirical methods to

illuminate various facets of both processes. In this chapter we describe

fundamental aspects of both beat and meter perception that any theory of

rhythm perception should be able to account for.

In Section 2.1 we begin by discussing the role of the structure of stimuli

in eliciting rhythm perception. That is, beat and meter perception arise

in the presence of stimuli that exhibit certain objectively measurable

structures. Section 2.1 introduces these structural characteristics of

9

Page 35: Testing and analysis of a computational model of human rhythm perception

10 2 human rhythm perception

rhythmic stimuli before we start analysing the accompanying perceptual

phenomena. Sections 2.2 and 2.3 duly move on to consider fundamental

aspects of beat and meter perception. Section 2.4 is concerned with

polyrhythms, the particular class of rhythmic stimuli considered in this

thesis. Section 2.5 summarises the chapter.

2.1 the temporal structure of stimuli

The way humans identify periodicity in the context of listening to stimuli

depends on their temporal structure. More specifically, the temporal

structure of stimuli determines whether a rhythm can be perceived or not,

and constrains the various ways beat and meter may be perceived. In other

words, the temporal structure of a stimulus can be thought of as a temporal

matrix that defines the points in time where a stimulus event has to occur.

For the purposes of this thesis, we refer to this structure as metrical because

of its organised form and we provide a detailed description of that form in

Subsection 2.1.1. In Subsection 2.1.2 we consider some possible nuances

that may be present in a metrically organized structure in order to describe

more precisely the actual way rhythmic information is conveyed to and

perceived by humans. Before we continue we should note that metricality

refers to objectively measurable structural characteristics and dissociate it

with the act of perceiving meter, which is the result of listening to some

stimulus with a metrical structure. In this section we deal with the first

while meter perception is addressed in Section 2.3.

Page 36: Testing and analysis of a computational model of human rhythm perception

2.1 The temporal structure of stimuli 11

2.1.1 Metricality

The main characteristic of the organisation of a metrical structure is the

co-existence of multiple levels of periodicities that comply with particular

constraints (disclosed below) (Patel et al., 2005). The above set of different

periodicities are related to each other in such way that over time the

metrical structure behaves like a repeating pattern (London, 2004). A

metric pattern can first be labeled on the total number of elements or

time points it involves. The time points are those points in time that

define the periodicities. The aforementioned total number of time points

usually relates to the smallest period present in the structure, and it is

the total number of points encountered in one repetition of the metric

pattern. London refers to this label as the N-cycle of the metric pattern

where N is the number of points in the cycle. Given that all other periods

in the structure can be expressed as multiples of the smallest one, we can

then express all periods as an (N/n)-cycle where (n) is the number of

times we have to multiply the smallest period to get another one. For

example a (N/2)-cycle corresponds to a period twice the length of the

smallest period and it has (N/2) points. According to London (2004)

cyclical representations of metrical structure follow five well-formedness

constraints that arise due to specific psychological and cognitive limits

in humans. The constraints listed below correspond to the underlying

constrains a metrical structure complies with.

• WFC 1: The inter-onset intervals (IOIs) between the time points on to the

N-cycle (and all sub-cycles) must be categorically equivalent. That is,

they must be nominally isochronous and must be at least 100 msec

approximately.

Page 37: Testing and analysis of a computational model of human rhythm perception

12 2 human rhythm perception

• WFC 2: Each cycle - the N-cycle and all subcycles - must be continuous,

that is, they must form a closed loop.

• WFC 3: The N-cycle and all subcycles must begin and end at the same

temporal location, that is they must all be in phase.

• WFC 4: The N-cycle and all subcycles must all span the same amount

of time, that is, all cumulative periods must be equivalent. The

maximum span for any cycle may not be greater than 5 seconds

approximately.

• WFC 5: Each subcycle must connect nonadjacent time points on the next

lowest cycle. For example, a level with a period defined by a quarter

note has a period twice as long in comparison to a period defined by

an eighth note in another level. That is every quarter note connects

two nonadjacent eighth notes.

An example of a stimulus that complies with the above constraints is a

drumbeat whereby a drummer plays 16th notes on the hi-hat and quarter

notes on the bass drum at the same time. There are two different periods

defined by the durations of the 16th and the quarter notes. The relationship

between 16th notes and quarter notes remains invariant over time that is

four 16th notes are repeated for every quarter note. Also, the onset of the

quarter note coincides (at least approximately) with the onset of the first

16th note.

For the purposes of this thesis we will revisit WFC 5 in order to

explore the types of ratios between periods from different levels that can

be encountered in a metrical stimulus. It is useful to note that some

authors have used ’metrical’ and ’polyrhythmic’ as mutually exclusive

terms, but in line with other authors, and for the purposes of this thesis,

we will consider polyrhythm to be simply a special case of metrical rhythm.

Page 38: Testing and analysis of a computational model of human rhythm perception

2.1 The temporal structure of stimuli 13

Amongst other things, this has the considerable benefit that we can refer

to London’s well-formedness constraints while discussing polyrhythms.

Polyrhythms are described in detail in the next section. In order to

distinguish polyrhythms from metrical rhythms that are not polyrhythmic,

we need to revisit WFC 5 and discuss its implications regarding the possible

ratios between two periods from different levels.

WFC 5 implies that two periods from two different levels relate to each

other by standing in a ratio such as 2:1, 3:1, 4:1, etc. For example, a 4:1

ratio implies that a period of one level is 4 times longer than the period

of another level. This matches the earlier example of a drummer playing

16th notes on the hi-hat on top of quarter notes on the bass drum. Another

example would be three triplet eighth notes (quavers) on top of one quarter

note, representing a 3:1 ratio.

Trivially, any ratio between two integers may be interpreted as defining a

rational number. So for example the ratios 2:1, 3:1, 4:1 may be interpreted

as defining the rational numbers 2/1, 3/1, 4/1 respectively. Clearly, these

fractions happen to reduce to the integers 2, 3 and 4. However, there are

other ratios commonly found between periods in rhythms that correspond

to rational numbers that cannot be reduced to integers, for example 2:3 and

4:3 ratios. However, such ratios in no way hinder the perception of beat or

meter and for that reason we consider them as valid ratios that can be found

in a metrical structure. For example, the duration of a triplet quarter note

is 2:3 of the duration of a standard quarter note, i.e., three triplet quarter

notes have the same duration as two standard quarter notes. Ratios of this

kind are very common in polyrhythmic stimuli, as discussed in detail in

Section 2.3.

Page 39: Testing and analysis of a computational model of human rhythm perception

14 2 human rhythm perception

2.1.2 Temporal nuances

The notion of periodicity discussed above does not need to imply strict

isochronicity. WFC 1 states that successive periods in one level should

be categorically equivalent and nominally isochronous. That is, a series

of events associated with one level of periodicity does not need to occur

absolutely periodically. For example, the periods defined by the 16th

notes played by a drummer will deviate from being perfectly isochronous,

however they are considered to be categorically equivalent. Furthermore, in

musical performances musicians employ purposeful temporal fluctuations

in their playing in order to convey musical meaning. The above type

of deviation from perfect periodicity in the underlying structure of the

stimulus is also known as expressive variation - what musicians might call

"pushing and pulling the time" (London, 2004, p.28). Another temporal

nuance in the structure of a stimulus can be observed in complex musical

rhythms whereby not all points in time - where an event onset is expected

to occur - are actually employed by a stimulus event. For example, in

syncopated stimuli events occur in some positions in the metrical structure

while leaving nearby positions empty in a non-regular way (Fitch and

Rosenfeld, 2007). According to (Large, 2008), both temporal deviation from

perfect isochrony and syncopation constitute forms of complexity that are

the most troublesome for modelling rhythm perception.

This completes our survey of objectively measurable features of stimuli

that afford beat and metre perception. In the next section we turn to the

associated topic of analysing the accompanying perceptual phenomena.

The importance of the next section is to discuss fundamental aspects of

both beat and meter perception that any theory that attempts to account

for rhythm perception should account for.

Page 40: Testing and analysis of a computational model of human rhythm perception

2.2 Beat perception 15

2.2 beat perception

In this section, three attributes of beat perception are considered. These are

expressed in a statement form as follows:

• Beat takes the form of a perceived pulsation in the mind of the listener,

• Beat is perceived subjectively, i.e., it is not an explicit part of the stimulus

and it can be felt in a number of different ways,

• Beat is dynamic in its nature, and constantly influenced by the stimulus.

2.2.1 Beat as mental pulsation

In the introduction of this chapter it was mentioned that beat perception

corresponds to the identification of an ongoing, steady, and unaccented

pulsation that is attributed to a rhythmic (or metrical) stimulus. Cooper and

Meyer (1960) describe beat perception as ’a series of regularly recurring,

precisely equivalent’ psychological events that arise in response to a

metrical stimulus. Cooper and Meyer’s view in seeing beat as a perceptual

construct rather than a property of the stimulus is commonly shared

by music theorists, such as London (2004), and Lerdahl and Jackendoff

(1983). The dissociation between perception of beat and the stimulus

further indicates a human ability to feel pulsation independently of the

presence of a stimulus, thus it further supports the view of beat as a

perceptual construct, which is felt subjectively. In fact, empirical studies in

tracking complex sequences of metrically structured rhythms have shown

that humans can synchronise to complex rhythms in different subjective

ways (Large et al., 2002). The behavioural manifestation of the above type

of synchronization is tapping behaviours that correspond to different levels

Page 41: Testing and analysis of a computational model of human rhythm perception

16 2 human rhythm perception

of periodicities where the latter may or may not be marked by a stimulus

event (e.g., syncopated stimuli)(Large et al., 2002).

2.2.2 Multiplicity of beat perception

The variability of human behaviour in perceiving a beat given a particular

stimulus is a main characteristic of beat perception. A brief example of this

kind of behavioural variation in beat perception and its link to the metrical

structure of a particular stimulus is spelled out below.

Figure 2.11 below uses the example of a simple monophonic melody

(labelled ’surface level’) consisting of a series of note events to illustrate

the multiplicity of beat perception. In this figure, the different levels of

beat perceived by different people are explicitly indicated. In concrete

terms, each different level in the hierarchy may be thought of as an

indication of one possible way of tapping one’s foot along with a song. This

characterization in terms of ’foot tapping’ has its limits: level n-2 is probably

too fast to tap a foot to, given that it corresponds to 16th notes. The middle

level (hierarchical level n) corresponds to quarter notes with respect to the

provided notation at surface level and it most likely represents the most

popular level for people to tap to. Where any metrical structure has a beat

level that is the most popular for people to tap to, this level is referred to

as the tactus (Lerdahl and Jackendoff, 1983). Drake et al. (2000) note that

while individuals may gravitate to one particular beat level, however, they

can generally change to either higher or lower levels if they wish (e.g., by

doubling or halving the tapping rate) and still feel synchronised with the

music.

1 Figure 2.1 derived from Drake et al. (2000, p. 253)

Page 42: Testing and analysis of a computational model of human rhythm perception

2.2 Beat perception 17

Figure 2.1: Binary-branched representation of a metrical structure. The verticallines delineate potential levels of beat perception in the presence of therhythmical stimulus at the surface level

2.2.3 Dynamic perception of beat

The perception of beat helps individuals to create expectations about when

events are likely to occur in the future. We could think of that process

as the principal mechanism that allows people to play music together or

to dance to a song. However, it should be stated that these expectations

are dynamic in their nature and constantly influenced and challenged by

the presence of a stimulus. That is, in order for those expectations to be

retained and fulfilled, the stimulus should exhibit relative temporal stability

over the course of its presentation, a stability that is nonetheless flexible

enough to accommodate minor temporal fluctuations related to the event

onsets of a stimulus. As was earlier discussed in Subsection 2.1.3 temporal

fluctuations (particularly local nuances) occur naturally in the structure of

a stimulus performed by humans, e.g., expressive variation. Rankin, Large,

and Fink (2009) have found that the fluctuation patterns of expressive piano

performance (by means of analyzing the time series of the performance) are

Page 43: Testing and analysis of a computational model of human rhythm perception

18 2 human rhythm perception

structured, and have demonstrated that people are still able to perceive a

steady beat in such performance. That is, the perception of beat is flexible

enough to retain its stability even in cases where the stimulus exhibits small

deviation from perfect periodicity.

For a more extreme example, an empirical study by Large, Fink, and

Kelso (2002) used a series of clicks with temporal fluctuations chosen

purposefully to be larger than the fluctuations normally found in expressive

variation. The clicks featured tempo perturbations of plus and minus 15 %

around a central inter-onset interval of 400 msec. This study showed that

humans can successfully re-synchronise tapping within 5 clicks after the

perturbation and without having to stop tapping during the adaptation

phase. In general, people are able to adapt to both tempo and phase

perturbations of simple and complex rhythmic patterns (Large et al., 2002;

Thaut et al., 1998; Repp, 2002). Additionally, in cases where individuals

have control over the production of a stimulus, its temporal structure can

be altered on demand and consequently, the perceived beat is also changed.

For example, musicians can influence the ongoing perceived periodicity

either by gradually accelerating (accelerando) or decelerating (ritardando)

their playing as part of the stylistic execution of a musical piece, and still

be able to perceive a beat. The aforementioned types of adaptability cast

light on the dynamic nature of beat perception, that is, the perception of

beat is anticipated based on the recently past events, and that changes

in the presentation of a stimulus constantly influence and challenge the

perception of the beat. In general, we can think of the beat as a dynamic

act of identifying a pulse that runs through a metrical stimulus.

The discussion so far has highlighted several characteristics of beat

perception such as its dynamic and subjective nature, and also, the

multiplicity of beat perception, i.e., different ways of attributing a steady

Page 44: Testing and analysis of a computational model of human rhythm perception

2.3 Meter perception 19

beat to a given stimulus (see Figure 2.1). Some of the key features of beat

perception can be summarised in the words of Meyer and Cooper (1960, p.

3):

’Though generally established and supported by objective

stimuli (sounds), the sense of pulse (beat) may exist subjectively.

A sense of pulse, once established, tends to be continued in the

mind and musculature of the listener, even though the sound

has stopped. For instance, objective pulses may cease or may

fail for a time to coincide with the previously established pulse

series. When this occurs, the human need for the security of an

actual stimulus or for simplicity of response generally makes

such passages point toward the re-establishment of objective

pulses or to a return to pulse coincidence’.

Having considered some of the key features of the perception beat, we will

now consider the key features of meter perception.

2.3 meter perception

In meter perception there is a sense of grouping of a series of successive

beats. This is marked by the perception of certain periodically recurring

beats as louder or more emphasised compared to the rest in the series.

Those beats that are perceived as louder are generally perceived as the

first in a pattern. Meter perception therefore can be described as a feeling

of regularly positioned strong beats called metrical accents (Lerdahl and

Jackendoff, 1983) interposed by a series of weak pulses. This strength is not

necessarily associated with phenomenal accents, e.g., sforzandi, sudden

changes in dynamics or timbre, or long notes (Lerdahl and Jackendoff,

Page 45: Testing and analysis of a computational model of human rhythm perception

20 2 human rhythm perception

1983), but rather the issue is one of perceived loudness. A common example

of perceived metrical accents is the phenomenon of subjective accentuation

whereby an isochronous series of notes of identical frequency and intensity

are grouped in a pattern of two or four notes with the first note subjectively

perceived as louder (Bolton, 1894). Bolton further notes that the recurring

of accented sounds forms a secondary rhythm out of the primary rhythm.

This secondary rhythm is sub-harmonically related to the main rhythmic

frequency. That is, meter perception may be viewed as an instance of

the ability of humans to spontaneously hear subharmonics of a rhythmic

frequency presented to them, a characteristic that repeatedly arises in the

way people perceive polyrhythms, as we will see in Section 2.4.

In general, given a metrical stimulus, metrical accents will arise in

temporal locations where the beats of many different levels of periodicities

come into phase (Large, 2000). In other words, a time point which is

perceived as a beat on two different levels of pulsation is ’structurally

stronger’ than a point which is felt as a beat in only one level (Clayton

et al., 2005) p.31. Thus, the perception of meter is based not only on

the existence of several levels of periodicities, but also on the fact that

different levels come into phase, which was previously described as one of

the constraints in the way different levels of periodicity relate to each other

(see Subsection 2.1.2 - WFC3). The more beats from different levels come

into phase, the stronger structurally is that point perceived to be. That

means that if a stimulus event occurs at a point in time of a structurally

stronger beat then it may be perceived as louder than if it occurs at a

structurally weaker beat. Figure 2.2 below is Lerdahl and Jackendoff’s

classic representation for illustrating structural strength and its association

with phenomenal loudness. For example, events occurring at beat position

1 would be felt more strongly than events occurring at beat position 3,

Page 46: Testing and analysis of a computational model of human rhythm perception

2.4 Polyrhythm perception 21

and events occurring at beat position 2 would be felt weaker than events

occurring at beat position 3 or 1.

Table 2.1: Lerdahl and Jackendoff’s diagrammatic representation of metricalhierarchy based on several beats.

Beats 1 2 3 4 1 2 3 4

Level 1 • • • • • • • •Level 2 • • • •Level 3 • •

2.4 polyrhythm perception

In this section we introduce polyrhythmic stimuli in detail and we describe

the different behaviourally overt ways in which people perceive beat and

meter in a polyrhythm.

2.4.1 Polyrhythmic stimuli

In Section 2.1.1 we discussed how, in the case of metrical music that is

not polyrhythmic, the ratios between the periods of different levels of beat

reduce to integers (e.g. 2:1, 3:1 and 4:1). By contrast, polyrhythmic stimuli

in their simplest form consist of two or more different isochronous pulse

trains which are in phase, but the ratios of whose periods do not reduce

to integers (e.g., 2:3, 3:4, etc). In more complex examples, one or more

isochronous pulse trains might be replaced with a complex pattern or figure

in such a way that the actual or implied isochronous pulse trains have

periods with ratios that do not reduce to integers.

Page 47: Testing and analysis of a computational model of human rhythm perception

22 2 human rhythm perception

To take a simple example, a polyrhythm could consist of a 3-pulse train

dividing the duration of a shared repeating pattern into three equal IOIs,

while a 2-pulse train divides the duration of the pattern into two equal IOIs.

In Subsection 2.1.1 we have noted that for the purposes of this thesis,

and in line with London’s (2004) Well Formedness Constraints for metrical

structures, polyrhythmic structure can be thought of as a special case

of metrical structure. As previously described, a metrical structure

contains different periodicities that co-exist in parallel and satisfy particular

constraints, such as that different periods stay invariant over time, and the

onsets of the periods come into phase regularly. Polyrhythmic structure

complies with the above constraints and therefore it is considered metrical

for the purposes of this thesis.

2.4.2 Human tapping behaviours in polyrhythm perception

One of the simplest forms of a polyrhythm consists of just two rhythms

(pulse trains), e.g., rhythms m and n. Just as with simple meters,

different people may perceive different beats or meters in polyrhythms,

or the same people may have different perceptions of the same stimulus

on different occasions. According to (Pressing et al., 1996) the most

frequently occurring ways of perceiving beat or meter in polyrhythms, as

behaviourally manifested by tapping along in a periodic way, are as follows:

(a) Tapping along with the m pulse train.

(b) Tapping along with the n pulse train.

Additionally, other empirically observed modes of tapping along with a

polyrhythm include (Handel and Oshinsky, 1981):

(c) Tapping along with the points of co-occurrence of the two pulse trains

(i.e., once per pattern repetition or measure).

Page 48: Testing and analysis of a computational model of human rhythm perception

2.4 Polyrhythm perception 23

(d) Tapping along with every second element of one of the pulse trains, e.g.,

every other element of an m-pulse train or an n-pulse train.

Figure 2.2 below illustrates the cases a to d of the above list, relabeled as

cases 1 to 5, i.e., the two cases in d) are considered independently. To make

the illustration concrete, a 4:3 polyrhythmic stimulus is used as an example

in Figure 2.2.

Figure 2.2: Categories of periodic tapping behaviours along with a 4:3 polyrhythm.The two measures of the polyrhythmic pattern depicted above aresufficient to illustrate the cyclicality of the above tapping options.

According to Handel and Oshinsky (1981), 80% of the behaviourally

observed responses - for a number of different polyrhythms and tempi

- correspond to tapping along with the elements of one of the pulse

trains. The second major class (12% of the observed behaviours) of tapping

periodically along with a polyrhythmic stimulus in a periodic way is to

tap in synchrony with the co-occurrence of the two pulses, or once per

Page 49: Testing and analysis of a computational model of human rhythm perception

24 2 human rhythm perception

measure (coincidence of pulse trains). The third class (6%) corresponds to

humans tapping periodically along with every other element of a 4-pulse

train (most common), and every other element of a 3-pulse train (least

common). An influential factor that shapes the human tapping preferences

is the repetition frequency of the polyrhythm. Empirical results from the

Handel and Oshinsky study demonstrate that the choices made by listeners

in the interpretation of a polyrhythms, as revealed by regular tapping,

depend in part on the absolute tempo.

The empirical results reported by Handel and Oshinsky also reveal limits

on the tapping speed that humans can exhibit. For example, in the case

of a 4:3 polyrhythm, when the polyrhythm repeats once every 600 msec,

the fastest tapping observed corresponds to tapping along with the 3-pulse

train i.e. tapping once every 200 msec. The slowest tapping rate for a 4:3

polyrhythm repeating once per second corresponds to tapping once every

1000 msec (i.e. tap along with the coincidence of the two pulse-trains).

The table below illustrates minimum and maximum tapping speeds for all

of the different types of polyrhythms and tempi explored by Handel and

Oshinsky. Tapping speeds range from one tap every 120 msec to one tap

every 1200 msec.

Page 50: Testing and analysis of a computational model of human rhythm perception

2.4 Polyrhythm perception 25

Table 2.2: Fastest and slowest tapping speeds along with different polyrhythmsreported by Handel and Oshinsky (1982).

Polyrhythm Type Fastest Tapping Slowest Tapping3/4 200 msec (tapping

along with the3-pulse train whenthe polyrhythmrepeats once per 600

msec)

1000 msec (tappingalong with thecoincidence ofthe two pulsetrains when thepolyrhythm repeatsonce per 1000 msec)

2/3 200 msec (tappingalong with the2-pulse train whenthe polyrhythmrepeats once per 400

msec)

1200 msec (tappingalong with thecoincidence ofthe two pulsetrains when thepolyrhythm repeatsonce per 1200 msec)

3/5 160 msec (tappingalong with everyother element of the5-pulse train whenthe polyrhythmrepeats once per 400

msec)

800 msec (tappingalong with thecoincidence ofthe two pulsetrains when thepolyrhythm repeatsonce per 800 msec)

2/5 120 msec (tappingalong with the5-pulse train whenthe polyrhythmrepeats once per 600

msec)

1200 msec (tappingalong with the2-pulse train whenthe polyrhythmrepeats once per2400 msec)

4/5 150 msec (tappingalong with 4-pulsetrain when thepolyrhythm repeatsonce per 600 msec)

600 msec (tappingalong with thecoincidence ofthe two pulsetrains when thepolyrhythm repeatsonce per 600 msec)

Page 51: Testing and analysis of a computational model of human rhythm perception

26 2 human rhythm perception

2.5 summary

Both beat and meter are perceptual mechanisms that involve the

identification of a periodicity in association with the presence of a metrical

stimulus. Certain aspects of beat and meter perception, such as the ability

of humans to accommodate small temporal fluctuations (both local and

wider) in the structure of a stimulus without disrupting the general feeling

of an underlying periodicity have been identified as some of the principal

aspects of the phenomenon of rhythm perception that any theory of rhythm

perception should be able to account for. Lastly, given that polyrhythms

have been characterised as metrical, different ways people perceive beat

and meter in polyrhythms have been described in terms of regular tapping,

and empirical data regarding how fast people tap along with polyrhythms

have been summarised.

Page 52: Testing and analysis of a computational model of human rhythm perception

3C O M P U TAT I O N A L M O D E L S O F B E AT A N D M E T E R

P E R C E P T I O N

In the previous chapter, various phenomena were itemized and described

that any model of rhythm perception should be able to account for. Firstly,

candidate models ought to be able to account for the perception of beat and

meter as outlined in the previous chapter. Secondly, they should be able

to deal with the resilience of beat and meter perception in the presence

of various kinds of temporal variation, such as time-varying tempi, also

discussed in that chapter. Thirdly, they should be able to deal with the

full variety of types of metrical structures (in the sense of ’metrical’ as

defined in the previous chapter) including, for example polyrhythms. The

above considerations can be applied both to theoretical and computational

models of rhythm perception. Depending on the modeler’s perspective,

these considerations can be interpreted either as a series of modelling tasks

that have to be implemented and incorporated into a model, or as a series

of criteria that models should comply with.

In this chapter, we will review computational (as opposed to purely

theoretical) models of rhythm perception. The review is representative

rather than exhaustive. Models are categorised and critiqued in several

ways. First of all, each model is critiqued in terms of the degree to which

it is able to account for the various phenomena of rhythm perception that

must be accounted for, as itemized above. Secondly, consideration is made

of the extent to which each model is, or is not, physiologically plausible.

27

Page 53: Testing and analysis of a computational model of human rhythm perception

28 3 computational models of beat and meter perception

We consider four principal categories of models. In the list below, one or

more implementations is cited against each category of model. Each of the

four categories is discussed in detail in Sections 3.1 - 3.4.

3 .1 Internal clock models (Povel and Essens, 1985)

3 .2 Probabilistic models (Laroche, 2001; Seppänen, 2001)

3 .3 Oscillator models (Church and Broadbent, 1990; Large and Kolen, 1994;

McAuley, 1995; Toiviainen, 1998)

3 .4 Filter Array Models (Oppenheim and Schafer, 1975; Scheirer, 1998;

McAngus Todd et al., 1999; Eck, 2007; Klapuri et al., 2006)

Having critiqued each of the four categories as listed above, we then discuss

three general emergent issues as follows.

3 .5 Issues of physiological plausibility

3 .6 Issues of input encoding in models

3 .7 Causal vs. non causal models

3.1 internal clock models

Internal clock models are associated with some of the first attempts in

modelling human perception of rhythm. The core idea of these models

is that humans possess one or more internal clocks of some kind and

some kind of way of effectively counting and storing (or at any rate

estimating) the number of ticks between two events of interest. Models

Page 54: Testing and analysis of a computational model of human rhythm perception

3.1 Internal Clock Models 29

vary according to whether there is assumed to be a single such clock, or

multiple such clocks, and whether the tick period is assumed to be fixed

(e.g., at 1 millisecond), or whether it may be possible for the tick period to

be influenced by the period of external stimuli of interest. In some models,

two or more clocks are assumed to be related hierarchically (Povel and

Essens, 1985). Povel and Essens posit that two characteristics need to be

defined to determine the setting of a clock. These are its unit length (tick

length) and its starting time (i.e., the time that counting starts). The starting

time of the clock is always defined relative to the time of some stimulus

event and by default the two times are assumed to be the same. As already

noted, in some variant models the unit length may be assumed to take a

value based on the interval periods present in a stimulus (within certain

limits).

Povel and Essens postulate that a whole family of clocks with different

tick rates is judged ("scored") against any incoming stimulus. The judgment

or score is based on the extent to which the ticks of any particular clock

coincide, or fail to coincide, with the incoming events. The clock with

the highest score is the most strongly favoured ("induced") clock and is

chosen to represent the clock that humans possess in perceiving rhythm.

Its temporal unit corresponds to a fundamental pulsation of the stimulus

such as that of the tactus. The clock then calculates a second pulsation,

which is the subdivision of its unit.

Because the model is hierarchical, it satisfies our criterion of being able

to produce multiple responses capable of reflecting the metrical structure

of a stimulus. Despite the fact that this model is considered hierarchical,

the version of the model described in the Povel and Essens (1985) paper

only captures two metrical levels, for example the periodicity related to the

tactus level, and one subdivision. However, Povel and Essens claim that

Page 55: Testing and analysis of a computational model of human rhythm perception

30 3 computational models of beat and meter perception

the model can be extended to identify more of the levels of the metrical

structure of a stimulus. But there is another, deeper limitation of this

model. For example, given a polyrhythmic stimulus the model may be able

to identify a pulsation of one pulse-train but then information with regard

to the pulsation underlying another pulse-train in the polyrhythm would

be undiscovered, because the period defined by one of the pulse trains in a

polyrhythm cannot be an integer multiple of the period defined by another

pulse train in the same polyrhythm.

With regard to the ability of the internal clock model to capturing the

temporal nuances of a stimulus described in Chapter 2, Povel and Essens

(1985) note that future developments of the model can allow for a clock

that continuously adapts its unit length, that is it can deal with stimuli

that exhibit temporal fluctuations with respect to perfect isochrony. That is

the internal clock model has the potential to comply with one more of the

criteria set in the beginning of this chapter.

Lastly, given Povel and Essens assumption that people have access to

an internal clock, they implicitly identify their model as a model that

is related to physiology. Despite the appeal to physiological processes

implicit in Povel and Essens’s assumption, the computational formalism

of the model utilises optimisation methods - where a "best" clock is chosen

among a series of candidate clocks to represent the perception of rhythm in

a stimulus. This multiplicity of clocks, and the choice amongst them, is not

explicitly linked to physiology.

3.2 probabilistic models

In this section we review computational models that model rhythm

perception using probability theory. The key idea is that successive

Page 56: Testing and analysis of a computational model of human rhythm perception

3.2 Probabilistic models 31

stimulus events define periodicities that can be analysed in order to predict

when in the future an event is expected to occur. An example of a

probabilistic model that processes audio files of actual musical stimuli is

the Laroche model (2001). The above model is known as a ’beat model’,

which means that it explicitly focuses on identifying just one level of

periodicity, e.g., the tactus. However, as noted in Chapter 2 the notion of

beat perception is associated with the perception of multiple periodicities,

so the description of the above model as a ’beat model’ may be slightly

misleading, because it identifies only one level of periodicity.

The computational process of the model in finding the beat is as follows.

Initially, the model assumes that there is a constant tempo in a stimulus. It

then identifies the first beat in a signal by means of transient analysis, i.e.,

times at which the energy of the signal in some frequency band increases

sharply. Finally, the model defines a series of points in time at which there

is a maximum probability to observe a beat (transient). The probability

density function (PDF) around the projected beat times can be represented

by any symmetrical distribution with a mean equal to zero, e.g., normal

distribution. However, the variance of the distribution must be adjusted as

a function of the tempo so the PDF does not spill into adjacent beats and

does not become too small either.

Laroche’s model is able to process audio stimuli that closely resemble

actual musical stimuli however under the assumption that the tempo is

constant. However, we have already described that it is very likely that

musical stimuli would exhibit temporal nuances like tempo variation over

time. In these cases Laroche mentions that dealing with time-varying tempi

is much more difficult and that the algorithm should be improved.

Seppänen (2001) created a model of rhythm perception that analyses

actual music stimuli at more than one metrical level. Similarly to

Page 57: Testing and analysis of a computational model of human rhythm perception

32 3 computational models of beat and meter perception

the Laroche model, Seppanen’s model utilizes probabilistic methods in

its processing. The model analyses a signal first at the tatum level

(defined below), then at the tactus level, and finally, at all subordinate

levels in between the tactus and the tatum level. For that reason it is

more hierarchical than Laroche’s single level analysis. The tatum level

corresponds to the smallest metrical unit, that is the metrical level that

resembles the fastest pulsation.

The processing of the stimulus begins by a transformation of the audio

signal into a symbolic representation (see Section 3.4 for more on symbolic

representation). An onset detector process is applied to the signal by means

of tracking changes in the root-mean-square amplitude envelope of the

signal and emits event onsets at points of rapid level increase. Then a

process of estimating the tatum level is calculated on the basis of deriving

the greatest common divisor of a number of inter-onset intervals (IOI). The

IOIs are calculated by considering pairs of both adjacent and non-adjacent

onsets. The model estimates the tatum level in real time based on the

current signal information (see also Section 3.5). After the segmentation

of the signal at the tatum level a psychoacoustic accentuation analysis takes

place at each tatum onset in order to estimate the tactus (beat) level. The

psychoacoustic accentuation is estimated by analyzing a number of signal

characteristics such as onset power, onset spectral shape, bass level, etc. The

beat is then estimated by observing the identified accentuated event onsets

for periodicities near 100 BPM (beats per minute).

Seppanen’s model allows for accelerandos and ritardandos in the

stimulus thus it can accommodate stimuli that deviate from perfect

isochrony. The only criticism with regard to the criteria we set out in the

beginning of this chapter is that Seppanen’s model does not account for

all possible levels in a metrical structure since it is confined in identifying

Page 58: Testing and analysis of a computational model of human rhythm perception

3.3 Oscillator Models 33

levels between the tactus and the tatum level. Also, despite being a

competent model for analyzing music like stimuli it is not explicitly related

to physiological mechanisms underlying human rhythm perception.

3.3 oscillator models

Oscillator models simulate rhythm perception by utilizing the generic

properties of an oscillator such as its natural frequency, and its ability to

adapt its phase and frequency (in the case of adaptive oscillators) in the

presence of a rhythmic stimulus. Without using adaptive oscillators, the

same effect can be obtained by using a set of oscillators.

For example, a set of oscillators can capture any level of periodicity in a

metrical stimulus, as long as one oscillator has the corresponding natural

frequency. Alternatively, a single oscillator with an adaptive frequency can

do the same job. Similarly, temporal fluctuations in phase and frequency

(either intentional or unintentional) can be tracked either by an adaptive

oscillator or a set of oscillators.

Based on the generic properties of phase and period adaptation of an

oscillator, different models that account for either beat or meter perception

have been developed and evaluated in a number of situations. As early

as 1984, Dannenberg developed a single oscillator model to identify beat

in rhythmic patterns, which were either speeding-up or slowing-down

(Dannenberg, 1984). Large and Kolen (1994) developed a model using a

type of oscillator that was capable of tracking the beat in a complex piece

of real improvised melodic performance. McAuley (1995) also created a

model using a similar type of oscillator to the one used by Large and

Kolen, which was able to follow large tempo changes (e.g., ritardando),

thus the term adaptive oscillator was coined. McAuley (1995) evaluated his

Page 59: Testing and analysis of a computational model of human rhythm perception

34 3 computational models of beat and meter perception

model by employing a series of rhythmical patterns of varying complexity

originally used by Povel and Essens (1985) to test their model and compared

the output of his model with empirical results from people tapping along

with the same patterns to eventually gain supportive results for his model.

Also, Toiviainen (1998) created a model of beat tracking again based on a

similar oscillator to the one used by Large and Kolen (1994). Toiviainen’s

model was designed to perform tempo tracking of a musical performance

in real time, and hence it was used to creatively - and interactively - play

back a predefined accompaniment in synchrony with the performance.

Furthermore, according to Toiviainen (1998, p. 64) the model could track

the beat in rhythmically complex inputs, such as melodies with trills, grace

notes and syncopation, however, Toiviainen did not provide an accurate

description of explicit examples of beat tracking performance.

A reasonable extension of the single oscillator models is to consider a

network of oscillators rather than a single unit, in order to address the

various levels of periodicity in the metrical structure of a stimulus. One

of the first references in using a multiple-oscillator model to account for

experimental results related to perception of time in rhythmical stimuli was

reported by Collyer, Broadbent, and Church (1994). In particular, Collyer

and colleagues asked participants to tap in an unconstrained manner along

with a stimulus, and reported tapping preferences that followed a bimodal

distribution with the two modes exhibiting inter-tap intervals of 272 and

450 msec. The two different ways of tapping along with the same stimulus

were taken as indicative of the multiplicity of beat perception, and were

eventually simulated by using the Church and Broadbent (1990) model of

multiple oscillators. Miller, Scarborough, and Jones (1992) considered a

network of coupled oscillators and observed its resonances to rhythmical

patterns. A similar approach of interconnected units has been suggested by

Page 60: Testing and analysis of a computational model of human rhythm perception

3.4 Filter Array Models 35

Large and Kolen (1994) and further refined by (Large and Jones, 1999; Large,

2000; Large and Palmer, 2002; Large et al., 2010) where a bank of gradient

frequency oscillators with filter-like behaviour has been developed and has

been found to account for a series of challenging intricacies of rhythm

perception such as stability of beat perception in the face of syncopated

stimuli (Large, 2010). Furthermore, the oscillators used in the Large,

Almonte, and Velasco (2010) model have a strong similarity to biophysical

models of neural activity related to both single neurons and populations

of neurons, as we will see. As a result, the theory of neural resonance has

been proposed to account for rhythm perception (Large and Snyder, 2009).

The theory suggests that when a network of oscillators spanning a range of

natural frequencies is stimulated with a musical rhythm, a multi-frequency

pattern of oscillations is established, where the latter accounts for a range

of characteristics of the phenomenon of rhythm perception. Because of

the tight link between the mathematical abstractions of this model and the

physiology of neurons explored in more detail in Section 3.5 and in the

next chapter, taken together with the high degree to which neural resonance

models are able to account for the various phenomena of rhythm perception

noted in the previous chapter, we will see that neural resonance models are

good prima facie candidates for a physiologically plausible computational

theory of rhythm perception.

3.4 filter array models

The idea of having an array of filters to account for the perception of the

metrical structure of a stimulus has also been explored by Oppenheim and

Schafer (1975). Oppenheim and Schafer used an array of linear band-pass

filters each of different frequency to identify the multiple periodicities

Page 61: Testing and analysis of a computational model of human rhythm perception

36 3 computational models of beat and meter perception

associated with the metrical structure of the stimulus. More recent models

that utilize similar band-pass (or comb) filters include Scheirer’s (1998)

model of gradient-frequency bank of comb filters, Todd’s et al. (1999) model

of passband filters, Eck’s (2007) autocorrelation model, and Klapuri’s et

al. (2006) model of resonating comb filters. Linear filters such as those

described above have been quite successful in accounting for identification

of beat and meter, in various musical styles. However, there are still some

challenges remaining such as the inability to retain a stable beat in the face

of syncopated stimuli and the time delay in capturing information about

metrical structure on the level of the measure (Large, 2000). However,

some of those models such as that of McAngus Todd, O’Boyle, and Lee

(1999) are of a particular interest due to the effort they put into creating a

"holistic" model of rhythm perception. That is they try to incorporate the

empirically observed influence of motor production in rhythm perception

and also they are concerned with how physical stimuli are processed

neurobiologically. As Eric Clarke comments on an earlier version of

Todd’s model (McAngus Todd and Lee, 1994): "the model regards the

tactus as the result of combining the output of the band-pass filter bank

with a sensory-motor filter that has fixed and relatively narrow tuning

characteristics and represents the intrinsic pendular dynamics of the human

body" (Clarke, 1999).

3.5 issues of physiological plausibility

Thaut et al. (2005) notes that by studying the physiology and neurology

of brain function in music we can obtain a great deal of knowledge

about general brain function, in regard to the perception of complex

auditory stimuli, rhythm processing, language processing, and feeling of

Page 62: Testing and analysis of a computational model of human rhythm perception

3.5 Issues of Physiological Plausibility 37

emotion. Furthermore, Michon (1967) argues that mathematical models

in and of themselves are psychologically void of sense if they are not

somehow associated to physical and mental functions, like memory and

order discrimination, which are known to play a role in time perception

(Creelman, 1962; Sherrick Jr et al., 1961). For example, a central assumption

in Michon’s model of timing in temporal tracking, which focuses on

simulating the way people perform in sensorimotor synchronization tasks,

is that people are able to retain timing information in their memory

about the duration of successive stimulus events and about the errors

they perform in trying to produce a tapping behaviour that is in perfect

synchrony with the onsets of the stimulus events. These two pieces of

temporal information are drawn from memory in order to direct future

behaviours with the aim to optimise the degree of synchrony between

behaviour and event onsets. From that perspective, Michon’s model

is physiologically related, at least in principle, to mental functions like

memory.

In general, models whose mathematical abstractions are tightly linked

to the physiology of neurons are known as "neural" models. As noted in

the previous section, one such model is the model of neural resonance. In

general, the mathematical abstraction of such "neural" models is derived

from well-established biophysical models of single neurons, e.g., Hodgkin

and Huxley (1952), and networks or populations of neurons such as

Wilson and Cowan (1972). Eck (2002) applied a relaxation oscillator model

of neural spiking dynamics to the task of modeling beat induction in

rhythmical patterns. This so-called relaxation oscillator has been used to

model a range of biological oscillatory processes, including neural spiking

Hodgkin and Huxley (1952) and heartbeat pacemaking (Van der Pol and

Van der Mark, 1928). In fact, Eck’s oscillator is a type of Fitzhuch-Nagumo

Page 63: Testing and analysis of a computational model of human rhythm perception

38 3 computational models of beat and meter perception

relaxation oscillator (Fitzhugh, 1961; Nagumo et al., 1962), which is a

two-variable model of neural action potential, i.e., a simplification of the

much more complicated model - but strongly biophysical - of Hodgkin and

Huxley (1952). Eck’s model has been tested using the rhythmical patterns

suggested by Povel and Essens, and made good predictions in 34 out of the

35 cases. Also, the same tests were repeated after injecting noise (up to 20%)

to the rhythmical patterns, thus to simulate natural temporal fluctuations in

real stimuli. Despite the degradation of performance Eck reports an 82.9%

successful rate in finding the beat.

3.6 types of input encoding : symbolic and/or acoustic

A generic distinction among different models of rhythm perception, such

as those discussed above, relates to the type of input that is fed into the

model. In particular, the type of the input could either be an acoustical

signal such as an audio file or a symbolic representation of a stimulus

such as a MIDI file representing a musical score. Natural stimuli exist

in the form of acoustic signals, so an argument has been made (Scheirer,

2000) contesting the validity of symbolic models in modelling human

rhythm perception. Despite this, Seppänen (2001) notes that insofar models

that process acoustic signals apply a signal-to-symbolic transform when

processing the stimulus, there is not much sense in differentiating symbolic

systems from acoustic signal processing systems. Additionally, there exist

models, which are flexible in terms of the nature of the stimulus they

consume. For example, Dixon (2001) proposes a symbolic MIDI tracker

that can also consume acoustic signals. Similarly, the Large, Almonte, and

Velasco (2010) neural resonance model provides great flexibility in encoding

Page 64: Testing and analysis of a computational model of human rhythm perception

3.7 Causal and non-causal models 39

the stimulus, where MIDI, acoustic and functional representations can all

be employed to encode a rhythmical stimulus.

3.7 causal and non-causal models

Another distinction between models of rhythm perception is whether a

model is causal or not. The causality or non-causality of a model is based on

whether the model’s output requires considering the stimulus as a whole

in advance of producing the output. In cases where the output of the

model is determined on the basis of looking at the stimulus as a whole

then the model is classified as non-causal, and causal otherwise. Causality

is particularly important in cases where the aim is to produce a model that

corresponds to the natural way people perceive rhythm, which is in essence

a causal way, i.e., people start to perceive rhythm almost immediately after

listening to a song. However, non-causality can be justified based on a

hypothesis that humans would alter earlier percepts in retrospect, based

on a later input, but only within the perceptual present of approximately 4

seconds (Parncutt, 1994). That is people have the ability to retain in their

memory 4 seconds of a stimulus and perceive a rhythm retrospectively by

processing those 4 seconds in their mind.

There are several models of rhythm perception that are causal. For

example, in Scheirer’s (1998) model the analysis of tempo and beat is

performed causally. Toiviainen’s (1998) interactive midi accompanist model

tracks the tempo of a musical performance in real time. Large and

Jones (1999) introduce their model as a dynamic model that builds on

the notions of expectancy and entrainment to describe real-time tracking

of time-varying events. Cemgil, Kappen, Desain, and Honing (2000)

Page 65: Testing and analysis of a computational model of human rhythm perception

40 3 computational models of beat and meter perception

formulate a tempo tracking system in a probabilistic framework that

facilitates a reasonable first estimate of the presented stimulus.

3.8 summary

The discussion in Chapter 3 has provided an overview of a number

of computational models of rhythm perception, such as clock-based,

probabilistic, and oscillatory models. We have discussed the differences in

their purposes, e.g., modelling either beat or meter perception, and pointed

to limitations and advantages at least with regard to a number of criteria

that are considered fundamental and challenging in modelling rhythm

perception, such as tolerance to temporal variation in the structure of the

stimuli, and capability of processing different forms of metrical structure.

As noted in Section 3.3 and Section 3.5, the neural resonance model shows

a strong link between the mathematical abstractions and the physiology of

neurons, and has good potential to account for the various phenomena of

rhythm perception. It is not the only model for which claims of this kind

can be made, but it is a good prima facie candidate for a physiologically

plausible computational theory of rhythm perception. Consequently, the

next chapter is completely devoted to the neural resonance theory and its

model. The main focus is on describing the mathematical derivation of the

model and its level of abstraction. Also, emphasis is given to understanding

the physiological plausibility of the model in some detail. In order to do

this, we will have to review some basic topics in brain physiology, and

hypotheses about how this may relate to rhythm perception.

Page 66: Testing and analysis of a computational model of human rhythm perception

4N E U R A L R E S O N A N C E M O D E L

In the previous chapter, the neural resonance model (Large et al., 2010)

was identified as a physiologically plausible candidate model of rhythm

perception, since it allows rhythm perception to be explained on the basis

of brain dynamics (as explored in depth in this chapter). The neural

resonance theory proposes that there are populations of neurons in the

brain, and that each population behaves like an oscillatory system with its

own macroscopic properties. These properties include synchronization of

firing among the constituent neurons, and a collective natural frequency.

Furthermore, when these populations of neurons are exposed to a regular

stimulus, some of the populations start to resonate, i.e., start to oscillate

with greater amplitude than in the absence of a stimulus. In physical

terms, the amplitude of the oscillation corresponds to the amplitude of

the electrical activity associated with the firing of a population of neurons.

The neural resonance theory asserts that the collective resonance patterns

formed in the presence of a metrical stimulus are responsible for giving rise

to the perception of beat and meter. For example, the perception of a beat

in a song emerges as a result of a population of neurons firing at a rate that

creates the perceived beat.

In the first section of this chapter, the physiological and functional

characteristics of the populations of neurons simulated by the model are

described. We then introduce the neural resonance model in mathematical

detail, while being careful to retain the link to physiological processes and

41

Page 67: Testing and analysis of a computational model of human rhythm perception

42 4 neural resonance model

structures wherever possible. In the second part of the chapter we focus

on existing studies where earlier versions of the neural resonance model

were tested using various rhythmical stimuli, and in some cases validated

against empirical data. Reviewing such studies is useful for framing the

area of contribution of the present thesis, and also provides examples of

the validation of computational models against empirical data. This is one

of the validation methods employed in this thesis, as explained in detail in

Chapter 5.

4.1 physiological plausibility of the neural resonance

model

The main unit in a population of neurons is a single neuron, and for

that reason we will consider single neurons first. In particular, we focus

on describing the physiology of neurons with particular attention to their

action potential, which controls the production of discrete electrochemical

signals. Neurons, together with glia cells, are the main cellular units that

constitute the human brain. The number of neurons in an adult brain can

be over 100 billion while the glia cells number about ten times more than

that, with different concentration ratios at different parts of the brain. The

neurons form networks of interactions by creating links called synapses.

Synapses form the pathways through which neurotransmitters are passed

from one neuron to the other. The formation of synapses is triggered by

the propagation of an electrical pulse (action potential) within the neuron.

The main parts of a neuron are dendrites (loosely speaking, inputs), soma

(main body), and an axon (loosely, output) (Figure 4.1)1.

1 Figure 4.1 derived from Arbib (2003), p. 1045

Page 68: Testing and analysis of a computational model of human rhythm perception

4.1 Physiological plausibility of the neural resonance model 43

Action potentials propagate down the axon towards their terminals

where they cause the release of varied neurotransmitters. The pulse-like

propagation of action potentials allows cells whose elements are in

continuous operation to function for many purposes as discrete processes.

Thus the production of an action potential is also called firing of the

neuron. When the neuron is at rest, the potential inside the cell is negative

(-70mV) relative to the outside of the cell. This potential is created by

ATP-fuelled pumps in the wall of the neuron that consume one molecule

of ATP to pump three sodium cations out for every two potassium cations

pumped in. However, external stimuli, such as the reception of excitatory

neurotransmitters from one or more excitatory presynaptic neurons (i.e,

"input neurons") may cause ion channels to open in the axon membrane

of the cell, allowing an influx of positive sodium ions (Na+), which causes

depolarization, thus allowing the membrane potential to rise to values

more positive than -70mV. At around -55mV there is a threshold that once

it is reached it triggers the initiation of the action potential, i.e., a further

rapid rise and then a fall in the potential. The maximum magnitude of

the potential can reach 100mV. By contrast, interaction with presynaptic

inhibitory neurons (which are less well-understood) may for example

decrease the permeability of the axon membrane to positive sodium ions

leaking back in to the neuron, or may increase permeability to an influx

of negative chloride ions (Cl-) and to an efflux of positive potassium ions

(K+), any of which have a tendency to hyperpolarise the potential of the

neuron. The fact that the neuron has many dendrites as inputs means that

both excitatory and inhibitory activity can take place at the same time.

Page 69: Testing and analysis of a computational model of human rhythm perception

44 4 neural resonance model

Figure 4.1: The anatomy of a neuron

Of particular interest for the purposes of this thesis is the functional

characteristic whereby individual neurons exhibit regular periodic firing

(Kalat, 2012). One such type of neuron is the thalamic reticular neuron:

these have a pacemaker role in activating the sleep spindle rhythm - a

type of firing activity that masks external stimuli during sleep in order to

prevent sleep disruption (Wang and Rinzel, 1993). Additionally, not only

single neurons but also ensembles of them of a particular local circuitry in

various brain structures, such as the cerebellum, hippocampus, olfactory

cortex, and neocortex have been found to produce periodic activity (Rakic,

1975; Shepherd, 1976). Hoppensteadt and Izhikevich (1996a, p. 118) explain

that one of the basic mechanisms for periodic activity among a population

of neurons is that local populations of excitatory and inhibitory neurons

have extensive and strong synaptic connections so that action potentials

generated by the former excite the latter, which, in turn, reciprocally

inhibit the former. A neural population with functional characteristics

such as periodic firing is called a neural oscillator. Physiological evidence

for the existence of such populations (neural oscillators) is provided by the

work of (Mountcastle, 1957) and (Hubel et al., 1963). Figure 4.2 illustrates

Page 70: Testing and analysis of a computational model of human rhythm perception

4.1 Physiological plausibility of the neural resonance model 45

schematically a neural oscillator whereby one neuron from each type is

depicted, i.e., excitatory and inhibitory. In practice, several thousands

neurons of each type make up a typical neural oscillator2. These ensembles

of neurons resemble the type of populations of neurons that the neural

resonance model is concerned with. In particular the neural resonance

model simulates the dynamics of such populations of neurons in the

presence of a metrical stimulus.

Figure 4.2: Schematic representation of the neural oscillator. It consists ofexcitatory (white) and inhibitory (shaded) populations of neurons. Forsimplicity one neuron from each population is pictured.

2 Figure 4.2 derived from Hoppensteadt and Izhikevich (1997), p.118

Page 71: Testing and analysis of a computational model of human rhythm perception

46 4 neural resonance model

4.2 historical background in modelling a neural

oscillator

The development of mathematical models to simulate activity of neurons is

a long-standing process, with models focusing both on single neurons and

populations of neurons. Also, there are different levels of mathematical

abstraction in the development of these models, some being very detailed

with respect to the physiology of neurons and others more abstract, where

the interest is to capture properties (functional characteristics) of the neural

activity. The neural resonance model belongs to the second category and it

focuses on capturing the properties of a neural oscillator in the presence of

a rhythmical stimulus, without its parameters being explicitly related to the

physiology of neurons. However, its development is based on models that

are quite detailed with respect to the physiology of neurons. In the next

two paragraphs we will describe those models that have been influential in

the development of the neural resonance model, which further allows us to

illustrate models of both single neuron and populations of neurons using

different mathematical abstractions.

One of the most popular biological models, Hodgkin and Huxley’s

(1952) model closely describes the physiology of the action potential of

a single neuron. This model mimics the firing of single neurons as

described in the previous section. The model has been derived from

empirical observations of the electrodynamics of a living squid’s axon. The

electrodynamics of the neuron are modelled by using a system of non-linear

differential equations that describe the behaviour of continuous currents

on the membrane potential (Koch and Grothe, 2003). This model uses the

differential equations to describe the initiation and propagation of action

potentials in the axon. One drawback of this biological model is that

Page 72: Testing and analysis of a computational model of human rhythm perception

4.2 Historical background in modelling a neural oscillator 47

it requires detailed knowledge of many parameters, which additionally

introduces a difficulty in defining all the degrees of freedom among the

interaction of the parameters. Furthermore, being a single-cell model, it

does not take into consideration the interaction of neurons within neural

populations.

A seminal attempt to describe neural oscillations due to interactions

of populations of excitatory and inhibitory neurons was the Wilson and

Cowan (1973) model. It is based on the Fitzhugh (1961) and Nagumo,

Arimoto, and Yoshizawa (1962) model of single-cell activity, which is a

simplification, i.e. a more abstracted version, of the Hodgkin and Huxley

(1952) model. The overall activity of the neural oscillator is modelled by

attributing a differential equation to describe the activity of each given

local population of either inhibitory or excitatory neurons (the equations

have been omitted here as a qualitative description is sufficient for our

purposes). The free parameters of the Wilson-Cowan model are highly

biological. In the Wilson-Cowan’s paper, ’A Mathematical Theory of the

Functional Dynamics of Cortical and Thalamic Nervous Tissue’ (1973) a

long list of symbols in page 57 describes parameters such as post-synaptic

membrane potential, surface density of excitatory neurons, and synaptic

operating delay, all directly related to physiology and functionality of

neurons.

The neural resonance model (Large et al., 2010) derives from the

(Wilson and Cowan, 1972) model. In particular, the neural resonance

model employs formal simplification methods that allow focusing on

the macroscopic properties of neural oscillators as a system, and thus, it

bypasses the need of an exhaustive list of physiological parameters and

their interactions such as those in the Wilson-Cowan model. In fact, the

neural resonance model is a ’canonical’ model, a notion generally used to

Page 73: Testing and analysis of a computational model of human rhythm perception

48 4 neural resonance model

refer to the simplest (in analytical terms) of a class of equivalent dynamical

systems. In our case, the dynamical system is the neural oscillator itself.

Other examples of canonical models that focus on neural oscillators are the

Aronson et al. (1990) model and the Hoppensteadt and Izhikevich (1996a)

model. The neural resonance model borrows the mathematical abstraction

of a neural oscillator as developed in the Hoppensteadt and Izhikevich

(1996a) paper, and extends it by assuming a network of heterogeneous

frequency oscillators, i.e., oscillators with different natural frequencies,

instead of homogeneous.

The formal mathematical introduction of the neural resonance model, its

properties and its parameters are the subject of the next section. Regardless

of the level of mathematical abstraction chosen to describe the phenomenon

of neural oscillation, the mathematical properties of oscillators introduced

in this section are universal, and thus expected to be observed in any

mathematical approach related to oscillators.

4.3 mathematical derivation of the neural resonance

model

This section begins with the mathematical representation of a single

neural oscillator. The model comprises a collection of such oscillators,

each one with its own natural frequency. In the mathematical expression

of a neural oscillator that follows, Dale’s principle (1935) is asserted

throughout, whereby: the excitatory neurons may have only excitatory

synaptic connections with other neurons and inhibitory neurons may have

only inhibitory synaptic connections (Figure 4.3).

Page 74: Testing and analysis of a computational model of human rhythm perception

4.3 Mathematical derivation of the neural resonance model 49

Figure 4.3: Schematic representation of the neural oscillator. It consists of

excitatory (white) and inhibitory (shaded) populations of neurons.For simplicity one neuron from each population is pictured. Dale’sprinciple is assumed, i.e., the white arrow in the red box denotesexcitatory synaptic connection, while the black arrow denotesinhibitory synaptic connection. This simple coupling structure allowsboth excitatory and inhibitory modes to impose negative feedbackon each other, thus moderating both extremes, and fostering steadyoscillation of the pair.

Page 75: Testing and analysis of a computational model of human rhythm perception

50 4 neural resonance model

4.3.1 Mathematical derivation of a neural oscillator

The activity of a single neural oscillator (labelled by i = 1,2, ..., n) can be

thought of as a dynamical system and it can be modelled by a system

of two differential equations, one describing the activity of an excitatory

neuron and the other the activity of an inhibitory neuron (Equation 1). Each

differential equation is a function of (x), (y) and (λ). (x) is the activity of

the excitatory neuron, and (y) is the activity of the inhibitory neuron. The

index (i) is used to distinguish different oscillators, but it can be disregarded

momentarily as we only deal with a single neural oscillator. Lambda is a

bifurcation parameter, which for certain numerical values corresponds to

a particular qualitative change (dynamical mode) in the properties of the

neural oscillator.

xi = fi(xi,yi, λ)

yi = gi(xi,yi, λ)(1)

The externally measurable activity of excitatory and inhibitory neurons

corresponds, at a macroscopic neurophysiological level, to the number of

action potentials per unit time (Hoppensteadt and Izhikevich, 1996a, p. 119).

Thus, the typical measure of neural oscillation in a given neighborhood is

the varying potential monitored by a local electrode which detects the net

difference between excitatory and inhibitory postsynaptic potentials in the

neighborhood of the electrode (Wilson and Cowan, 1972). The parameter

(λ), the bifurcation parameter, defines the dynamical mode of the neural

oscillator. For the purposes of this thesis, the focus is on the dynamical

mode of the neural oscillator near an Andronov-Hopf bifurcation point

(Andronov et al., 1971). Hoppensteadt and Izhikevich note that while an

Andronov-Hopf bifurcation is only one of many possible bifurcations of

Page 76: Testing and analysis of a computational model of human rhythm perception

4.3 Mathematical derivation of the neural resonance model 51

relevance for the dynamics of neural oscillation in general, it corresponds

to a transition from equilibrium (resting) to periodic activity (firing). Thus,

this mode is of most biological relevance for our purposes (Hoppensteadt

and Izhikevich, 1996a), and is most relevant to the neural resonance model

at the focus of this thesis (as considered in more detail in section 4.3.2).

Before we expand Equation 1 to deal with multiple neural oscillators,

we briefly cover the idea of gaining information about the solution of a

system of one neural oscillator by considering a geometric representation

of the system’s solution rather than an analytical solution. The reason

for doing this is to introduce some useful notions that will facilitate

the discussion as we go. Geometric methods in dynamical systems are

quite common, especially when the differential equations governing the

system are nonlinear and therefore arriving to a solution analytically is

challenging. For example, Winfree (2001) shows how geometric methods

of dynamics can be applied to the explication of biological oscillations,

especially circadian and heart rhythms. In general, a solution of Equation 1

demands a mathematical expression of how the dependent variables (x) and

(y) vary over time (t). In order to understand a geometric representation of

the solution we can define an abstract space with coordinates x and y.

The two-dimensional space (Figure 4.4) represents the phase space of the

neural oscillator as a dynamical system. In general, the dimensions could

be as many as the number of variables needed to characterise the state of the

system, and thus, systems in general can be characterized as n-dimensional

system or an nth-order system on that basis. Within the above space a

trajectory can be drawn to correspond to the solution of the system over

time in a particular mode and for given initial conditions. Interestingly, in

many cases geometric reasoning allows us to draw the trajectories without

actually solving the system (Strogatz, 1994). As an example, Figure 4.4

Page 77: Testing and analysis of a computational model of human rhythm perception

52 4 neural resonance model

shows a trajectory that corresponds to a neural oscillator firing periodically

(self-sustained oscillation), i.e., spontaneous oscillation. This trajectory

is known as stable limit cycle. The limit cycle is a closed and isolated

trajectory, which means that neighboring trajectories spiral either toward

or away from the limit cycle (Strogatz, 1994, p. 196).

Figure 4.4: A limit cycle of a neural oscillator exhibiting self-sustained oscillation(spontaneous oscillation).

While a single neural oscillator can behave as outlined above, the neural

resonance model is concerned with a network of gradient frequency

oscillators. In order to model such an oscillator that is able to interconnect

with other oscillators in a network, it is necessary to model its interactions

with all the other oscillators in the network. In order to achieve this, we

need to extend Equation 1 as follows. Equation 2 is a system of two

differential equations describing the activity of a single neural oscillator

that is coupled with other oscillators in a network. The first set of terms

on the right hand side of the equations is the same as in Equation 1. The

Page 78: Testing and analysis of a computational model of human rhythm perception

4.3 Mathematical derivation of the neural resonance model 53

second set of terms refers to the connection of a single oscillator with the

rest of the oscillators in a network. Parameter (ε) refers to the strength of

the connection between oscillators.

xi = fi(xi,yi, λ) + εpi(x1,y1, ..., xn,yn, ε)

yi = gi(xi,yi, λ) + εqi(x1,y1, ..., xn,yn, ε)(2)

Here the functions p and q represent synaptic connections from the whole

network of oscillators onto the i-th neural oscillator. The parameter (ε >

0) represents the strength of synaptic connection between different neural

oscillators. In general, parameter (ε) is assumed to be small based on some

neurophysiological justifications. More about the hypothesis of weakness of

synaptic connections can be found in (Hoppensteadt and Izhikevich, 1995).

Equation 2 is of the right form to describe a neural oscillator weakly

connected to all of the other neural oscillators in a network, but to make

this equation useful for our purposes, we need to extend it again. The

next step in extending the dynamical system is to consider its dynamics

in the presence of an additional external input to the i-th oscillator. The

external input resembles the presence of an external rhythmic stimulus of

a pulse-like nature. Equation 3 is a system of two differential equations

capturing the activity of a single neural oscillator in the presence of a

rhythmic stimulus. The oscillator is coupled with other oscillators in the

network. The parameters are the same as in Equation 2. The only new

parameter is the external input ρ(xi)(t), which is incorporated into the

coupling term. Large, Almonte, and Velasco (2010, Equation 5) incorporate

the external input in the following way:

xi = fi(xi,yi, λ) + εpi(x1,y1, ..., xn,yn, ρxi(t), ε)

yi = gi(xi,yi, λ) + εqi(x1,y1, ..., xn,yn, ρyi(t), ε)(3)

Page 79: Testing and analysis of a computational model of human rhythm perception

54 4 neural resonance model

One can think of Equations 2 and 3 as being a generalization of the

physiologically plausible Wilson and Cowan (1972) model of a neural

oscillator in an oscillatory neural network (Hoppensteadt and Izhikevich,

1996a).

4.3.2 Canonical derivation of a neural oscillator

In order to facilitate the analysis of a neural oscillator, we can focus

on its state near the Andronov-Hopf bifurcation point. A focus on

this particular bifurcation point is physiologically justifiable, because it

resembles the states of resting and firing, as outlined in the previous section.

This simplified focus also allows us to simplify the relevant equation by

reducing it to a much simpler form (a canonical representation) through

the application of normal form theory (Large et al., 2010). The exact

derivation is omitted here, but a detailed procedure for obtaining normal

forms of a system of equations like the one described by Equation 2 can

be found in (Hoppensteadt and Izhikevich, 1996a; Wiggins, 2003; Kahn

and Zarmi, 1998). While realistic mathematical models of physical and

biological processes can be complicated (e.g., Hodgkin and Huxley’s model

of action potential), simplified forms can ease analysis while still capturing

essential aspects of the phenomena being modelled.

The application of normal form methods in the system of differential

equations described in Equation 2 simplifies it into a single equation for

each (i) - rather than two equations for each (i) (Equation 4). However,

variable (z) below is a complex valued state, which means that it has both

Page 80: Testing and analysis of a computational model of human rhythm perception

4.3 Mathematical derivation of the neural resonance model 55

real and imaginary parts. Equation 4 below is a canonical representation of

Equation 2.

zi = bizi + dizi|zi|2 +

n∑j 6=icijzj + h.o.t. (4)

The meaning of the variables is momentarily withheld until we derive

the fully expanded equation that incorporates an external stimulus and

captures interaction between oscillators of different natural frequency. For

example, the term h.o.t (higher order terms) captures the interactions

between oscillators of different frequencies and it needs to be disclosed. As

previously noted, Equation 3 considers the dynamical system of a neural

oscillator in the presence of an external rhythmical stimulus. According to

Large, Almonte, and Velasco (2010), the process of simplifying Equation

3 - by using the normal form method in a similar way as in Equation

2 - is problematical, and known methods for deriving a canonical model

cannot be applied. In order to overcome this difficulty, Large et al. (2010)

suggest that the external input s(t) to oscillator (zi) can be included in

the coupling term as follows. Consider the coupling term - the second

term on the right-hand side - of Equation 4, i.e.,∑nj 6=i cijzj. This term is

related to the total amount of synaptic connection between a single neural

oscillator and the rest of the oscillators in a network. The complex-valued

coefficient (cij) denotes the values of synaptic connections between neural

oscillators, which depend upon (pi) and (qi) from Equation 2. In particular,

the absolute value (|cij|) encodes the strength of connection from the j-th to

the i-th oscillators, whereas the argument of (cij) encodes phase information

(Hoppensteadt and Izhikevich, 1996a). The term xi =∑nj 6=i cijzj + s(t) is

the Large, Almonte, and Velasco (2010) suggestion for incorporating the

external input into the coupling term, and as a result Equation 4 becomes

Page 81: Testing and analysis of a computational model of human rhythm perception

56 4 neural resonance model

Equation 5. Note that the above (xi) is not the same (xi) as the one in

Equations 1, 2, 3. The variables will be disclosed shortly.

zi = bizi + dizi|zi|2 +

n∑j 6=icijzj + s(t) + h.o.t (5)

Equation 5 can be simplified into Equation 6 as follows:

zi = bizi + dizi|zi|2 + xi(t) + h.o.t (6)

Equation 6 is a canonical representation of a neural oscillator with an

external stimulus incorporated into the coupling term. The higher order

terms in Equation 6 may or may not have an effect on the dynamics of

the system. We will treat these two cases in turn. In cases where the

interacting neural oscillators have very close natural frequencies the effect

of the higher order terms will be negligible (Large et al., 2010). This is

the case considered by Hoppensteadt and Izhikevich (1996a) whereby a

network of neural oscillators of homogeneous frequencies is assumed.3

The neural resonance model we are considering in this thesis is a gradient

frequency neural network, which comprises of a series of neural oscillators

of different natural frequencies. In this case, the effect of the higher

order terms in Equation 6 will not be negligible. Large, Almonte, and

Velasco (2010) suggest that the development of a heterogeneous frequency

oscillator network is of a critical relevance to understanding aspects of

auditory processing, such as rhythm perception. In order to capture

interactions between oscillators of different frequencies and also, responses

of an oscillator to an input that does not share closely related frequency to

3 According to Hoppensteadt and Izhikevich, oscillators of equal natural frequencies displayfunctionally significant connections while oscillators of different natural frequenciesdisplay functionally insignificant connections (see Figure 4, Hoppensteadt and Izhikevich,1996, p.120).

Page 82: Testing and analysis of a computational model of human rhythm perception

4.3 Mathematical derivation of the neural resonance model 57

its natural frequency, the higher order terms need to be expanded. Equation

7 below is the fully expanded canonical representation of a neural oscillator

(zi) that is part of a gradient frequency neural network that captures all the

necessary interactions for modelling rhythm perception.

zi = bizi + dizi|zi|2 +

kiεzi|zi|4

1− ε|zi|2+

xi(t)

1−√εxi(t)

· 1

1−√εxi(t)

(7)

We are now in a position where the variables of the above equation can be

disclosed. Parameters (bi), (di), (ki), are all complex variables and they are

defined as follows:

b = α+ iω

d = β1 + iδ1

k = β2 + iδ2

Equation 7 is two-dimensional, because z is a complex variable, having

real (Re(z)) and imaginary (Im(z)) parts. It has both real (α,β1,β2)

and imaginary (iω, iδ1, iδ2) parameters, however (ω), (δ1) and (δ2) are

all real. The parameters are (α), the bifurcation parameter, (β1) and

(β2), the non-linear saturation parameters, (ω), the eigenfrequency of

the oscillator, and (δ1) and (δ2), the frequency detuning parameters. The

bifurcation parameter (α), determines the nature of the interaction between

excitatory and inhibitory inputs (i.e., either spontaneous oscillation or

damped oscillation - Figure 4.5). The saturation parameters (β1), and

(β2) prevent amplitude from increasing indefinitely when α > 0 (Figure

4.5). The eigenfrequency (ω) is the natural frequency of the oscillator.

The frequency detuning parameters (δ1) and (δ2) describe how the

instantaneous frequency4 of the oscillator also depends on its amplitude.

4 Instantaneous frequency is dφ/dt i.e., angular velocity

Page 83: Testing and analysis of a computational model of human rhythm perception

58 4 neural resonance model

4.4 properties of neural oscillator

The remaining two sections of this chapter introduce and focus on

those properties of a neural oscillator that account for aspects of

rhythm perception. These are a) spontaneous oscillation, b) entrainment,

and c) higher order resonance, which have been previously suggested

(Large, 2008) to account for diverse phenomena, including: non-linear

resonance in the cochlea; phase locked responses of auditory neurons; and

entrainment of rhythmic responses in distributed cortical areas. Briefly,

spontaneous oscillation refers to the activity of a neural oscillator that is

not being stimulated by an external stimulus, i.e., self-powered oscillation.

Entrainment describes the activity whereby the oscillator may adjust either

its frequency or phase in the presence of some external rhythmical

stimulus. Higher order resonance captures two distinct phenomena: firstly,

interaction among multiple neural oscillators, each one with different

natural frequency, and secondly, the response of each oscillator in the

presence of a rhythmical stimulus with a pulse frequency that is not close

to the natural frequency of that oscillator. The ways the above behaviours

are mapped to the free parameters of the neural resonance model will be

described below.

4.4.1 Spontaneous oscillation

Spontaneous oscillation accounts for a dynamic state of the neural oscillator

whereby periodic activity can be achieved independently of the presence

of a rhythmic stimulus. Near an Andronov-Hopf bifurcation point,

spontaneous oscillation is one of the potential dynamical states of a neural

oscillator. Spontaneous oscillation occurs when its bifurcation parameter

Page 84: Testing and analysis of a computational model of human rhythm perception

4.4 Properties of neural oscillator 59

is positive (α > 0). Figure 4.5 below illustrates how the amplitude of the

oscillator stabilises in the spontaneous oscillation state5. In cases where

(α < 0), the oscillator displays its second dynamical state, a damped

oscillation, which means that the oscillation decays after some time (Figure

4.5). Bifurcation therefore can be thought of as the critical point whereby

the system changes from one dynamical state to another. The bifurcation

parameter responsible for controlling the transition between qualitatively

different states of the oscillator is parameter (α) and the critical point occurs

when (α = 0).

Figure 4.5: Trajectories of spontaneous and damped oscillations. For (α > 0), theamplitude of the oscillator increases up to a stable saturation point,while for (α < 0) the amplitude decays to zero.

We can see that, in the spontaneous oscillation mode, the oscillation reaches

a particular amplitude, then stabilises, and repeats itself over time. This

particular trajectory is an example of a limit cycle in dynamical systems.

This dynamical state is of particular interest because it can be used to

5 Figure 4.5 derived from (Large, 2008, p. 21)

Page 85: Testing and analysis of a computational model of human rhythm perception

60 4 neural resonance model

simulate the observed periodicity of real neural oscillators (Section 4.1).

In terms of rhythm perception, this property accounts for, and addresses,

situations where the beat is felt even in the absence of stimulus events, e.g.,

pauses in a stream of note events.

4.4.2 Entrainment

Spontaneous oscillation, by definition, is achieved independently of any

external stimulus. In the case of entrainment, an established oscillation is

coupled to an external stimulus of some frequency (f). That is the oscillator

develops a stable phase relation with the stimulus, i.e. phase locking.

A typical example of phase locking is when both the oscillator and the

external stimulus are constantly in phase, i.e., perfect synchrony. In phase

locking can be observed only when both the oscillator and the external

stimulus share the same frequency. If for example the frequency of an

oscillator was twice as fast as the frequency of the stimulus then the phase

locking would alternate between in-phase locking and anti-phase locking.

Entrainment is central to neural resonance model in that the theory asserts

that rhythm perception can be explained by the entrainment of physically

real neural oscillators in the presence of a rhythmic stimulus.

4.4.3 Higher Order Resonance

When the neural resonance model is considered without the higher order

terms (h.o.t) it is only possible to model entrainment between a stimulus

and an individual oscillator whose natural frequencies stand close to each

other or exhibit simple integer ratios such as 2:1, 3:1, 4:1 (i.e. the frequency

of the stimulus is more or less an integer multiple of the frequency of

Page 86: Testing and analysis of a computational model of human rhythm perception

4.4 Properties of neural oscillator 61

the oscillator). The response of an oscillator that is not related to the

stimulus in any of the above ways is inhibited. By contrast, in the fully

expanded canonical model (Equation 7), where the higher order terms are

fully expanded, any oscillator whose frequency stands in any low-integer

ratio (or close approximation thereof) to the stimulus pulse frequency, for

example not only the previously noted, but also 1:2 and 3:2 can resonate.

For example, Figure 4.6 6 illustrates the responses of a network of neural

oscillators in the presence of a rhythmic stimulus (a sinusoid of 2Hz).

The responses of the network occur in a way that denotes sub-harmonic,

harmonic and more complex ratios of the stimulus frequency. Apart

from illustrating higher order resonance, Figure 4.6 addresses additional

properties of a neural oscillator such as frequency detuning (e.g., high

amplitude levels in the 1:1 ratio lead to a small mismatch between stimulus

frequency and oscillator frequency), sensitivity to weak signals and high

frequency selectivity, which have been previously observed as distinctive

aspects of human rhythm perception (Large, 2008).

Figure 4.6: An example of a neural resonance model that consists of an array

of oscillators with frequencies from 0.5 Hz - 8.0 Hz. The amplituderesponses result from stimulating the model with a 2Hz sinusoid ofvarious amplitudes.

6 Figure 4.6 derived from (Large, 2008)

Page 87: Testing and analysis of a computational model of human rhythm perception

62 4 neural resonance model

Aspects of the illustrated responses of the neural resonance model to

external stimuli (Figure 4.6) play key roles in modelling the way humans

perceive rhythm. For example, the multiple responses of the model to

a ’monochromatic’ stimulus suggests how the neural resonance theory

might cast light on the ability of listeners to attend to multiple levels of the

metrical structure of a stimulus under a wide variety of task conditions (e.g.

playing an instrument). Figure 4.6 reveals that the strongest response is

found at the stimulus frequency (f = 2Hz), but responses are also observed

at harmonics (e.g., 2f, 4f) and sub-harmonics (e.g., 1f/2) as well as rational

ratios (e.g., 3f/2) of the stimulus frequency. In other words, the responses

of the oscillators in the presence of a stimulus may be able to account for

human perception of metrical structure in simple rhythmic stimuli.

A strength of the neural resonance model is the demonstration of how

higher-order interactions between oscillator and stimulus can give rise

to stable oscillations and multi-frequency patterns of oscillation. These

multi-frequency patterns of responses are very promising for testing the

neural resonance model with more complex rhythmical stimuli such as

polyrhythms, which is the main theme on this thesis.

4.5 simulation of rhythm perception with the neural

resonance model

The theory of neural resonance has been proposed quite recently (Large,

2008; Large and Snyder, 2009) but computational implementations that

utilise the idea of resonance to model human rhythm perception go back a

number of years (Large and Kolen, 1994, 1998; Large and Jones, 1999; Large,

2000; Large and Palmer, 2002; Large, 2008). With this in mind, we review

Page 88: Testing and analysis of a computational model of human rhythm perception

4.5 Simulation of rhythm perception with the neural resonance model 63

cases where predecessor models of the one used in this thesis (Large et al.,

2010) have been utilised to account for human rhythm perception using

various stimuli such as simple metronomic clicks and stimuli that exhibit

temporal variation and syncopation.

Figure 4.7 (below) illustrates an example of a resonant pattern produced

by the neural resonance model in the presence of a simple rhythmical

stimulus. In this case, the stimulus was a series of metronomic clicks with

a tempo of 60 beats per minute (bpm) presented for a 100 second period

of stimulation. In general, the resulting resonant pattern is dynamic, i.e., it

evolves during the period while the metronomic clicks stimulate the bank

of oscillators. However, the graph below corresponds to the average activity

of each individual oscillator (blue dot) over the second half of the stimulation

period (final 50 seconds). Averaging over the final part of the stimulation

period ensures that appropriate time has been given for all oscillators to

respond, and that the amplitude responses are measured only after the

oscillators have reached a steady amplitude.

We can see that the neural resonance model exhibits peak responses,

which are related in three different ways to the fundamental rhythmic

frequency of the given stimulus: harmonically; sub-harmonically; and in

a more complex fashion. In the case of Figure 4.7 a series of clicks played at

60 bpm imply a frequency of 1 Hz (i.e., one click per second). Consequently

the fundamental pulse frequency of the stimulus is 1 Hz. We can see that

there are peaks at 0.5 Hz, 1 Hz, 1.5 Hz, 2Hz, etc.

In terms of modelling human rhythm perception, some of the responses

in Figure 4.7 can be interpreted in the following way. The highest peak

in Figure 4.7, around the 1 Hz oscillator, corresponds in effect to listeners

perceiving quarter notes (using the convention that the tempo of the

clicks refers to quarter notes, i.e., ♩= 60). Similarly, the peak around

Page 89: Testing and analysis of a computational model of human rhythm perception

64 4 neural resonance model

the 2 Hz oscillator corresponds to listeners perceiving/playing eighth

notes; the peak around the 1.5 Hz oscillator corresponds to listeners

perceiving/playing triplet crochets; the peak around the 0.5 Hz oscillator

corresponds to listeners perceiving/ playing half notes, and so on.

0.25 0.38 0.50 0.75 1.00 1.50 2.00 3.00 4.00 6.00 8.00 12.01 16.000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Frequency Response

Oscillator Frequency (Hz)

Response A

mplit

ude (

|z|)

Figure 4.7: Average frequency responses of the neural resonance model after beingstimulated by a series of clicks played at 60 bpm. The frequency ofclicks is 1 Hz, i.e., one click per second. The highest peak correspondsto oscillators around the 1Hz oscillator, where the latter exhibits themaximum amplitude. Peaks related to oscillators with harmonically,subharmonically and more complexly related frequencies to the 1Hzoscillator can be observed. Some of these responses can be said toresemble the human perception of different metric levels associatedwith a given stimulus e.g., quarter notes (1 Hz), half-notes (0.5 Hz),16th notes (4 Hz).

The above interpretation of Figure 4.7, using the neural resonance model

illustrates how some of the principal aspects of rhythm perception, such

as those described in Chapter 2, can be taken into account by the theory

of neural resonance. For example, the model responds dynamically in

the presence of a stimulus, and it identifies multiple levels of periodicity

similar to those associated with human perception of beat and meter. The

Page 90: Testing and analysis of a computational model of human rhythm perception

4.5 Simulation of rhythm perception with the neural resonance model 65

metronomic clicks are nonetheless perfectly isochronous: the ability of

the model to cope with temporal fluctuations will be examined in the

following examples.

In Figure 4.7 the activated oscillators and their corresponding amplitude

responses comprise the two main features of the neural resonance model’s

behaviour as a result of being stimulated by some simple periodic stimulus.

Henceforth, for purposes of concise and clear presentation, the presentation

of such data (Figure 4.7) is referred to as the frequency response pattern of

the model.

Large and Kolen (1994) tested a model consisting of six oscillatory units

of different natural frequencies covering a range of two octaves with a

freely improvised melody performance collected in a study of musical

improvisation conducted by Large, Palmer, and Pollack (1995). The

performance displayed temporal variations at both local levels (expressive

variation) and also rather large changes in tempo (ritardando). Two

out of the six oscillators acquired stable modes when stimulated by the

above performance as a stimulus, while the rest of the oscillators never

stabilised. Based on the metrical structure of the performance as a stimulus

the two stabilised oscillators were corresponding to the periodicity levels

resembling the quarter note and the half note.

In another study (Large, 2000) investigated the response of a bank of

oscillators similar to the one above in the presence of a series of musical

excerpts of ragtime music as stimuli. Ragtime is a representative example

of a syncopated type of stimulus, thus in effect the study was testing

the ability of the model to identify a pulse in a syncopated stimulus.

Syncopation amounts to a violation of the temporal expectations embodied

in the perceived metrical structure and it can be operationalized as the

lack of events on strong beats (or sometimes the occurrence of unstressed

Page 91: Testing and analysis of a computational model of human rhythm perception

66 4 neural resonance model

events), accompanied by the placement of stressed events on weak beats (Large,

2000). A computer was used to generate musical events resembling a piano

performance in a metronomically precise manner, i.e., expressive timing

and tempo variation were restricted. This type of musical excerpt has

been previously used by (Snyder and Krumhansl, 2000) as a means of

identifying cues to pulse finding by humans tapping along with the ragtime

examples. The effect of syncopation as a cue in pulse finding was evaluated

by considering two versions of the performance. The first version was the

full performance and the second one resembled the part of the performance

that would have been played by the right hand, i.e., the syncopation in

ragtime is usually performed by the right hand, while the left holds a steady

bass rhythm. Large stimulated the model with both the full version and

the syncopated version and he compared the responses of the model with

human tapping behaviour reported by Snyder and Krumhansl (2000). In

particular, he compared two aspects of human behavior with the behavior

of the model. The first point of comparison was the mode of tapping,

i.e., tapping frequency, and the second was the time it took for people

to start tapping along with the ragtime excerpts. The human modes of

tapping were compared with the most active oscillators presented by the

model, and the time to start tapping was compared with the time it took

for the corresponding oscillator to start resonating. The outcome of the

two comparisons suggested that the simulation squares well with human

performance (Large, 2000). Additionally, the outcome of the comparisons

showed similarity between human behaviour and model response in terms

of the degree of disruption caused by the level of syncopation. More

specifically, when people were confronted only with the syncopated part

of the performance more turbulence was observed in tapping behaviour -

for example, switching from tapping the beat to tapping the off-beat - in

Page 92: Testing and analysis of a computational model of human rhythm perception

4.5 Simulation of rhythm perception with the neural resonance model 67

comparison to the full version. This behavioural finding was juxtaposed by

Large (2000) with the model’s deterioration in stabilizing to one beat in the

presence of the syncopated part of the ragtime stimulus.

Another revealing example of pulse finding using the neural resonance

model can be seen in the case of the highly syncopated son clave rhythmic

pattern (Large, 2010). In particular, in the son clave rhythm note events

occur at only half the points in time where the basic beat (tactus) is expected,

and also, notes occur often on relatively weak beats, however, the neural

resonance model identifies the underlying beat.

In this chapter, we have outlined the physiological plausibility of the

neural resonance model by describing the physiological and functional

characteristics of a particular type of population of neurons (neural

oscillator) that the model aims to simulate. In particular, the

model simulates the dynamics of these populations of neurons in the

presence of a rhythmic stimulus. According to the theory of neural

resonance, the characteristics of emerging multiple resonances give rise

to rhythm perception. We have also described how these dynamics

result from particular functional characteristics of oscillators such as

entrainment, spontaneous oscillation, and higher order resonance. An

extended mathematical description together with the level of mathematical

abstraction of the neural resonance model has been provided. The

development of the neural resonance model and its relation to various

models has been briefly considered.

In Chapter 5, we will describe the process of setting up the neural

resonance model for simulations to be run. Setting up the neural resonance

model constitutes part of the general method for conducting validation

tests with the model, preparatory to the comparison of the output of the

model with empirical human data. Chapter 5 also introduces the method

Page 93: Testing and analysis of a computational model of human rhythm perception

68 4 neural resonance model

of obtaining empirical human data and also the method of comparing

modelled with empirical data.

Page 94: Testing and analysis of a computational model of human rhythm perception

5M E T H O D O L O G Y

The main aim of this chapter is to discuss the kinds of evidence used in

this thesis, the methods of obtaining this evidence, and how such evidence

can be used to address the research question, namely to what extent can

the theory of neural resonance provide an account for human perception

of polyrhythms. In order to address this question we will introduce and

reflect on issues related to the identification and collection of modelled

and empirical data, and discuss issues related to the drawing of inferences

from comparisons of such data. In particular, the aforementioned issues

are directly related to the three different methods used in this thesis.

The first method describes the process of running simulations (tests) with

the computational model of neural resonance in order to collect modelled

data in response to selected polyrhythmic stimuli, while the second method

describes the process of obtaining empirical data derived from human

tapping behaviour in response to polyrhythms. While some relevant

human empirical data can be found from the literature, others need to be

obtained by conducting novel experiments. The third method describes the

process of comparing modelled data with empirical data. In general, there

are two types of comparisons between modelled and empirical data, each

one associated with one of the two halves of the research question. The

first type of comparison between modelled and empirical data considers

the modes of periodic tapping along with a polyrhythmic stimulus. In this

case, existing empirical data are borrowed from the study of Handel and

69

Page 95: Testing and analysis of a computational model of human rhythm perception

70 5 methodology

Oshinsky (1981) and modelled data are obtained from running simulations

with the neural resonance model, in particular, by referring to the frequency

response pattern of the model. This type of comparison is the theme of

Chapter 6 and it aims to answer the first half of the research question,

which is: To what extent can the theory of neural resonance provide an

account for existing empirical data on how humans perceive polyrhythms?

The second type of comparison considers the time it takes for people to start

tapping along with a given polyrhythmic stimulus. In this case, the model

makes a prediction based on a hypothesis suggested by Large (2000), which

is described in detail in Subection 5.2.1 of this chapter, and experiments are

conducted in order to obtain empirical data regarding the time it takes

for people to start tapping along with a polyrhythmic stimulus. This

type of comparison between predicted and empirically observed time to

start tapping is the theme of Chapter 8 and it aims to answer the second

half of the research question, which is: Can we make novel predictions

about human behaviour in polyrhythm perception based on the theory of

neural resonance and subsequently test these predictions? The studies in

Chapter 6 and 8 can be thought of as validation studies for testing the

theory of neural resonance against empirical observations in polyrhythm

perception. For that reason, the chapter begins by briefly reflecting on the

methodological approach of validation studies.

5.1 the methodology of validation

Validation encapsulates one of the fundamental methodological rules in

scientific discovery, which is the need for the testing of theories against

experience by observation. In general, validation can be thought of as the

process of exposing a theory to empirical scrutiny. The nature of inferences

Page 96: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 71

deriving from the comparison between real world and simulated data may

among other options lead to falsifying or corroborating a theory. According

to Popper (1959), corroboration refers to the process, whereby a theory

withstands detailed and severe tests and is not superseded by another

theory in the course of scientific progress, and as a result we may say that

it has ’proved its mettle’ or that it is corroborated by past experience. In

this thesis the results of the validation studies presented in Chapter 6 and

8 suggest that the theory of neural resonance is corroborated further.

In the remainder of this chapter, we introduce in detail the methods

involved in this thesis. In particular, we begin by describing the method of

acquiring modelled data by running simulations with the model of neural

resonance, and we continue with the method of obtaining empirical data

by conducting experiments. Finally, we describe the method of comparison

between modelled and empirical data, whereby responses of the neural

resonance model in the presence of polyrhythmic stimuli are compared

with empirical data of human tapping behaviour in polyrhythm perception.

5.2 simulations (tests)

In order to explore the extent to which the neural resonance theory provides

an account for empirical data on human perception of polyrhythms we

need to choose appropriate stimuli to feed into the neural resonance

model, and then collect data on how the model behaves in response to

these stimuli. This behavior can then be compared with human tapping

behavior in response to the same stimulus. Recall that the model of neural

resonance is a bank of oscillators tuned to a range of frequencies from

low to high, reflecting the hypothesized existence of a number of neural

oscillators in the brain that can potentially respond to the presence of

Page 97: Testing and analysis of a computational model of human rhythm perception

72 5 methodology

rhythmical stimuli. The computational model of the neural resonance, as

this was described in Chapter 4, has been developed by Edward Large

and his colleagues and it is implemented as a MATLAB toolbox for

research in nonlinear resonance using gradient frequency neural oscillator

networks. This model is not generally available, but we were able to

obtain it, with full documentation and source code for the purpose of this

study. Below we consider the following components related to setting

up the neural resonance model for the purposes of running the simulations:

a) Input encoding

b) Frequency range of the bank of oscillators

c) Number of oscillators

d) Duration of simulation/stimulation

e) Connectivity of oscillators and number of networks

5.2.1 Input encoding

In order to run simulations using the neural resonance model there is a

technical choice to be made regarding encoding a polyrhythmic stimulus.

There are three possible ways: functions, audio files sampling the stimulus

and MIDI files representing times of events. For the series of simulations

related to the discussion in Chapter 6 and Chapter 8 we have utilised the

bump function (a pulse generating function) to represent the polyrhythmic

stimulus. However, we have conducted additional tests using both of the

alternative methods mentioned above with no major effect in altering the

results presented here (see Chapter 7).

The choice of polyrhythmic stimuli presented to the model in this

thesis takes advantage of their link to existing empirical data related to

Page 98: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 73

polyrhythm perception, allowing direct comparison with the output of the

simulations. The study of Handel and Oshinsky (1981) is the main reference

for addressing the aforementioned issues.

In the Handel and Oshinsky study the polyrhythmic stimulus presented

to people consists of two individual pulse trains, each of which is delivered

through a different loudspeaker placed in front of the participants. There

are five different types of polyrhythms presented at ten different rates. All

polyrhythms are combinations of two pulse trains exclusively. The options

for each pulse train are a 2-pulse, a 3-pulse, a 4-pulse or a 5-pulse train. In

Chapter 6 an extensive description of the polyrhythmic stimulus is given.

The physical characteristics of the stimulus, such as the amplitude and the

pitch of each pulse train, and the presentation tempo, are experimental

variables that have been found to influence the way people perceive rhythm

(Handel and Oshinsky, 1981). For example, in some of their experiments

where the focus is on examining the influence of the tempo in polyrhythm

perception ( i.e., both the amplitude and the pitch for each pulse train

are kept the same), in the case of a really fast repetition rate, e.g., 400

msec, of a 4:3 polyrhythmic stimulus, people choose to tap along with the

coincidence of the two pulse trains every 400 msec. For slower repetition

rates e.g., 1200 msec, people choose to tap along with either of the pulse

trains. For the purposes of this thesis, there is a particular advantage of

having a pool of existing empirical data that focus exclusively on individual

physical characteristics of the stimulus that may influence the way humans

perceive rhythm in a polyrhythm. The advantage relates to the fact that

the current version of the neural resonance model is limited in taking into

account the full range of physical characteristics of a stimulus such as pitch

information. Physical characteristics such as pitch and timbre that has been

found to influence rhythm perception, e.g., human preference for attending

Page 99: Testing and analysis of a computational model of human rhythm perception

74 5 methodology

to a low-pitch pulse train than a high-pitched (Handel and Oshinsky, 1981).

From that perspective we can conduct tests with the neural resonance

model that fully resemble existing experimental set ups, i.e., we can feed

into the model the same polyrhythmic stimulus that people were listening

to and subsequently, to compare human tapping behaviour with the output

of the neural resonance model.

5.2.2 Frequency range of the bank of oscillators

The range of natural frequencies of the bank of oscillators is defined

based on the following two aspects of human rhythm perception. The

first aspect is related to the sensorimotor-synchronisation (SMS) range

of human tapping behaviour. This range helps us define the highest

and the lowest frequency of the oscillators in the bank in a way that

reflects the highest and lowest tapping frequencies empirically observed

in sensorimotor synchronisation tasks (Repp, 2005). The identification of

those limits is based on reviewing the literature (London, 2004; Snyder and

Large, 2004; Pressing et al., 1996; Repp, 2005). The second aspect is related

to the observed tapping frequencies in polyrhythm perception reported by

Handel and Oshinsky (1981), which inform us about the granularity of the

range.

The human perception of beat and meter implies that events can be

perceived as being both discrete and periodic at the same time, that is

events are perceived distinctively in a regular manner. The lower inter-onset

interval (IOI) necessary for perceiving two events as being distinct is 12.5

msec (Snyder and Large, 2004). With regard to perceiving (or tracking) a

series of events as being periodic, London (2004) suggests that the 100 msec

interval corresponds to the shortest interval we can hear or perform as an

Page 100: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 75

element of a rhythmic figure. This limit is well documented in the literature

as the lower limit for perceiving potential periodicity in successive events

(London, 2004; Snyder and Large, 2004; Pressing et al., 1996; Repp, 2005).

Regarding the upper limit for perceiving periodic events there are several

suggestions about how long it can be, and as Repp (2005) notes, it is a less

sharply defined limit. For example, London (2004, p. 27) suggests that

the longest interval (upper limit) we can "hear or perform as an element

of rhythmic figure is around 5 to 6 secs, a limit set by our capacities to

hierarchically integrate successive events into a stable pattern". Repp (2005)

quotes Fraisse’s upper limit of 1800 msec, and in a more recent study (Repp,

2008) he points out to the difficulty in providing evidence towards a sharply

defined upper limit of sensorimotor synchronisation tasks (SMS). In this

study, we are mainly interested in capturing tapping frequencies that are

observed in Handel and Oshinsky’s study and therefore, we consider the

upper limit with regard to these observations. As previously mentioned, we

focus on tapping behaviours observed in a case where the sole experimental

variable is the tempo of the polyrhythm. In such a situation, the concrete

behaviour with which it is appropriate to associate an upper limit on

periodicity is tapping along with the unit pattern of the polyrhythm for

a repetition rate of 1 second or 1000 msec.

As a result we can set the natural frequencies of the oscillators of the

neural resonance model in a way that is arbitrary, but nonetheless informed

by the following considerations:

•the limits observed in SMS studies (as noted above),

•the tapping frequencies observed in Handel and Oshinsky’s study,

•potential tapping frequencies related to plausibly perceivable repetitive

patterns in a polyrhythmic stimulus.

Page 101: Testing and analysis of a computational model of human rhythm perception

76 5 methodology

For clarity of presentation it helps to express the sensorimotor

synchronisation (SMS) range in terms of frequencies, trivially by making

use of the F=1/T formula, where F is the frequency and T is the tapping

period in seconds. For example, tapping once every 100 msec implies a

tapping frequency of 10 Hz, while tapping once every 2000 msec implies a

frequency of 0.5 Hz.

Given our focus on polyrhythms as stimuli, Table 5.1 below illustrates

an example of potential tapping frequencies associated with a 4:3

polyrhythmic stimulus for the range of presentation rates studied by

Handel and Oshinsky (1981). The repetition rate of the polyrhythmic

pattern is given in terms of seconds in the first row, and it has been

transformed in frequency terms in the subsequent rows (rounded to 2 d.p

where necessary).

Table 5.1: Potential periodic human tapping behaviours along with a 4:3polyrhythm expressed in frequencies for a series of repetition rates.

RepetitionRate in sec

2.4 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4

Pattern’sFrequency

0.42

Hz0.50

Hz0.56

Hz0.63

Hz0.71

Hz0.83

Hz1.00

Hz1.25

Hz1.67

Hz2.50

HzFundamentalof 4-pulsetrain

1.67

Hz2.00

Hz2.22

Hz2.50

Hz2.86

Hz3.33

Hz4.00

Hz5.00

Hz6.67

Hz10.00

Hz

1stsub-harmonicof 4-pulse

0.83

Hz1.00

Hz1.11

Hz1.25

Hz1.43

Hz1.67

Hz2.00

Hz2.50

Hz3.33

Hz5.00

Hz

Fundamentalof 3-pulsetrain

1.25

Hz1.50

Hz1.67

Hz1.88

Hz2.14

Hz2.50

Hz3.00

Hz3.75

Hz5.00

Hz7.50

Hz

1stsub-harmonicof 3-pulse

0.63

Hz0.75

Hz0.83

Hz0.94

Hz1.07

Hz1.25

Hz1.50

Hz1.88

Hz2.50

Hz3.75

Hz

Page 102: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 77

Once the frequency range to be incorporated into the neural resonance

model is decided in principle, e.g., 0.42 Hz to 10 Hz according to the

minimum and maximum values of Table 5.1, this decision needs to

be executed within the constraints of the controls of the software that

implements the neural resonance model. In practice, the current version

of the software requires the selection of a central frequency and an even

number of octaves. The octaves are added symmetrically on either side

of the central frequency. So for example if we want a frequency range that

covers a tapping frequency range between 0.42 Hz and 10 Hz we can set the

central frequency of the bank at 2Hz and then we could add three octaves

on either side, i.e., six octaves in total. In this case, the frequency range will

be from 0.25 Hz to 16 Hz.

5.2.3 Number of oscillators

One can define the granularity of the frequency range by making sure

that for each empirically observed or potential tapping frequency there is

an oscillator with a closely related natural frequency. In fact, maximum

resonance occurs for particular ratios such as 1:1, 2:1, 3:2, etc, thus, we have

to make sure that the oscillators are packed tightly enough in frequency

space in order to match the frequencies defined by the above ratios. For

example, for a 4:3 polyrhythm repeating once per second, the frequency

of the 4-pulse train is approximately 6.66 Hz. As a result, the bank of

oscillators should contain oscillators with frequencies such as 6.66 Hz,

2∗6.66Hz, (3/2)∗6.66Hz and so on. A large number of oscillators per octave

such as 128 provide a sufficient level of granularity. We have also tried 256

oscillators per octave with no observed difference in the way the neural

resonance model responds. As a result the total number of oscillators is

Page 103: Testing and analysis of a computational model of human rhythm perception

78 5 methodology

768 inasmuch as there are 6 octaves and each octave accommodates 128

oscillators.

5.2.4 Duration of simulation

The duration of simulation of the model should be long enough in order

for the oscillators to reach a steady state. One way that steady states can

be determined is by obtaining spectrograms covering the simulation period

and identifying a point in time at which the amplitude responses of the

oscillators appear to be stable. Figure 5.1 below illustrates a simulation

period of 7 seconds using a 4:3 polyrhythm which is repeated once every

second. The y-axis represents the frequency range of the oscillators,

i.e., 0.25 Hz to 16 Hz. The redness indicates amplitude levels and in

effect shows which oscillators resonate the most in the presence of this

particular polyrhythmic stimulus. For example the oscillators with 3Hz

and 4Hz natural frequencies exhibit the highest amplitude (at least for the

illustrated time period), followed by oscillators, which are harmonically

and subharmonically related to the 3 Hz and 4 Hz frequencies. The

x-axis represents time, which manifests the evolution of the dynamics of

the system over the duration of the simulation. This graph allows us to

illustrate the importance of allowing sufficient time for the system to reach

a steady state and also the importance of averaging the activity only after a

steady state has been reached. For example, if we had only considered the

first two seconds of stimulation and averaged over 100% of that duration we

would have observed only dynamics over the transient phase of the system.

In that case important information about the sub-harmonic responses of

the system with regard to the fundamental frequencies of the stimulus may

have been lost, which is a crucial piece of information for the arguments

Page 104: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 79

made in this thesis. From that perspective, a duration of 12 seconds for

stimulating the neural resonance model with a polyrhythmic stimulus is

sufficient to obtain a frequency response pattern whereby the resonant

oscillators are in steady state.

Figure 5.1: Amplitude response of oscillators to a 4:3 polyrhythmic stimulus. The

polyrhythmic pattern repeats once every second over a period of 7

seconds.

An interesting aspect with regard to the time it takes for the oscillators

to reach a steady state is the potential association of that time with the

time it takes for people to start tapping along with a polyrhythm. A

hypothesis has been suggested and tested by Large (2000) whereby the

time needed for oscillators to reach a steady state squares well with the

time it takes for people to start tapping along with a rhythmic stimulus,

and as a result it is tested in this thesis with respect to polyrhythms. In

the Handel and Oshinsky study the polyrhythmic stimuli were initially

presented for 15 seconds, followed by a 5 seconds silent pause after which

the participants were asked to start tapping. However, the exact time it

Page 105: Testing and analysis of a computational model of human rhythm perception

80 5 methodology

takes before participants start tapping is not documented in that paper and

further research did not result in identifying any other paper that explores

this aspect of human behaviour. The hypothesis formulated by Large (2000)

in combination to the fact that Handel and Oshinsky (1981) did not consider

the time it takes for people to start tapping along with a polyrhythm as an

experimental variable opens up an opportunity for making new predictions

about human tapping behaviour based on the model’s outputs which can

be tested by conducting novel experiments with people tapping along with

polyrhythmic stimuli (Chapter 8).

5.2.5 Connectivity and number of networks

In principle, we can have the oscillators of the bank interconnected to each

other in order to form a network of oscillators, which is believed to be

a more accurate representation of the underlying connectedness of neural

populations in the auditory nervous system (Hoppensteadt and Izhikevich,

1996a). However, in this particular series of simulations, partly for reasons

of simplicity, we examine the individual responses of each of the oscillators

to the external polyrhythmic stimulus, i.e., there is no internal coupling

among the oscillators. This chosen arrangement "focuses the response of

the network to the external input" (Large et al., 2010, p. 4). As a further

choice in setting up the model, we could choose to have more than one bank

of oscillators. One argument in favour of assuming more than one bank of

interconnected oscillators (network of oscillators) is related to the fact that

processing of any auditory stimulus takes place in more than one area in

the brain, thus utilising more than one network would be more plausible.

As Velasco suggests (personal communication), two linked networks might

be small patches of tissue in the same local brain area, or they could be

Page 106: Testing and analysis of a computational model of human rhythm perception

5.2 Simulations (Tests) 81

two areas that are far away from each other, for instance, primary auditory

cortex and supplementary motor area (SMA).

Taking into account the above considerations, in the conducted

simulations we have employed a single network of non-interconnected

oscillators, which is the simplest option for starting to test the neural

resonance model. In effect, the simplified version of the model puts

the focus on its non-linear resonance feature, i.e., nonlinear coupling

between stimulus and oscillators. Further investigations involving both

interconnections and more than one network are reserved for future work.

5.2.6 Numerical solvers and nominal values of parameters

The output of the simulations is based on solving numerically a series of

128 ∗ 6 = 768 differential equations, each one describing a neural oscillator

with a unique natural frequency within the 0.25 Hz to 16 Hz range, in the

presence of a particular polyrhythm, i.e., type and presentation rate, over

a particular duration (simulation time), and a set of nominal numerical

values of the parameters that define the dynamic mode of the oscillator.

The nominal values are defined on the basis of certain criteria such

as the fact that we are interested in the dynamics of an oscillator near

an Andronov-Hopf bifurcation point. However, the selection of certain

numerical solvers and numerical values for running the simulations raises

questions regarding the stability and consistency of the model in producing

certain results. Additionally, the derivation of Equation 7 in Chapter 4 is

a canonical representation of a neural oscillator as a dynamic system and

as a result it is more abstracted at least in comparison to the Wilson and

Cowan (1972) model with its extended range of physiological parameters.

The above concerns can be thought of as sources of uncertainty in the

Page 107: Testing and analysis of a computational model of human rhythm perception

82 5 methodology

modelling process in general and in the observed outputs of the neural

resonance model in particular. For that reason the notion of validation can

be extended to include additional techniques, such as parameter sensitivity

analysis and cross-model comparison that can help with limiting the degree

of uncertainty. The above issues are addressed in Chapter 7 - Model

Analysis. Additionally, detailed discussion on the nominal values of the

parameters is also reserved for Chapter 7.

5.3 experimental method

This section is devoted in describing the experimental method of this

thesis. Briefly, the method refers to conducting experiments with people

listening to polyrhythms and monitoring tapping behaviours. The reason

for conducting novel experiments with people listening to polyrhythms has

been briefly touched upon in Subsection 5.2.4 of this chapter. For example,

in discussing how to define the duration of the simulation an observation

was made whereby the time it takes for people to start tapping along with

a polyrhythm can be considered as an experimental variable itself. Also,

results from the first study, which are presented in Chapter 6, show that the

model predicts a wider range of tapping behaviours compared to the one

reported by Handel and Oshinsky (1981), therefore an additional reason for

conducting experiments with people has been identified. Lastly, conducting

experiments with people listening to polyrhythms is a good opportunity to

test for reproducibility of the data provided by Handel and Oshinsky.

Page 108: Testing and analysis of a computational model of human rhythm perception

5.3 Experimental method 83

5.3.1 Participants

There were a total of 37 participants. They were recruited across faculties

at the Open University, UK, and they represented a wide range of musical

training and ages. No musical experience or expertise was required. The

only requirement to participate in the study was to have a general interest

or habit in tapping to music e.g., tapping on the steering wheel while

listening to music in the car. The musical background was recorded

by asking the participants to fill in a personal data form prior to the

experiment. No distinction related to age, sex or musical background was

made in the analysis presented in Chapter 8, however we have collected

such information. The study was approved by the Open University’s

Human Research Ethics Committee (HREC) and covered by the Open

University’s insurance indemnity scheme.

5.3.2 Polyrhythmic stimulus and presentation rates

The experimental set up is similar to the one described by Handel and

Oshinsky (1981). Five types of polyrhythms at ten different rates formed

the 50 conditions that were presented to people. The types of polyrhythms

are 3:2, 4:3, 5:2, 5:3 and 5:4 and the presentation rates in terms of

seconds/pattern repetition are 2.4, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 0.8, 0.6, and

0.4 sec. Each pulse train is conveyed by a metronomic click of 30 msec

duration, thus the two pulse trains sound identical.

Page 109: Testing and analysis of a computational model of human rhythm perception

84 5 methodology

5.3.3 Experimental task

The participants were asked to listen to the presented polyrhythm and start

tapping the perceived beat as soon as they felt it. The task of tapping

the beat was distinguished from shadowing or copying the pattern, i.e.,

participants were asked to tap in a steady pulse-like manner. We have

emphasised to the participants to start tapping as soon as they feel the

beat in order to be as close as possible to the time it takes for them to

start perceiving a beat. Also, they were instructed to retain a single beat

- once it was chosen - until the end of the trial. In situations where an

odd pulse is presented and people feel like tapping along with every other

element of the odd pulse, it consequently forces them into a tapping mode

that alternates between in phase and out of phase locking with the unit

pattern, which is likely to make them feel like changing beat reference.

Additionally, participants were asked to start tapping using their finger

by pressing the spacebar on a QWERTY keyboard, and to restrain from

start tapping with their foot and then copy this with their finger. On each

trial the participants were responsible for initiating the presentation of the

polyrhythmic stimulus on demand - with a small delay to provide them

time to get ready. Participants were asked to tap for approximately 7-10

times before they could stop the current trial and move on to the next

one, or to keep tapping until they feel that their perception of beat was

settled. When the polyrhythm is first heard a timer begins. Each time the

participant presses the spacebar the current value of the timer in msec is

recorded. At the end of the trial the ID of the polyrhythm together with the

timing data described above constitute the data associated with one trial or

condition. The inter-tap-time in combination with the ID, provide sufficient

information to derive which beat was chosen by a participant. For example,

Page 110: Testing and analysis of a computational model of human rhythm perception

5.3 Experimental method 85

consider one condition where a 4:3 polyrhythm is presented at the rate of

one repetition per second. Then if the inter-tap-time data is around 250

msec it means that the participant has chosen to tap along with the 4-pulse

train. Also, the first tap is taken to estimate the time it takes for participants

to perceive a beat in a given polyrhythm. The same process was repeated

for all 50 conditions, and in combination with the 37 participants a pool of

1850 trials was obtained. For the above experimental design we used the

programming language Max/MSP to implement a patch that would carry

out the task.

5.3.4 Stimulus presentation

The experimental session took place in a room with just the experimenter

and participant present. Participants were seated in front of two adjacent

speakers, each of which presented one pulse train. All patterns were

presented at a comfortable listening level. The order of presentation of

each condition was random, as was the presentation of each pulse from the

speakers.

5.3.5 Method of comparing modelled with empirical data

In the last part of this chapter, the nature of comparison between modelled

and empirical data is discussed. In particular, there are two aspects of

human tapping behaviour in polyrhythm perception that are compared

with the responses of the neural resonance model in the presence of

polyrhythmic stimuli. The first aspect of human tapping behaviour in

polyrhythm perception is the range of periodic tapping modes, i.e., tapping

frequencies, that people exhibit when confronted by a polyrhythmic

Page 111: Testing and analysis of a computational model of human rhythm perception

86 5 methodology

stimulus. The second aspect is the time it takes for people to start tapping

along with a polyrhythmic stimulus.

In the first case the comparison between human tapping modes and

responses of the neural resonance model in the presence of a commonly

shared polyrhythmic stimulus is conducted on the following basis. The

human tapping modes are expressed in terms of tapping frequencies (see

for example Table 5.1 - Subsection 5.2.2) for different types of polyrhythms

and different presentation rates. Next, the neural resonance model is

stimulated with a polyrhythmic stimulus from the same range as the

one presented to people and a frequency response pattern emerges. The

frequency response pattern is the average amplitude response of each

oscillator in the presence of a polyrhythmic stimulus over the last 20%

of the total duration of the simulation. The reason for averaging over

the last 20% percent is due to the need of observing the dynamics of

the oscillators once a steady state is reached, i.e., ignoring the transient

stage, as has already been described. The comparison between empirical

and modelled data is then based on identifying the extent at which the

resonances of the frequency response pattern of the model match human

tapping frequencies. An example of such comparison was provided in

Chapter 4, whereby the frequency response pattern of the neural resonance

model in the presence of a series of metronomic clicks has been matched

with the way humans perform various levels of the metrical structure such

as eighth, quarter, half and sixteenth notes. Chapter 6 is devoted in the

above type of comparison for all types and tempi of polyrhythms presented

in the Handel and Oshinsky study. In particular, the types of polyrhythms

that are fed into the neural resonance model are 3:2, 4:3, 2:5, 3:5, 4:5 and the

repetition rates in sec are 2.4, 2.0, 1.8, 1.6, 1.4, 1.2, 1.0, 0.8, 0.6 and 0.4. The

Page 112: Testing and analysis of a computational model of human rhythm perception

5.3 Experimental method 87

obtained frequency response patterns of the model are then compared with

the human tapping frequencies reported by Handel and Oshinsky (1981).

In the second case, the comparison of human behaviour and model

responses is made by comparing the time it takes for people to start

tapping along with a polyrhythm and the time it takes for an oscillator

to reach relaxation time. This comparison is based on a hypothesis made

by Large (2000) who observed that the time it takes for humans to start

tapping along with a stimulus squares well with the relaxation time of an

oscillator. Chapter 8 is fully devoted in presenting the results of the above

comparison. Empirical data for this comparison are obtained by conducting

novel experiments following the method described in Section 5.2.2. The

modelled data are obtained from the amplitude response graphs of the

bank of oscillators (see Figure 5.1) for different types of polyrhythms and

presentation rates.

5.3.6 Justification of selected method of comparison

We should note that the neural resonance model simulates populations

of neurons and their dynamics in the presence of a rhythmical stimulus

over time. From that perspective, the empirical data should ideally be

collected from monitoring electrical activity of the brain over time by

using brain-imaging techniques, and in particular, imaging of the brain

while it is confronted by a polyrhythmic stimulus. However, obtaining

this kind of empirical data requires identification and spatial mapping of

each single neural oscillator independently, which is beyond the reach

of this thesis. Identifying individual oscillators that behave in the way

the neural resonance theory hypothesises would have provided strong

Page 113: Testing and analysis of a computational model of human rhythm perception

88 5 methodology

empirical evidence in support of the theory, but this type of evidence it

is currently impractical to obtain.

In fact, most theories that attempt to explain human behaviours do not

involve descriptions at the level of neuronal activity. Instead, the approach

is to explain how brain dynamics might implement abstract computational

mechanisms required by cognitive theories, e.g., (Jackendoff, 2003). The

theory of neural resonance on the other hand can be thought of as a

behaviour-level theory according to which musical behaviour may not

require postulation of abstract computational mechanisms, but may be

explainable directly in terms of neurodynamics (Large et al., 2010). From

that perspective, a comparison that involves simulated brain dynamics

of the neural resonance model with empirical data related to tapping

behaviours seems to be reasonable. Indeed, a study by Pollok, Gross,

Müller, Aschersleben, and Schnitzler (2005) has shown that when people

are asked to tap along with a periodic stimulus, activity in the brain that

shares the same frequency with the tapping frequency can be observed.

Additionally, brain activity with frequency that corresponds to the second

and third harmonic of the tapping frequency has also been identified.

Lastly, in Chapter 4, a number of existing validation studies of the

neural resonance model were described in which the standard technique

is to compare modelled brain dynamics with human tapping behaviours

observed in rhythm perception experiments, which is exactly the same

method of comparison as the one utilised in this thesis.

Page 114: Testing and analysis of a computational model of human rhythm perception

6VA L I D AT I O N O F T H E M O D E L A G A I N S T E X I S T I N G

E M P I R I C A L D ATA

In this chapter we examine the extent to which the behaviour of the neural

resonance model in the presence of polyrhythmic stimuli can account

for polyrhythm perception as this is manifested by tapping behaviours

reported in the empirical study of Handel and Oshinsky (1981). In

particular, we express regular tapping behaviours by people in terms of

tapping frequencies, which can then be compared with the frequency

response pattern of the neural resonance model. The neural resonance

model was stimulated with similar polyrhythmic stimuli to those used by

Handel and Oshinsky (1981) in their experiments with human subjects. In

particular, Handel and Oshinsky used polyrhythmic stimuli comprising of

two pulse trains each one perfectly isochronous, i.e., completely regular.

Furthermore, the model was stimulated with the same type of

polyrhythms, but this time the pulse trains exhibit temporal fluctuations

with respect to perfect isochrony. This is to have a form of stimulus that

resembles more the context within actual music is performed and perceived.

Two different stimuli were used to stimulate the neural resonance model.

In the first one, a musician performs a polyrhythm in the presence of a

metronomic click, thus tempo variation is restricted but local deviations

from perfect isochrony are expected. In the second case, the polyrhythm

is performed in the absence of a metronomic click (after four initial clicks),

which allows for tempo variation over time as well as local deviations from

89

Page 115: Testing and analysis of a computational model of human rhythm perception

90 6 validation of the model against existing empirical data

perfect isochrony. The last two cases aim to test the model with stimuli that

resemble the way actual polyrhythmic music is performed.

Lastly, we explore whether the behaviour of the model can provide

further insight with regard to the formation of human tapping preferences

as a function of absolute tempo (Handel and Oshinsky, 1981). In

particular, we examine whether different tempi of a particular polyrhythm

influence the way the model behaves and whether an assumed difference

in the behaviour of the model can be somehow linked to the patterns of

preferences empirically observed.

The chapter begins by restating the categories of human tapping

behaviours in polyrhythm perception observed in the Handel and Oshinsky

study (see Chapter 2), which provides the reference point for making

comparisons between empirical and modelled data. Several different

polyrhythms over a range of tempi are featured in Handel and Oshinsky’s

work, and also in the present research. It is useful for purposes of

exposition, and in order to avoid confusion, to select one representative

polyrhythm to use as a workhorse example. We have chosen the 4:3

polyrhythm with 1 Hz frequency repetition as a good representative for

the family of polyrhythms consisting of two pulse trains (3:2, 2:5, 3:5, 4:5).

The output of the neural resonance model is then presented and the extent

to which human tapping behaviour can be understood on the basis of that

output is discussed.

6.1 preparing to compare human with model behaviour

In Chapter 2 four categories were identified to describe the way people

perceive beat and/or meter in polyrhythms. In Handel and Oshinsky’s

Page 116: Testing and analysis of a computational model of human rhythm perception

6.1 Preparing to compare human with model behaviour 91

words (1981, p.2):

"Polyrhythms permit many possible meter organisations: (1) a

pulse train can define the meter, achieved by tapping each event

in that pulse train; (2) a pattern repetition can define the meter,

achieved by tapping each co-occurence of the 2-pulse trains; and

(3) alternative elements of a pulse train can define the meter,

achieved, for example by tapping every second element of a

4-pulse train" .

For example, given a 4:3 polyrhythm the general categories identified by

Handel and Oshinsky with regard to how people perceive beat and meter

in polyrhythms are as follows.

1. Tapping along with the coincidence of the two pulse trains

2. Tapping along with the 4-pulse train

3. Tapping along with every other element of the 4-pulse train

4. Tapping along with the 3-pulse train

5. Tapping along with every other element of the 3-pulse train

As previously noted, in order to facilitate comparison between the

tapping behaviours listed above and the frequency response pattern of the

neural resonance model in the presence of some rhythmic stimulus it is

useful to express the former in frequency terms. Recall that the way the

model responds in the presence of a rhythmical stimulus is to resonate at

frequencies related to the metrical structure of the stimulus (see Chapter

4, 4.5). Therefore by expressing tapping behaviours in frequency terms we

can make a direct comparison between the frequency response pattern of

the neural resonance model and the tapping frequencies for a commonly

Page 117: Testing and analysis of a computational model of human rhythm perception

92 6 validation of the model against existing empirical data

shared type of polyrhythmic stimulus, e.g., a 4:3 polyrhythm with 1 Hz

repetition frequency.

Table 6.1 below expresses in frequency terms the human tapping

behaviours listed above for a case where the 4:3 polyrhythmic pattern is

repeated once per second1.

Table 6.1: Periodic tapping behaviours along with a 4:3 polyrhythmic stimulusexpressed in terms of frequencies, when the repetition frequency of thepolyrhythmic pattern is 1 Hz.

1. Tapping along with the 4-pulse train. 4 Hz2. Tapping along with the 3-pulse train. 3 Hz3. Tapping along with the co-occurrence of the two pulses. 1 Hz4. Tapping along with every second element of the 4-pulse train. 2 Hz5. Tapping along with every second element of the 3-pulse train. 1.5 Hz

Before we move on to consider the way the neural resonance model

responds to the presence of such a stimulus it is worth noting and briefly

analysing the relation of the tapping behaviours to the constituent pulse

trains. On the one hand, we should note that in the above example

the polyrhythm consists of two pulse trains exclusively, with rhythmic

frequencies of 4 Hz and 3 Hz. On the other hand, the final three tapping

behaviours noted in Table 6.1, cases 3 to 5, imply a tapping behaviour

that is subharmonically related to the fundamental frequencies of the

two physically presented pulse trains. This kind of relation represents

an important characteristic of human rhythm perception, which is the

ability of people to perceptually construct a rhythm that is subharmonically

related to a presented rhythm. In fact, this relation indicates the ability of

1 In Chapter 5, tapping behaviours were interpreted in terms of frequencies for the wholerange of polyrhythmic pattern durations in the Handel and Oshinsky study.

Page 118: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 93

people to perceive metrical structure in a rhythmic stimulus, as discussed in

detail in Chapter 2 (see Section 2.3). There it was further suggested that any

model that aims to simulate rhythm perception should be able to account

for this.

6.2 simulations with isochronous polyrhythmic stimuli

Our next step is to apply polyrhythmic stimuli to the neural resonance

model. As previously noted, for purposes of exposition we have so far

focused on a 4:3 polyrhythm presented at 1 Hz, but in fact Handel and

Oshinsky used a 5 different types of polyrhythms, and presented these

stimuli to human subjects at a range of ten different tempi. A similar

range of stimuli and a similar range of tempi to those used by Handel

and Oshinsky were applied to the neural resonance. More generally, the

polyrhythmic stimuli were structurally regular, i.e., individual pulse trains

are isochronous.

6.2.1 Simulation with a 4:3 polyrhythmic stimulus repeating once per second

The duration of the simulation was decided on the basis of allowing

enough time for the system (bank of oscillators) to reach a steady state. In

this simulation the period is 12 seconds, which was decided by observing

the amplitude response of the bank of oscillators (see Chapter 5, Subsection

5.2.1d). Figure 6.1 is a time series representation of a 4:3 polyrhythmic

stimulus repeating once per second composed of two pulses with rhythmic

frequencies of 4 Hz and 3 Hz respectively. The stimulus was created using

the ’bump’ function.

Page 119: Testing and analysis of a computational model of human rhythm perception

94 6 validation of the model against existing empirical data

Figure 6.1: Time series representation of the 4:3 polyrhythmic stimulus. The

stimulus is a combination of two pulses of 4 Hz and 3 Hz respectively.The graph represents two cycles of completion of the polyrhythmicpattern.

Figure 6.2 below shows the frequency response pattern of the neural

resonance model in the presence of the 4:3 polyrhythmic stimulus with 1

Hz presentation rate. The frequency response pattern shown is averaged

over the last 20% of the stimulation period, i.e., 2.4 seconds.

Each amplitude peak in the frequency response graph results from the

activation of several oscillators (each dot represents one of 768 oscillators).

However, in each peak there is generally one particular oscillator that

exhibits a maximal amplitude response and we refer to it as the main

oscillator. For ease of comparison, in Figure 6.2 some amplitude peaks have

been labelled in bold type to indicate that they correspond to the categories

of human tapping frequencies described by Handel and Oshinsky. These

main oscillators are listed below:

An oscillator with 3 Hz natural frequency

An oscillator with 4 Hz natural frequency

Page 120: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 95

An oscillator with 1 Hz natural frequency

An oscillator with 2 Hz natural frequency

An oscillator with 1.5 Hz natural frequency

In addition to the peaks that correspond to human observed tapping

frequencies, there are other peaks. These peaks generally correspond to

oscillators with natural frequencies harmonically related to the frequencies

of the main oscillators noted above. For example, in Figure 6.2 below we

note peaks corresponding to the 2nd and 3

rd harmonics of the 3 Hz and 4

Hz oscillators, and also, peaks corresponding to the 5th and 7

th harmonics

of the 1 Hz oscillator. However, these peaks (grey-labelled) do not appear

to correspond directly to any human behaviour as these are expressed in

terms of categories in the Handel and Oshinsky study. Despite this, some

of these harmonic responses, such as 6 Hz and 8 Hz are attainable tapping

frequencies by people according to the literature review on the limits of

sensorimotor synchronisation studies (see Chapter 5, Subsection 5.2.2).

This kind of observation opens up an opportunity for repeating

experiments similar to the Handel and Oshinsky study to refine our

understanding about the range of tapping behaviour in polyrhythms

further. The results of these empirical studies are presented in Chapter

8.

Page 121: Testing and analysis of a computational model of human rhythm perception

96 6 validation of the model against existing empirical data

Figure 6.2: Frequency response of the neural resonance model in the presence

of a 4:3 polyrhythm. The figure illustrates the average amplitudeover the last 20% of the stimulation time. The main oscillators inthe black labelled peaks represent oscillators with natural frequenciesmatching human tapping frequencies (e.g., fundamental of polyrhythm,1st sub-harmonic of the 3-pulse train, 1

st sub-harmonic of the 4-pulsetrain or 2

nd harmonic of the polyrhythm, fundamental of the 3-pulseand 4-pulse train). The grey-labelled peaks are formed by oscillatorswith natural frequencies that correspond to harmonics of both pulsetrains (e.g., 2

nd and 3rd harmonics of both 4-pulse and 3-pulse train,

and harmonics of the polyrhythm’s fundamental frequency), and moreodd overtones.

6.2.2 Discussion on simulation

We have already mentioned that the output of the simulation accounts for

all the categories of human tapping frequencies observed in the Handel and

Oshinsky study. In particular, with regard to the five tapping behaviours

Page 122: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 97

related to a 4:3 polyrhythmic stimulus (the five cases in Table 6.1), the

neural resonance model produces an output that matches all of them.

As mentioned earlier, it is interesting that the output of the neural

resonance model includes resonances that are subharmonically related to

the fundamental frequencies of the two pulse trains of the polyrhythmic

stimulus. There is no pulse train explicitly present in the initial 4:3

polyrhythmic stimulus that matches any of these subharmonic responses,

i.e., the polyrhythmic stimulus is solely comprised of two pulses of 4 Hz

and 3 Hz. Mathematically, these subharmonic responses by the neural

resonance model can be attributed to the expansion of the higher order

terms of the canonical formulae described in Chapter 4. In particular,

this expansion leads to resonant monomials that correspond to harmonics,

subharmonics, and higher order combinations of the input frequencies

(Large et al., 2010). It is instructive to compare the frequency response of

the neural resonance model with a bank of linear oscillators stimulated with

the same 4:3 polyrhythmic stimulus. Figure 6.3 below shows the response

of a bank of linear oscillators.

Page 123: Testing and analysis of a computational model of human rhythm perception

98 6 validation of the model against existing empirical data

Figure 6.3: Response of a series of linear oscillators in the presence of a 4:3

polyrhythmic stimulus that repeats once per second. The polyrhythmconsists of two pulses 4 Hz and 3 Hz.

In comparison to the human tapping behaviours (Table 6.1) and to the

output of the neural resonance model (Figure 6.2), it is worth noting

that there is no response at all corresponding to any of the subharmonic

responses. This bank of linear oscillators is essentially the neural resonance

model with its nonlinear parameter (ε) set to zero. Thus, the subharmonic

responses in the output of the neural resonance model may be attributed to

the non-linear coupling between the stimulus and the oscillators.

6.2.3 Simulations using the 4:3 polyrhythm at different tempi

By considering a single polyrhythm at a single tempo, we have already

been able to demonstrate that the neural resonance model can account

for tapping behaviours that are not accounted for by any simple linear

Page 124: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 99

resonator model. However, the available empirical data on human tapping

behaviour from Handel and Oshinsky extends much further, since it

encompasses different polyrhythms at different tempi. Consequently, the

next step is to consider what happens when the neural resonance model is

stimulated with the 4:3 polyrhythm, at different tempi.

The following graphs illustrate the frequency response pattern of the

neural resonance model in the presence of a 4:3 polyrhythmic stimulus at

all the remaining nine repetition frequencies of the Handel and Oshinsky

study. The frequency response pattern of the model is in essence the same

for all different repetition frequencies, apart from being shifted gradually to

the right on the x-axis. As a result, the discussion regarding the frequency

response pattern of the model in the presence of a 4:3 polyrhythm repeating

once per second applies for all different repetition frequencies presented

below. The appropriate duration, for all different tempi, of the application

of the stimulus before taking the average frequency response over the last

20% of the stimulation duration was calculated on the basis of allowing

sufficient time for the oscillators to reach relaxation (steady state). Given

that the lower the frequency of an oscillator the longer it takes for it to

relax, polyrhythmic stimuli with slow repetition frequencies will need to

stimulate the bank of oscillators for longer periods than stimuli with fast

repetition frequencies.

Figure 6.4 shows the frequency response pattern resulting from a 4:3

polyrhythmic pattern stimulus that was repeated once every 2400 msec for

approximately 28.8 seconds. The two pulse trains comprising the actual

polyrhythmic stimulus presented to the model have rhythmic frequencies

of 1.25 Hz (3-pulse train) and 1.67 Hz (4-pulse train). For the other tempi

used in presenting the current polyrhythm, information about the tempo

and the rhythmic frequencies of the pulse trains comprising the stimulus

Page 125: Testing and analysis of a computational model of human rhythm perception

100 6 validation of the model against existing empirical data

are provided exclusively in the captions of the respective figures (Figures

6.4 - 6.12).

Figure 6.4: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 2400 msec. The polyrhythmic stimulus consists ofa 1.25 Hz 3-pulse train and a 1.67 Hz 4-pulse train.

Page 126: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 101

Figure 6.5: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 2000 msec. The polyrhythmic stimulus consists ofa 1.5 Hz 3-pulse train and a 2 Hz 4-pulse train.

Figure 6.6: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 1800 msec. The stimulus consists of a 1.67 Hz3-pulse train and a 2.22 Hz 4-pulse train.

Page 127: Testing and analysis of a computational model of human rhythm perception

102 6 validation of the model against existing empirical data

Figure 6.7: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 1600 msec. The stimulus consists of a 1.88 Hz3-pulse train and a 2.5 Hz 4-pulse train.

Figure 6.8: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 1400 msec. The stimulus consists of a 2.14 Hz3-pulse train and a 2.86 Hz 4-pulse train.

Page 128: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 103

Figure 6.9: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 1200 msec. The stimulus consists of a 2.5 Hz3-pulse train and a 3.33 Hz 4-pulse train.

Page 129: Testing and analysis of a computational model of human rhythm perception

104 6 validation of the model against existing empirical data

Figure 6.10: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 800 msec. The stimulus consists of a 3.75 Hz3-pulse train and a 5 Hz 4-pulse train.

Figure 6.11: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 600 msec. The stimulus consists of a 5 Hz 3-pulsetrain and a 6.67 Hz 4-pulse train.

Page 130: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 105

Figure 6.12: Frequency response of the model in the presence of a 4:3 polyrhythm

repeating once every 400 msec. The stimulus consists of a 7.5 Hz3-pulse train and a 10 Hz 4-pulse train.

We have now stimulated the model with a single polyrhythm at a range

of tempi. The neural resonance model has been able to account for the

range of tapping behaviours empirically observed (Handel and Oshinsky,

1981). Consideration of new tempi has not so far unearthed any new

phenomena to be accounted for. In the beginning of the chapter we

mentioned that we are interested in examining whether different tempi of

a particular polyrhythm influence the way the model behaves and whether

an assumed difference in the behaviour of the model can be somehow

linked to the patterns of preferences empirically observed by Handel and

Oshinsky (1981). Empirical data suggest that a rhythmic figure is perceived

differently through transpositions in time. In the case of polyrhythms,

the formation of human preferences in perceiving a regular beat along

with a polyrhythm and their relative changes through transposition of

the polyrhythm in time may be seen as a manifestation of this dynamic

Page 131: Testing and analysis of a computational model of human rhythm perception

106 6 validation of the model against existing empirical data

perception. Also, a brain imaging study by Iversen, Repp, and Patel

(2009), showed that the interpretation of a rhythm in a particular way was

reflected in the magnitude of the amplitude responses of brain activity. In

particular, Iversen et al. tested whether voluntary metrical interpretation

modulates brain responses. The stimulus consisted of a repeating sequence

of two tones followed by a rest (symbolised TT0). The two tones were

identical. In different trials, participants were instructed to adopt one of

two metrical interpretations of the TT0 phrase by mentally placing the

beat on either the first or the second tone. By comparing the event related

potentials associated to the presence of each tone they found that amplitude

response in the brain was stronger (by 64% on average) when a tone was

chosen to represent the beat, thus preference to one specific beat produced

enhanced amplitude response in the brain. As a result, a simple hypothesis

we wanted to test was to associate the relative preferences for different

beats in a polyrhythm with the magnitude of the amplitude response

of the corresponding oscillators in the model. For example, the largest

(human) preference for a particular way of tapping would be reflected to the

strongest amplitude response of a corresponding oscillator in the frequency

response pattern of the model.

However, the frequency response pattern remains invariant for different

tempi apart for some small amplitude fluctuations in the subharmonic

responses with respect to the input frequencies. We have repeated the

same tests with a different form of stimulus encoding (square waves)

and the frequency response pattern remained completely invariant over

different tempi. The reason we have not selected square waves to represent

our polyrhythmic stimuli is to avoid a theoretical limitation that follows

from the use of square waves for encoding the pulse trains. Namely, the

decomposition of a square wave into an infinite set of simple oscillating

Page 132: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 107

functions only contains components of odd integer harmonic frequencies.

The bump function, by contrast avoids this limitation by also containing

even harmonics of the fundamental frequencies. In Chapter 7 we include

superimposed frequency response patterns for the different forms of

stimulus encoding. The next step is to consider what happens when the

neural resonance model is stimulated with polyrhythms other than 4:3.

In particular, we now consider the frequency response patterns of

the neural resonance model in the presence of the remainder of the

polyrhythms used by Handel and Oshinsky in their study. These are the

2:3, 2:5, 3:5, and 4:5 polyrhythms. Note that, having observed previously

that that tempo does not alter the relative pattern of peaks in the frequency

response pattern, in order to simplify the discussion, and without loss of

generality for the issues being considered, for purposes of presentation, we

will consider the tempo for each type of polyrhythms to be one repetition

per second. Thus, in the case of a 3:2 polyrhythm, the constituent pulse

trains have rhythmic frequencies 2 Hz and 3 Hz. Consequently, the type of

polyrhythm determines the rhythmic frequencies of each pulse train.

6.2.4 Simulation with a 3:2 polyrhythmic stimulus repeating once per second

Figure 6.13 below is the frequency response pattern of the neural resonance

model in the presence of a 3:2 polyrhythmic stimulus repeating once per

second for 12 seconds. The response is the average of the activity over

the last 20% of the stimulation period. We can see from the graph that

the frequency response pattern is in line with the general framework of

empirical findings on how people perceive polyrhythm. With regard to

Handel and Oshinsky’s categories, the frequency responses at 2 Hz and

3 Hz reflect the first category noted above, i.e., tapping along with each

Page 133: Testing and analysis of a computational model of human rhythm perception

108 6 validation of the model against existing empirical data

of the pulse trains. The frequency response at 1 Hz reflects the second

category whereby people tap along with the co-ocurrence of the two pulse

trains. However one may equally interpret this as tapping to every other

element of the 2-pulse train. Similarly a frequency response at 1.5 Hz

reflects the third category whereby people choose to tap along with every

other element of the 3-pulse train.

Interestingly, despite the multiplicity of categories noted in Handel and

Oshinsky’s framework for classifying how people perceive a beat/meter

in polyrhythms, their results for the 3:2 polyrhythm did not include any

empirical observations that fell outside the first two categories of their

framework (i.e., tapping along with the 2-pulse train, the 3-pulse train, and

the co-occurrence of the two pulses). However, this interesting gap may

be due to the fact that in their analysis they have combined observations

related to the third category with observations related to the first. So for

example if a participant chose to tap every other element of the 3-pulse

train, in the analysis this behaviour was interpreted as tapping along

with the three pulse train since according to Handel and Oshinsky "these

responses indicate the salience of the given pulse train" (1981, p. 3).

In the case of the 3:2 polyrhythmic stimulus, as in the case of the 4:3

polyrhythm, the neural resonance model gives frequency responses that

cannot be associated with reported tapping behaviours in the empirical

study. Of course for frequencies beyond 10 Hz, tapping is quite likely

impossible for people, as suggested by data on the upper and lower limits

of sensorimotor synchronization tasks (Repp, 2005). However, there is a

range of frequency responses, in particular between 4 Hz and 10 Hz, which

are potentially attainable by people as tapping speeds. For example the

frequency responses at 4 Hz and 6 Hz resemble the 2nd harmonics of the

fundamental frequencies of the two pulse trains, which could be interpreted

Page 134: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 109

as a tapping behaviour whereby one focuses on either of the pulse trains

and taps twice as fast as the rate of the pulse train.

A particularly interesting frequency response occurs at 5 Hz, which

resembles the addition of the input frequencies, i.e., 3 Hz plus 2 Hz equals

5 Hz. Although we can imagine people listening to a 3:2 polyrhythm

repeating once per second and tapping along with it in a 5 Hz tapping

frequency, we can expect that such tapping behaviour is more challenging

in comparison to produce tapping behaviours that are harmonically or

subharmonically related to the input frequencies.

An alternative explanation, however within a different context, could be

the following and it should be taken parenthetically to the main thread

of this chapter. We have already discussed that the bank of oscillators of

the neural resonance model are originally designed to model actual neural

oscillators in the human brain (see Section 4.1). From that perspective,

the frequency response pattern of the neural resonance model simulates

activity in the brain and in particular neural oscillators resonating to the

presence of a polyrhythmic stimulus, so the 5 Hz response can be seen as a

component of brain activity. Indeed, Langner (2007) observed that neurons

in the inferior colliculus of the gerbil (desert rat) respond at ratios such as

3:2 and 5:2 with respect to the frequency of a pulse-like stimulus. So in the

aforementioned example, the 5 Hz response of the neural resonance model

can be interpreted as a response along the above lines, i.e., a response that

stands in a 5:2 ratio with respect to the 2 Hz input frequency, and it may be

understood solely as a neural response.

Page 135: Testing and analysis of a computational model of human rhythm perception

110 6 validation of the model against existing empirical data

Figure 6.13: Frequency response of the model in the presence of a 3:2 polyrhythm

repeating once every 1000 msec. The stimulus consists of a 2 Hz pulsetrain and a 3 Hz pulse train.

6.2.5 Simulation with a 2:5 polyrhythmic stimulus repeating once per second

In the next simulation the frequency response pattern of the neural

resonance model was obtained using a 2:5 polyrhythm as a stimulus.

Similarly to the previous simulations the pattern was repeated once per

second for reasons explained earlier and consistency of presentation. Figure

6.14 below shows two frequency responses that match the input frequencies,

i.e., 2 Hz and 5 Hz, and at least four harmonics of the input frequencies,

including the 2nd and 3rd harmonics of the fundamentals, i.e., 4 Hz and 10

Hz (2nd harmonics), and 6 Hz and 15 Hz (3rd harmonics), and the 4th and

6th harmonics of the 2 Hz input frequency, i.e., 8 Hz and 12 Hz. Four further

responses are subharmonically related to the input frequencies, as follows.

The strongest subharmonic response is at 1 Hz, followed by one at 2.5 Hz,

and another one at 1.5 Hz. Additionally, there are frequency responses

Page 136: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 111

at 3 Hz and 7 Hz which are of a similar nature to the 5 Hz response

in the 3:2 simulation, i.e., implying a response obtained as a result of

subtracting and summing the input frequencies respectively. Considering

the empirical human data for people tapping along with a 5:2 polyrhythm,

Handel and Oshinsky report only a limited range of tapping behaviours

compared to the frequency response pattern of the model. In particular,

Handel and Oshinsky report that when people are asked to tap along with

a 2:5 polyrhythm they choose to tap either along with one of the pulse

trains (2-pulse or 5-pulse trains), or along with the coincidence of the pulse

trains. These behaviours correspond to a 2 Hz, 5 Hz, and 1 Hz tapping

frequencies respectively, all of which are present in the frequency response

pattern of the model below.

Figure 6.14: Frequency response of the model in the presence of a 2:5 polyrhythm

repeating once every 1000 msec. The stimulus consists of a 2 Hz pulsetrain and a 5 Hz pulse train.

Page 137: Testing and analysis of a computational model of human rhythm perception

112 6 validation of the model against existing empirical data

6.2.6 Simulation with a 3:5 polyrhythmic stimulus repeating once per second

The next simulation is concerned with the frequency response of the neural

resonance model in the presence of a 3:5 polyrhythm repeating once per

second. Therefore the input frequencies to the model are two pulse trains

of 3 Hz and 5 Hz. In Figure 6.15 below we can observe that the frequency

responses are structurally similar to the responses in previous simulations.

So for example, we get responses that resemble the input frequencies of 3

Hz and 5 Hz, harmonics of those frequencies, e.g., 6 Hz, 9 Hz, and 12 Hz

(harmonics of the 3 Hz input frequency) and 10 Hz and 15 Hz (harmonics of

the 5 Hz input frequency), the first subharmonics of the input frequencies,

i.e., 1.5 Hz and 2.5 Hz, a subharmonic response at 1Hz, which implies the

frequency repetition of the pattern, and responses at 2 Hz and 7 Hz as a

result of subtracting and summing the input frequencies respectively.

Figure 6.15: Frequency response of the model in the presence of a 3:5 polyrhythm

repeating once every 1000 msec. The stimulus consists of a 3 Hz pulsetrain and a 5 Hz 5-pulse train.

Page 138: Testing and analysis of a computational model of human rhythm perception

6.2 Simulations with isochronous polyrhythmic stimuli 113

The empirical findings reported by Handel and Oshinsky (1981) in regard

to how people tap along with a 3:5 polyrhythm fall within the main

categories reported by Handel and Oshinsky (in terms of preferences) of

tapping along with either one of the pulse trains or tapping along with

the co-occurrence of the pulse trains. In Figure 6.15 the aforementioned

tapping behaviours are represented by frequency responses at 3 Hz, 5 Hz

and 1 Hz respectively.

6.2.7 Simulation with a 4:5 polyrhythmic stimulus repeating once per second

The next simulation concerns the last type of polyrhythm used in the

Handel and Oshinsky study. This is a 4:5 polyrhythm repeating once per

second so the input frequencies are 4 Hz and 5 Hz. The frequency response

pattern can be explained in a similar manner to the analysis of the previous

simulations. Therefore, regardless the type of polyrhythm presented to

the neural resonance model the pattern of responses is structurally similar.

This gives us the opportunity to generalise the type of expected responses

in cases where the neural resonance model is stimulated by a polyrhythmic

stimulus of two pulse trains. We develop the idea below and the frequency

response pattern related to the 4:5 polyrhythm (Figure 6.16) is used to

support the generalisation.

Page 139: Testing and analysis of a computational model of human rhythm perception

114 6 validation of the model against existing empirical data

Figure 6.16: Frequency response of the model in the presence of a 4:5 polyrhythm

repeating once every 1000 msec. The stimulus consists of a 4 Hz pulsetrain and a 5 Hz pulse train.

6.3 generalising the frequency response pattern

Assuming that the input frequencies of a polyrhythmic stimulus are

expressed as f1 and f2 respectively, it is expected that the neural resonance

model would resonate at the following frequencies in the presence of any

polyrhythmic stimulus consisting of two pulse trains (Table 6.2).

Table 6.2: A generic frequency response pattern of the neural resonance model inthe presence of a two-pulse train polyrhythm.

a) Input frequencies f1 and f2b) Harmonics of input frequencies, e.g., 2f1, 3f1, 4f1, 2f2, 3f2, etc.c) 1st subharmonics of the f1 and f2, i.e., f1/2, f2/2

d) Repetition frequency of the polyrhythmic patterne) Responses at f2-f1 and f2+f1

Page 140: Testing and analysis of a computational model of human rhythm perception

6.4 Simulations with non-isochronous polyrhythmic stimuli 115

Figure 6.16 above contains responses that can be interpreted on the

basis of Table 6.2. The above generalization, in particular, cases a), c),

and d) resemble the general framework of tapping behaviours related

to polyrhythm perception reported in the Handel and Oshinsky study.

Furthermore, cases b), and e) in Table 6.2 suggest a possible extension of

the above general framework, i.e., the ways people perceive beat or meter

in polyrhythms. We have already discussed that the frequency responses

falling within categories b) and e) can be attainable by people based on

the upper and lower limits of sensorimotor synchronization tasks reported

in the literature and also imply the ability of people to perceive metrical

structure in a given rhythm (see 6.1.2).

The suggested extension as this derives from the general behaviour of

the neural resonance model in the presence of a polyrhythmic stimulus

can be tested by conducting experiments with people similar to those done

by Handel and Oshinsky. In fact, the suggested extension is indicative of a

very important process in scientific discovery, which is the use of a model to

make predictions about human behaviour previously unobserved. We have

valued this opportunity and for that reason we conducted experiments to

examine the validity of the aforementioned predictions. The results of these

experiments are discussed extensively in Chapter 8.

6.4 simulations with non-isochronous polyrhythmic

stimuli

In modelling rhythm perception, the task of accommodating stimuli that

deviate from perfect isochrony has been described as one of the most

Page 141: Testing and analysis of a computational model of human rhythm perception

116 6 validation of the model against existing empirical data

troublesome to deal with (Large, 2008). In Chapter 4, we have discussed

a case whereby the neural resonance model is stimulated with a freely

improvised melody in order to test whether the beat of the melody can be

identified. Indeed, the frequency response pattern gives rise to resonances,

which correspond to the periodicity levels of quarter and half note. Given

the focus of this thesis in polyrhythmic stimuli we wanted to stimulate the

model with polyrhythmic stimuli performed by a single person in order

to examine further the competence of the neural resonance model to deal

with polyrhythmic stimuli of non-isochronous pulse trains. The frequency

response pattern of the neural resonance in the presence of a polyrhythm

with non-isochronous pulse trains can be compared with a pattern related

to the same polyrhythm but with isochronous pulse trains.

An additional motivation for examining polyrhythms made of

non-isochronous pulse trains is the relevance of such stimuli with real

music and consequently the opportunity of testing the neural resonance

model with stimuli that strive to resemble real stimuli as closely as possible.

For example, much African music is fundamentally polyrhythmic and it

is performed live on demand in order to mark a wide range of social

occasions related to everyday life. Performing live means that a perfectly

isochronous polyrhythmic stimulus like those we considered so far in this

chapter is unlikely to be achieved by humans. In Chapter 2 we discussed

that temporal fluctuations from perfect isochrony occur as a result of

stylistic decisions but also because of people’s natural tendency to deviate

from perfect isochrony even if they are asked to tap periodically. Despite

this deviation from perfect regularity in musical structure people can still

perceive an underlying periodicity. A typical behavioural manifestation

of perceiving regularity in African polyrhythmic music is dance. Dancing

gestures can be triggered by spoken word, vocal music, and instrumental

Page 142: Testing and analysis of a computational model of human rhythm perception

6.4 Simulations with non-isochronous polyrhythmic stimuli 117

music (Agawu, 1995). Dancing gestures in African music exhibit several

types of synchronisation to the metrical structure of polyrhythmic music.

For example, one such type of synchronisation is in-phase synchronisation

with one particular rhythm. Therefore, the perception and the embodiment

of rhythm as these are manifested through dancing are associated with the

presence of polyrhythmic stimuli that exhibit temporal fluctuations - with

regard to a perfect isochrony - in their metrical structure.

6.4.1 Preparation of polyrhythmic stimulus comprising non-isochronous pulse

trains

Two different polyrhythmic stimuli were used to stimulate the neural

resonance model. We asked a performer to produce those two polyrhythms

in the following way. In the first case, the polyrhythm is performed in

the presence of a metronomic click, thus tempo variation is restricted but

local deviations from perfect isochrony are expected - given that a musician

performs the polyrhythm. In the second case, the polyrhythm is performed

in the absence of a metronomic click (after four initial clicks), which allows

for tempo variation over time as well as local deviations from perfect

isochrony. For convenience of presentation, we asked the musician to

perform a 4:3 polyrhythm at 60 bpm tempo, i.e., the polyrhythmic pattern

is asked to be repeated several times once per second. We also asked the

musician to tap the polyrhythm on a midi keyboard in order to record his

performance. More specifically, the polyrhythm was performed bimanually,

i.e., the right hand index finger playing the 4-pulse train and the left hand

index finger playing the 3-pulse train.

Figure 6.17 is an excerpt of a time series representation of the performed

polyrhythm, which also illustrates the form of the signal that is fed into

Page 143: Testing and analysis of a computational model of human rhythm perception

118 6 validation of the model against existing empirical data

the neural resonance model. In this case, the polyrhythm is performed

while the performer is listening to the metronome. The strongest response

corresponds to the co-occurrence of the two pulse trains, i.e., the 1st pulse of

both pulse trains. Then there are 5 events between the two successive peak

responses. The 1st, 3

rd and 5th of those are the 2

nd, 3rd, and 4

th pulses of the

4-pulse train. The 2nd and 4

th events correspond to the 2nd and 3

rd pulse

of the 3-pulse train. The most obvious temporal fluctuation in Figure 6.17

is that the polyrhythm is drifting slightly to the right gradually over time,

i.e., there is a deceleration, which indicates a delayed start of repeating the

pattern with regard to the one-second period indicated by the inter-onset

interval of the metronomic click. Additional fluctuation may exist between

successive events like for example the relative position of the 4th event amid

the 3rd and 5

th events.

Figure 6.17: Time series representation of a 4:3 polyrhythm performed by a human

at 60 bpm.

Page 144: Testing and analysis of a computational model of human rhythm perception

6.4 Simulations with non-isochronous polyrhythmic stimuli 119

6.4.2 Stimulating the model

Figure 6.18 below is the frequency response pattern of the neural resonance

model in the presence of the above 4:3 polyrhythmic stimulus. We can see

that despite the temporal fluctuation from perfect isochrony embedded in

the pulse trains of the polyrhythmic stimulus the pattern is structurally

similar to the one related to the stimulation of the model with a polyrhythm

comprised of isochronous pulse trains. In other words, the frequency

response pattern conforms to the generalised framework of the model’s

responses suggested previously (see Table 6.2). The only difference is

a missing response corresponding to the 1st subharmonic of the 3-pulse

train. This is likely due to the form of stimulus encoding of the human

performance (MIDI). In Chapter 7 we will see that this particular response

is significantly decreased also for the isochronous polyrhythm when the

type of the encoding is a MIDI file.

Page 145: Testing and analysis of a computational model of human rhythm perception

120 6 validation of the model against existing empirical data

Figure 6.18: Frequency response of the model in the presence of a 4:3 polyrhythm

performed by a human at 60 bpm.

The following simulation is the last of this chapter and is similar to the

previous one, however the polyrhythm was performed without listening

to the metronome, thus the polyrhythmic stimulus is susceptible to exhibit

tempo variation as well as local fluctuations from perfect isochrony. In

particular, in the current case four clicks at 60 bpm were provided in

advance to the performance of the 4:3 polyrhythm and then the musician

performed the polyrhythm for about 30 seconds without listening to

the metronome. A time series representation of the first 9.5 seconds

of the performed polyrhythmic signal is shown in Figure 6.19. This

excerpt illustrates the temporal deviation occurring in the performance. In

particular if we concentrate on the strongest amplitude response, which

corresponds to the co-occurrence of the two pulse-trains, we can see that

the second and third repetition of the polyrhythmic pattern is ahead

of a hypothetical metronomic click occurring every second. If the first

Page 146: Testing and analysis of a computational model of human rhythm perception

6.4 Simulations with non-isochronous polyrhythmic stimuli 121

hypothetical click is taken to be at 0.5 seconds in the x-axis, then all

numerical points in the x-axis demarcate a hypothetical metronomic click.

The performance starts approximately at 500 msec, a point in time that

marks the first co-ocurrence of the two pulse trains. It is therefore expected

that the second repetition of the polyrhythmic pattern will start at 1.5 sec,

insofar as the performer was asked to perform the polyrhythm at 60 bpm,

i.e., to repeat the pattern once per second. However, we can see in the

graph that the second repetition of the polyrhythmic pattern starts slightly

ahead of time, i.e., the first repetition was a bit too fast. During the second

repetition it seems like the performer may have felt that the first repetition

was too fast so he is holding the former longer than 1 second in order

to return to a one repetition per second mode. Despite this correction,

the third repetition starts slightly delayed with respect to a hypothetical

click at 2.5 seconds and also it lasts a bit longer than 1 second - maybe

indicating a continuation of the previous correction strategy. Over the next

two repetitions (4th and 5th) the polyrhythmic pattern is in phase with a

hypothetical click and therefore the one repetition per second is successful.

From repetition 6 onwards the performance starts drifting slightly to the

right with respect to the hypothetical clicks, which means that the repetition

of the polyrhythmic pattern is slower than expected. This performance

is a typical example of temporal deviation from perfect isochrony and it

shows the kinds of temporal fluctuation related to human performance

discussed in Chapter 2 (see Subsection 2.1.3). In this last simulation the

neural resonance model has been stimulated with the first 15 seconds of

the above performance. Figure 6.20 corresponds to the frequency response

pattern of the model averaged over the last 20% of the simulation time, i.e.,

the last 3 seconds. We can see that the pattern is similar to the two previous

cases where a 4:3 polyrhythmic stimulus with repetition rate of 1 second

Page 147: Testing and analysis of a computational model of human rhythm perception

122 6 validation of the model against existing empirical data

Figure 6.19: Time series representation of a 4:3 polyrhythm performed by a human

at 60 bpm with no metronomic guidance.

comprised of isochronous pulse trains was presented to the model, and

also, it conforms to the generalized framework suggested in Table 6.2.

Page 148: Testing and analysis of a computational model of human rhythm perception

6.5 Summary 123

Figure 6.20: Frequency response of the model in the presence of a 4:3 polyrhythm

performed by a human at 60 bpm with no metronomic guidance.

6.5 summary

Overall, based on the output of all the simulations presented in this

chapter we can see that the neural resonance model produces responses

that can be matched to tapping behaviours observed by Handel and

Oshinsky (1981) for a range of polyrhythms and repetition rates. Also,

the model can accommodate temporally variant polyrhythmic stimuli that

are closer to the kind of rhythmical stimuli occurring in performed music.

A general framework to capture the frequency response pattern of the

neural resonance in the presence of a two pulse-train polyrhythm has been

suggested. Inspection of the latter in combination with empirical studies

about the upper and lower limits in sensorimotor synchronisation tasks

suggest that there is an opportunity for conducting empirical experiments

in order to potentially observe attainable tapping frequencies, which have

Page 149: Testing and analysis of a computational model of human rhythm perception

124 6 validation of the model against existing empirical data

not been reported in the literature previously. Finally, the hypothesis that

the magnitude of the amplitude responses in the model may be indicative

of how relative tapping preferences are forming through transposition of

the polyrhythm in time did not verify given that the frequency response

pattern remains invariant for different tempi.

Page 150: Testing and analysis of a computational model of human rhythm perception

7M O D E L A N A LY S I S

In general, the process of validating mathematical models of biological

systems involves the comparison between modelled data and empirical

data. Furthermore, the extent to which a model manages to produce data

similar to those empirically observed serves as a measure of the degree of

validity of a model, subject to certain provisos. For example, the process

of obtaining simulated and empirical data is in general accompanied by a

degree of uncertainty regarding the adequacy of the collected data, on both

empirical and simulated sides, to support inferences about the validity of

a model. Some of the main sources of uncertainty when gathering data

from a model to be tested, are related to the fact that any simulated dataset

will be based on modelling assumptions about the natural system to be

modelled. These assumptions may be incomplete or wrong (O’Neill and

Gardner, 1979). In the case of a parameterised model, further issues may

arise, since any simulated dataset may depend on specific numerical values

of the parameters.

In Chapter 5 (see Subsection 5.2.6) we have briefly mentioned that the set

of nominal values of the parameters used for testing of the neural resonance

model could be seen as a potential source of uncertainty with regard to

the obtained modelled data. Another such source of uncertainty relates to

the issue that the neural resonance model is a canonical representation of

the more biological Wilson-Cowan model. In order to address these and

other issues, we will now discuss model validation techniques designed to

125

Page 151: Testing and analysis of a computational model of human rhythm perception

126 7 model analysis

limit uncertainty. For example, such techniques include testing a model for

consistency in producing certain results for a range of numerical settings,

and comparing the performance of a model with the performance of other

similarly purposed models. In general, similar-purpose models typically

simulate the same biological system but they may make slightly different

biological hypotheses and/or use different mathematical formulae. In the

present research, both of the above two techniques for limiting uncertainty

about the assumptions behind the neural resonance model are used. Firstly,

a technique called parameter sensitivity analysis is employed, under which

the numerical values of the parameters of the neural resonance model

(detailed below) are systematically manipulated. Secondly, a technique

called cross-model comparison is employed to limit uncertainty. In this

case, the behaviour of the neural resonance model is compared with the

behaviour of the Wilson-Cowan model. The Wilson-Cowan model is a

well-respected and highly biologically plausible model for capturing the

dynamics of neural populations (Wilson and Cowan, 1972). Thus, by

comparing its behaviour with the behaviour of the neural resonance model

we can limit uncertainty regarding the biological assumptions made by the

latter.

7.1 parameter sensitivity analysis

In the next few sections, sensitivity analysis is conducted on a range of

parameters of the neural resonance model. The neural resonance model

contains many different parameters. In particular, parameters appear in

several places such as in the state variable of the neural resonance model,

i.e., the equation that describes a neural oscillator, in the driving variable, i.e.

Page 152: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 127

the external stimulus, and in the initial conditions, i.e., initial amplitude and

phase of the oscillators.

The state variable parameters (see Chapter 4 - Equation 7) account

for the dynamics of a neural oscillator over time in the presence of a

rhythmic stimulus. The external stimulus is an independent variable

(driving variable) that influences the dynamics of the state variable. The

only parameter associated with the external stimulus is the amplitude of

the latter, but analysis can also be performed for different types of stimulus

encoding, e.g., MIDI vs. audio files (see Chapter 5 - Subsection 5.2.1). The

initial conditions are related to the amplitude and phase value of each

oscillator in the bank just before the stimulation starts. The next three

subsections describe in detail the systematic manipulation of the numerical

values for all types of parameters briefly introduced above, starting with the

initial conditions of the oscillators, then the external stimulus, and finally,

the parameters of the state variable of the neural resonance model.

7.1.1 Initial conditions of each oscillator: amplitude and phase

Initial conditions refer to the initial numerical values given to the

observable outputs of the state variable of a model at the starting point

of a simulation. In the case of the neural resonance model these outputs

are the amplitude and the phase of the oscillators comprising the bank.

For example, the output of the simulations presented in Chapter 6 is the

average value of the amplitude of each oscillator over the last 20% of the

stimulation with an external polyrhythm. From that perspective, initial

numerical values should be given to both the amplitude and the phase of

each oscillator in the bank at time zero. As a result, we need to identify

those initial numerical values, and to clarify why a certain numerical value

Page 153: Testing and analysis of a computational model of human rhythm perception

128 7 model analysis

is chosen instead of any other. To address this point, at least in the context

of this thesis, it is useful to remember that we are mostly concerned with

examining the amplitude responses of the neural resonance model in the

presence of some stimulus. That is we want to examine which oscillators

in the bank will resonate in the presence of some polyrhythmic stimulus.

The resonance phenomenon can briefly be described as the increase in

amplitude of an oscillation of a physical system due to the exposure to a

periodic external force of which the driving frequency is equal or very close

to the natural frequency of the physical system (van Noorden and Moelants,

1999). In effect, the focus is on the increase of the amplitude value of each

oscillator resulting from resonance activity, i.e., only one of the two main

metrics of the state variable of the neural resonance model (the other being

the phase), thus the discussion below is exclusively focused on the initial

condition of the amplitude. With regard to the phase of the oscillators, all

oscillators are in-phase in the beginning of a simulation, i.e., relative phase

between any pair of oscillators in the bank is zero.

In the presence of a polyrhythmic stimulus some of the oscillators in

the bank are expected to resonate (exhibit an increase in their amplitude)

regardless of their initial amplitude. One reasonable numerical setting for

the initial amplitude of each oscillator in the bank is a value close to zero, a

numerical value that is used to indicate baseline activity, i.e., neural activity

in the absence of external stimuli. In fact, the value of the initial amplitude

of the oscillators in the simulations presented in Chapter 6 is indeed close

to zero 10−10. However, the response of the neural resonance model can

be examined for a numerical range regarding the initial amplitude of the

oscillators such as between 0 (inclusive) and 1 (exclusive). For any value

within this range we would expect to observe resonance activity similar to

this presented in Chapter 6. Indeed, the following three figures illustrate

Page 154: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 129

that the resonance effect is independent of the initial amplitude of the

oscillators, and that the initial amplitude of the oscillators only corresponds

to a transient state of the system. Figure 7.1 and Figure 7.2 illustrate the

effect of resonance for initial amplitudes 0.7 and 10−10 respectively in a

12.5 second stimulation period using a polyrhythmic stimulus consisting of

two squarewaves 3Hz and 4Hz. Figure 7.3 illustrates the combined results

of the average activity of the bank of oscillators over the last 20% of the

stimulation.

Time (sec)

Oscill

ato

r F

requency (

Hz)

Oscillator Amplitude

2 4 6 8 10 12 0.25

0.38

0.50

0.75

1.00

1.50

2.00

3.00

4.00

6.00

8.00

12.01

16.00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 7.1: Amplitude response of the neural resonance model over a period of 12.5seconds where the initial amplitude for all oscillators is 0.7. Gradually,the oscillators drop the levels of the initial amplitude except from thoseoscillators that resonate in the presence of the polyrhythmic stimulus.Some of the resonating oscillators, which are easy to spot, have thefollowing frequencies: 12 Hz, 9 Hz, 4 Hz, 3 Hz, 2 Hz, 1.5 Hz, and 1 Hz.

Page 155: Testing and analysis of a computational model of human rhythm perception

130 7 model analysis

Time (sec)

Oscill

ato

r F

requency (

Hz)

Oscillator Amplitude

2 4 6 8 10 12 0.25

0.38

0.50

0.75

1.00

1.50

2.00

3.00

4.00

6.00

8.00

12.01

16.00

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Figure 7.2: Amplitude response of the neural resonance model over a period of 12.5seconds where the initial amplitude for all oscillators is close to 0. Overtime, the oscillators that resonate in the presence of the polyrhythmicstimulus exhibit higher levels of amplitude. Some of the resonatingoscillators, which are easy to spot, have the following frequencies: 12

Hz, 9 Hz, 7 Hz, 4 Hz, 3 Hz, and 1 Hz.

0.25 0.38 0.50 0.75 1.00 1.50 2.00 3.00 4.00 6.00 8.00 12.01 16.000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Oscillator Frequency (Hz)

Response A

mplit

ude (

|z|)

Frequency Response

Figure 7.3: Superimposed frequency response patterns for different initialamplitude of the oscillators 0.5 (blue) and near 0 (green). The resonancepattern is similar in both cases, i.e., similar oscillators resonate in thepresence of the polyrhythmic stimulus regardless the initial amplitude.The high amplitude levels of the low frequency oscillators in the bluepattern correspond to a transient stage.

Page 156: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 131

7.1.2 Driving variable parameters: external polyrhythmic stimulus

In general, driving variables are independent variables, which are used

to influence a state variable. They are dynamic variables acting as an

input to a state variable. Driving variables have no input. Systematic

numerical manipulation of the parameters of a driving variable allows

us to examine their effect in the general behaviour of a model, and the

consistency of the latter in producing qualitatively similar results. In

the simulations presented in Chapter 6 this type of driving variable is

the external polyrhythmic stimuli used to stimulate the neural resonance

model. The parameter we are concerned with here is the amplitude of the

stimulus. However, the different ways of encoding the external stimulus,

e.g., MIDI and audio files (see Chapter 5 - Subsection 5.2.1) can be subjects

for sensitivity analysis as well. The results of applying parameter sensitivity

analysis with regard to the amplitude of the stimulus and its encoding type

are presented below.

In Chapter 6 the polyrhythmic stimulus was encoded using functions.

Figure 7.4 below combines frequency response patterns of the neural

resonance model for all three potential ways of stimulus encoding i.e.,

functions, audio files, and MIDI files. The fact that the frequency response

patterns of the neural resonance model are structurally the same for all

the above ways of stimulus encoding indicates that the model behaves

consistently regardless the chosen way of stimulus encoding. The blue

frequency response pattern corresponds to using the bump function for

encoding the stimulus, the green to using audio files, and the red to using

MIDI files for encoding the stimulus. By comparing the frequency response

patterns we can see that the peak responses are similar for all the patterns

regardless the encoding technique. However, we should note the absence

Page 157: Testing and analysis of a computational model of human rhythm perception

132 7 model analysis

of even harmonics in the audio encoding which uses square waves to

represent the polyrhythm. In Chapter 6 we mentioned that this is normal

since the decomposition of a squarewave into an infinite set of simple

oscillating functions only contains components of odd integer harmonic

frequencies.

Figure 7.4: Superimposed frequency response patterns of the neural resonance

model for the three different ways of encoding a polyrhythmic stimulus,i.e., audio files, MIDI files and bump function.

We now proceed to examine what happens to the frequency response

pattern of the bank by considering a systematic manipulation of the

amplitude of the polyrhythmic stimulus. The stimulus is encoded using

the bump function because it takes less time to alter the amplitude value

compared to the other ways of encoding. The amplitude value of the

stimulus is manipulated around a central value (in this example 0.4) within

a certain range (0.2 - 0.6). These values can be defined arbitrarily, taking

into consideration certain limitations on the stimulus amplitude imposed

Page 158: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 133

by the mathematical derivation of the model (Large et al., 2010). Figure

7.5 below illustrates the frequency response patterns for three numerical

values of the stimulus’s amplitude. The red response corresponds to the

largest amplitude (0.6), the blue to the medium one (0.4), and the green to

the smallest amplitude (0.2). In general, we can see that the magnitude of

the response of the oscillators follows the changes in the amplitude of the

stimulus, but nonetheless the resonance pattern is similar to all three cases.

Figure 7.5: Superimposed frequency response patterns for different amplitudes of

a 4:3 polyrhythmic pattern with 1 Hz repetition frequency.

In general, we can see that the frequency response patterns are

structurally the same regardless the differences in the amplitude of the

stimulus and its encoding type. Thus, the behaviour of the neural resonance

model is consistent in producing similar results to those presented in

Chapter 6 regardless any small variations in the value of the amplitude

of the stimulus and its encoding type.

Page 159: Testing and analysis of a computational model of human rhythm perception

134 7 model analysis

7.1.3 State variable parameters

In this section we apply sensitivity analysis to the parameters of the state

variable of the neural resonance model, which is represented by Equation 7

(Chapter 4). The equation will be repeated below to facilitate the discussion

and illustrate the parameters. Similarly to the previous sections, the

generally desirable expectation is that a slight change in the parameter

values will not create significant changes in the frequency response pattern

of the neural resonance model. The idea that a systematic, but limited

in numerical range, manipulation of the values of the parameters of a

model will not cause a significant change in the responses of the model, is

expected due to an intuitive belief that most real systems will not respond

violently to small changes in the values of the operating parameters or

variables (Haefner, 2005). The section begins by disclosing the nominal

numerical values given in the parameters in the simulations of Chapter 6,

and continues by applying sensitivity analysis to some of the parameters

comprising the state variable of the neural resonance model.

7.1.3.1 Nominal values of state variable parameters

In Chapter 4 the mathematical formulation of the neural resonance model

was presented and the functionality of each parameter has been described.

However, at that point we did not disclose the numerical values of the

parameters during a simulation. Also, in Chapter 5 we briefly mentioned

that the nominal values of the parameters of Equation 7 that represents a

neural oscillator are defined based on certain criteria such as the fact that

we are interested in the dynamics of an oscillator near an Andronov-Hopf

bifurcation point. There we pointed forward to this chapter as a more

appropriate place to include a detailed discussion on the nominal values

Page 160: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 135

of the parameters, because parameter sensitivity analysis deals exclusively

with manipulation of numerical values of parameters.

The following equation is Equation 7 introduced in Chapter 4. Parameters

b, d, and k, are complex variables (disclosed below), which reveal the full

range of parameters associated with the state variable (z) of the neural

resonance model.

zi = bizi + dizi|zi|2 +

kiεzi|zi|4

1− ε|zi|2+

xi(t)

1−√εxi(t)

· 1

1−√εxi(t)

(8)

where,

b = α+ iω

d = β1 + iδ1

k = β2 + iδ2

Thus, the range of parameters associated with the neural resonance

model is (α,β1,β2, δ1, δ2,ω) and (ε). Parameters (α,β1,β2, δ1, δ2), and (ω)

relate to the oscillatory nature of the system. Parameter (α) corresponds

to a bifurcation parameter, which is in effect a parameter denoting the

systems transition from one dynamic state to another. In this thesis,

the bifurcation parameter takes a nominal value equal to zero, which

implies that the system lies at a critical point called Andronov-Hopf

bifurcation point. As previously mentioned the reason for taking

parameter (α) equal to 0 is related to the dynamics of the original system

of equations describing the interaction of excitatory and inhibitory neural

populations. When a dynamical system like the above is near one of its

bifurcation points (e.g., Andronov-Hopf) the interaction of the excitatory

and inhibitory populations may result either in a steady oscillation or a

damped oscillation, i.e., the neural oscillator is in a transient mode. For

Page 161: Testing and analysis of a computational model of human rhythm perception

136 7 model analysis

positive values, the oscillatory system moves on to spontaneous oscillation

while for negative values it moves on to a damped oscillation. Sensitivity

analysis for parameter (α) can therefore be applied by considering a

systematic numerical manipulation within a range, which has its centre

around the nominal value zero.

Parameter (β1) and (β2) are damping parameters that prevent the

amplitude of the oscillation from becoming infinitely large when (α)

is positive. In other words, they function as a saturation point (limit)

around which the amplitude of the oscillation is bound to stabilise. In

the simulations of Chapter 6 some of the oscillators exhibit an increase in

their amplitude due to the resonance effect, therefore parameters (β1) and

(β2) control the resonance amplitude. The nominal values of parameters

(β1) and (β2) should be negative to express the above functionality. In

the simulations of Chapter 6, only parameter (β1) had a negative nominal

value of -1, while (β2) was set to 0 for simplification. The numerical

range employed for conducting sensitivity analysis with respect to (β1) is

discussed in the following section.

Parameters (δ1) and (δ2) are frequency detuning parameters that define

how the peaks in the frequency response pattern begin to detune as the

strength of the stimulus increases. Frequency detuning was illustrated in

Figure 4.6 (Chapter 4) for positive values of (δ1) and (δ2). Large (2008)

suggested that frequency detuning - for negative values - can address some

behavioural observations, such as the tendency of people to tap ahead in

time when asked to tap along with a metronomic click. This tendency

of tapping ahead of time is illustrated in Figure 7.11 where for example

the maximum resonance for a pulse train of 3 Hz rhythmic frequency

corresponds to an oscillator with natural frequency slightly larger than 3

Hz. In the simulations presented in Chapter 6 both (δ1) and (δ2) parameters

Page 162: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 137

were set to zero for simplicity. We have already discussed how the sign of

the value of parameters (δ1) and (δ2) influences the direction of detuning.

For that reason we can infer that a parameter sensitivity analysis whereby

the values of parameters (δ1) and (δ2) take a small increase or a small

decrease around their nominal value zero would detune the frequency

response pattern either to the left or to the right respectively.

Parameter (ω) is the angular frequency of the oscillator. It refers to the

product of the natural frequency of each one of the oscillators in the bank

multiplied by 2π. The frequency of each oscillator is fixed, thus it makes no

sense to apply sensitivity analysis for (ω) either.

Lastly, parameter (ε) controls the type of coupling between oscillators

and the external stimulus. When the value of (ε) is 0 the coupling is

linear, while for positive values it becomes nonlinear. Linear coupling

implies that only the oscillators with natural frequencies close to those of

the input frequencies of the stimulus and their harmonics will resonate.

However, if (ε) is set to 1 then there is a nonlinear coupling between the

bank of oscillators and the stimulus, and the full range of resonances such

as subharmonics and higher order combinations of the input frequencies

emerges. In the simulations of Chapter 6 the nominal value of parameter

(ε) was 1, so parameter sensitivity analysis will be considered for a range

of values around 1 but strictly non-zero in order to retain nonlinearity.

7.1.3.2 Setting numerical range for state variable parameters

Sensitivity analysis examines whether the systematic numerical

manipulation of the parameters of the state variable described above

has an effect in changing the frequency response pattern of the neural

resonance model. In the previous section we revisited the parameters

related to the state variable of the model, and we disclosed the nominal

Page 163: Testing and analysis of a computational model of human rhythm perception

138 7 model analysis

numerical values that were used in the simulations of Chapter 6. Also, we

discussed that sensitivity analysis will be applied only to some of those

parameters, and in particular, to parameters (α) with nominal value 0, (β1)

with nominal value −1, and (ε) with nominal value 1.

The sensitivity analysis entails running multiple simulations whereby

numerical values are changed systematically for the three parameters

mentioned above. The coefficients of similarity r2 of the frequency

response patterns are then calculated. For each parameter we need to

define a numerical range around the nominal value previously described.

Additionally, we have to determine how many numerical values within

each numerical range for each parameter are chosen for the analysis. For

example, by choosing four values for each of the three parameters the

total number of simulations to run will be 43 = 64. Within this total

number of combinations, some simulations will only have the value of

a single parameter changed with respect to the nominal values, others

will have the value of two parameters changed, and some others will

have the values changed of all three parameters. The first category is

described as to single parameter sensitivity analysis, whereas the last two

are described as multiple parameter sensitivity analysis. In some cases,

where the number of simulations is large the analysis can be confined by

choosing multiple parameter analysis exclusively. Another way to bring

down the total number of simulations to be run is by taking fewer values

for each parameter within its range. The exact number of runs is discussed

in the following sections.

We have already introduced the parameters of the state variable. The

nominal values of these parameters were discussed along with some

implication about the range of values for each parameter that can be

potentially used in the parameter sensitivity analysis. In general, the

Page 164: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 139

range of values for each parameter corresponds to the degree of alteration

around the nominal value. There are two strategies for determining the

amount by which parameters can be altered, namely the uniform and the

variable approach (Haefner, 2005). In the uniform approach, all parameters

are altered by the same percentage with respect to their nominal value.

In order to apply the variable approach we need to know the variance of

the nominal value, if any, for specific parameters. However, since we do

not have information about the variance of nominal values for certain

parameters regarding the neural resonance model, we can employ the

uniform approach. Often the percentage of alteration in the uniform

approach is ±10% around the nominal value, but values ranging from

2% and 20% can also be used (Haefner, 2005). Table 7.1 below shows the

numerical values of the parameters (α), (β), and (ε) after adopting a ±10%

uniform variation around the nominal value.

Table 7.1: Range of numerical values for parameters (α), (β) and (ε). Values arecalculated based on a ±10% uniform variation around nominal values.

Parameter (α) Parameter (β) Parameter (ε)Nominal Value 0 −1 1

+10% 0.1 −0.9 1.1−10% −0.1 −1.1 0.9

The developers of the neural resonance model (Large et al., 2010) have

made suggestions about the range of values for parameter (α). In particular,

they suggest that parameter (α) can take values between 0 (inclusive) and

a small positive such as 0.1. The reason for suggesting this range is to

focus on spontaneous oscillation mode (for small positive), which can

account for simulating the endogenous periodicity associated with rhythm

perception in people. In other words, in the spontaneous oscillation mode,

Page 165: Testing and analysis of a computational model of human rhythm perception

140 7 model analysis

once an oscillation is established it can be retained even in the absence

of an external stimulus, which according to Large (2010) is a potential

mechanism that explains how people are able to retain a perceived rhythm

in periods of pauses of events or even in highly syncopated rhythms.

Large’s range differs from the general guidelines mentioned above, in

particular where the lower limit of parameter (α) with nominal value 0

is −0.1 (-10% of the nominal value). However, there is no restriction in

considering negative values for (α), because the polyrhythmic stimuli used

in simulations of Chapter 6 are not syncopated and have no long pauses,

thus requirement for the spontaneous oscillation mode exclusively is not

imperative. In other words, considering the presence of a polyrhythmic

stimulus as this has been described so far, in combination to the fact

that even a damped oscillator (α < 0) has the ability to resonate, we can

expect the frequency response of the model to be similar to the one where

critical1 (α = 0) or spontaneous oscillation (α > 0) is considered. Figure

7.6 below illustrates the responses of the neural resonance model at the

critical Andronov-Hopf point (α = 0) and also, when the oscillator is

damped (α = −0.1). Despite the fact that the two graphs are highly similar

with r2 = 0.9463, the responses at 1.5 Hz, 2 Hz, and 5 Hz are significantly

lower in the damped mode. In a way we could say that the system in its

damped oscillation mode supresses its ability to resonate in the presence

of a stimulus. This is reasonable as the magnitude of the damping has

an inversely proportional effect to the magnitude of the resonance. After

conducting a further run in which we set (α = −1), which is not depicted

here, the resonances were clearly suppressed for all the oscillators.

1 A critical point is a non-equilibrium transient state of a dynamical system that lies betweentwo different stable dynamic modes.

Page 166: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 141

Figure 7.6: Frequency response patterns for (α = 0) (blue), and (α = −0.1) (green).

The patterns are very similar with r2 = 0.9463

7.1.3.3 Sensitivity analysis for parameters (α) and (ε)

In Figure 7.6 above we saw that even for a small value of the parameter

(α = −0.1) the frequency response pattern starts dropping its resonances.

Additionally, a large negative value of parameter (α) would eventually turn

the system into a rigid non-responsive system in the presence of a stimulus.

For that reason, we can take advantage of it and limit the total number of

runs we have to perform in parameter sensitivity analysis by disregarding

the cases whereby (α) takes a negative value. Table 7.2 below includes all

possible runs for a (±10% uniform variation around the nominal value of

parameters (α) and (ε) (negative (α) included).

Page 167: Testing and analysis of a computational model of human rhythm perception

142 7 model analysis

Table 7.2: Combinations of different runs for parameters (α) and (ε) consideringboth single and multiple sensitivity analysis.

Runs Parameter (α) Parameter (ε) Color1 Nominal (0) Nominal (1) Blue2 Nominal (0) +10%(1.1) Green3 Nominal (0) -10%(0.9) Red4 +10%(0.1) Nominal (1) Light Blue5 +10%(0.1) +10%(1.1) Purple6 +10%(0.1) −10%(0.9) Golden7 −10%(−0.1) Nominal (1) n/a8 −10%(−0.1) −10%(0.9) n/a9 −10%(−0.1) +10%(1.1) n/a

Run 1 corresponds to the original simulations (Chapter 6) with nominal

values (α = 0,β1 = −1,andε = 1). Run 2, run 3, and run 4 imply single

parameter analysis because only parameter (ε) is altered in the first two

cases, and only parameter (α) in the third case. Run 5 and run 6 imply

multiple parameter analysis because two parameters are altered at the same

time with respect to the nominal values. Figure 7.7 illustrates the frequency

response patterns of the first six runs described in Table 7.2 in the presence

of a 4:3 polyrhythmic stimulus with 1 Hz repetition frequency.

Page 168: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 143

Figure 7.7: Frequency response patterns corresponding to sensitivity analysis for

bifurcation parameter (α) and nonlinearity parameter (ε). The stimulusis a 4:3 polyrhythmic pattern with 1 Hz repetition frequency. Thenumerical values of the parameters of each run are listed in Table 2.

It seems like all six runs can be reduced into two distinctive patterns

of responses. In fact, by calculating the coefficient of similarity between

pairs of runs we get that run 1 (nominal values) and run 2 are highly

similar with r2 = 0.998. Similarly, run 1 is highly similar to run 3 with

r2 = 0.998. Run 2 and run 3 are also similar with r2 = 0.992. With these

degrees of similarity among the first three runs we can consider the nominal

run as representative of runs 1, 2 and 3. In the first three runs only the

value of parameter (ε) is altered and we can observe that such numerical

manipulation within the described range has virtually no effect in altering

the frequency response pattern, thus the neural resonance model produces

responses that are resilient with regard to numerical manipulation of

parameter (ε) exclusively.

Page 169: Testing and analysis of a computational model of human rhythm perception

144 7 model analysis

Similarly, calculation of the r2 index between run 4 and run 5, and

between run 5 and run 6 gives 0.998 for both cases. Run 5 and run 6 have

r2 = 0.995. Under these circumstances we can take run 4 as representative

of runs 4, 5 and 6. The responses of the two groups of runs identified

above are combined in Figure 7.8 below. The green graph represents the

first group of runs (1-3) and the blue one represents the second group of

runs (4-6). The coefficient of similarity for these two graphs is 0.4368, which

indicates a weak degree of similarity. The differentiation in the frequency

response pattern in the second group of runs in comparison to the first

group is due to the 10% increase of the nominal value of parameter (α),

because it has been shown that the numerical manipulation of parameter

(ε) has no impact in the behaviour of the neural resonance model.

Figure 7.8: Frequency response patterns for (α > 0) (blue) and (α = 0) (green).

Despite the relatively low degree of similarity we can see that the

resonance effect is similar in both patterns. In particular, in the blue graph

we can still see that there are resonating oscillators with natural frequencies

Page 170: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 145

similar to those observed in the green graph, such as 1Hz, 1.5Hz, 2Hz, 3Hz

and 4Hz. The only difference is that the peak responses of the resonating

oscillators is stronger when parameter (α) is small positive and also, the

amplitude floor for those oscillators in the between of the resonating ones

in the frequency range 1 Hz - 16 Hz is raised when (α > 0). However,

in the case where a large negative value is given to parameter (β) (the

damping parameter) that prevents the oscillation amplitude from becoming

large when (α) is small positive, then the two graphs are highly correlated

with r2 equal to 0.9406 (not depicted here). From that perspective, it can

be said that the frequency response patterns of the neural resonance model

reported in Chapter 6 are consistent for a range of numerical values of

parameters (α) and (ε).

7.1.3.4 Sensitivity analysis for parameters (β) and (ε)

The set of simulations (runs) presented below assumes that parameter

(α) is small positive throughout the different runs. This is because the

main use of parameter (β) is to prevent the amplitude of a spontaneous

oscillation (α > 0) from becoming infinitely large. The sensitivity analysis

is applied for parameters (β) and (ε). In particular, both single and multiple

parameter analysis is applied for parameters (β) and (ε). The numerical

range and the specific values for the simulations for parameters (β) and

(ε), are calculated according to the uniform approach. The total number

of runs in this section is summarised in Table 7.3. The star next to the run

number is given to avoid confusion with the runs described in Table 7.2.

Page 171: Testing and analysis of a computational model of human rhythm perception

146 7 model analysis

Table 7.3: Combinations of runs regarding parameters (β) and (ε) considering bothsingle (Run 2* and Run 3*) and multiple (Run 4* to Run 6*) parameteranalysis. Range of values is calculated based on a uniform variation of±10% around nominal values.

Runs Parameter (β) Parameter (ε) Color1* Nominal (−1) Nominal (1) Brown2* +10%(−0.9) Nominal (1) Blue3* −10%(−1.1) Nominal (1) Green4* +10%(−0.9) −10%(0.9) Red5* −10%(−1.1) +10%(1.1) Light Blue6* +10%(−0.9) +10%(1.1) Purple7* −10%(−1.1) −10%(0.9) Golden

Figure 7.9 illustrates the frequency response pattern of the neural resonance

model for the seven runs described in Table 7.3 above in the presence of

a 4:3 polyrhythmic stimulus. The frequency response patterns exhibit a

high degree of similarity among them. In this set of runs parameter (α)

has been kept fixed (α = 0.1). In fact the resonance pattern is similar to

the one produced by runs 4, 5, and 6 in the first set of runs (Table 7.2),

where (α) was small positive. There we have seen that despite the small

similarity suggested by the coefficient r2 between the graphs corresponding

to (α = 0) and (α = 0.1), the fact that the same oscillators resonate across

different runs suggests that the model’s behaviour is not influenced by any

small numerical manipulation of the parameters.

Page 172: Testing and analysis of a computational model of human rhythm perception

7.1 Parameter sensitivity analysis 147

Figure 7.9: Frequency response patterns corresponding to sensitivity analysis for

saturation parameter (β) and nonlinearity parameter (ε). The dynamicsystem is on spontaneous oscillation mode (α > 0). The stimulus is a4:3 polyrhythmic pattern with 1 Hz repetition frequency. The numericalvalues of the parameters of each run are listed in Table 7.3.

So far we have employed parameter sensitivity analysis for a range of

parameters related to the state variable of the neural resonance model.

In general, we could say that parameters (β) and (ε) make almost no

difference in the frequency response pattern, a positive value for parameter

(α) raises the floor while a negative value leads to suppression of the ability

of the system to resonate. The frequency response pattern of each of the

simulations in this subsection has been suggestive of a model that produces

consistent results despite any numerical manipulation of its parameters

within a justified range, and therefore, it can be argued that the results

presented in Chapter 6 are robust and not significantly dependent on the

chosen numerical values of the parameters.

Page 173: Testing and analysis of a computational model of human rhythm perception

148 7 model analysis

7.2 cross-model comparison with the wilson-cowan model

In the introduction to this chapter, we noted that false assumptions

regarding the actual biological processes to be modelled can introduce

uncertainty in the adequacy of modelled data for making comparisons

with empirical data. As argued in the introduction, one way to deal

with this uncertainty is to consider alternative similar-purpose models,

ideally models which are considered to be particularly accurate or the best

available in some relevant respect, regardless of their complexity, and to

compare the behaviours of the two models (Haefner, 2005).

As indicated earlier, our first step in such a process has therefore been

to choose an alternative similar purpose model to the neural resonance

model. The model we have chosen to use here is the Wilson-Cowan

model, which describes the dynamics of interaction between excitatory

and inhibitory neurons. In fact, the neural resonance model derives

from the Wilson-Cowan model, therefore we could say that in many

respects the former shares indirectly the same biological assumptions as

the Wilson-Cowan model. In Chapter 4 the general view was noted of

the Wilson-Cowan model as a highly biological model, and therefore it

can be considered as a model that limits uncertainty regarding biological

assumptions more than any other obvious candidate model.

The second step in the process of cross-model comparison is to stimulate

both models using the same polyrhythmic stimulus and to compare their

behaviours. In cases where these behaviours are qualitatively similar it can

be argued that the neural resonance model can be used interchangeably

with the Wilson-Cowan, thus, any potential uncertainty with respect to the

biological assumptions of the neural resonance model is limited up to the

point that the assumptions of the Wilson-Cowan model are rigorous.

Page 174: Testing and analysis of a computational model of human rhythm perception

7.2 Cross-model comparison with the Wilson-Cowan model 149

The Wilson-Cowan model simulates the dynamics of a neural oscillator,

i.e., the properties of the interaction of two types of neurons (excitatory and

inhibitory) that comprise a population of neurons (neural oscillator), with

parameters corresponding to physical aspects of such a system. A detailed

description of the equation of a Wilson-Cowan oscillator can be found in

Campbell and Wang (1996) paper ’Synchronisation and desynchronisation

in a network of locally coupled Wilson-Cowan (WC) oscillators’. We will

not duplicate that description here. However, it will be useful for our

purposes to consider aspects of the parameters used in these equations.

In running simulations with the Wilson-Cowan model, values must be

assigned to the parameters of the WC oscillators, just as in the case of the

neural resonance model. Some of the parameters of a WC oscillator are

determined externally. These parameters are:

a) a parameter for self excitation in the excitatory neuron,

b) a parameter that denotes the strength of the coupling from the inhibitory

unit to the excitatory one,

c) a parameter of the natural frequency of the oscillator, and

d) two parameters reflecting the threshold of activation for both excitatory

and inhibitory neurons.

Also, there are two parameters, whose values are calculated internally

based on a set of nominal values given in the previous parameters. These

are:

a) a parameter for self excitation of the whole unit (neural oscillator), and

b) a parameter that denotes the strength of the coupling from the excitatory

unit to the inhibitory one.

Page 175: Testing and analysis of a computational model of human rhythm perception

150 7 model analysis

In the following simulations we have been using a nominal set of values

taken from the literature (Large et al., 2010) for the Wilson-Cowan model

that is being used as a standard of comparison. It has been beyond the

scope of this thesis to recursively apply parameter sensitivity analysis to

the Wilson-Cowan model itself. As a result, we have to keep in mind that

the discussion below is based on the responses produced given a particular

nominal set and that for different values it is possible that the observed

responses would have been different.

7.2.1 Frequency response pattern of the Wilson-Cowan model to a 4:3

polyrhythmic stimulus with 1 Hz repetition frequency.

Figure 7.10 illustrates the frequency response pattern of the Wilson-Cowan

model when stimulated with a 4:3 polyrhythmic stimulus for a range

of different stimulus’s amplitudes. The use of different amplitudes

is used here mainly to illustrate the frequency detuning property of

the Wilson-Cowan model. The pattern is qualitatively similar to the

neural resonance model, e.g., the resonating oscillators reflect the general

resonance pattern suggested in Chapter 6 (Table 6.2). Also, similar to

the neural resonance model, we can see that as the stimulus amplitude

increases, the higher-order resonances at harmonic, subharmonic and more

complex ratios with respect to the fundamental frequencies of the stimulus

become more prominent. For example if we concentrate on the light blue

trace we can see that for the 4:3 polyrhythmic stimulus, the responses we

get correspond to those of the neural resonance model. In Figure 7.13

responses of both models are combined to illustrate this, but before that

we need to elaborate on the observed frequency detuning in the frequency

Page 176: Testing and analysis of a computational model of human rhythm perception

7.2 Cross-model comparison with the Wilson-Cowan model 151

response pattern of the Wilson - Cowan model. Figure 7.10 suggests that

detuning increases in line with an increase in the stimulus amplitude.

Figure 7.10: Frequency response patterns of the Wilson-Cowan model in the

presence of a 4:3 polyrhythmic stimulus with 1 Hz repetitionfrequency for a range of different amplitudes of the stimulus.

7.2.2 Frequency detuning in the neural resonance model

Earlier in this chapter we have mentioned that frequency-detuning

parameters are also present in the neural resonance model, i.e., parameters

(δ1) and (δ1). In the simulations of Chapter 6 no detuning has been

observed in the frequency response patterns because the parameter related

to detuning had taken a nominal value of 0, as disclosed in this chapter,

and therefore was inactive. However, by changing the value of parameter

(δ) to −9, which is the nominal value used by Large, Almonte, and Velasco

(2010) we can expect detuning to arise in the frequency response pattern

Page 177: Testing and analysis of a computational model of human rhythm perception

152 7 model analysis

of the neural resonance model. This is illustrated in Figure 7.11 below. We

can see that the peaks are slightly bent to the right as a result of a negative

value of (δ).

Figure 7.11: Frequency response pattern of the neural resonance model in the

presence of a 4:3 polyrhythmic stimulus with 1 Hz repetitionfrequency for a negative value of the frequency detuning parameter(δ).

7.2.3 Superimposed frequency response patterns for both the neural resonance

and the Wilson-Cowan model

Finally, by superimposing the frequency response patterns of both models

in Figure 7.12 below we can observe the structural similarity between the

two responses. The coefficient of similarity is r2 = 0.8677.

Page 178: Testing and analysis of a computational model of human rhythm perception

7.3 Summary 153

Figure 7.12: Superimposed frequency response patterns of Wilson-Cowan and

neural resonance model. The two patterns are quite similar withr2 = 0.8677.

7.3 summary

In this chapter we have applied systematic techniques to limit the

uncertainty that arises in relation to reliance on data from the neural

resonance model. In particular, two such techniques were employed:

parameter sensitivity analysis and cross-model comparison. Firstly, in the

case of parameter sensitivity analysis, systematic numerical manipulation

was applied to the parameters of the state variable, the parameters relating

to the driving variable, and to the initial condition of the amplitude of the

oscillators. This allowed us to rule out dependency of the results presented

in Chapter 6 on specific numerical values.

Secondly, we compared the behaviour of the neural resonance model

with the behaviour of the Wilson-Cowan model in order to limit

uncertainty with regard to the biological assumptions of the neural

resonance model. The Wilson-Cowan model was chosen as the model

Page 179: Testing and analysis of a computational model of human rhythm perception

154 7 model analysis

that best limits uncertainty of biological assumptions more than any other

candidate model. Simulations using the Wilson-Cowan were found to

produce outputs qualitatively the same as those of the neural resonance

model. However, for completeness we should note that certain nominal

values were used to run the simulations with the Wilson-Cowan model

without recursively applying sensitivity analysis to the parameter of the

Wilson-Cowan model.

The neural resonance model derives from the Wilson-Cowan model,

and shares, in less explicit, but more mathematically convenient, form its

biological assumptions. Given that the Wilson-Cowan model is generally

considered (Chapter 4) to be the most biologically faithful of the candidate

models, and bearing in mind the similarity of responses between the two

models, we are able to place a reasonable limit on uncertainty with regard

to the biological assumptions made by the neural resonance model.

Page 180: Testing and analysis of a computational model of human rhythm perception

8M O D E L P R E D I C T I O N S A B O U T H U M A N B E H AV I O U R

In this chapter the neural resonance model is used to make predictions

about human tapping behaviour in polyrhythm perception, which are then

compared with empirical data collected by conducting novel experiments

with people. In the next few sections we will focus in detail on two

as-yet-unexamined predictions made by the neural resonance model about

human tapping behaviour in association with polyrhythm perception.

These predictions are as follows.

1. The categorisation used by (Handel and Oshinsky, 1981) to

summarize observed human tapping behaviours in polyrhythm perception

is incomplete - new kinds of tapping are predicted based on the behaviour

of the model (restated below).

2. The time it takes for an oscillator to reach relaxation time makes

predictions about the relative times humans will take to start tapping

in different ways to a polyrhythm. Our specific predictions are based

on a refined version of a hypothesis suggested by (Large, 2000). Large

previously made some limited tests of the hypothesis (as discussed below).

We then discuss how the predictions inform the design of the

experimental study and present the data obtained from doing the actual

experiment. We conclude by evaluating the model’s predictions by

comparing modelled data with empirical data. We will see that the two

principal results are as follows. Firstly, there is an empirical confirmation

of a prediction by the model of a category of human tapping behaviour not

155

Page 181: Testing and analysis of a computational model of human rhythm perception

156 8 model predictions about human behaviour

noticed in previous empirical studies. Secondly, the time it takes for people

to start tapping along with a polyrhythm - as soon as they feel a beat -

shares the same timescale with the temporal development of the resonance

dynamics in the network for reaching a steady state.

8.1 predictions to extend the categories of human tapping

behaviours associated with polyrhythm perception

The essence of the first of the two new predictions is simple. Namely the

frequency response pattern of the neural resonance model predicts that

the categories of human tapping behaviours observed in the empirical

study of Handel and Oshinsky (1981) is not exhaustive, and new kinds

of tapping are predicted. In Chapter 2, we saw that for a polyrhythm,

which consists of two pulse trains (m) and (n), the categories of observed

tapping behaviours reported by the Handel and Oshinsky study expressed

in rhythmic frequencies were as follows:

1. Tapping along with the fundamental frequency of (m).

2. Tapping along with the fundamental frequency of (n).

3. Tapping along with the 1st subharmonic of (m).

4. Tapping along with the 1st subharmonic of (n).

5. Tapping along with the repetition frequency of the polyrhythmic pattern.

In Chapter 6, we saw that the neural resonance model not only captures

all the aforementioned frequencies when stimulated by a n:m polyrhythm

for a range of tempi, but also, it produces several other frequency responses

that include:

Page 182: Testing and analysis of a computational model of human rhythm perception

8.2 Prediction of the timescale people start to tap to polyrhythms 157

1. the 2nd harmonic of (n),

2. the 2nd harmonic of (m),

3. the 3rd harmonic of (n),

4. the 3rd harmonic of (m), and

5. the 1st and 2nd subharmonic of the pattern’s repetition frequency.

Subject to one important proviso, the above components of the frequency

response pattern of the model may be taken as predictions for observable

human tapping behaviours in polyrhythm perception. The important

proviso, and the reason for specifically concentrating on the particular

components noted above and not others from the full range one can observe

in a frequency response pattern was discussed in Chapters 5 and 6, is

as follows. Namely, some, but not all of these additional components

correspond to humanly attainable tapping frequencies, because their

underlying rhythmic frequencies fall within the range of upper and lower

limits reported in studies of sensorimotor synchronization tasks (Repp,

2005). Given that important proviso, in order to provide empirical evidence

in support to the above predictions, we decided to undertake a study

similar to, but more comprehensive than, that of Handel and Oshinsky.

8.2 prediction of the timescale people start to tap to

polyrhythms

The second prediction is based on modifying a hypothesis suggested and

subjected to limited tests by Large (2000). According to the original

hypothesis, the time it takes for humans to start tapping along with a

given stimulus - as soon as they feel a beat - can be associated with the

time it takes for oscillators to reach relaxation time, i.e., a peaked and

Page 183: Testing and analysis of a computational model of human rhythm perception

158 8 model predictions about human behaviour

approximately stabilized amplitude response (Large, 2000). In this thesis,

we adopt a modified version of Large’s hypothesis due to the following two

reasons. Firstly, we argue that the method Large uses to test his hypothesis

(in particular the summary statistics he is referring to) needs to be revised

for concrete reasons that will be described in the text. Secondly, we propose

an alternative way of measuring the relaxation time of the oscillators’

resonances focusing on the timescale of the relaxation as opposed to an

absolute point in time where relaxation occurs. In general, relaxation time

corresponds to the time it takes for a perturbed system to return into

equilibrium. As a scientific convention, the simplest theoretical description

of relaxation as function of time t is an exponential law exp(−t/τ), where

(τ) is the relaxation time. In our case relaxation time refers to the time it

takes for an oscillator to reach a stable resonance pattern in the presence

of a stimulus, that is the time it takes for an oscillator to reach its maxima

magnitudes around which it stabilises, however some small fluctuation is

allowed.

8.2.1 Measuring relaxation time

In order to measure relaxation time, we can refer to graphs that show the

development of resonances, i.e., the way the amplitude of an oscillator

increases over time, for all the oscillators in the bank. Figure 8.1

below shows the development of such dynamics in the presence of a

4:3 polyrhythmic stimulus with 1 Hz repetition frequency. The x-axis

corresponds to time in seconds, the y-axis corresponds to frequencies of

the oscillators in the bank (in this case 768 oscillators from 0.25 Hz to 16 Hz

as discussed in Chapter 5). The color code represents the amplitude of the

oscillator over time. We can see that at the beginning of the stimulation (0

Page 184: Testing and analysis of a computational model of human rhythm perception

8.2 Prediction of the timescale people start to tap to polyrhythms 159

seconds) the amplitude of all the oscillators is close to zero. This is related

to the chosen initial condition of the amplitude of all the oscillators in the

bank (see Chapter 7).

We can see that oscillators with natural frequencies around the input

frequencies 3Hz and 4Hz of the pulse trains comprising the polyrhythm

start to resonate straight after the 1st second of the stimulation period,

and within the next three seconds of the stimulation period they attain

a quasi-stable prominent amplitude. The time to reach this amplitude

is the relaxation time. The band of oscillators with natural frequencies

that are subharmonically related to the stimulus input frequencies, start

to resonate in a delayed manner in comparison to the previous set of

oscillators, and also, they need more time to relax. In particular, these are

the oscillators with frequencies 2Hz and 1Hz. They also start resonating

from the 2nd second onwards but in a not so strong way in comparison

to the previous set of oscillators. Furthermore, they need more time to

relax in comparison to the previous set of oscillators. In addition to the

previous oscillators, there are two oscillators that are harmonically related

(in fact corresponding to the 2nd harmonics of the fundamental frequencies

of the pulse trains) to the stimulus input frequencies and exhibit similar

behaviour to the first group of oscillators described above. These are the

oscillators with 6Hz and 8Hz frequencies. Other oscillators also resonate

in the presence of the 4:3 polyrhythm and are all related to the general

framework of resonances described in Chapter 6 (see Table 6.2).

Page 185: Testing and analysis of a computational model of human rhythm perception

160 8 model predictions about human behaviour

Figure 8.1: Amplitude responses of the bank of oscillators in the presence of a 4:3

polyrhythmic stimulus with 1 Hz repetition frequency.

The above amplitude responses in the presence of a polyrhythmic

stimulus in combination with the hypothesis that the time it takes for

oscillators to relax may be indicative of the time it takes for people to start

tapping (Large, 2000) can be used to make testable predictions about the

time it takes for humans to perceive beat in polyrhythms.

8.2.2 Modifying the original hypothesis

We modify Large’s hypothesis, by focusing on the development of the

dynamics of each oscillator over time, to make predictions about the

timescale within which people start to tap along with a polyrhythm as soon

as they feel a beat, instead of focusing on the absolute point in time when

the oscillators relax. In other words, our modified hypothesis focuses on

Page 186: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 161

the timescale of the dynamics development rather than exact points in time,

such as when relaxation is reached. This allows us to make predictions

about the relative time it takes for people to start tapping.

8.2.3 Method limitations in evaluating the original hypothesis

The way Large (2000) evaluates his hypothesis is by comparing the absolute

relaxation time of an oscillator with the average time it takes for people to

start tapping. However, this method may not be the most appropriate,

given that our experimental observations suggest that the distribution of

the time to start tapping is not a normal distribution, thus the use of mean

as a measure of location may be biased.

Also in order to make his comparisons, Large (2000) uses the mean time

associated with one particular empirical dataset among the three different

datasets available in the empirical study he refers to. In other words, it

seems like a specific empirical result is paired to match the behaviour of

the model.

8.3 experimental studies in human perception of

polyrhythms

According to the predictions described in the previous section, the aim

of our new proposed experiments with people is to monitor two main

variables. These are the time it takes for people to start tapping along

with a polyrhythmic stimulus and the category of the tapping behaviour.

In Chapter 5 we have described the methodology for obtaining the above

type of empirical data.

Page 187: Testing and analysis of a computational model of human rhythm perception

162 8 model predictions about human behaviour

8.3.1 Experimental design and participants

The experiment is concerned with obtaining data for all five types of

polyrhythmic stimuli and for all ten different repetition rates employed

in the Handel and Oshinsky study. As a result, 50 combinations were

presented randomly to each one of the 37 participants of the experiment,

which gave rise to 1850 trials in total. 18 female and 19 male participants

were recruited randomly across the Open University campus. The mean

age of the group was 40 years old with standard deviation 11 years. 10

participants (27%) had 0 years of formal musical training, 12 participants

(32%) had between 1 and 5 (inclusive) years of training, and the remaining

15 participants (41%) stated formal training experience between 5 and 34

years. Almost all the participants (92%) stated they are music enthusiasts,

i.e., selected either 4 or 5 to a 5-point scale with 5 representing the most

enthusiastic.

8.3.2 Data management and visualisation

Below we present an example that illustrates the process of pooling raw

data, sorting the 1850 trials into classes, analysing each trial within each

class to identify the category of tapping behaviour (by means of finding the

statistical median of the distribution of the values of the inter-tap intervals),

creating summary datasets for each class that capture both the category

of tapping and the time it takes to start tapping, and visualizing these

datasets by using boxplots. In effect, the total number of summaries at

the end of the analysis would be the same to the number of combinations

presented to people, i.e., 50. Each summary contains a number of columns

that corresponds to the number of the categories of tapping behaviours

Page 188: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 163

observed, and numerical entries that correspond to the time it takes for

people to start tapping each one associated with a particular category. In

this chapter we present 10 summaries regarding the continuing example

of a 4:3 polyrhythm repeating at all different tempi, and a further two

summaries for a 3:2 and a 2:5 polyrhythm repeating once per second as

a sufficient number of summaries to communicate the argument of the

chapter.

8.3.3 Raw data

Figure 8.2 is an excerpt of the spreadsheet where raw data were imported

for initial sorting. It contains six randomly selected complete trials among

three participants. In particular, there are three complete trials from

participant 8, two trials from participant 9, and one trial from participant

10. Each trial can be distinguished by its ID on the top of the list, and

the rest of the numerical entries correspond to a point in time where a

tap has occurred in msec after the participants have started listening to

the polyrhythmic stimulus. The ID of that trial is denoted for example

by the following sequence of numbers 1/ 0. 3. 4. 0. The first number

(1) corresponds to the repetition frequency of the polyrhythm, which

in this case is 1 Hz. The remaining sequence of the numbers (after the

forward slash) is used to denote the type of polyrhythm presented to the

participant, which in that case is a 4:3 polyrhythm. The first numerical

entry below the ID for that particular trial is 4004 and implies that the

first tap in the trial was recorded 4004 msec after the polyrhythm was

first heard. In that trial, participant 8 has decided to finish the trial after

tapping for 8 times along with the polyrhythmic stimulus.

Page 189: Testing and analysis of a computational model of human rhythm perception

164 8 model predictions about human behaviour

Figure 8.2: Excerpt of pooled raw empirical data

So far, the raw data -in total- contains timing information on tapping

behaviours for the total number of 1850 trials. Next, we discuss how

to identify which tapping category has been chosen by the participant

and whether that category complied with a regular pulse-like tapping as

required for the purposes of the experiment. For example, in some trials

some participants chose to shadow (copy) parts or all of the polyrhythmic

pattern despite the instruction of trying to refrain from doing this, or

participants changed the mode of tapping during the course of the trial,

or they gradually drifted out of phase with respect to the targeted beat,

which eventually led the participant to abandon the trial. In such cases the

trials were dismissed from further analysis.

Page 190: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 165

8.3.4 Identifying the category of tapping behaviour: an example

A single trial from those illustrated above can be used as an example to

explain how the category of tapping behaviour for each individual trial is

identified. The number of numerical entries correspond to points in time

where successive tapping behaviours occur, therefore we can calculate the

inter-tap interval (ITI) in msec, which is the time that elapses between two

successive taps. Figure 8.3 illustrates all the ITIs associated with the first

trial of participant 8. In order to identify the chosen category of tapping

we calculated the median of the ITI entries. We have selected to calculate

the median ITI as a more appropriate statistical summary than the average

ITI because the former is less susceptible to outlying observations caused

for example by participants significantly drifting or losing ’steadiness’ in

their tapping. The median is calculated by putting the numerical entries in

ascending order and selecting the middle value in the range. In case where

the total number of entries is an odd number then the median is the single

number in the middle of the range. If the total number of entries is an even

number then the median is the average of the two values in the middle of

the range.

Page 191: Testing and analysis of a computational model of human rhythm perception

166 8 model predictions about human behaviour

Figure 8.3: Timing data for successive taps in the presence of a 4:3 polyrhythmic

stimulus repeating once every 1000 msec. The third columncorresponds to inter-tap intervals in msec for successive pairs of taps(ITIs). The median is also calculated in the bottom left-hand side of thethird column and is used to identify the mode of tapping chosen by theparticipant. In this trial the participant taps approximately every 1000

msec, therefore the mode of tapping is along with the co-occurrence ofthe two pulse trains.

In the above trial, the participant starts tapping after 4004 msec, which

means that he starts tapping for the first time at the beginning of the

5th repetition of the 4:3 polyrhythmic pattern. The median of the ITIs is

approximately 1000 msec, therefore the participant chooses to tap along

with the co-occurrence of the two pulse trains. Over the course of the trial

there is a small deviation from perfect periodicity in the tapping behaviour,

which is commonly identified in empirical studies where people are asked

to tap in a regular way. In fact, empirical studies have shown that for

a musically trained and practiced individual at least 68% percent of his

asynchronies, i.e., variability from perfect periodic tapping, will fall within

a range of 1 standard deviation, that is the 2% of the IOI (Repp, 2000).

For example, in the above case where the target is to tap along with the

co-occurrence of the two pulse trains, i.e., once every 1000 msec, a standard

deviation of 2% of the IOI would be 1000 ∗ 0.02 = 20 msec. According

to Repp’s suggestion approximately 68% of the asynchronies in this trial

would fall within a range from 980 msec to 1020 msec. Indeed, 5 out of

7 of the above ITIs (approximately 71%) fall within the aforementioned

Page 192: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 167

range. This result implies a trained and practiced individual according to

Repp’s findings. Interestingly, in the personal data form filled prior to the

experiment participant 8 reported 15 years of formal musical training and

45 years of practicing experience. However, a collective analysis for all the

50 trials undertaken by that participant would be needed to provide a more

accurate view on the participant’s sensorimotor synchronization skills.

In any case, for trials with similar characteristics to this, such as small

deviation around the targeted beat and consistency in keeping the same

beat, the use of the median to identify the category of tapping is meaningful.

The above process of calculating the median and as a result identifying the

category of tapping behaviour was repeated for all 1850 trials. Some of the

trials were dismissed from further analysis on the basis described in the

last paragraph of the previous section. An example of a dismissed trial is

illustrated below (Figure 8.4). The presented polyrhythm is of a 4:3 type

repeating once every 600 msec.

Figure 8.4: An example of a dismissed trial.

Page 193: Testing and analysis of a computational model of human rhythm perception

168 8 model predictions about human behaviour

Based on the first three numerical entries of ITIs in column 3 of Table 8.3

we may infer that the participant starts tapping without having an explicit

pulse-like perception already formed in his mind. From the 4th ITI onwards

we may further infer that the participant copies part of the polyrhythmic

stimulus inasmuch there is an alteration of two ITIs of rough durations

around 400 msec and 800 msec.

The outcome of the above processing of raw data (in terms of ITI and

Median calculation) leaves a pool of trials that is less than 1850 in total,

and for each one of those trials left in the usable pool we have two pieces

of useful information that can eventually be compared with the neural

resonance model predictions. In other words, for each trial the category

of a tapping behaviour and the time it takes for a participant to start

tapping in this way along with a polyrhythmic stimulus are both known.

One convenient way to get an idea of how many trials are dismissed

is to look at the collective results for each individual combination, i.e.,

polyrhythmic type and presentation rate, which is presented in the next

section. For example, one particular combination would ideally contain 37

trials assuming that all the participants perform in line with the instructions

of the experiment. However, it is expected that the frequency of successful

trials varies across combinations. In the next section we give an example of

creating a collective dataset that provides information about the categories

of tapping behaviours and their corresponding times, i.e., the time it takes

to start tapping, with regard to a specific combination, i.e., a 4:3 polyrhythm

repeating once per second.

Page 194: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 169

8.3.5 Creating a dataset for each combination, i.e., polyrhythm type and tempo

The total number of trials is organized into 50 datasets, each one

representing a different combination. For each combination the number

of columns of the dataset corresponds to the total number of observed

categories of tapping behaviours. The numerical entries in each column

are independent to each other and correspond to the time it takes to start

tapping for different participants. Figure 8.5 illustrates one such dataset for

a combination whereby the type of a polyrhythm is 4:3 and the repetition

rate 1000 msec.

Figure 8.5: Collective data for a single combination (in this case a 4:3 polyrhythm

repeating once per second). Each column represents a single categoryof tapping and each numerical entry corresponds to the time it takesfor a single participant to start tapping.

The adopted convention for describing the tapping categories is to

express it in relation to the pulse trains of the polyrhythmic pattern. So in

Figure 8.5, the first column corresponds to tapping along with the 4-pulse

train and it is labeled as ’4-pulse train’, the second column corresponds

to tapping along with the 3-pulse train mode and it is labeled as ’3-pulse

train’, the third column corresponds to tapping along with co-occurrence of

Page 195: Testing and analysis of a computational model of human rhythm perception

170 8 model predictions about human behaviour

the two pulse trains and by following Handel and Oshinsky’s convention

it is labeled as ’unit meter’, the fourth column corresponds to tapping

along with every other element of the 4-pulse train therefore it is labeled

as ’1st subharmonic of the 4-pulse train’. The last column corresponds to

a tapping mode where the participant groups 6 pulses from the 4-pulse

train and it is labeled as ’grouping 6 pulses of a 4-pulse’. A similar

convention is employed for all 50 datasets. The numerical entries in each

column correspond to the time it took for participants to start tapping in

the presence of the polyrhythmic stimulus. The above dataset contains

information about the categories of tapping behaviours along with a 4:3

polyrhythm, the time to start tapping, and also, preferences of categories in

terms of the ratio number of numerical entries per category / total number

of entries. Also, the total number of numerical entries is 37, which means

that in this particular combination no trial has been dismissed. In the next

section we introduce the use of boxplots as a convenient way to visualise

concisely the dataset described in Figure 8.5.

8.3.6 Visualisation of data using boxplots

Considering the two experimental variables described above, i.e., category

of tapping behaviour and time it takes to start tapping, the most

appropriate graphical display/visualization of the obtained data is the use

of multiple boxplots. Figure 8.6 illustrates a graph of multiple boxplots

corresponding to the data presented in Figure 8.5.

Page 196: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 171

Figure 8.6: Multiple boxplot summarizing the categories of tapping and the range

of time it takes for people to start tapping along with the given 4:3polyrhythm repeating once every 1000 msec.

The x-axis corresponds to the time that elapses after the polyrhythm is

first heard. On the y-axis the multiple categories of tapping behaviours are

listed. For each category there is a boxplot associated with it. Each boxplot

consists of the grey area box where 50% of all the timing observations fall.

The right edge of the box corresponds to the upper quartile Q3 (75% of all

observations) and the left edge to the lower quartile Q1 (25% percent of

all observations). The line splitting the box is the median time it takes for

people to start tapping. The difference between upper quartile and lower

quartile is the inter-quartile range (IQR), which is also used to define the

minimum and maximum values of the range, or lower and upper adjacent

values respectively. These two values are furthest away from the median

on either side of the box, but are still within a distance of 1.5 times the IQR

Page 197: Testing and analysis of a computational model of human rhythm perception

172 8 model predictions about human behaviour

from the nearest end of the box, i.e., the nearest quartile. The lower and

upper adjacent values are denoted by the end of the whiskers (the lines

extending potentially from either ends of the box). The stars correspond to

outliers, i.e. observations that do not fall in the range described above.

Figure 8.7 provides complementary details about the exact numerics

regarding the multiple boxplot presented above. For example, it provides

an accurate picture regarding the number of people who have chosen to

tap in a certain ways. This information is useful for creating the frequency

distribution of the categories of tapping behaviour.

Figure 8.7: Descriptive statistics for the 4:3 polyrhythm repeating once per second.

The visualization of tapping data using boxplots in the way describedabove allows us to access information with regard to the two centralexperimental variables of the empirical tests, namely the categoriesof tapping behaviours and their corresponding time it takes forpeople to start tapping both in terms of range and median. In thelast part of this section we list graphs of multiple boxplots for theremaining combinations with respect to a 4:3 polyrhythm repeating atthe remainder 9 tempi. Multiple boxplot graphs regarding the 3:2 and5:2 polyrhythm are presented in the last section of the chapter.

Page 198: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 173

8.3.7 Empirical results of tapping along with a 4:3 polyrhythm for different tempi

Figure 8.8: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 2500 msec.

Page 199: Testing and analysis of a computational model of human rhythm perception

174 8 model predictions about human behaviour

Figure 8.9: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 2000 msec.

Figure 8.10: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 1800 msec.

Page 200: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 175

Figure 8.11: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 1600 msec.

Figure 8.12: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 1400 msec.

Page 201: Testing and analysis of a computational model of human rhythm perception

176 8 model predictions about human behaviour

Figure 8.13: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 1200 msec.

Figure 8.14: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 800 msec.

Page 202: Testing and analysis of a computational model of human rhythm perception

8.3 Experimental studies in human perception of polyrhythms 177

Figure 8.15: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 600 msec.

Figure 8.16: Multiple boxplots for the range of tapping modes and their

corresponding period of time within which people start tapping alongwith a 4:3 polyrhythm repeating once every 400 msec.

Page 203: Testing and analysis of a computational model of human rhythm perception

178 8 model predictions about human behaviour

8.4 evaluation of the model’s predictions

The third and last section of this chapter focuses on examining whether the

predictions on human tapping behaviour made by the neural resonance

model (described in Section 7.1) can be observed in the empirical data

presented in the previous section. The comparison is two-fold, one

concerning the categories of tapping and the other the period of time it

takes for people to start tapping. The section begins by comparing the

predicted categories of tapping with those found in the experiments.

8.4.1 Extension of categories of tapping behaviours

A quick review of the boxplots presented in the previous section reveals

that in general, tapping behaviours agree with the findings reported by

Handel and Oshinsky (1981). In other words, people tap along with either

of the pulse trains, along the co-occurrence of the two pulse trains, and

along with every other element of the pulse trains. However, there are

cases where the tapping behaviour performed by people goes beyond these

categories, but yet is in line with the predictions suggested by the neural

resonance model. In particular, these are categories of tapping behaviours

that correspond to: tapping the 2nd harmonic of the 4-pulse train when the

pattern is repeated once every 2500 msec (5 observations) and 2000 msec (1

observation).

Also, there are additional empirical observations of tapping behaviours

not observed in the Handel and Oshinsky, which are less frequent at least

with regard to the rest of the observations, but nonetheless they imply a

pulse like tapping along with the presented polyrhythm. Furthermore, for

these categories of tapping behaviours there are no frequency responses

Page 204: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 179

in the neural resonance model that could be associated with. In these

behaviours people choose to tap the first subharmonic of the unit pattern

when the stimulus is repeated once every 800 msec (1 observation), 600

msec (2 observations) and 400 msec (1 observation). Another novel

empirical tapping behaviour can be found in the case where the 4:3

polyrhythm is repeated once per 1000 msec, where the participant groups 6

pulses of the 4-pulse train. The inter-onset interval of those pulses is 1000:4

= 250 msec, therefore the participant produces a pulse like response with a

period of 1500 msec, or expressed in terms of rhythmic frequency 0.666 Hz.

The second observation is related to the 4:3 polyrhythmic pattern repeating

once every 1200 msec. The participant groups three pulses of the 4-pulse

train. The inter-onset interval of those pulses is 1200:4=300 msec, therefore

the participant produces a pulse like tapping with a period of 900 msec, or

1.11 Hz. This tapping behaviour is similar to the one described above in a

sense that they both group pulses of the 4-pulse train. In the latter case 3

pulses are grouped while in the former 6.

In general, on the one hand, the model correctly predicts the extension of

the categories of tapping behaviours to include for example 2nd harmonics

of a pulse train. On the other hand, the experiment has identified new but

relatively rare ways of how people choose to tap periodically along with a

given polyrhythm and for these cases there is no prediction from the model

associated with them.

8.4.2 Evaluating oscillator relaxation time

The second part of the evaluation is concerned with the neural resonance

model’s predictions about the relative time it takes for people to start

tapping in different modes. The rationale of the comparison is a

Page 205: Testing and analysis of a computational model of human rhythm perception

180 8 model predictions about human behaviour

modification of a hypothesis suggested by Large (2000) according to which

the relaxation time of an oscillator can be used to simulate the time it takes

for people to start tapping - as soon as they feel a beat - along with a

given rhythm. Large (2000) tested his hypothesis by comparing with his

model the time it took for humans to start tapping along with a ragtime

stimuli. For this purpose, the relaxation time of a network of oscillators

was compared with beats to start tapping (BST) (Snyder and Krumhansl,

2000) explained in detail below. Based on the outcome of this comparison

Large (2000, p. 555) argues "the model’s relaxation time squares reasonably

well with human performance". In this section, a similar comparison is

undertaken for the case in which people are tapping to polyrhythms.

A note should be made to consider the difference between the time

needed for people to perceive a beat and the time it takes to translate

the perceived beat into an action. Firstly, we may reasonably assume that

beat perception precedes tapping to the beat. Secondly, since the neural

resonance model simulates the development of neural dynamics that give

rise to the perception of beat, then the rationale suggested by Large (2000)

whereby tap time is directly compared with oscillator dynamics may need

to be adjusted to accommodate the time difference between perception and

action. In the experiments described in the previous section a modest effort

has been made to minimise the time difference between perception and

action by emphasising to the participants to start tapping as soon as they

feel the beat.

Before we continue with the evaluation of the prediction with which this

section is concerned, we juxtapose the similarities and differences between

the set up of the study (e.g., model details and experimental variables)

adopted in this thesis with the one in the Large (2000) study. Furthermore,

we give a detailed version of the argument that a comparison between

Page 206: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 181

the time it takes for people to start tapping with the time it takes for

an oscillator to relax as conducted by Large (2000) should be preferably

conducted by focusing on a timescale rather than exact points in time.

8.4.2.1 Differences and similarities between studies

The network of oscillators in the Large (2000) study is similar to the neural

resonance model used in this thesis. It is important to note that the

relaxation time of the oscillators depends on the numerical values given in

the free parameters of the model (Large, 2000, p. 541-545), however Large

states that they were not explicitly adjusted to maximize goodness of fit to

the empirical data described below. The relaxation times of the simulations

presented in this section are obtained using the nominal values described

in Chapter 7.

There is a slight difference between the two set-ups in the way the time it

takes for people to start tapping is quantified. In Large’s study the source

of empirical data is from the Snyder and Krumhansl (2000) study. There

the time it takes for people to start tapping is expressed in terms of beats to

start tapping. In order to reflect this, Large expresses the time it takes for an

oscillator to reach relaxation in terms of number of beats. So for example

if one of the resonating oscillators has a period of 250 msec, and 1500 msec

are needed for the oscillator to relax, then the latter divided by the former

provides the time is needed to relax in terms of beats. So for this particular

oscillator 1500 : 250 = 6 beats are needed before the oscillator reaches

relaxation. In this thesis however, neither the time it takes for people to

start tapping nor the time it takes for oscillators to relax is expressed in

terms of beats but instead it is expressed in time units (msec).

Page 207: Testing and analysis of a computational model of human rhythm perception

182 8 model predictions about human behaviour

8.4.2.2 Limitations in the method of testing the original hypothesis

In general, there is no reason to doubt that a modelled oscillator reaches

relaxation at some fixed point in time (given a particular set of initial

conditions). Large’s results in testing his hypothesis support that the above

time squares well with the mean time it takes for people to start tapping.

However, in the following three paragraphs we identify weaknesses and

limits in Large’s arguments and claims.

As previously noted, Large (2000) refers to the Snyder and Krumhansl

(2000) study in order to obtain empirical data regarding the time it takes

for people to start tapping. In that empirical study there are three groups of

participants reported. Two of the groups consist of musically experienced

participants and one group of non-experienced. All three groups are

asked to identify and tap a beat along with a ragtime musical stimulus.

In the first group of 8 musically experienced participants Snyder and

Krumhansl report a mean BST time of 5 and 6 beats for pitch varied and

monotonic versions of the stimulus respectively. In the second group, the

12 musically experienced participants were asked to perform the same

task while listening to the same stimulus. The mean BST time was 3 to

4 beats in this case. Snyder and Krumahansl report a very low standard

error for the both mean BST times, meaning that this sample mean time

is quite close to the population mean. Lastly, the mean BST time for the

group of 8 musically inexperienced participants was 9 and 11 beats for

pitched-varied and monotonic versions of the ragtime stimulus respectively.

Taken together for both experienced and inexperienced participants the

mean BST ranges from 3 beats to 9 beats for the pitch-varied version of

the ragtime stimulus, and 4 to 11 for the monotonic version. Despite this

variation on the mean BST time, Large focuses on the mean BST of the

particular group of 12 experienced participants to conduct the comparison

Page 208: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 183

with the BST time predicted by the model, and thus, his argument that the

model’s relaxation time squares reasonably well with human performance

relies on choosing a particular group of participants.

Furthermore, Large (2000) focuses exclusively on the mean BST time and

compares it with the exact point in time at which an oscillator relaxes.

However, a mean BST time is a measure of location that summarises the

variation of BST time observed among participants. In fact, Snyder and

Krumhansl report variation with regard to the BST time. For example, for

the specific experiment, which is also the reference point for the Large

(2000) study, Snyder and Krumhansl report that 75% of the musically

experienced subjects (9 out of the 12) have started tapping after 4 beats,

while 10 beats are needed for 95% of the subjects to start tapping. From

that perspective a measure that focuses on the variation of BST time would

be more appropriate than the mean BST time. Also, we have already

mentioned that the empirical data regarding the absolute time to start

tapping do not conform with a normal distribution, thus calculation of

mean may be inappropriate. A quick way to check that our own empirical

data are not distributed normally is by observing that the boxplots are not

symmetrical in general even for those cases where a relatively large number

of observations are associated with a particular boxplot. For example, in

Figure 8.6 (repeated below as Figure 8.17) the boxplot corresponding to

people start tapping along with every other element of the 4-pulse train

(1st subharmonic of the 4-pulse train) contains 15 observations and it is

not symmetrical. Another example, where the frequency of a particular

tapping behaviour exceeds 25 observations (Figure 8.9 - Boxplot for 3-pulse

train - 26 observations) the boxplot for that particular behaviour implies a

right skewed distribution, thus the use of a mean is certainly not the best

numerical summary for the center of that skewed distribution. In any case,

Page 209: Testing and analysis of a computational model of human rhythm perception

184 8 model predictions about human behaviour

a larger sample of observations may be necessary to unveil the exact shape

of distribution of the time it takes for people to start tapping along with a

given rhythm in general, thus to inform which type of summary statistics

should be used.

For these reasons, we suggest that trying to compare the mean time

it takes for people to start tapping with the particular time at which

an oscillator reaches relaxation may not be the most appropriate way to

support the hypothesis whereby a model like the neural resonance model

can make predictions about timing aspects of human tapping behaviour.

Instead, we suggest that the focus should be on the temporal development

of the oscillator dynamics, which as we will see, shares the same timescale

with the period of time within which people start tapping to rhythms.

In this way, the neural resonance model can be used to make plausible

predictions about timing aspects of human tapping behaviour.

8.4.2.3 Evaluation

Figure 8.1, which is repeated below as Figure 8.17 for convenience, shows

the development of the oscillatory dynamics over time in the presence of

a 4:3 polyrhythm with 1 Hz repetition frequency. For some oscillators the

transient stage (the stage until a relaxation is reached) is longer than others.

For example, the oscillators with frequencies 3 Hz and 4 Hz relax after 4

seconds, while the oscillators of 2 Hz and 1 Hz frequency are still in their

transient stage in the 4th sec of the stimulation. Also, the oscillators that

reflect the metrical structure of the 4:3 polyrhythm (e.g., 4Hz, 3Hz, 1Hz,

etc.) start to resonate from the very early stages (after the 1st second) of

the stimulation and they stay in transience for some time before they reach

relaxation.

Page 210: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 185

Figure 8.17: Temporal development of resonances among the bank of oscillators

in the presence of a 4:3 polyrhythm repeating once per second. Theperiod of time at which oscillators start to resonate begins almostimmediately after stimulation. The transient stage (the time untilrelaxation is reached) is different for each oscillator, but nonethelessit spans a period of several seconds. For example, oscillators withfrequencies around 3 Hz and 4 Hz relax within the first 4 seconds ofthe stimulation.

Page 211: Testing and analysis of a computational model of human rhythm perception

186 8 model predictions about human behaviour

Figure 8.6 is repeated below to illustrate the periods of time within which

tapping modes along with a 4:3 polyrhythm repeating once per second

take place. There is no distinction between experienced and inexperienced

participants. We can see that the periods of time to start tapping share

the same timescale to the periods of transient dynamics showed above.

For example, for all 5 categories of tapping associated with this particular

combination, people start tapping after 1000 msec. This is similar to

the time it takes for the resonances to start developing as shown in the

oscillators dynamics of Figure 8.1. Also, the temporal period within

which each empirical tapping mode is observed is similar to the transient

period in the oscillatory dynamics, i.e., first seconds after the stimulus is

heard/presented.

Figure 8.18: The period of time it takes for people to start tapping in a given mode

along with a 4:3 polyrhythm repeating once every 1000 msec.

In general, the timescale of the period within which people start tapping

along with a polyrhythm and the period of transient dynamics is of the

Page 212: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 187

same magnitude for all types of polyrhythms and presentation rates. We

further present empirical data related to two more types of polyrhythm, i.e.,

3:2 and 2:5 repeating once every second, and the corresponding predictions

by the model. An exhaustive presentation of graphs covering the 50

combinations is achievable, however it is not included here for purposes

of concise presentation.

Page 213: Testing and analysis of a computational model of human rhythm perception

188 8 model predictions about human behaviour

Figure 8.19: The modes of tapping and their corresponding periods of time

(boxplots) within which people start tapping along with a 3:2polyrhythm repeating once every 1000 msec.

Figure 8.20: Temporal development of resonances in the network in the presence

of a 3:2 polyrhythm repeating once every 1000 msec.

Page 214: Testing and analysis of a computational model of human rhythm perception

8.4 Evaluation of the model’s predictions 189

Figure 8.21: The modes of tapping and their corresponding periods of time

(boxplots) within which people start tapping along with a 2:5polyrhythm repeating once every 1000 msec.

Figure 8.22: Temporal development of resonances in the network in the presence

of a 5:2 polyrhythm repeating once every 1000 msec.

Page 215: Testing and analysis of a computational model of human rhythm perception

190 8 model predictions about human behaviour

8.5 summary

In this chapter we used our previous analysis of the response of the neural

resonance model to polyrhythms to create two new classes of prediction

about human behavior in tapping to polyrhythms. In order to test these

new predictions, we carried out a new empirical human study on tapping

to polyrhythms. This new study collected substantially more data and

analysed it in more depth than the previous existing study due to Handel

and Oshinsky (1981).

The first of the predictions posited that the general categories of

tapping behaviours along with a polyrhythm reported by the study of

Handel and Oshinsky, was incomplete, and that specific new kinds

of human tapping behavior should be observable. These predicted

and previously undocumented human tapping behaviours were then

subsequently observed. In the second prediction we referred to a

hypothesis due to Large (2000) whereby a connection can be established

between the evolution of oscillator dynamics and time taken for people to

start tapping along with a rhythm as soon as they feel a beat. The need

for a modified version of Large’s hypothesis was argued and a modified,

more defensible version was framed accordingly. In particular, this more

plausible version of the original hypothesis suggested that the timescale

of a relaxation, as opposed to an absolute point in time where relaxation

occurs, is more appropriate for measuring the phenomenon. The modified

version of the hypothesis suggested a novel set of summary statistics for

evaluating the revised hypothesis.

To reflect on this summary in a little more detail the nature of predictions

of the model led to conducting a series of novel empirical experiments in

polyrhythm perception in order to acquire the necessary data. In particular,

Page 216: Testing and analysis of a computational model of human rhythm perception

8.5 Summary 191

we have conducted an extended version of the study reported by Handel

and Oshinsky (1981) with many more data points, supporting a deeper

level of analysis. A range of tapping modes were identified and the time it

took for people to start tapping along with the polyrhythm was analysed.

Subsequent comparisons between empirical data and predictions showed

that the model correctly predicted the extension of the categories of tapping

to include 2nd harmonics of the input frequencies and subharmonics of the

frequency repetition of the pattern, and also that the period of the temporal

development of the dynamics in the model is relative to the period of time

within which people start tapping along with a polyrhythm.

Lastly, the empirical data have shown tapping modes that are not

predicted by the neural resonance model nor they have been found

in previous empirical studies, a finding that complements further our

understanding of the way people perceive polyrhythms. A series of

limitations have been exposed and interpreted as opportunities for future

work.

Page 217: Testing and analysis of a computational model of human rhythm perception
Page 218: Testing and analysis of a computational model of human rhythm perception

9C O N C L U S I O N S , L I M I TAT I O N S A N D F U RT H E R W O R K

This thesis has tested and analysed the theory of neural resonance, a

physiologically plausible theory of human rhythm perception, in the

context of modeling certain aspects of human polyrhythm perception.

According to the theory, human rhythm perception is directly related to

neurodynamics (Large, 2008). We have investigated the extent to which the

theory accounts for certain aspects of human polyrhythm perception, in

particular the way people tap along with polyrhythmic stimuli in a periodic

way and the time it takes for them to start tapping. The former human

behaviour is associated with the identification of a beat in a polyrhythmic

stimulus and the latter is associated with the time it takes to perceive a beat.

We have used the theory to make new predictions about human behaviour

and we have carried out new empirical studies to test those predictions. In

the following three sections we review the results, discuss the limitations

and identify directions for future work.

9.1 conclusions

In general, we have focused on two behavioural aspects of human

polyrhythm perception which the neural resonance model addresses. In

particular, the neural resonance model responds to appropriate stimuli with

responses that reflect both the different ways in which people tap regularly

along with polyrhythmic stimuli, and the timescale according to which they

193

Page 219: Testing and analysis of a computational model of human rhythm perception

194 9 conclusions , limitations and further work

begin to tap. The methodology of this work involves stimulation of the

model with different polyrhythmic stimuli at different tempi, and analysis

both of the frequency response pattern and of the development of the

resonant dynamics. In particular, the proposed methodology for studying

the second aspect of the human behaviour described above, constitutes

an original alteration to an existing methodology for concrete reasons

that have been described in Chapter 8. Briefly, we proposed that the

timescale of relaxation should be considered rather than a specific point

in time where relaxation occurs, and secondly, we proposed a different set

of summary statistics - to describe empirical data - as more appropriate

to those utilised by the existing methodology. Additionally, we checked

for consistency in the model’s behaviour by systematically manipulating

the numerical values of the parameters of the state variable, the initial

conditions of the oscillators, and the amplitude of the polyrhythmic

stimulus, and found consistency of results for a range of numerical

values (within an appropriately defined range). This methodology has

originally been developed for the purposes of this thesis and to our

best knowledge it is the first time that such an analysis is published.

Furthermore, we examined the degree to which the neural resonance model

behaves similarly to the less abstracted and more directly biologically

rooted model of Wilson and Cowan (1972), in order to limit the degree

of uncertainty of the former regarding its biological assumptions. We

have found that the two models behave similarly and thus, concluded

that we can place a reasonable limit on uncertainty with regard to the

biological assumptions made by the neural resonance model. The theory

of neural resonance unlike many theories relates high level perception, like

rhythm perception, to behaviour of neurons, so it is important to test it

in order to explore its limits and strengths in our effort to understand the

Page 220: Testing and analysis of a computational model of human rhythm perception

9.1 Conclusions 195

human brain. As imaging techniques become widely spread in the future

new techniques will become available to test the model. Nonetheless the

existing research has unearthed previously unseen human behaviour thus

making the theory of neural resonance a good candidate for examining

potential physiological plausible ways of how high level behaviour can be

explained in terms of brain dynamics. In the introduction of this thesis

we have underlined the importance of rhythmic stimuli in regulating brain

activity and briefly discussed the therapeutic applications of interventions

that utilise rhythmical stimuli to treat various psychological and biological

diseases associated with brain malfunction. Theories of rhythm such as

the neural resonance theory may lead to a better understanding of the

physiological underpinnings of rhythm perception which could further

lead to the development of carefully tuned and customised interventions

to promote human well-being in a non-invasive way.

9.1.1 Validation against existing empirical data

The first research question seeks to examine whether the neural resonance

model can account for existing empirical data in polyrhythm perception.

The question as stated in Chapter 1 is repeated below for convenience.

Question 1: To what extent can the theory of neural resonance provide an

account for existing empirical data on how humans perceive polyrhythms?

In order to address the above question we conducted a series of tests

with the model. The design of the tests mirrored existing experimental

studies in human polyrhythm perception, and took advantage of existing

empirical data about how people perceive the beat in polyrhythmic stimuli

Page 221: Testing and analysis of a computational model of human rhythm perception

196 9 conclusions , limitations and further work

under different circumstances. This allowed the behaviour of the model to

be compared with human behaviour. In particular, our tests mirrored the

experimental study of Handel and Oshinsky (1981) in which people were

asked to tap along with a polyrhythm in a regular pulse-like way. Handel

and Oshinky presented five different types of polyrhythms to people at ten

different tempi. We used the same polyrhythmic stimuli to stimulate the

neural resonance model. Our tests showed that the neural resonance model

produced frequency response patterns whose categories matched the full

range of human tapping behaviour reported by Handel and Oshinsky. This

was the case for diverse polyrhythms and diverse tempi as used by Handel

and Oshinsky .

9.1.2 Evaluation of model’s predictions

The second research question seeks to examine the model’s ability in

predicting aspects of human behaviour in polyrhythm perception. The

question as originally posed in Chapter 1 is repeated below.

Question 2: Can we make novel predictions about human behaviour

in polyrhythm perception based on the theory of neural resonance and

subsequently test these predictions empirically?

We saw in Chapter 6 that when the model is stimulated with polyrhythm

parts of the generalised frequency response pattern includes resonances

not identified in Handel and Oshinsky’s studies of human behaviour.

Some of these resonances correspond to tempi faster than humans can tap,

according to limits discovered in sensorimotor synchronization studies (see

Chapter 5, Subsection 5.2.2). However, other resonances fall in the humanly

Page 222: Testing and analysis of a computational model of human rhythm perception

9.1 Conclusions 197

tappable range. These resonances can be thought of as predictions made

by the model about human behaviour. More specifically, the predictions

suggest that (subject to tempo limits) human tapping behaviour should

include tapping responses that reflect 2nd harmonics of the fundamental

frequencies of the polyrhythm. We carried out a larger scale, more detailed

version of the experiments conducted by Handel and Oshinsky, and found

humans behaving in the way predicted by the model and overlooked by the

earlier experimenters. More specifically, we identified tapping frequencies

that correspond to 2nd harmonics of a 4-pulse train of a 4:3 polyrhythm

repeating once every 2500 msec and 2000 msec. We also framed and tested

a second prediction about human tapping behaviour concerning the relative

time it takes for people to start tapping along with different polyrhythms

(Chapter 8). This prediction was derived from the relative time it takes

for an oscillator to reach relaxation in the presence of different rhythmic

stimuli. This prediction was also tested by conducting novel experiments

with people tapping to polyrhythms. More specifically, the relative time it

took for people to start tapping along with a polyrhythmic stimulus was

monitored for all five types of polyrhythms and for all repetition rates

reported by Handel and Oshinksy. To the best of our knowledge, our

experiment appears to be the first occasion in any literature that the relative

time to start tapping along with a polyrhythm has been empirically tested.

Our analysis showed that the relative timescale within which people start to

tap along with different polyrhythms matches the relative timescale within

which the resonances of the oscillator develop until they reach relaxation.

Page 223: Testing and analysis of a computational model of human rhythm perception

198 9 conclusions , limitations and further work

9.2 limitations

In this section we discuss the limitations of the research.

1. The model predicts categories of tapping but fails to predict the

relative popularity of those categories at different tempi (Chapter 6).

A reasonable candidate hypothesis (based on some neurophysiological

evidence - subsection 6.2.3) would be to use the magnitude of the amplitude

response of the different oscillators in the model to predict the relative

preferences of the listeners for different beats in a polyrhythm. In other

words, the strongest amplitude in the frequency response pattern of the

model might be expected to predict the most popular human preference

for a particular way of tapping, and so on for the next strongest amplitude,

etc. In fact, such a prediction fails to match the human relative popularity of

tapping categories.

2. The model not only fails to track the relative popularity of tapping

categories at different tempi - it features a limitation that appears to

virtually guarantee this failure. The problem is that the relative popularity

of human tapping categories changes for different tempi. By contrast, in

the case of the model, the effect of different tempi is simply to shift the

amplitude pattern of the frequency response along the x-axis (frequency

axis). In other words, when humans tap to a polyrhythm in a regular way,

the preferences of tapping observed are a function of absolute tempo. For

example, for fast presentation rates of n:m polyrhythms there is a higher

preference in tapping along with the co-occurrence of the two constituent

pulse trains than for slow presentation rates (Handel and Oshinsky, 1981).

However, when stimulating the model with an n:m polyrhythm (Chapter

6) at various tempi the frequency response pattern of the model remains

invariant in terms of the relative amplitude levels.

Page 224: Testing and analysis of a computational model of human rhythm perception

9.3 Further Work 199

3. The model does not predict all human categories of tapping, albeit it

represents all common ones. More specifically, in our own experimental

studies reported in Chapter 8 we identified some very rare tapping

behaviours in which participants grouped six pulses of the 4-pulse train of a

4:3 polyrhythm repeating once per 1000 msec, and grouped three pulses of

the 4-pulse train of a 4:3 polyrhythm repeating once per 1200 msec. These

behaviours, although rare, are not predicted by the model.

4. In this research, we chose to configure the neural resonance model using

a single bank of oscillators and without interconnectivity among oscillators.

There is some evidence that the perception of rhythm takes place in

multiple, spatially distinct brain regions (Chen et al., 2008; Karabanov et al.,

2009). Thus there can be arguments for more complex models, using

interconnected multiple banks of oscillators. However, the model we have

chosen has the advantage of simplicity and parsimony. Also, it is unclear

how more complex models should be parameterized within a very large

configuration space while retaining physiological plausibility.

5. Although the model accepts input in a variety of forms (Chapter 5

and 7), in each case, physical characteristics of the stimulus such as pitch

and timbre have been abstracted from the stimulus, leaving bare rhythmic

information. Consequently, the model gives little indication how these

musical dimensions might affect rhythmic processing. (Jones, 1987).

9.3 further work

1. The existing model predicts categories of tapping but the amplitudes

of resonances do not predict the relative popularity of those categories at

different tempi. It may be unreasonable to expect that they should, but

if we make the assumption (based on some evidence provided by brain

Page 225: Testing and analysis of a computational model of human rhythm perception

200 9 conclusions , limitations and further work

imaging studies) that some hidden connection exists, then various issues

need resolving. For example, the effect of tempo changes on the model

is unlike the effect on human tapping behaviour, as the amplitude pattern

of the frequency response is simply shifted along the x-axis. It could be

that the model correctly models the state of oscillators in the brain, but

that other factors moderate tapping behaviour such as musculo-skeletal

capabilities of how fast one can tap, or it could be that our model of

oscillators in the brain is wrong. One useful project would be to see if

either of these hypotheses can be supported or ruled out by the application

of brain imaging techniques.

2. One family of useful future research projects would be to explore the

extent to which extended models using interconnected banks of oscillators

might be able to model certain aspects of human rhythm perception more

faithfully than the single bank model of oscillators used in the present

work. For example, non-connectivity focuses the response of the network

exclusively on external rhythmic stimulus (Large et al., 2010), whereas in

some cases a better model might include influences from other processes

(e.g. motor activity, or remembered rhythmic patterns). Such processes

might be modelled by multiple connected banks of oscillators. There is

some physiological justification for such variants of the model. Recent

brain-imaging studies (Chen et al., 2008; Karabanov et al., 2009) suggest that

rhythmic information is represented across broad cortical and subcortical

networks and that both auditory and motor areas play key roles in both

rhythm perception and production. Velasco (personal communication),

suggests that employing more than one bank of oscillators might be used to

model two communicating distant areas of the brain, such as the primary

auditory cortex and supplementary motor area, with the output of one

bank be employed to stimulate another. Clearly, such directed stimulation

Page 226: Testing and analysis of a computational model of human rhythm perception

9.3 Further Work 201

could be used in principle to model afferent connections where peripheral

information is carried inwards towards parts of the central nervous system,

e.g., parts along the auditory pathway and the auditory cortex. Brain

imaging techniques could be used to further constrain exploration of such

models in physiologically plausible ways.

3. A related family of useful future research projects could explore the

extent to which extended models using interconnectivity among oscillators

in a single bank (as opposed to connections between banks of oscillators)

might be able to model certain aspects of human rhythm perception more

faithfully than the unconnected oscillators used in the present work. There

is at least general physiological evidence for considering more complicated

models. For example, interconnected oscillators within one network

are a more accurate representation of the underlying connectedness of

neural populations in the auditory nervous system than the non-connected

oscillators are (Hoppensteadt and Izhikevich, 1996a). Tuning the model

used in this thesis and comparing with human behavior might also help

explore these issues. For example, as a preliminary step towards such a

pilot, we briefly explored the frequency response pattern of the network

in the presence of a 4:3 polyrhythm (Figure 9.1) with connectivity among

oscillators in one network switched on, i.e., giving parameter (c) (Equation

5 - Chapter 4) an arbitrarily chosen small numerical value (0.1).

Page 227: Testing and analysis of a computational model of human rhythm perception

202 9 conclusions , limitations and further work

Figure 9.1: Frequency response of a network of interconnected oscillators in the

presence of a 4:3 polyrhythmic stimulus of 1 Hz repetition frequency.

4. One way to model learning processes in rhythm cognition involves the

use of connections between individual oscillators and between multiple

oscillator banks. Loosely speaking, one could start with all oscillators

weakly connected to each other, and to strengthen connections between

oscillators whenever they oscillate at the same time, and to allow all other

connection to atrophy. This version of Hebbian learning (Hoppensteadt

and Izhikevich, 1996b) could model the way that learning appears to bias

perception and action towards the familiar and away from the unfamiliar.

Large (2011) has suggested that the neural resonance model can be adapted

to demonstrate such processes, like for example the creation of tonal fields

in the central nervous system that reflect different musical cultures. The

design, implementation, testing and clear dissemination of new versions

of the model for exploring these issues would constitute a valuable future

contribution to knowledge.

5. A further possibility for future work lies in improving the way the

Page 228: Testing and analysis of a computational model of human rhythm perception

9.3 Further Work 203

oscillators’ relaxation time is estimated for the purposes of comparison with

the time it takes for people to start tapping along with the polyrhythm.

In the thesis, for pragmatic reasons, relaxation time is calculated by

producing amplitude response graphs of the network in the presence of

a polyrhythmic stimulus, and subsequently, by qualitatively observing the

approximate point in time where the amplitude reaches its peak response

and becomes stabilized. However, a more rigorous way to calculate the time

to relaxation would be by expressing relaxation time as a function of time

(Section 8.2). A small project taking advantage of this method would allow

the identification of the exact point in time at which an oscillator relaxes,

allowing more precise tests of the model.

6. The present thesis motivates various further empirical tests of human

behavior. For example, we previously conducted empirical work (Chapter

8) to measure the time it takes to perceive a beat and tap along with

polyrhythms. In particular we measured the absolute time it takes for

people to start tapping along with polyrhythms. Snyder and Krumhansl

(2000) made similar measurements using ragtime, but they focused on the

number of beats passed before people started to tap. In specific cases, either

measurement can trivially be transformed and expressed in terms of the

other, but the general question of principle remains whether the time it

takes for people to perceive beat is a function of absolute time or depends

on a number of beats.

7. Empirical work shows that a number of physical characteristics of a

stimulus are sources of interacting information from which people extract

rhythmicity. For example, the preference of attending to a low-pitch pulse

train in comparison to a high-pitch pulse train (of approximately similar

tempi) when taping, indicates that tonality information influences rhythm

perception. Empirical studies using brain-imaging techniques can help us

Page 229: Testing and analysis of a computational model of human rhythm perception

204 9 conclusions , limitations and further work

understand how these empirically observed instances of rhythm perception

translate into neural correlates. Results of such studies can help us refine

our modeling aims to reflect an increased understanding of the exact

neurophysiological process in a perceptual phenomenon like that of rhythm

perception.

9.4 final reflection

This thesis has exposed the theory of neural resonance to empirical and

theoretical scrutiny by examining the extent to which it can be used to

model aspects of human polyrhythm perception. We have shown that

the neural resonance model can account for existing empirical data about

how people perceive beat and meter in polyrhythms. We have used

the model to make new predictions about human behavior. We have

demonstrated empirically that humans behave in the predicted manner. We

have analysed limitations of the model and theory, made comparison with

competing models and theories, and identified physiologically motivated

ways to refine the model. As a result, this thesis has cast new light on

explanations of the phenomenon of rhythm perception directly in terms of

brain dynamics, and identified likely paths to further illumination.

Page 230: Testing and analysis of a computational model of human rhythm perception

B I B L I O G R A P H Y

Victor Kofi Agawu. African rhythm: a Northern Ewe perspective. CUP Archive,

1995. (Cited on page 117.)

AA Andronov, EA Leontovich, II Gordon, and AG Maier. Theory of

dynamical systems on a plane. Israel Program of Scientific Translations,

Jerusalem, 1971. (Cited on page 50.)

D.G. Aronson, G.B. Ermentrout, and N. Kopell. Amplitude response of

coupled oscillators. Physica D: Nonlinear Phenomena, 41(3):403–449, 1990.

(Cited on page 48.)

Thaddeus L Bolton. Rhythm. The American Journal of Psychology, 6(2):

145–238, 1894. (Cited on page 20.)

Shannon Campbell and DeLiang Wang. Synchronization and

desynchronization in a network of locally coupled wilson-cowan

oscillators. Neural Networks, IEEE Transactions on, 7(3):541–554, 1996.

(Cited on page 149.)

Ali Taylan Cemgil, Bert Kappen, Peter Desain, and Henkjan Honing. On

tempo tracking: Tempogram representation and kalman filtering. Journal

of New Music Research, 29(4):259–273, 2000. (Cited on page 39.)

Joyce L Chen, Virginia B Penhune, and Robert J Zatorre. Listening to

musical rhythms recruits motor regions of the brain. Cerebral Cortex, 18

(12):2844–2854, 2008. (Cited on pages 199 and 200.)

205

Page 231: Testing and analysis of a computational model of human rhythm perception

206 bibliography

Russell M Church and Hilary A Broadbent. Alternative representations of

time, number, and rate. Cognition, 37(1):55–81, 1990. (Cited on pages 28

and 34.)

Eric F Clarke. Rhythm and timing in music. The psychology of music, 2:

473–500, 1999. (Cited on page 36.)

Martin Clayton, Rebecca Sager, and Udo Will. In time with the music:

the concept of entrainment and its significance for ethnomusicology.

European Meetings in Ethnomusicology, 11:3–142, 2005. (Cited on page 20.)

Charles E. Collyer, Hilary A. Broadbent, and Russell M. Church. Preferred

rates of repetitive tapping and categorical time production. Perception

& Psychophysics, 55:443–453, 1994. ISSN 0031-5117. doi: 10.3758/

BF03205301. URL http://dx.doi.org/10.3758/BF03205301. (Cited on

page 34.)

C Douglas Creelman. Human discrimination of auditory duration. The

Journal of the Acoustical Society of America, 34:582, 1962. (Cited on page 37.)

Roger B Dannenberg. An on-line algorithm for real-time accompaniment.

In Proceedings of the 1984 International Computer Music Conference, pages

193–198. Computer Music Association, 1984. (Cited on page 33.)

Simon Dixon. An interactive beat tracking and visualisation system. In

Proceedings of the international computer music conference, pages 215–218,

2001. (Cited on page 38.)

Carolyn Drake, Mari Riess Jones, and Clarisse Baruch. The development

of rhythmic attending in auditory sequences: attunement, referent

period, focal attending. Cognition, 77(3):251–288, 2000. doi:

10.1016/S0010-0277(00)00106-2. (Cited on page 16.)

Page 232: Testing and analysis of a computational model of human rhythm perception

bibliography 207

Douglas Eck. Finding downbeats with a relaxation oscillator. Psychological

Research, 66(1):18–25, 2002. (Cited on page 37.)

Douglas Eck. Beat tracking using an autocorrelation phase matrix.

In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE

International Conference on, volume 4, pages IV–1313. IEEE, 2007. (Cited

on pages 28 and 36.)

W Tecumseh Fitch and Andrew J Rosenfeld. Perception and production

of syncopated rhythms. Music Perception, 25(1):43–58, 2007. (Cited on

page 14.)

R. Fitzhugh. Impulses and physiological states in theoretical models of

nerve membrane. Biophysical journal, 1(6):445–466, 1961. (Cited on

pages 38 and 47.)

James W Haefner. Modeling biological systems: principles and applications.

Springer, 2005. (Cited on pages 134, 139, and 148.)

Stephen Handel. Using polyrhythms to study rhythm. Music Perception,

pages 465–484, 1984. (Cited on page 3.)

Stephen Handel and James Oshinsky. The meter of syncopated auditory

polyrhythms. Attention, Perception, & Psychophysics, 30(1):1–9, 1981. (Cited

on pages 6, 22, 69, 73, 74, 76, 80, 82, 83, 87, 89, 90, 105, 113, 123, 155, 156,

190, 191, and 198.)

Alan L. Hodgkin and Andrew F. Huxley. A quantitative description of

membrane current and its application to conduction and excitation in

nerve. The Journal of physiology, 117(4):500, 1952. (Cited on pages 37, 38,

46, and 47.)

Page 233: Testing and analysis of a computational model of human rhythm perception

208 bibliography

Frank C. Hoppensteadt and Eugene M. Izhikevich. Canonical models

for bifurcations from equilibrium in weakly connected neural networks.

World Congress on Neural Networks, 1(2):80–83, 1995. (Cited on page 53.)

Frank C. Hoppensteadt and Eugene M. Izhikevich. Synaptic organizations

and dynamical properties of weakly connected neural oscillators.

Biological Cybernetics, 75(2):117–127, 1996a. (Cited on pages 44, 48, 50,

51, 54, 55, 56, 80, and 201.)

Frank C Hoppensteadt and Eugene M Izhikevich. Synaptic organizations

and dynamical properties of weakly connected neural oscillators ii.

learning phase information. Biological Cybernetics, 75(2):129–135, 1996b.

(Cited on page 202.)

Dan H. Hubel, Thomas N. Wiesel, et al. Receptive fields of cells in

striate cortex of very young, visually inexperienced kittens. Journal of

neurophysiology, 26(6):994–1002, 1963. (Cited on page 44.)

John R. Iversen, Bruno H. Repp, and Aniruddh D. Patel. Top-down control

of rhythm perception modulates early auditory responses. Annals of the

New York Academy of Sciences, 1169(1):58–73, 2009. (Cited on page 106.)

Ray Jackendoff. Précis of foundations of language: brain, meaning,

grammar, evolution. Behavioral and Brain Sciences, 26(6):651–665, 2003.

(Cited on page 88.)

Mari Riess Jones. Dynamic pattern structure in music: Recent theory

and research. Perception & Psychophysics, 41(6):621–634, 1987. (Cited on

page 199.)

Peter B Kahn and Yair Zarmi. Nonlinear dynamics. Exploration through normal

forms, volume 1. 1998. (Cited on page 54.)

Page 234: Testing and analysis of a computational model of human rhythm perception

bibliography 209

James W. Kalat. Biological Psychology. Thomson Wadsworth, 2012. (Cited

on page 44.)

Anke Karabanov, Örjan Blom, Lea Forsman, and Fredrik Ullén. The dorsal

auditory pathway is involved in performance of both visual and auditory

rhythms. Neuroimage, 44(2):480–488, 2009. (Cited on pages 199 and 200.)

A.P. Klapuri, A.J. Eronen, and J.T. Astola. Analysis of the meter of acoustic

musical signals. Audio, Speech, and Language Processing, IEEE Transactions

on, 14(1):342–355, 2006. (Cited on pages 28 and 36.)

Ursula Koch and Benedikt Grothe. Hyperpolarization-activated current

in the inferior colliculus: distribution and contribution to temporal

processing. Journal of neurophysiology, 90(6):3679–3687, 2003. (Cited on

page 46.)

Gerald Langner. Temporal processing of periodic signals in the auditory

system: Neuronal representation of pitch, timbre, and harmonicity. Z.

Audiol, 46(1):80–21, 2007. (Cited on page 109.)

Edward Large. Musical tonality, neural resonance and hebbian learning.

Mathematics and Computation in Music, pages 115–125, 2011. (Cited on

page 202.)

Edward Large, Philip Fink, and Scott Kelso. Tracking simple and complex

sequences. Psychological Research, 66(1):3–17, 2002. (Cited on pages 15, 16,

and 18.)

Edward W. Large. On synchronizing movements to music. Human

Movement Science, 19(4):527–566, 2000. (Cited on pages 20, 35, 36, 62, 65,

66, 67, 70, 79, 80, 87, 155, 157, 158, 160, 161, 180, 181, 182, 183, and 190.)

Page 235: Testing and analysis of a computational model of human rhythm perception

210 bibliography

Edward W. Large. Resonating to musical rhythm: theory and experiment.

Psychology of time, pages 189–232, 2008. (Cited on pages 2, 14, 58, 61, 62,

116, 136, and 193.)

Edward W Large. Neurodynamics of music. In Music perception, pages

201–231. Springer, 2010. (Cited on pages 3, 35, 67, and 140.)

Edward W. Large and Mari Riess Jones. The dynamics of attending: How

people track time-varying events. Psychological Review, 106(1):119–159,

1999. (Cited on pages 35, 39, and 62.)

Edward W. Large and John F. Kolen. Resonance and the perception

of musical meter. Connection Science, 6(2-3):177–208, 1994. (Cited on

pages 28, 33, 34, 35, 62, and 65.)

Edward W Large and John F Kolen. Method and apparatus of analysis

of signals from non-stationary processes possessing temporal structure

such as music, speech, and other event sequences, May 12 1998. US

Patent 5,751,899. (Cited on page 62.)

Edward W. Large and Caroline Palmer. Perceiving temporal regularity in

music. Cognitive Science, 26(1):1–37, 2002. (Cited on pages 35 and 62.)

Edward W. Large and John S. Snyder. Pulse and meter as neural resonance.

Annals of the New York Academy of Sciences, 1169(1):46–57, 2009. (Cited on

pages v, 2, 35, and 62.)

Edward W. Large, Caroline Palmer, and James B. Pollack. Reduced memory

representations for music. Cognitive Science, 19(1):53–93, 1995. (Cited on

page 65.)

Edward W. Large, Felix V. Almonte, and Marc J. Velasco. A canonical model

for gradient frequency neural networks. Physica D: Nonlinear Phenomena,

Page 236: Testing and analysis of a computational model of human rhythm perception

bibliography 211

239(12):905–911, 2010. (Cited on pages 3, 35, 38, 41, 47, 53, 55, 56, 63, 80,

88, 97, 133, 139, 150, 151, and 200.)

Jean Laroche. Estimating tempo, swing and beat locations in audio

recordings. In Applications of Signal Processing to Audio and Acoustics, 2001

IEEE Workshop on the, pages 135–138. IEEE, 2001. (Cited on pages 28

and 31.)

Fred Lerdahl and Ray S Jackendoff. A generative theory of tonal music. The

MIT Press, 1983. (Cited on pages 15, 16, and 19.)

Justin London. Hearing in time. OUP USA, 2004. (Cited on pages 11, 14, 15,

22, 74, and 75.)

Neil P McAngus Todd, Donald J O’Boyle, and Christopher S Lee. A

sensory-motor theory of rhythm, time perception and beat induction.

Journal of New Music Research, 28(1):5–28, 1999. (Cited on pages 28 and 36.)

NP McAngus Todd and Chris Lee. An auditory-motor model of beat

induction. In Proceedings of the International Computer Music Conference,

pages 88–88. International Computer Music Accociation, 1994. (Cited on

page 36.)

J.D. McAuley. Perception of time as phase: Toward an adaptive-oscillator model of

rhythmic pattern processing. PhD thesis, 1995. (Cited on pages 28 and 33.)

Leonard Meyer and Grosvenor Cooper. The rhythmic structure of music,

1960. (Cited on pages 15 and 19.)

John Albertus Michon. Timing in temporal tracking. Institute for Perception

RVO-TNO Soesterberg, The Netherlands, 1967. (Cited on page 37.)

Page 237: Testing and analysis of a computational model of human rhythm perception

212 bibliography

Benjamin O Miller, Don L Scarborough, and Jacqueline A Jones. On the

perception of meter. In Understanding music with AI, pages 428–447. MIT

Press, 1992. (Cited on page 34.)

Vermon B. Mountcastle. Modality and topographic properties of single

neurons of catÕs somatic sensory cortex. 1957. (Cited on page 44.)

J. Nagumo, S. Arimoto, and S. Yoshizawa. An active pulse transmission

line simulating nerve axon. Proceedings of the IRE, 50(10):2061–2070, 1962.

(Cited on pages 38 and 47.)

RV O’Neill and RH Gardner. Sources of uncertainty in ecological

models. Methodology in Systems Modelling and Simulation, North-Holland,

Amsterdam, pages 447–463, 1979. (Cited on page 125.)

Alan V Oppenheim and Ronald W Schafer. Digital signal processing, 1975,

1975. (Cited on pages 28 and 35.)

Richard Parncutt. A perceptual model of pulse salience and metrical accent

in musical rhythms. Music Perception, pages 409–464, 1994. (Cited on

page 39.)

Aniruddh D. Patel, John R. Iversen, Yanqing Chen, and Bruno H. Repp.

The influence of metricality and modality on synchronization with a beat.

Experimental Brain Research, 163(2):226–238, 2005. (Cited on page 11.)

Bettina Pollok, Joachim Gross, Katharina Müller, Gisa Aschersleben, and

Alfons Schnitzler. The cerebral oscillatory network associated with

auditorily paced finger movements. Neuroimage, 24(3):646–655, 2005.

(Cited on page 88.)

Karl R Popper. The logic of scientific discovery. London: Hutchinson, 1, 1959.

(Cited on page 71.)

Page 238: Testing and analysis of a computational model of human rhythm perception

bibliography 213

Dirk J. Povel and Peter Essens. Perception of temporal patterns. Music

Perception, pages 411–440, 1985. (Cited on pages 28, 29, 30, and 34.)

Jeff Pressing, Jeff Summers, and Jon Magill. Cognitive multiplicity in

polyrhythmic pattern performance. Journal of Experimental Psychology:

Human Perception and Performance, 22(5):1127, 1996. (Cited on pages 22,

74, and 75.)

Pasko Rakic. Local circuit neurons. Neurosciences Research Program Bulletin,

13(3):295, 1975. (Cited on page 44.)

Summer K Rankin, Edward W Large, and Philip W Fink. Fractal tempo

fluctuation and pulse prediction. Music Perception, 26(5):401–413, 2009.

(Cited on page 17.)

Bruno Repp. Sensorimotor synchronization: A review of the tapping

literature. Psychonomic Bulletin & Review, 12(6):969–992, 2005. (Cited on

pages 74, 75, 108, and 157.)

Bruno H Repp. Subliminal temporal discrimination revealed in

sensorimotor coordination. Rhythm perception and production, pages

129–142, 2000. (Cited on page 166.)

Bruno H Repp. Phase correction in sensorimotor synchronization:

Nonlinearities in voluntary and involuntary responses to perturbations.

Human Movement Science, 21(1):1–37, 2002. (Cited on page 18.)

Bruno H Repp. Multiple temporal references in sensorimotor

synchronization with metrical auditory sequences. Psychological Research,

72(1):79–98, 2008. (Cited on page 75.)

Page 239: Testing and analysis of a computational model of human rhythm perception

214 bibliography

Eric D Scheirer. Tempo and beat analysis of acoustic musical signals.

The Journal of the Acoustical Society of America, 103:588, 1998. (Cited on

pages 28, 36, and 39.)

Eric D Scheirer. Music-listening systems. PhD thesis, Massachusetts Institute

of Technology, 2000. (Cited on page 38.)

Jarno Seppänen. Computational models of musical meter recognition. PhD

thesis, 2001. (Cited on pages 28, 31, and 38.)

GM Shepherd. Local circuit neurons. MIT Press, 1976. (Cited on page 44.)

CE Sherrick Jr et al. Perceived order in different sense modalities. Journal

of experimental psychology, 62:423, 1961. (Cited on page 37.)

Joel Snyder and Carol L Krumhansl. Tapping to ragtime: Cues to pulse

finding. Music Perception, 18(4):455–489, 2000. (Cited on pages 66, 180,

181, 182, and 203.)

Joel S. Snyder and Edward W. Large. Tempo dependence of middle- and

long-latency auditory responses: power and phase modulation of the eeg

at multiple time-scales. Clinical Neurophysiology, 115(8):1885–1895, 2004.

(Cited on pages 74 and 75.)

Steven H Strogatz. Nonlinear dynamics and chaos. 1994. Reading: Perseus

Books, 1994. (Cited on pages 51 and 52.)

Michael Thaut et al. Rhythm Music and the Brain: Scientific Foundations and

Clinical Applications. Routledge, 2005. (Cited on pages 1 and 36.)

Michael H Thaut, Bin Tian, and Mahmood R Azimi-Sadjadi. Rhythmic

finger tapping to cosine-wave modulated metronome sequences:

Evidence of subliminal entrainment. Human Movement Science, 17(6):

839–863, 1998. (Cited on page 18.)

Page 240: Testing and analysis of a computational model of human rhythm perception

bibliography 215

Petri Toiviainen. An interactive midi accompanist. Computer Music Journal,

pages 63–75, 1998. (Cited on pages 28, 34, and 39.)

Balth Van der Pol and Jan Van der Mark. Lxxii. the heartbeat considered as

a relaxation oscillation, and an electrical model of the heart. The London,

Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 6(38):

763–775, 1928. (Cited on page 37.)

Leon van Noorden and Dirk Moelants. Resonance in the perception of

musical pulse. Journal of New Music Research, 28(1):43–66, 1999. (Cited on

page 128.)

X.-J. Wang and J. Rinzel. Spindle rhythmicity in the reticularis

thalami nucleus: Synchronization among mutually inhibitory neurons.

Neuroscience, 53(4):899 – 904, 1993. (Cited on page 44.)

Stephen Wiggins. Introduction to applied nonlinear dynamical systems and chaos,

volume 2. Springer, 2003. (Cited on page 54.)

H. R. Wilson and J. D. Cowan. A mathematical theory of the functional

dynamics of cortical and thalamic nervous tissue. Biological Cybernetics,

13(2):55–80, 1973. (Cited on page 47.)

Hugh R. Wilson and Jack D. Cowan. Excitatory and inhibitory interactions

in localized populations of model neurons. Biophysical Journal, 12(1):1–24,

1972. doi: 10.1016/S0006-3495(72)86068-5. (Cited on pages 7, 37, 47, 50,

54, 81, 126, and 194.)

Arthur T Winfree. The geometry of biological time, volume 12. Springer Verlag,

2001. (Cited on page 51.)

Page 241: Testing and analysis of a computational model of human rhythm perception
Page 242: Testing and analysis of a computational model of human rhythm perception

colophon

This document was typeset using the typographical look-and-feel

classicthesis developed by André Miede. The style was inspired

by Robert Bringhurst’s seminal book on typography “The Elements of

Typographic Style”. classicthesis is available for both LATEX and LYX:

http://code.google.com/p/classicthesis/

Happy users of classicthesis usually send a real postcard to the author,

a collection of postcards received so far is featured here:

http://postcards.miede.de/

Final Version as of April 26, 2014 (classicthesis).

Page 243: Testing and analysis of a computational model of human rhythm perception
Page 244: Testing and analysis of a computational model of human rhythm perception

D E C L A R AT I O N

Put your declaration here.

Milton Keynes near Bletchley, the city of Codebreakers, August 2013

Vassilis Angelis