DIFFUSION-BASED MUSIC ANALYSIS: A NON-LINEAR APPROACH … › ~gsell › pubs ›...
Transcript of DIFFUSION-BASED MUSIC ANALYSIS: A NON-LINEAR APPROACH … › ~gsell › pubs ›...
DIFFUSION-BASED MUSIC ANALYSIS:
A NON-LINEAR APPROACH FOR VISUALIZATION AND
INTERPRETATION OF THE GEOMETRY OF MUSIC
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF MUSIC
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Gregory Kennedy Sell
October 2010
Abstract
Diffusion mapping is a non-linear data analysis method based off a model of the data
as states in a random walk. Through this approach, the global structure of the data
is built up from local connectivity rather than pure distance. This diffusion-based ap-
proach is advantageous because, by using only local connectivity, it is still robust and
meaningful in high dimensional spaces, unlike Euclidean distance, without requiring
any assumptions about the structure of the data. Also, the diffusion mapping format
leads directly into meaningful low-dimensional spaces for visualization of the data’s
structure.
In this dissertation, I will examine the effectiveness of diffusion mapping as a tool
for analysis and visualization of music theory and, through these demonstrations,
make an argument for its potential in the field. Diffusion has never been applied to
music at this level before, nor has it been used analytically at a comparable level in
any other field. It will be shown that the approach is not only capable of organiz-
ing and visualizing music, but also, through those organizations and visualizations,
communicating the underlying music theory used in creating the data sets.
First, I will show that notes within a diffusion space plot the fundamental geo-
metrical shape underlying the intervals of diatonic music, using only those intervals
themselves as input to the system. Furthermore, by combining two or three of these
simple intervals, the diffusion space can easily recreate historically significant musical
visualizations. In both of these cases, the diffusion process requires very basic input
and then automatically organizes the notes into a meaningful and insightful visualiza-
tion. This same process can be applied to temporal events, automatically extracting
the complimentary geometry to the patterns of meter and hemiolas.
iv
Diffusion geometry can also be used to organize and group music based on musical
characteristics. Specifically, this will be demonstrated in several key-based organiza-
tions. To accomplish this, I will also perform clustering in the diffusion space. As a
part of this process, I will propose a new and novel metric in diffusion space called the
diffusion time constant τ . Due to the flexibility of diffusion, key-based organization
can be performed from both distributional and functional approaches, and, in the
distributional case, the performance of the Krumhansl-Schmuckler algorithm can be
improved significantly on a commonly used test corpus, Bach’s The Well-Tempered
Clavier, Books 1 and 2.
Musical excerpts will also be visualized as trajectories in the musical space, po-
sitioning the notes of the scale structurally based on the musical relationships in
the score itself. By animating this visual to follow the trajectory through time, a
new musical analysis and experience tool is introduced. Elements such as harmonic
rhythm, harmonic movement, and repetitive sections can easily be perceived in this
space. This visualization, it will also be shown, is largely robust to temporal vari-
ations and musical errors, plotting versions of the same melody in a recognizably
similar structure.
It will also be demonstrated that, though the majority of the work presented uses
exclusively symbolic representations, the same principles and tools can be applied to
audio signals by layering multiple diffusion maps.
Throughout this process, the applicability of machine learning methods in music
analysis to diffusion space will be examined, in the context of key finding and meter
induction in particular.
This work makes novel contributions to both the fields of diffusion mapping and
computational music analysis. In the diffusion domain, this dissertation firstly offers
a comprehensive guide to the diffusion algorithm designed for a reader with only
moderate mathematical background. Additionally, the diffusion time constant and
the subsequent hierarchical clustering in diffusion space are both new extensions to
diffusion mapping, and it will be shown that the metric is meaningful for both musical
analysis and parsing of arbitrary data sets.
In the music domain, this dissertation contributes a completely new analysis
v
method. By treating music as data points, musical sets can be organized based on
attributes such as key. And the visualization capabilities of diffusion also have a great
deal of potential in music analysis as a means for understanding musical relationships
and for interacting with musical structure in a new way.
vi
Acknowledgments
Over the course of a full undergraduate and graduate career in one school, there have
been far too many friendships and collaborations to fully recount them all here. But,
some have been so remarkable and appreciated that they must be properly recognized
before beginning the work that follows.
Any discussion of my academic career must begin with my advisor, Jonathan
Berger. Unbelievably, Jonathan taught my first class at CCRMA, all the way back
in my sophomore year, the class that convinced me I belonged at CCRMA. Jonathan
has always been an advisor in the fullest sense, willingly extending his duties well
beyond the classroom and research lab into life and its many decisions. Without
Jonathan Berger, my life would be impossibly different, and for his role in helping
me find my way to this milestone, I will always be grateful.
Chris Chafe and Ge Wang, who, along with Jonathan, make my committee,
also have my deepest gratitude. They were extraordinarily accommodating to the
challenges and complications involved in finishing a PhD from the opposite side of
the country. They offered wonderful support and inspiring insights into my work.
Jonathan Abel and Paul DeMarinis, who served as the additional members of my
defense committee, were also fantastically accommodating to the hectic scheduling
I forced upon them. Having such a wonderfully creative and supportive committee
made this whole process a true pleasure.
Professor Ronald Coifman from Yale University also merits special recognition.
His guidance through the world of diffusion has been both patient and masterful.
His creativity for new ideas and approaches combined with a remarkably brilliant
mathematical mind is a truly unique combination and it has been the highest honor
vii
to spend time working with him. Without his help and guidance, this dissertation
simply would not exist.
I would also like to recognize the many others in the CCRMA community and
beyond who have helped me throughout my academic career. Malcolm Slaney, Les
Atlas, and Julius Smith have provided great guidance in both research and life. And
I always enjoyed and will miss the many conversations, both research-related and
diversionary, with my fellow students Kyogu Lee, Gautham Mysore, Ed Berdahl,
David Yeh, Nelson Lee, and the rest of the DSP group. CCRMA has been my home
for many years, and I am sad to leave its welcoming faces and beautiful views.
Of course, I must mention my family. They supported me long before this research
began, and they will be there long after. I can only hope that this work is able to
honor the sacrifices they made for me.
And finally, the biggest thanks of all goes to my wife Tara. She spent every day in
the trenches with me, and her support and encouragement kept me pushing forward.
Without her in my life, I am sure this dissertation would have taken my mind long
ago.
viii
Contents
Abstract iv
Acknowledgments vii
1 Overview 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Assumption-free Music Analysis . . . . . . . . . . . . . . . . . 2
1.2.2 Computational Music Theory Analysis . . . . . . . . . . . . . 2
1.2.3 Musical Visualization . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization of Content . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Key Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Krumhansl-Schmuckler key-finding algorithm . . . . . . . . . 6
2.2.2 Other key-finding algorithms . . . . . . . . . . . . . . . . . . . 12
2.3 Meter Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Other meter induction methods . . . . . . . . . . . . . . . . . 14
2.4 Musical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Note or Pitch-Class Visualizations . . . . . . . . . . . . . . . . 15
2.4.2 Key Visualizations . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 Other visualizations . . . . . . . . . . . . . . . . . . . . . . . . 21
ix
2.5 Machine Learning in Music Analysis . . . . . . . . . . . . . . . . . . 21
2.5.1 Unsupervised . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 Supervised . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Diffusion Maps 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Diffusion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Affinity function . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Affinity-derived Markov matrices . . . . . . . . . . . . . . . . 29
3.2.4 Eigenvectors of a Markov matrix . . . . . . . . . . . . . . . . 36
3.2.5 Eigenvalue decay and the meaning of the eigenvectors . . . . . 37
3.2.6 Diffusion maps and diffusion distance . . . . . . . . . . . . . . 39
3.2.7 Diffusion time constant . . . . . . . . . . . . . . . . . . . . . . 42
3.2.8 Hierarchical clustering with the diffusion time constant . . . . 47
3.2.9 Comparison to other methods . . . . . . . . . . . . . . . . . . 48
3.3 Applying diffusion distance to music analysis . . . . . . . . . . . . . . 54
4 Diffusion-based Music Theory Analysis 57
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Tonal Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Input Representation . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.2 Affinity Function . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.3 Geometric representations of pitch-class intervals . . . . . . . 59
4.2.4 Recreating note-based visualizations . . . . . . . . . . . . . . 72
4.3 Metrical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 Metric geometry . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.2 Visualizing hemiolas . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
x
5 Diffusion-based Musical Applications 88
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Key Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.1 Key-Finding Characteristics from the Diffusion Time Constant 89
5.2.2 Functional Code-Based Key Organization . . . . . . . . . . . . 94
5.2.3 Extending the K-S Algorithm with Clustering . . . . . . . . . 99
5.3 Meter Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Visualization of Trajectories . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.1 Twinkle, Twinkle, Little Star . . . . . . . . . . . . . . . . . . 106
5.4.2 Prelude No. 1 in C major (BWV 846) from The Well-Tempered
Clavier, Book 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4.3 Robustness to Performance Noise . . . . . . . . . . . . . . . . 113
5.4.4 Audio Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6 Conclusions and Future Work 119
6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 119
6.1.1 Diffusion Time Constant . . . . . . . . . . . . . . . . . . . . . 119
6.1.2 Assumption-Free Music Analysis . . . . . . . . . . . . . . . . 120
6.1.3 Musical Visualizations . . . . . . . . . . . . . . . . . . . . . . 121
6.2 Future Work and Extensions . . . . . . . . . . . . . . . . . . . . . . . 121
6.2.1 Audio Signal-based Analysis . . . . . . . . . . . . . . . . . . . 121
6.2.2 Improved Visualization Platform . . . . . . . . . . . . . . . . 122
6.2.3 Comparison of Diffusion Spaces . . . . . . . . . . . . . . . . . 123
6.2.4 Implications for Non-Tonal Western or Non-Western Music . . 124
6.2.5 Inverting Diffusion Space to Audio . . . . . . . . . . . . . . . 125
6.2.6 Examination of Less Prominent Dimensions of Map . . . . . . 126
6.2.7 Dual Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Bibliography 129
xi
List of Tables
4.1 The pitch-class intervals and their inversions . . . . . . . . . . . . . . 60
5.1 Accuracy for various interval-based key-finding approaches using near-
est neighbors in the diffusion time constant. . . . . . . . . . . . . . . 95
5.2 Accuracy for the K-S key-finding algorithm before and after process-
ing the data with a filter derived from hierarchical clustering in the
diffusion space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3 Accuracy for the meter induction task using nearest neighbors with
both Euclidean distance and the diffusion time constant. . . . . . . . 102
xii
List of Figures
2.1 The Krumhansl and Kessler key profiles for major (top) and minor
(bottom) keys, from [76] . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The Tonnetz, harmonic network, or table of tonal relatives (from [29]),
including the visualizations of parallel (P), leading-tone (L), and rela-
tive (R) triad relationships from Neo-Riemannian theory. . . . . . . . 16
2.3 Several geometric representations proposed by Shepard, from [45]. . . 17
2.4 The Spiral Array, from [13] . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Schoenberg’s spatial mapping of keys (from [45]), with the key region
for C major marked . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 The MDS derived mapping of the keys from Krumhansl and Kessler’s
key-profiles, and its 2D mapping from the angles of the circles, from [46]. 19
2.7 Two harmonic visualizations of Mozart’s Sonatina No. 1 in C major
K439B (Viennese), 1st Movement, from [62]. . . . . . . . . . . . . . . 20
3.1 Two example data sets that will be used to demonstrate the process
of calculating diffusion distance. In both cases, color indicates the
distribution from which a sample was drawn . . . . . . . . . . . . . . 28
3.2 The probabilities of a random walk concluding in each data point (with
high probability shown in lighter color) at different time scales for the
cluster-based data set. The columns show the case for three different
starting points, shown as a red dot in each. . . . . . . . . . . . . . . . 31
xiii
3.3 The probabilities of a random walk concluding in each data point (with
high probability shown in lighter color) at different time scales for the
circle-based data set. The columns show the case for three different
starting points, shown as a red dot in each. . . . . . . . . . . . . . . . 32
3.4 The probability matrix P t for the cluster data set (Fig. 3.1(a)) at
several values of t. Axis labels correspond to the cluster numbering
from Fig. 3.1(a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Diffusion distance matrices Dt for different values of t for the cluster
data set, where dark color means a short distance. . . . . . . . . . . . 41
3.6 Diffusion distance matrices Dt for different values of t for the circle
data set, where dark color means a short distance. . . . . . . . . . . . 42
3.7 The circle data set plotted in φ1, φ2, and φ3. . . . . . . . . . . . . . . 43
3.8 The diffusion time constant matrices for the example data sets, demon-
strating that data points within structural elements have small time
constants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.9 The Euclidean distances for the circle data set, where the data points
within the same circle are not close together. . . . . . . . . . . . . . . 47
3.10 Hierarchical trees for both data sets from the diffusion time constants,
showing that structural elements at multiple levels are accurately ex-
tracted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 The pitch classes connected by semitone intervals plotted in the first
two dimensions of the diffusion map. . . . . . . . . . . . . . . . . . . 61
4.2 The pitch classes connected by major 2nd intervals plotted in the first
three dimensions of the diffusion map. . . . . . . . . . . . . . . . . . 62
4.3 The pitch classes connected by major 2nd intervals plotted in dimen-
sions 2, 3, and 4 of the diffusion map. . . . . . . . . . . . . . . . . . . 63
4.4 The pitch classes connected by minor 3rd intervals plotted in the first
three dimensions of the diffusion map. . . . . . . . . . . . . . . . . . 64
4.5 The pitch classes connected by minor 3rd intervals plotted in dimen-
sions 3, 4, and 5 of the diffusion map. . . . . . . . . . . . . . . . . . . 65
xiv
4.6 The pitch classes connected by minor 3rd intervals plotted in dimen-
sions 3, 4, and 5 of the diffusion map, viewed from a different angle
than Fig. 4.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 The pitch classes connected by major 3rd intervals plotted in the first
three dimensions of the diffusion map. . . . . . . . . . . . . . . . . . 67
4.8 The pitch classes connected by major 3rd intervals plotted in dimen-
sions 4, 5, and 6 of the diffusion map. . . . . . . . . . . . . . . . . . . 68
4.9 The pitch classes connected by perfect 4th intervals plotted in the first
two dimensions of the diffusion map. . . . . . . . . . . . . . . . . . . 69
4.10 The pitch classes connected by tritone intervals plotted in the first
three dimensions of the diffusion map. . . . . . . . . . . . . . . . . . 70
4.11 Several geometric representations of intervals appear in the diffusion
space created with the major chord. . . . . . . . . . . . . . . . . . . . 71
4.12 Shepard’s chromatic helix in diffusion space, resulting from the com-
bination of minor 2nd and octave intervals with the full note set, with
and without the minor 2nd intervals drawn in. . . . . . . . . . . . . . 73
4.13 Zooming in on two octaves of the chromatic helix from Fig. 4.12(b). . 74
4.14 Shepard’s double helix in diffusion space, resulting from the combina-
tion of perfect 5th and octave intervals with the full note set, with and
without the major 2nd intervals drawn in. . . . . . . . . . . . . . . . . 76
4.15 Zooming in on two octaves of the double helix from Fig. 4.14(b). . . . 77
4.16 Several other interpretations of the note organization in which Shep-
ard’s double helix exists. . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.17 The diffusion space created with minor 2nd and two octave plus major
3rd intervals with an approximation of the Spiral Array represented by
the lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.18 The Krumhansl-Kessler key space from Fig. 2.6(a) remade with diffu-
sion. Dots represent major keys and circles represent minor keys. . . 81
4.19 Duple-meter beat trains separated completely from triple meter beat
trains and organized into a square. . . . . . . . . . . . . . . . . . . . 82
xv
4.20 Triple-meter beat trains separated completely from duple meter beat
trains and organized into a triangle. . . . . . . . . . . . . . . . . . . . 83
4.21 Hemiolas based in units of 2 shaped into a square, similarly to the
metric case shown in Fig. 4.19. . . . . . . . . . . . . . . . . . . . . . 86
4.22 Hemiolas based in units of 3 shaped into a triangle, similarly to the
metric case shown in Fig. 4.20. . . . . . . . . . . . . . . . . . . . . . 86
5.1 Diffusion time constants between the pitch classes for major and minor
keys. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Diffusion time constants between notes separated by various intervals
for the major subset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Diffusion time constants between notes separated by various intervals
for the minor subset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4 Key profiles derived from the diffusion time constant compared to the
K-K key profiles for major (top) and minor (bottom) keys. . . . . . . 93
5.5 Confusions for all 6 functional key-finding experiments with notewor-
thy confusions labeled. . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6 Diffusion maps for all 6 functional key-finding experiments. . . . . . . 97
5.7 The hierarchical tree created from pitch-class distributions labeled with
key with errors circled in red. . . . . . . . . . . . . . . . . . . . . . . 100
5.8 The first three dimensions of the diffusoin map for meter classification
on the Essen folksong database colored by meter label. . . . . . . . . 103
5.9 The same as Fig. 5.3 with the test data indicated by larger size, and
errors in labeling of the test data largest. . . . . . . . . . . . . . . . . 104
5.10 The melody of Twinkle, Twinkle, Little Star . . . . . . . . . . . . . . 106
5.11 The trajectory for Twinkle, Twinkle, Little Star with the individual
notes marked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.12 Score of Bach’s Prelude No. 1 in C major (BWV 846) from The Well-
Tempered Clavier, Book 1. . . . . . . . . . . . . . . . . . . . . . . . . 109
5.13 The trajectory for Bach’s Prelude No. 1 in C major (BWV 846) from
The Well-Tempered Clavier, Book 1. . . . . . . . . . . . . . . . . . . 110
xvi
5.14 The trajectory for Bach’s Prelude No. 1 in C major (BWV 846) from
The Well-Tempered Clavier, Book 1, with only the first four measures
shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.15 Several trajectories for Bach’s Prelude No. 1 in C major (BWV
846) from The Well-Tempered Clavier, Book, with different levels of
performance-like noise synthetically added . . . . . . . . . . . . . . . 114
5.16 The trajectory for Twinkle, Twinkle, Little Star, derived here from an
audio signal instead of symbolic music (as was the case in Fig. 5.11). 116
xvii
Chapter 1
Overview
1.1 Introduction
This dissertation presents and develops a non-linear data analysis method called dif-
fusion mapping and examines its applicability to music analysis. Diffusion analysis
has been used for dimensional reduction, extraction of structure, classification, orga-
nization, and visualization of various data sets. Each of these assets will be examined
in the context of music analysis.
This chapter serves to contextualize the work that follows by discussing the moti-
vation for such research. The organization of the remaining content of the dissertation
is also briefly summarized.
1.2 Motivation
The central goal of much of computational music research is to teach computers to
listen to music, and to simulate how humans might process and interpret it. This
would not only be an impressive accomplishment and a significant contribution to
perceptual computation, it would also allow for automating many complicated musical
tasks, including musical similarity and recommendation. Significant progress has been
made, but there is still a great deal of work to be done. Furthermore, automatic
processes are not the only application for computational music understanding. It
1
CHAPTER 1. OVERVIEW 2
could provide a valuable tool in computer-assisted musical teaching and analysis, not
to mention interactive search and organization.
The work presented in this dissertation focuses on advancing research in com-
putational ‘understanding’ of music. The work is primarily motivated by need for
computational improvements in analysis and visualization of basic elements of mu-
sic theory, and the work is especially significant because it is based entirely on an
assumption-free framework.
1.2.1 Assumption-free Music Analysis
One of the most significant aspects of the diffusion-based music analyses to be pre-
sented is that they require no prior assumptions about the incoming musical signals.
This is a valuable characteristic, and has largely been missing from other computa-
tional music analyses. Typically, musical assumptions are manually hardwired into
the system or estimated from large databases of training music.
However, in the work to follow, brief and individual excerpts of music can be
analyzed without any assumptions at all, beyond the fundamental assumption that
notes are different, and so the musical relationships can be extracted on as fine a
level as is desired for the application or task. This sort of system has not been tested
or examined before, and so creating that sort of contribution was one of the main
motivations behind the work.
1.2.2 Computational Music Theory Analysis
One key aspect of the perceptual musical listening experience is the concept of music
theory, the set of rules and guidelines that determine the harmonic and temporal
relationships in music. Musical expectation, in which a listener anticipates the pro-
gression of music, and its fulfillment or violation, are all products of music theory,
and most listeners have an underlying concept of music theory, regardless of formal
training or experience.
However, little work has been done in computationally extracting and analyzing
music theory on a fundamental level. Most research has focused on higher-level music
CHAPTER 1. OVERVIEW 3
theory concepts like key, meter, genre, and mood, because those have more direct
applications in classification. However, all of these concepts are built on more basic
and atomic concepts such as pitch interval, and little work has been directed at this
level. Building a computational understanding of music theory from the ground up
requires starting from these atomic units, and demonstrating a process that fills in
this groundwork is another core motivation of this work.
1.2.3 Musical Visualization
Musical visualization is a particularly interesting research area. Its wide range of
implementations and applications seem limited only by the imagination. Associating
visual projections with musical signals can provide, at the very least, a dynamic
multimedia experience. Ideally, visualizations offer a new way to analyze and parse
the music, and through this new approach provide a pathway to deeper understanding.
As will be discussed, diffusion mapping is ideal for creating low-dimensional spaces
for visualization of data or, in this case, music. So, at the core of the motivation for
this work is the fact that, while creating a system for assumption-free computational
analysis of music theory, an approach for automatic visualization is developed as well.
Visualizations will be created toward many goals, ranging from geometric represen-
tations of intervals to mapping musical excerpts to trajectories in diffusion space,
and through these demonstrations, the role of visualizations will be explored in the
context of a new and automatic visualization tool.
1.3 Organization of Content
Following this introduction, this dissertation will begin with a thorough review of
previous work in several relevant fields to this work. Key finding, meter induction,
visualization, and machine learning will all be addressed.
This is followed by a thorough and complete introduction to diffusion theory.
While all of the work in this dissertation is geared toward musical applications, the
descriptions of diffusion analysis are universally applicable. It is hoped that this
CHAPTER 1. OVERVIEW 4
chapter will serve as a reference for others with new ideas for applications of diffusion
mapping beyond those approaches presented here.
We will then dive into the world of musical applications of diffusion mapping.
First, the experiments will focus on the most basic aspects of music theory, examining
the geometric visualizations of intervals and short rhythmic patterns. In this context,
it is also shown that many past visualizations of musical space are specific examples
of the type of analysis performed in this diffusion space.
The next level of theoretical analysis, focusing on key and meter, is analyzed in
the context of both historical methods and also assumption-free code-based methods.
At this level, a process for creating musical trajectories in a diffusion space based on
the musical relationships is introduced and analyzed for a few examples.
Finally, accompanying a conclusion, a series of directions for future work are
proposed. Because the work presented in this dissertation primarily deals with new
research fields like ground-up computational music theory analysis and assumption-
free theory extraction as well as highly interpretable and customizable fields like
musical visualization, there is a great deal of future work to be considered.
Chapter 2
Background
2.1 Introduction
This work introduces a new analytic system to the world of computational music
analysis, and so it will necessarily involve many fields. To follow is a review of research
in key finding, meter induction, musical visualization, and machine learning in music
analysis. The potential for diffusion geometry in computational music analysis and
music information retrieval extends far beyond these applications, but these are the
areas on which the work presented here will focus. While an attempt was made for
broad coverage within the fields covered, this should by no means be considered a full
review of computational music analysis, music information retrieval, or music theory.
2.2 Key Finding
The key of a musical piece refers specifically to which note is the tonal center, but it
also establishes roles for all 12 of the pitch classes. A key can be in major or minor
mode, each of which defining a different set of expectations for the tonal or harmonic
progressions and cadences. While determining the key is often intuitive for a human
listener, creating a similarly effective computational model has not yet been fully
achieved.
In the field of key finding for symbolic music, the largest shadow is undoubtedly
5
CHAPTER 2. BACKGROUND 6
cast by the Krumhansl-Schmuckler (K-S) algorithm. Not only is the approach used
extensively throughout the field, but many subsequent algorithms can easily be seen
as extensions of the original principles, populating a field called distributional key
finding. Others have proposed many unique algorithms, largely called structural key-
finding algorithms, but the K-S algorithm and the continuations of that work are the
most prominent in the field.
2.2.1 Krumhansl-Schmuckler key-finding algorithm
Initially, Krumhansl and Kessler [46] derived a set of key profiles (also sometimes
called tonal hierarchies, or K-K profiles), seen in Fig. 2.1, from the results of percep-
tual tests. In the tests, listeners were asked to rate how well a certain tone follows
a musical sequence, such as a scale, chord, or cadence. Essentially, the study aimed
to find how well notes of the scale perceptually fit in with musical elements designed
to establish a key. By averaging the results across users and transposing results to
a common key (under the reasonable assumption that the study was unaffected by
transposition), they derived a major key profile and a minor key profile.
The K-S key-finding algorithm [43] correlates these key profiles with an input vec-
tor created from the total duration time of each of the 12 pitch classes. By correlating
with each of the 12 possible shifted orientations of the key profiles, the key of the
excerpt can be estimated as the key with highest correlation.
Key Estimate = argmaxn∈{0,1,...,11}
∑m
(kn(m)− kn)(x(m)− x)√∑m
(kn(m)− kn)2(x(m)− x)2(2.1)
In this equation kn is a key profile shifted by n, and so kn(m) is the score for the
mth pitch class in the key profile. kn is the mean of the key profile. x is the input
vector, and so, for a given musical excerpt, x(m) quantifies the amount of time that
the mth note is playing. x is the mean of the input vector. Note that the correlation
is calculated for both major and minor key profiles.
CHAPTER 2. BACKGROUND 7
68 David Temperley
Fig. 1. The Krumhansl-Kessler key profiles, for C major (above) and C minor (below). (Data from Krumhansl, 1990, p. 80.)
in the key profile means that the corresponding pitch class was judged to fit well with that key. Each of the 24 major and minor keys has its own key profile. The key profiles for C major and C minor are shown in Figure 1; other profiles are generated simply by shifting the values around by the
appropriate number of steps. For example, whereas the C-major vector has a value of 6.35 for C and a value of 2.23 for Ci, Cft major would have a value of 6.35 for Ci and a value of 2.23 for D.2 As Krumhansl (1990, p. 29) notes, the key profiles reflect well-established musical principles. In both
major and minor, the tonic position (C in the case of C major/minor) has the highest value, followed by the other two degrees of the tonic triad (E and G in C major, Et and G in C minor); the other four degrees of the diatonic scale are next (D, F, A, and B in C major; D, F, At, and Bt in C minor - assuming the natural minor scale), followed by the five chromatic scale steps.
The algorithm judges the key of a piece by correlating each key profile with the "input vector" of the piece (Krumhansl, 1990, pp. 78-80). The
2. The original data were gathered for a variety of keys, but there was little variation between major keys (after adjusting for transposition), so the data were averaged over all
major keys to produce a major-key profile that was then used for all major keys; the same was done for minor keys (Krumhansl, 1990, pp. 25, 27).
Figure 2.1: The Krumhansl and Kessler key profiles for major (top) and minor (bot-tom) keys, from [76]
CHAPTER 2. BACKGROUND 8
This equation is a standard correlation, with mean and variance removed so that
the major and minor key scores can be compared. Temperley [76] suggested elim-
inating the denominator from this equation for easier calculation, though with the
Krumhansl-Kessler key profiles, this illegitimizes comparisons between major and
minor correlation scores1.
The K-S algorithm works quite well, and has become one of the most widely
used key-finding algorithms. It is easy to implement and runs sufficiently fast for
most applications. It also requires very little music theory to understand and is
conceptually straightforward. On a high level, it makes sense that certain notes will
be played more than others in a given key. The fact that this intuition is confirmed
with Krumhansl and Kessler’s perceptual data further validates the approach.
However, there are also several well-known and well-founded concerns with the
K-S algorithm. Researchers have questioned the effectiveness of ignoring timing in-
formation as well as the methods themselves in the perceptual test that derived the
key profiles.
Concerns regarding the exclusion of local timing information
The primary concern with the K-S algorithm is that it is a distributional algorithm,
meaning the input vector ignores all timing and sequencing information, using only
the frequency of a pitch class to determine the key. Many have argued that a metric so
statistical is not sufficient for key identification, suggesting ordering, location, group-
ing, and intervals are also relevant traits, among others (often called structural or
functional attributes [24, 9, 25]). It is relatively simple to generate example melodies
that share their pitch distribution but induce different keys (such as G−E −D−Cimplying C major while E−C−D−G indicates G major), demonstrating that other
information is needed in these cases. In a study by Brown et al [7], it is demon-
strated that the presence of a sequenced tritone (a so-called rare interval) influences
a listener’s key decision.
Several studies have examined the signifiance of this limitation. West and Fryer
[88] asked subjects to rate a probe tone as a tonic after the notes of a scale are played
1Temperley addresses this by proposing other key profiles, as will be discussed later.
CHAPTER 2. BACKGROUND 9
in a random order. When the results do not show consensus for either musicians
or non-musicians, it is concluded that sequencing information is necessary for key
estimation. In [55], Matsunaga and Abe present musicians and non-musicians with
similar randomly ordered pitch sequences, this time generated from the sequence
{C,D,E,G,A,B}, and ask them to identify the key. This time, the results gener-
ally suggest that the pitch set is an important characteristic in key identification,
as listeners were able to agree on the key of the sequence in many cases. But, in
other cases, significant disagreement among the listeners indicates that pitch-class
distribution alone is not sufficient for defining a key. Interestingly, in both of these
studies, musicians and non-musicians responded similarly, suggesting that key per-
ception does not require or improve significantly with musical training. Two other
studies [72, 79] draw the randomly ordered sequences from statistical distributions,
and in both cases the listeners agree on the key in a reasonable number of cases. In
general, the perceptual tests confirm that pitch distributions are useful in identifying
the key, though they are not always completely sufficient.
Unfortunately, the loss of local timing information is unavoidable with distribu-
tional key-finding algorithms (in fact, it is fundamental to them). However, in [79],
Temperley and Marvin observe that most of the examples used in the above stud-
ies are relatively short (sometimes only a few notes long) and simple. So, somewhat
counterintuitively, the lack of local timing may be less of a concern for longer excerpts,
when more samples (notes) give a theoretically more accurate pitch distribution.
Huron and Parncutt [36] also proposed an extension of the K-S algorithm that
tries to account for these timing concerns. Their model includes weightings related to
perceptual salience of the input, and also to the decay over time (a memory effect).
By incorporating these features, they are able to improve the model’s performance
on several examples from Krumhansl and Kessler’s original report that were shown
to have time dependence.
Concerns regarding the exclusion of global timing information
A related concern to the lack of local timing information is the lack of global timing.
Distributional key-finding algorithms must assume that the key is stationary over
CHAPTER 2. BACKGROUND 10
the entire excerpt used for the estimate. This makes the inclusion of a modulation
extremely difficult to handle. When a full score is collapsed down to 12 duration
measurements and given only one key estimate, it is simply not possible to find and
label modulations, should that be desired. Furthermore, a modulation will corrupt the
one key estimate that is given. This is because, if there is a modulation somewhere
in the score, then the notes will be drawn from two key distributions, that of the
original key and that of the modulated key. As a result, the pitch distribution will
be a hybrid of the two, and will not fit as well into a classification of either key.
This effect is not always undesirable. The hybridization of the input vector as a
result of content from multiple keys can be used as a tool for understanding the music
itself [78]. Also, a study showed that the strength of a melody’s key match correlates
to a perceptual judgment of that melody’s tonal structure [75].
However, if finding modulations is desired, these issues can also easily be solved
by limiting the duration of the musical excerpt used. The first example of this was
proposed by Krumhansl [43], in which a separate key decision is made for each measure
of the score by combining the current measure’s input vector with weightings of the
neighboring measures.
This idea was also extended by Shmulevich and Yli-Harja [70], in which a fully
sliding window is used. The output can then be filtered with either a median filter or
a graph-based smoothing, resulting in a stepwise set of key judgments. An advantage
to this approach is that the very definition of a modulation (a full change in key,
as opposed, for example, to tonicization, in which harmonic movement only briefly
moves towards a secondary key) can be controlled by changing the parameters of the
smoother. The authors also show that this approach can be used for musical pattern
recognition.
Temperley [76, 77] proposed an approach for handling modulations by assigning
a penalty to a key change. In this system, the optimal key judgment is then deter-
mined by some combination of how well a key fits the pitch distribution with the
penalty assigned to each key. This approach naturally leads to a Bayesian framing
of the algorithm, in which the key correlation scores and key-change penalties are
CHAPTER 2. BACKGROUND 11
represented with probabilities. Additionally, a variation on the input vector was pro-
posed, in which, within a short window, only a binary present/absent metric is kept
for each of the 12 pitch classes, preventing the algorithm from overvaluing anomalous
pitch repetitions (though this obviously also increases sensitivity to notes that appear
sparsely).
Concerns regarding the probe-tone method
Another concern about the K-S algorithm involves the perceptual test itself. It has
been suggested that the probe-tone method is biased by the timing of the test tone,
causing subjects to judge how the tone fits at the final position of the sequence
rather than simply how it fits with the sequence. Aarden [2] tested this hypothesis,
with results suggesting that listeners are indeed biased to hear the probe tone as the
phrase-final note. One common extension to address this issue is to replace the key
profiles derived by Krumhansl and Kessler.
Drawing from the fundamental assumption of distributional key-finding that a key
is determined statistically, Aarden derived new key profiles learned from the statistics
of a musical corpus. Tests showed that these new profiles performed better than
the original K-K profiles. Temperley and Marvin [79] also used profiles statistically
derived from a musical corpus to create the musical sequences used in their perceptual
study.
Temperley [76] tried a modification on the key profiles in which entries are rounded
and simplified, leading to profiles with most scores shared between the major and
minor keys, with only the 3rd, 6th, and 7th degrees differing, as is the case in the scales.
The major and minor profiles also have equal norms (this allows for the elimination
of the denominator in the correlation in Eq. (2.1), as previously discussed).
Hu and Saul [35] learned key profiles from a musical corpus using Latent Dirichlet
Allocation (LDA). In this context, not even key was defined. Rather, musical scores
were analyzed and organized into 24 categories, or topics, and the key profiles were
derived as the mechanism for assigning these topics.
It is worth noting, though, that in all of these cases, the new key profiles share
many of their characteristics with the original K-K key profiles.
CHAPTER 2. BACKGROUND 12
2.2.2 Other key-finding algorithms
Most key-finding algorithms outside the K-S family are referred to as structural or
functional algorithms, so called because they incorporate the timing and ordering
data.
One early example is a rule-based system proposed by Longuet-Higgins and Steed-
man [52, 51]. In the approach, the notes of a melody are looked at one at a time.
Starting from the beginning, each note eliminates from consideration all keys in which
the note is absent. This process is continued, moving along the melody, until only
one key remains. If multiple keys remain at the end of the melody, then the key is
selected among the remaining candidates based on the first note in the melody. The
theory behind this approach is based on the observation that the key-tones for a given
key occupy a compact space in the Tonnetz, seen in Fig. 2.2.
The rare-interval approach to key-finding [9, 7] is another structural algorithm
that seeks out sequential intervals that are seen as specific to certain keys. The
most prominent example is the tritone, which is largely unique to a single key, such
as between the 4th and 7th degrees of a major scale. It is suggested that the most
perceptually likely method for key identification is to determine a tonal center based
on evidence of these rarely occurring intervals.
Rizo et al [61] proposed a key-finding algorithm derived from a tree representation
for representing melodies [60]. The tree structure is built so that time is represented
from left to right and duration of a note is represented by tree level. By iteratively
assigning a key to each leaf in the tree and moving up the representation, one key
estimate is left in the end.
The Center of Effect Generator (CEG) is a distributional key-finding algorithm
(like the K-S algorithm) proposed by Chew [13, 14] based off her geometric represen-
tation, the Spiral Array (seen in Fig. 2.4). Each combination of notes is represented
by a single point, called the center of effect (c.e.), and the key is defined based on
the distance between the excerpt’s c.e. and the c.e. calculated for a key. Like
Longuet-Higgins and Steedman’s algorithm, the CEG is based on the observation
that key-based tonal notes form compact spaces and shapes in the Spiral Array. In
fact, the Spiral Array is based on the Tonnetz, so it is not surprising that key-finding
CHAPTER 2. BACKGROUND 13
algorithms in each space are grounded in similar properties.
2.3 Meter Induction
Meter generally refers to the organization of the rhythm and beats in music. It
consists of a periodic system of emphasized down beats, and is generally broken into
segments based on either 2 (duple meter) or 3 (triple meter) beats, or one of several
common multiples. Like the key, determining the meter is usually very natural and
intuitive for a human listener, but, as of yet, though great strides have been made,
computational models have not achieved comparable results. The meter of a musical
piece can be valuable information on its own, and is also important for tasks like
musical transcription [66] and editing [11].
The field of meter induction is also not quite as widely studied as key finding,
but some past work has aimed at computationally extracting meter. Beat tracking
and tempo estimation from real audio are related rhythmic tasks, but these tasks are
different than meter induction and therefore will not be dealt with here (though a
few systems have tried to incorporate them all together on some level [31, 42]). The
diffusion-based organization to follow is based on the autocorrelation approach for
meter induction, but several other methods will be discussed as well.
Most meter induction methods are based on symbolic data, but can also be applied
to musical signals, typically after extracting onset information.
2.3.1 Autocorrelation
The concept of applying autocorrelation to meter induction was first proposed by
Brown [8]. Using an autocorrelation function to analyze the rhythmic timing of events
is designed to capitalize on the periodic structure of meter. Presumably, the music
has repetitions of rhythmic sequences to create the sense of meter, and these rhythmic
sequences occur at similar places within the metrical structure. The autocorrelation
function will show spikes at the timing difference between the repeated events, and
that timing difference is a clue toward the meter.
CHAPTER 2. BACKGROUND 14
Autocorrelation has also been applied to isochronous melodies using pitch infor-
mation [86]. In addition to being a relevant example of autocorrelation, this is also
one of the few times that pitch information has been incorporated into meter induc-
tion, even though pitch information has been shown to be perceptually relevant to
meter [32].
Toiviainen and Eerola [81, 82] extended these models by weighting the rhythmic
events based on several criteria, including duration, melodic accent, melodic interval,
and melodic trajectory. Results suggested that autocorrelation functions based on
only the onset timing with no accent weighting at all yielded the most accurate meter
classification.
2.3.2 Other meter induction methods
Longuet-Higgins and Steedman [52] proposed a system for meter induction very simi-
lar to their key-finding algorithm. In the meter version, rhythmic events are analyzed
one at a time starting from the beginning, eliminating meters that are unlikely to
see that event with each new input. This rule-based system is the metrical dual to
the key-finding version, in which the pitch-classes are analyzed and eliminate keys
instead of rhythmic timing eliminating meters.
The Generative Theory of Tonal Music (GTTM) [49] is an important example of
computational music analysis with strong ties to meter. Fundamentally, it generates
structure and understanding of a musical piece with a predictive model based on a
set of analysis rules. In the context of meter induction, GTTM strictly ties meter to
tonal harmony by organizing music into a metrical structure which is derived from
multiple levels of meter and rhythm. This approach aims to extract and explain the
sense of strong and weak beats that music creates for a listener and is a key aspect
of meter. Based on this accent structure, the predictive model and its set of rules
determine the meter that a listener is likely to experience for a given musical excerpt.
Large and Kolen [47] proposed a system for determining meter based on the oscil-
lations of resonators. In it, a bank of oscillators organize based on a rhythmic input,
and the resulting organization helps determine the meter. This process is suggested
CHAPTER 2. BACKGROUND 15
to model human perception of meter.
Parncutt [59] suggested another model for the perception of musical meter. In-
corporating several aspects of GTTM, the model estimates the accent structure of a
rhythmic input through a multi-stage process in which different types of accents are
separately estimated. The model then estimates both the meter and the expressive
timing of the input. This model was designed based on several perceptual tests, and
behaves similarly to those experimental results for the same tests.
Dixon [26] suggested using multiple metrical hypotheses, determining the meter
as the hypothesis that best fits with the inter-onset timing data. This specific system
was used on the very difficult task of determining meter for expressive performances,
in which the tempo is not guaranteed to be consistent throughout the piece. Including
a perceptual estimation of the metrical relevance of rhythmic events in the hypothesis
search as a weighting improves the results as well. These perceptual estimations make
use of pitch information in addition to the temporal information.
2.4 Musical Visualization
Visualizing music has long been a tool for analysis and understanding of musical
structure on many levels. From analyzing chords to estimating keys, mapping the
musical data to some corresponding geometric space has been extensively used to
help find a deeper understanding of music theory and its underlying principles.
Generally speaking, visualizations of music theory can mostly be separated into
two categories: those which organize the pitch classes (C, C], D, D], ...) or notes
(pitch class with octave information), and those which visualize the keys (C major, c
minor, C] major, c] minor, ...). As a notational matter, uppercase letters indicate a
major key, while lowercase letters indicate a minor key.
2.4.1 Note or Pitch-Class Visualizations
One of the earliest attempts to visualize music theory is called the Tonnetz, also called
the table of tonal relations or the harmonic network, which plots the 12 pitch classes
CHAPTER 2. BACKGROUND 16
B F# C# G# D#
D A E B
1R L.VL r
Bb F C G D
Db Ab Eb Bb
(a) A region of traditional 2-D Tonnetz and three contextual inversions
(sharing a common edge) with a C-major triad
/M3 axis
P5 axis (b) The axis system of the traditional 2-D Tonnetz
Figure 1
and have labeled these a, b, and c (avoiding labels x, y, and z to distin-
guish my axes from those of the Cartesian coordinate system). Axes are labeled according to a right-handed convention (right thumb points in a
positive direction along the a-axis, right index finger in a positive direc- tion along the b-axis, middle finger in a positive direction along the c-
axis). Points within the lattice are arranged at unit distances in positive and negative directions along the axes from all other points. The regular arrangement of points in this Tonnetz constitutes one of two uniform
ways of filling space with spheres-crystallographers refer to this ar-
rangement as cubic closest packing (ccp).7
Throughout this discussion, we will assume equal temperament. Do-
ing so induces a modular geometry to the 3-D Tonnetz-the Tonnetz
occupies the closed, unbounded volume of a hyper-torus in 4-dimen- sional space. Units in a positive direction along the a-axis correspond to the musical interval of 4 semitones, units in a positive direction along the b-axis correspond to the interval of 7 semitones, and units in a positive direction along the c-axis correspond to the interval of 10 semitones. Any
197
Figure 2.2: The Tonnetz, harmonic network, or table of tonal relatives (from [29]),including the visualizations of parallel (P), leading-tone (L), and relative (R) triadrelationships from Neo-Riemannian theory.
in a two dimensional space, seen in Fig. 2.2. An early version was first proposed by
Leonhard Euler [27] as a 4 x 3 matrix with notes separated by a fifth on one axis
and by a major third on the other. Later, Arthur von Oettingen [57] expanded on
the concept by transposing the visualization and extending as an infinitely repeating
pattern. The visualization was more recently revived in music theory by Daniel
Lewin [50] in 1982, who, along with others [37, 15, 16], recognized the relevance
of the mapping to the fundamental triads of Neo-Riemannian theory. This leads
to interesting visualizations of music theory and voice-leading in which the musical
processes are visualized as rotations and transformations of triangles in the Tonnetz.
In addition to uses for music theory, the Tonnetz has been used as a tool for key
estimation, based on the observation that the primary chords for a certain key form
compact shapes in a unique region for that key, though in this case the Tonnetz was a
2D projection of a proposed 3D mapping that used the octave as the third dimension
[52, 51]. Other 3D extensions have been used as well [29]. The Tonnetz has also been
animated for visualizing the evolution of consonance throughout a score [6].
However, an inconvenience of the Tonnetz visualization is that, in order to ac-
count for the cyclic aspects of Western music theory, it must repeat infinitely in each
direction. Along with being cumbersome, this requires that each note occupy multiple
CHAPTER 2. BACKGROUND 17
8 • C. L. Krumhansl
!
!
"#$!#%&'()*+,!-.!/.)*+)0-.&*.),1!2%34!51!6%4!71!89)%:*+!;<<=4!
&%,)!3->*3?!)+-0@!A%+!0!,*)!%A!)%.*,4!B)!-,!*C'3-9-)!-.!#D*E!0.@!F+0.G%-,H!I)D-,!-,,(*J!-.!)D*!
K-,(03!+*'+*,*.)0)-%.!%A!9D%+@,1!ED-9D!0''*0+!0,!)+-0.L3*,!-.!)D*!,'-+03!0++0?4!".@!)D*!>*?,!
E-)D!)%.-9,!,*'0+0)*@!:?!)D-+@,!0''*0+!%.!D%+-M%.)03!3-.*,!-.!)D*!)%+(,!%A!>*?,!IN%-K-0-.*.1!
)D-,!-,,(*J4!
ND*,*!)E%!-.)*+K03,1! )D*!'*+A*9)!A-A)D!0.@!&0O%+!)D-+@1!03,%!0''*0+!-.!0!L+-@!'+%'%,*@!:?!
P%.L(*)QR-LL-.,! 0.@! S)**@&0.! TUVWUX4! ND*! L+-@! E0,! (,*@! -.! )D-,! *0+3?! 0))*&')! )%!
0()%&0)*!>*[email protected]!R*+*!)D*!A-A)D,!0+*!%.!)D*!D%+-M%.)03!@-&*.,-%.!0.@!)D*!)D-+@,!0+*!
%.! )D*! K*+)-903! @-&*.,-%.1! 03)D%(LD! %)D*+E-,*! -)! -,! *,,*.)-033?! *Y(-K03*.)! )%! )D*!
+*'+*,*.)0)-%.! '+%'%,*@! :?! /33-,4! ND*?! .%)*! )D0)! )D*! ,903*! )%.*,! %A! #! &0O%+! A033! -.! 0!
9%&'09)! +*L-%.! I,D%E.! %.! )D*! 3*A)J! 0.@! )D*! ,903*! )%.*,! %A! #!&-.%+! A033! -.! 0! 9%&'09)!
+*L-%.!I,D%E.!%.!)D*!+-LD)J4!ND*?!(,*@!)D-,!&0'!%A!)%.*,!0,!)D*!:0,-,!A%[email protected]!)D*!>*?,!
-.! Z09D[,! \*33Q)*&'*+*@! #30K-*+! I,**! ]+(&D0.,3! TUVV<X1! A%+! &%+*! @*)0-3,! %A! )D*-+!
0''+%09DJ4!
!
!!F-L4!74!ND*!&0'!%A!&(,-903!)%.*,!(,*@!-.!P%.L(*)QR-LL-.,!0.@!S)**@&0.[,!TUVWUX!>*[email protected]!03L%+-)D&4!!
!
!
!!F-L4!=4!NE%!D*3-903!+*'+*,*.)0)-%.,!%A!&(,-903!'-)9D!'+%'%,*@!:?!SD*'0+@!TUV^;X1!)D*!,-.L3*!D*3-C!I3*A)J!:0,*@!%.!
'-)9D!'+%C-&-)?!%.!)D*!9D+%&0!9-+93*!0.@!%9)0K*!*Y(-K03*.9*1!0.@!)D*!@%(:3*!D*3-C!I+-LD)J!:0,*@!%.!)E%!
ED%3*!)%.*!,903*,!ED*+*!)D*!'+%O*9)-%.!%.)%!)D*!D%+-M%.)03!-,!)D*!9-+93*!%A!A-A)D,4!Figure 2.3: Several geometric representations proposed by Shepard, from [45].
locations in the projection, a conceptually problematic proposition. For this reason,
many other theory-based visualizations use circular shapes.
One such visualization was proposed by Shepherd [69], in which the notes are
organized as a helical structure, increasing chromatically as they ascend the spiral
(Fig. 2.3). The helix is organized so that octaves of the same pitch class are directly
above or below each other. A projection down the octave dimension (the height of the
helix) yields the chromatic circle. The representation was also extended to include
the perfect fifth, leading to a double helix (labeled (c) in Fig. 2.3).
Chew [13, 14] proposed a representation that built off of the previous visualiza-
tions. Her mapping, called the Spiral Array (Fig. 2.4), organizes the notes as an
ascending spiral, increasing by perfect fifths and rotating 360◦ every four notes. This
creates a vertical axis that relates the major thirds. Chew also observes that this ar-
rangement is essentially a reshaping of the Tonnetz in which the redundancies overlap
by rolling it into a spiral shape. In the Spiral Array, chords occupy unique triangular
spaces. By projecting these triangles onto a representative point, the chords then
create a similar spiral within the note spiral. Then, by similarly defining a key based
on its tonic, dominant, and subdominant chords, a unique triangle for each key can
be projected on its own point, creating yet another spiral within the note and chord
CHAPTER 2. BACKGROUND 18
CG
D
A
E
B
F
B"
ascend by a
perfect fifth
interval
ascend by a
major third
interval
Figure 2.4: The Spiral Array, from [13]
spirals. This creates an interesting space for simultaneous visualization and analysis
of multiple levels of music.
Others have created more mathematically-derived spaces to geometrically map
the notes and chords [33, 84, 10]. These spaces are unique from the others in that
they are based on musical operations rather than simply notes or chords. One such
space is, by design, unaffected by octaves, permutations, transpositions, inversions,
and cardinality changes (the so-called OPTIC operations), because it is argued that
the musical listening experience is largely immune to these operations as well.
2.4.2 Key Visualizations
Schoenberg [67] developed a mapping, seen in Fig. 2.5, that is somewhat similar to the
Tonnetz (Fig. 2.2). However, unlike the Tonnetz, the elements in the visualization are
the 24 keys (rather than the 12 pitch classes). This mapping organizes keys such that
dominants are neighbors along the vertical axis while the horizontal axis proximates
relative and parallel major/minor keys. The design is such that the nearest neighbors
for a key are also the most common modulations seen in Western music.
Another mapping of the keys originates from the key-profiles that Krumhansl and
Kessler [46] derived from perceptual data. Using Multidimensional Scaling (MDS)
on the inter-key distances created from the key-profiles, they found a 4D space (Fig.
CHAPTER 2. BACKGROUND 19
The Geometry of Musical Structure: A Brief Introduction and History • 5 !
!
"#$!#%&'()*+,!-.!/.)*+)0-.&*.),1!2%34!51!6%4!71!89)%:*+!;<<=4!
!!
!!
>-?4!;4!@*'+*,*.)0)-%.,!%A!B*C!+*30)-%.,D-',!A+%&!E)%'F!G9D%*.:*+?!HIJKJLIJ=7M!0.N!E:%))%&F!O+(&D0.,3!0.N!
O*,,3*+!HIJP;M4!
!
!
'(:3-,D*N! IQ5=4! RD*! 9-+93*! %.! )D*! +-?D)! 9%&*,! A+%&! S*-.-9D*.! HIJKJM1! %+-?-.033C!
'(:3-,D*N! IQ;P1!TD%!T+%)*! 0! '-*9*! %A!&(,-9! )D0)! '+%?+*,,*,! ,C,)*&0)-9033C! 0+%(.N! )D*!
9-+93*4! ERD-,! 0.N! )T%! %)D*+! ,(9D! '-*9*,! 90.! :*! A%(.N! -.! !"#$$% &'()*+,% -)#*,$(% ./#%
0$12/+#3!H@0,9D!IJP5M4F!
U%)D! 9-+93*,! 0+*! )+C-.?! )%!?+0''3*!T-)D! )D*!'+%:3*&! )D0)! *V*+C!&0W%+!B*C! -,! 93%,*3C!
0,,%9-0)*N!T-)D!A%(+!N-AA*+*.)!B*C,X!)D*!B*C,!T-)D!)%.-9,!0!'*+A*9)!A-A)D!0:%V*!0.N!:*3%T!
-),!)%.-91!0.N!)D*!+*30)-V*!0.N!'0+033*3!&-.%+!B*C4!#%.,-N*+!)D*!+*A*+*.9*!B*C!#!&0W%+!E#!
N(+1!0)!)D*!)%'!%A!*09D!9-+93*F4!",!.%)*N!0:%V*1!-)!,D%(3N!:*!93%,*!)%!Y!&0W%+!E0!A-A)D!0:%V*!
#!%.!)D*!9-+93*!%A!A-A)D,F!:*90(,*!)D*-+!,903*,!N-AA*+!%.3C!-.!%.*!)%.*!E>!V,4!>ZF4![)!-,!03,%!
93%,*!)%!>!&0W%+!E0!A-A)D!:*3%T!#!%.!)D*!9-+93*!%A!A-A)D,F!TD%,*!,903*!03,%!N-AA*+,!A+%&!#!
&0W%+!%.3C!-.!%.*!)%.*!EU!V,4!U2F4!#!&0W%+!-,!03,%!93%,*!)%!"!&-.%+!E"!&%33F1!-),!+*30)-V*!
&-.%+1!:*90(,*!)D*!#!&0W%+!,903*!E#!\!/!>!Y!"!UF!D0,!)D*!,0&*!)%.*,!0,!)D*!"!&-.%+!
.0)(+03! ,903*! E"! U! #! \! /! >! YF1! %.3C! )D*C! :*?-.! %.! N-AA*+*.)! .%)*,! 0.N1! )D(,1! D0V*!
N-AA*+*.)!)%.-9,4!>-.033C1!#!&0W%+!-,!03,%!93%,*!)%!#!&-.%+1!-),!'0+033*3!&-.%+1!TD-9D!D0,!
)D*! ,0&*! )%.-9! E0.N! 03,%! )D*! ,0&*! A-A)D! ,903*! )%.*! 0.N!2! 9D%+N1! ,)+(9)(+033C! -&'%+)0.)!
*3*&*.),!-.!)D*,*!B*C,4F!!
RD*!A-+,)!9-+93*!0))*&'),!0!,%3()-%.!:C!?+%('-.?!)%?*)D*+!)T%!.*-?D:%+-.?!&0W%+!B*C,!
%.! )D*!9-+93*!%A! A-A)D,!EA%+!*]0&'3*1!#!&0W%+!0.N!Y!&0W%+F!0.N!A30.B-.?! )D*&!%.!*-)D*+!
,-N*!T-)D! )D*-+! +*30)-V*!&-.%+!B*C,!E0!&-.%+!%.! )D*! 3*A)!%A!#!&0W%+!0.N!/!&-.%+!%.! )D*!
+-?D)!%A!Y!&0W%+F4!RD-,!,%3()-%.!,0)-,A-*,!)D*!+*30)-V*!&0W%+^&-.%+!+*30)-%.,D-'!0.N!B**',!
Figure 2.5: Schoenberg’s spatial mapping of keys (from [45]), with the key region forC major marked
TONAL ORGANIZATION IN MUSIC
MULTIDIMENSIONAL SCALING SOLUTION OF KEYS
345
ft ««Db ^
g» f Ek
Bc' c gi,
E»
iD e G°
g»Al>E c e
f c8 BrEkf a G
b
Ok ~ g d*
F A
fc DBk
d b* p»
DIMENSIONS I AND 2 DIMENSIONS 3 AND 4
Figure 4, The four-dimensional multidimensional scaling solution of the intercorrelations between the24 major and minor key profiles (stress = .017). (The projection of the solution onto the first twodimensions is shown on the left. In this projection the circle of fifths for major and minor keys wasobtained. The projection onto the last two dimensions is shown on the right. Major and minor keysseparated by an interval of a major third were represented as single points in the solution. Each majorkey was located next to its parallel minor key on one side and its relative minor key on the other.Similarly, each minor key was flanked by its parallel and relative major keys.)
mensions as shown in Figure 5, where it isunderstood that the opposite edges of therectangle are identified.
To see the similarity between this repre-sentation and the more familiar (but mis-leading) description of a torus as an inner-tube, first visualize folding the rectangle overto identify the two horizontal edges, makinga hollow tube, followed by wrapping itaround to line up the two open ends. Notonly does this introduce distortions of dis-tances (the inner radius is smaller than theouter radius) but it also leads to false intu-itions about the interdependence of coordi-nates in the different dimensions. For thesereasons the flattened out representationshown in Figure 5 is preferable.
The (B, <j>) coordinates of the 24 keys werecomputed from the four-dimensional scalingsolution and are shown in the torus repre-sentation in Figure 5. Because of small de-viations from perfect circularity in the ob-tained scaling solution, ideal coordinateswere computed for each key such that thepoints were constrained to fall on the surfaceof the torus. The program CMH2 (Cliff,1966), which shifts, rotates, and expands orcontracts one configuration to best match asecond configuration, was used to comparethe ideal configuration with that actuallyobtained in the scaling solution. This pro-
gram gave a correlation of .998, indicatingthat the torus is in fact a very accurate rep-resentation of the four-dimensional scalingsolution of the profile correlations.
More importantly, when viewed as (6,<j>) coordinates on a torus, the pattern of in-terkey distances becomes entirely interpret-able. First of all, the keys separated by fifthsfall on a path that wraps three times aroundthe torus before joining up with itself again;the major keys fall on one such path, andthe minor keys on another, parallel path.These are lined up so that any major key isflanked by its relative minor on one side andits parallel minor on the other. These par-allel, relative, and fifth relations are madeexplicit for C major in Figure 5.
There is a striking similarity between themap of tonal regions given by Schoenberg(1969) for major and minor keys (Figure 1)and local regions of the toroidal represen-tation obtained here (Figure 5). The torus,however, has the advantage of simulta-neously depicting all interkey relations, notjust those immediately surrounding a singlemajor or minor key. Moreover, precise quan-titative comparisons between interkey dis-tances can be made from the torus repre-sentation. For example, a major key is infact found to be more closely related to itsrelative than to its parallel minor as seen,
(a) 4D space
346
ek/d
CAROL L. KRUMHANSL AND EDWARD J. KESSLER
f » d bk-
qt
a ' fx /
Relative^ /
V«. 1 Parallel A"
/ Circle of fifths
eFigure 5. An equivalent two-dimensional map of the multidimensional scaling solution of the 24 majorand minor keys. (The vertical [dashed] edges are identified and the horizontal [solid] edges are identified,giving a torus. The circle of fifths and parallel and relative relations for the C major key are noted.)
for instance, as the closer distance betweenC major and A minor than between C majorand C minor. Another finding is that a majorkey is closer to the minor key built on itsthird scale degree (the relative minor of thedominant key) than it is to the minor keybuilt on its second scale degree (the relativeminor of the subdominant key). For in-stance, C major and E minor are closer thanare C major and D minor. In fact, this wasanticipated by Schoenberg (1969, p. 68) andmay reflect the greater number of chordsshared by the major key and the relativeminor of the dominant key. Other compar-isons of this sort can easily be made usingthis spatial map of key regions. Moreover,this representation provides a framework forapproaching the problem taken up later ofhow chords relate to different tonal centersand how the sense of key develops as thelistener hears sequences of chords.
Other spatial representations. Other pre-viously proposed spatial representations havebeen described in detail by Shepard (1982a;see also Shepard, 1981, 1982b) and will bementioned only briefly here. The first suchrepresentation in the literature is the helicalstructure of single tones (Drobisch, 1846,cited in Ruckmick, 1929; Pickler, 1966; Re-vesz, 1954; Shepard, 1964). In this three-dimensional configuration, the single tonesare spaced along a helical path in order of
increasing pitch height such that tones sep-arated by an octave interval are relativelyclose as the helix winds back over itself onsuccessive turns. The projection of the pointson the plane perpendicular to the verticalaxis of the helix is often referred to as "tonechroma," and the projection on the axis as"tone height." This representation, then, si-multaneously specifies the close relation be-tween tones separated by small intervals andthat between tones at octave intervals. Thishelical representation is preferable for mu-sical pitches to the unidimensional psycho-physical scale of pitch that combines bothfrequency and log frequency as proposed byStevens (Stevens & Volkmann, 1940; Ste-vens, Volkmann, & Newman, 1937), be-cause there is a strong identification betweentones differing by octaves. This octave effectis seen in their interchangeable use in musictheory and composition and in judgments ofintertone relatedness (Allen, 1967; Boring,1942, p. 376; Krumhansl, 1979; Licklider,1951, pp. 1003-1004; and numerous othertreatments). Shepard (1982a) argues thatthe tones should be equally spaced in termsof log frequency around the helix becauseboth the selection of musical tones and per-formance in transposition tasks (Attneave& Olson, 1971) indicate that the log fre-quency scale applies to musical pitches.
Shepard (1981, 1982a, 1982b) has re-
(b) 2D remapping
Figure 2.6: The MDS derived mapping of the keys from Krumhansl and Kessler’skey-profiles, and its 2D mapping from the angles of the circles, from [46].
2.6(a)) in which two dimensions create major and minor circles of fifths (though the
circles are related by a major second instead of the more common relative or parallel
relationship) and the other two dimensions group major or minor keys separated by
a major third. Krumhansl and Kessler observed that these two circular shapes could
be transformed to a single 2D space in which the coordinates are defined by the
angle within each circle that a key occupies, seen in Fig. 2.6(b). Interestingly, this
transformation creates a space almost identical to Schoenberg’s.
Many other key-based visualizations have sought to map the key of a given musical
excerpt, oftentimes utilizing one of the visualizations already discussed for the process.
Mardirossian and Chew [54] animate the process as the music progresses, dynamically
CHAPTER 2. BACKGROUND 20
(a) Linear scale (b) Log scale
Figure 2.7: Two harmonic visualizations of Mozart’s Sonatina No. 1 in C majorK439B (Viennese), 1st Movement, from [62].
changing circles’ color (key) and size (confidence in that key estimate) to visualize
the key-based content of the piece. Toiviainen and Krumhansl [83] use contour maps
to visualize the strength of key decisions for an excerpt. Both of these visualizations
show the relative strength of keys, which offers great insight into the tonal content
of an excerpt. Animating the process visualizes the evolution of that tonality over
time, as well. The KeyGram and Key Correlation visualizations in [30] take similar
but slightly different approaches to the previous visualizations.
Sapp [63, 64] proposed a completely unique space for key estimation visualization.
The system hierarchically colors according to the key estimates derived from any key
detection algorithm performed on sliding windows of multiple lengths. Sapp presents
two versions of the visualization: a triangular shape that is linear in window length
(Fig. 2.7(a)), and a rounding of the triangle to give equal resolution to different
time scales (Fig. 2.7(b)). The plots are suggested as a means for comparing key
estimation algorithms. Sapp also observes that the hierarchical nature of the plots
gives a Schenkerian representation of the musical selection, and also complements
Lerdahl and Jackendoff’s hierarchical computational analysis [49].
CHAPTER 2. BACKGROUND 21
2.4.3 Other visualizations
Not all visualizations use notes or keys as the basic unit for graphing. One common
goal is to geometrically represent an entire musical excerpt of melody as an object.
Foote and Cooper [28] suggest using a self-similarity matrix to visualize the structure
of music. Bello [5], as a step towards grouping music, demonstrated that performing
MDS on a self-similarity matrix graphs an excerpt as a trajectory through a low-
dimensional space. It should be noted that this approach is similar to the diffusion-
based visualizations to be presented.
Online, there are also several visualizations for the structure of a musical excerpt.
Wattenberg [87] creates a series of arcs and semicircles to show the structural repe-
titions and relationships in a musical piece. Malinowski [53] animates the score with
color and flowing circles to represent the performance of the piece.
In the field of music visualization, there have also been a few particular innovative
and unique examples. Aarden and Huron [1] created a program for plotting the
geographical origins of certain aspects of music theory onto a map, visualizing the
cultural origins of musical entities. Janata et al [40] show the regions of the brain
activated by certain tonal events. While this research was not intended specifically as
a visualization of the music, the mapping of the neural activity is itself a visualization
of the music, passed through the filter of the perceptual system.
2.5 Machine Learning in Music Analysis
Machine learning has permeated most engineering fields, and computational music
analysis is no exception. In musical signal analysis, machine learning is widely used.
Statistical learning approaches have been proposed for artist identification, instru-
ment identification, note/chord extraction, cover song identification, and music rec-
ommendation, among other tasks.
On the symbolic level, machine learning has played an important role, though not
quite as pervasive as in signal analysis. This is potentially, in part, because the type
of learning needed in symbolic music analysis is oftentimes more easily incorporated
CHAPTER 2. BACKGROUND 22
automatically than learned through data. Additionally, many of the high-level tasks
in music analysis are based on human perception, which is possible to incorporate
into a machine learning system, but requires large quantities of manually labelled
data, which is always difficult to find. So, instead of using musical data to teach an
algorithm music theory and human judgment, those aspects are sometimes included
in the algorithm itself.
There are several examples where researchers chose to incorporate musical knowl-
edge for a classification task rather than use pure machine learning. In the K-S
key-finding algorithm, Krumhansl and Kessler’s perceptually derived key profiles [46]
were incorporated into the algorithm automatically. Temperley’s synthetic profiles
[76] were also created manually. Chew’s Center of Effect Generator (CEG) key-finding
algorithm [13] also incorporates prior knowledge of music theory, including the chord
composition of a key, rather than learning it from data. In meter classification, it is
common to recognize that, because the vast majority of western music is written in
either duple or triple meter, an autocorrelation feature will have stronger peaks at
lag times that are either multiples of 2 or 3 [8, 82].
Another popular application of machine learning principles is in the field of algo-
rithmic composition [58, 23, 34]. Algorithmic computing in general is a field in which
music is composed by following a set of rules or instructions, often by a computer in
recent years, though a computer is not necessary. In many cases, however, machine
learning principles are included so that the system can learn the rules from a musical
set. A model is created based on the existing music, and then that model can be
used to create new music. Preexisting knowledge can often be incorporated in these
systems as well.
However, there often remains motivation for utilizing pure machine learning into
symbolic music analysis.
2.5.1 Unsupervised
Unsupervised machine learning typically relies on automatic pattern or structure
recognition within the data rather than requiring the patterns to be defined by known
CHAPTER 2. BACKGROUND 23
labels. In this regard, it is well suited for musical applications, because most attributes
of music are well structured and labels are difficult to come by for high-level tasks.
One example of unsupervised learning is the key-finding algorithm proposed by
Hu and Saul [35]. In this system, key-profiles were derived in an unsupervised fashion
using Latent Dirichlet Allocation (LDA). Instead of defining the 24 keys, this system
categorizes the musical corpus into 24 distinct labels, defined by their rating from
the unknown key-profiles. With only the assumption that the key-profiles are related
through transposition or shifting, the algorithm properly classifies the key more often
that the K-S algorithm on the test data. This algorithm is also advantageous because
it does not require any labeling at all.
Juhasz [41] designed an unsupervised learning system for extracting motives from
a folk song database by searching for repeated patterns with a self organizing map.
By performing this analysis on the melodies from different regions, the cultures them-
selves can be grouped based on their common musical traits.
2.5.2 Supervised
Some of the most straightforward uses of supervised machine learning occur in key-
finding. In several extensions of the K-S algorithm ([2, 79]), key profiles derived
from statistical analyses of musical corpi replace the K-K profiles. In this approach,
common pitch distributions for major and minor keys are learned from the data, a
clear example of supervised learning.
Several machine learning algorithms were examined in the context of style classi-
fication by Dannenberg et al [21]. In the work, Bayesian classifiers, linear classifiears,
and neural networks were all tested for the task of classifying styles based on 5 seconds
of trumpet music. All approaches performed with reasonably high success.
Tzanetakis et al [85] used pitch distributions (here referred to as folded Pitch
Histograms) to classify music based on genre. Here, k-Nearest Neighbors, a common
supervised learning algorithm, is used to assign genre labels to unknown music based
on the known labels.
Basili et al [4] also applied machine learning to the problem of genre classification.
CHAPTER 2. BACKGROUND 24
In this study, many simple musical features such as meter changes and instrument
classes are used as input for numerous machine learning algorithms. In the end, no
algorithm stands out above the others, but it is concluded that even simple musical
features can result in reasonably good classification.
Hidden Markov Models (HMMs) were used by Chai and Vercoe [12] to classify
folk music melodies into their country of origin based on their tonal progression. In
this approach, separate HMMs are built for each region, encoding the different tonal
relationships for each case. It is then determined for an unknown melody which set
of tonal relationships were more likely to have produced that melody. The method
performs well, despite the high-level task it is asked to perform.
A similar approach was used by Mavromatis [56], in which HMMs were modeled
after Greek church chants. Using the results, it is possible to draw conclusions about
the theory behind the composition of the melodies.
Lee and Slaney [48] used an HMM system as well, except the states were chords
and the HMMs were built for each key. While this task was used for signal input,
the supervised learning portion was performed with MIDI information. Interestingly,
the transition probabilities derived from the model show common chord relationships
from harmonic progressions. So, in this case, music theory was derived from a musical
corpus.
Stamatatos and Widmer [73] devised an approach for distinguishing musical per-
formers playing the same piece using timing, articulation, and dynamic information.
Promising results were achieved in the classification by using discriminant analysis
on the features.
2.6 Conclusion
In this chapter, we summarized some of the most significant work related to the music
experiments to follow in later chapters, specifically related to visualizations, key find-
ing, meter induction, and machine learning. Through these experiments, diffusion-
based music analysis will make novel contributions to each of these fields. However,
before progressing to these novel contribusions, we will first need to thoroughly review
CHAPTER 2. BACKGROUND 25
the mathematical theory behind diffusion mapping and the low-dimensional spaces it
creates for organizing the structure of high-dimensional data.
Chapter 3
Diffusion Maps
3.1 Introduction
Diffusion distance is a Euclidean metric in a non-linear space defined by diffusion
maps [20, 17, 18]. The process relates data points to each other based on local ge-
ometry. This approach is advantageous because local geometry avoids the pitfalls
of Euclidean distance in high-dimensional space, does not require any distribution
assumptions on the data set, and is highly robust to noise. Beyond these clear
advantages, the space created with diffusion maps also provides a low-dimensional
visualization of the structure of the data set, organized hierarchically from global to
local, and accurately stores the structure of the data in an efficient and light-weight
representation.
Diffusion has already been used in clustering [80], classification [74], data process-
ing for multiple applications [20], wavelet analysis [19], and non-linear independent
components analysis [71], among others. However, the applicability of diffusion to
a higher-level, structured rule-based system like music has not yet been examined.
Izmirli [38] used diffusion to classify tonal and atonal music with success, but ap-
proached the task as a classification problem and only touched on the implications of
the work for extracting and visualizing musical concepts and theory. However, it is
recognized that the low-dimensional visualization of the data set resembles the circle
of fifths, hinting at the possibility. In this work, the diffusion process will be applied
26
CHAPTER 3. DIFFUSION MAPS 27
to symbolic musical data with the goal of extracting, interpreting, and visualizing
characteristics of Western music theory.
3.2 Diffusion Theory
3.2.1 Data sets
To introduce diffusion, first we must define a data set X , consisting of K data points
in N -dimensional space.
X → {x0, x1, ..., xK−1} ∈ RN
Example data sets
Two examples of data sets will be used throughout this chapter, one consisting of
clusters and another of circles. Both are embedded in 2-D space (R2) for easier
plotting.
One data set, seen in Fig. 3.1(a), is equally drawn from 8 separate gaussian
distributions. The means of the gaussians were designed to create a hierarchical
structure. On the lowest level, each of the clusters is its own structure. Then, clusters
1 and 2 are very close, as are 7 and 8. Cluster 3 is also well connected to 1 and 2.
Then, up another level, clusters 1, 2, 3, 7, and 8 form a larger group on the left while
4, 5, and 6 create another group on the right. This data set will be used to show the
hierarchical nature of diffusion distance.
The second data set (Fig. 3.1(b)) is drawn from three circular distributions, in
which the radius varies slightly from a set value, but the angle is selected completely
randomly. This data set will show the advantages of using a connectivity-based dis-
tance, rather than Euclidean distance. Using these two completely different data sets
will also demonstrate the distribution-free properties of random walk- and diffusion-
based metrics.
CHAPTER 3. DIFFUSION MAPS 28
−15 −10 −5 0 5 10 15 20−8
−6
−4
−2
0
2
4
6
8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(a) Clusters
−4 −2 0 2 4−4
−3
−2
−1
0
1
2
3
4
1
2
3
Student Version of MATLAB
(b) Circles
Figure 3.1: Two example data sets that will be used to demonstrate the process ofcalculating diffusion distance. In both cases, color indicates the distribution fromwhich a sample was drawn
CHAPTER 3. DIFFUSION MAPS 29
3.2.2 Affinity function
In the data space, a function k(xm, xn) measures some affinity between two points in
the set X . This function must satisfy two criteria:
1. Symmetry → k(xm, xn) = k(xn, xm)
2. Non-negative → k(xm, xn) ≥ 0
The affinity k can be any function that fulfills these properties, but, in practice, it is
often selected to be the exponential distance function
k(xm, xn) = exp(−σ||xm − xn||2) (3.1)
where the parameter σ is selected based on the data. In the context of this work, we
will also often use cosine distance for the affinity function
k(xm, xn) =xTmxn
||xm||||xn||=
N−1∑p=0
xm(p)xn(p)√√√√(N−1∑p=0
xm(p)2
)(N−1∑p=0
xn(p)2
) (3.2)
Note that, in order for the cosine distance function to satisfy non-negativity, all data
points xi must be non-negative as well.
The collection of affinities for the data set X can be used to create a Markov
process by defining p(xn|xm), the probability of moving to xn if starting at xm.
p(xn|xm) =k(xm, xn)
d(xm)where d(xm) =
K−1∑p=0
k(xm, xp) (3.3)
3.2.3 Affinity-derived Markov matrices
If we define a matrix Pmn = p(xm|xn), then we can step the Markov process forward
t steps by calculating P t. The entries of this matrix, pt(xn|xm), give the probability
CHAPTER 3. DIFFUSION MAPS 30
of ending up in location xn after t steps if the starting location is xm.
All random walks have a stationary distribution π(xn), which defines the proba-
bility of ending up in each location as time approaches infinity. This is given by the
limit of the probability function pt for any starting point xp (in the limit, the starting
point is irrelevant).
π(xn) = limt→∞
pt(xn|xp) (3.4)
The eigenvector decomposition will give greater insight into this equation shortly.
If the graph is fully connected, meaning that it is possible to reach any point in the
data set from any other point in the data set, then the stationary distribution π is
unique. If it is not fully connected, then there will be multiple stationary distributions
πi, one for each separate component of the graph.
The random walk process prioritizes connectivity rather than simply distances,
and so data points that are highly connected are measured as close. This approach
is advantageous for several reasons. First, it is more robust to noise than Euclidean
distance. After all, perturbing the points in a data set will have a greater affect on
individual distances than on the connectivity. Second, it is distribution-free, meaning
that there are no assumptions made about the structures or shapes of the data.
“Hot Potato” analogy
A concrete way to visualize this process is to envision a game of “Hot Potato” in
which an object is passed between players. The object starts with one person who
tosses it to someone else, and that person then tosses to someone else, and so on.
Now let us say that, when someone is holding the hot potato, they are more likely to
toss it to someone close to them than far away. They also have the option of holding
on to the object, which, in this scenario, is the most likely outcome.
The probability distribution pt(xn|xm) then gives the probability of person xn
holding the hot potato after t tosses if it started at person xm. We can conceptually
draw a few simple conclusions based off this analogy. First, if t is small, xm and xn
CHAPTER 3. DIFFUSION MAPS 31
Larger
Smaller
t
?
6
pt(·|x340) pt(·|x221) pt(·|x610)
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
−20 −10 0 10 20−10
−5
0
5
10
Student Version of MATLAB
Figure 3.2: The probabilities of a random walk concluding in each data point (withhigh probability shown in lighter color) at different time scales for the cluster-baseddata set. The columns show the case for three different starting points, shown as ared dot in each.
CHAPTER 3. DIFFUSION MAPS 32
Larger
Smaller
t
?
6
pt(·|x1) pt(·|x201) pt(·|x501)
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
−4 −2 0 2 4−4
−2
0
2
4
Student Version of MATLAB
Figure 3.3: The probabilities of a random walk concluding in each data point (withhigh probability shown in lighter color) at different time scales for the circle-baseddata set. The columns show the case for three different starting points, shown as ared dot in each.
CHAPTER 3. DIFFUSION MAPS 33
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(a) P 100
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(b) P 500
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(c) P 5000
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(d) P 100000
Figure 3.4: The probability matrix P t for the cluster data set (Fig. 3.1(a)) at severalvalues of t. Axis labels correspond to the cluster numbering from Fig. 3.1(a).
CHAPTER 3. DIFFUSION MAPS 34
will almost certainly be close to each other. Second, if t is large, it makes sense that
someone in a crowded area is more likely to end up with the hot potato than someone
in a secluded area. Another reasonable conclusion is that, if the hot potato is in one
crowd, it is unlikely to move into another crowd in a small number of steps if the two
crowds are not well connected. Finally, and most importantly, the hot potato is more
likely to move between two points with many pathways of small steps between them
rather than two points the same distance apart with few pathways connecting them.
In other words, connectivity is more important than proximity.
All of these conclusions apply to data viewed as a random walk, and they can
easily be seen for the data in the next section. We will also revisit this analogy to
interpret the diffusion time constant.
Markov matrices for example data sets
The power of this probabilistic approach is visualized in Figs. 3.2 and 3.3 for the
cluster and circle data sets, respectively. These figures show the probabilities of the
random walk from three randomly selected starting points. In the cluster case (Fig.
3.2), the plots demonstrate how, as the time scale increases, the random walk spreads
outward from its point of origin. In the beginning, all three cases stick to their local
region. In the middle case, the probabilities divide the clusters into their left and
right subgroup. As t continues to increase indefinitely, all starting points yield the
same stationary distribution.
The circle case (Fig. 3.3) shows a similar progression. In the short-term case,
the activity stays in the neighborhood of the starting point. For this data set, the
middle-range time distribution is the most interesting. In this case, all three starting
points have a very high probability of spreading somewhere throughout their indi-
vidual circle. This indicates that, on some level, we have successfully segmented the
three circles. Diffusion distance will further clarify this separation. If t continues
to increase, the distributions of all three starting points once again approach the
stationary distribution π.
The hierarchical nature of the parameter t can also be shown at this point for the
cluster data set. Fig. 3.4 shows the full probability matrix P t for several values of
CHAPTER 3. DIFFUSION MAPS 35
t. For t = 100, the smallest value plotted, seen in Fig. 3.4(a), the clusters are all
clearly seen in the matrix as a diagonalized set of blocks. Notice that cluster 1 and
2 are also starting to group together, as are 7 and 8 to a lesser extent. For t = 500
in Fig. 3.4(b), the next level of structure is extracted, grouping clusters 1, 2, and 3
into one set, 4, 5, and 6 into another, and 7 and 8 into a third. By increasing t to
5000 as in Fig. 3.4(c), the two higher level sets are grouped together. Finally, by
increasing t much higher (t = 100000 in Fig. 3.4(d)), all of the data points have the
same distribution, the stationary distribution.
In these examples, it is made clear that treating the data set as a random walk
is a valuable and powerful tool. Sets from two completely different distributions are
properly organized and even the hierarchical sub-structures can be extracted with
ease.
Prior uses of a Markov matrix for data analysis
It has been previously shown that calculating the Markov matrix P for the data set
X can be a very powerful tool.
Tishby and Slonim [80] first introduced the concept of analyzing data sets as
random walks. Their approach looked for stable regions in the decay rate of the
mutual information of the probability functions p as the time t increased, and used
those stable regions to define clusters in the data set. Szummer and Jaakkola [74]
used the Markov matrix of the data to assign labels to a data set with only a small
number of labels known.
However, in both of these cases, high-level, computation-intensive algorithms like
EM are required to process the data. Though the steps and logic that take the data
to the Markov matrix are relatively clear and straightforward, there is not a means
for similarly clear and straightforward analysis of the data. It is for this reason that
we use the Markov matrix P to move the data into diffusion space.
CHAPTER 3. DIFFUSION MAPS 36
3.2.4 Eigenvectors of a Markov matrix
The first step towards a diffusion map from the Markov matrix requires the eigenvec-
tor decomposition. The eigenvectors of a matrix are vectors that, when multiplied by
the matrix, are only scaled and not rotated at all.
Pν = λν
φP = λφ
Here we see that there can be both right eigenvectors (ν) and left eigenvectors (φ),
and, in both cases, the vectors are scaled by λ, called the eigenvalue.
It is possible to do a full eigenvector decomposition, in which a matrix is defined
by its eigenvectors and eigenvalues.
P = ΛEΛ−1 =K−1∑l=0
λlφlνTl
P t = ΛEtΛ−1 =K−1∑l=0
λtlφlνTl (3.5)
Here, φl are the full set of left eigenvectors and νl are the right eigenvectors, and
λl are the corresponding eigenvalues. It is a property of Markov matrices that first
eigenvalue will be equal to one, and all eigenvalues will be less than or equal to one.
|λ0| = 1, |λl| ≤ 1 (3.6)
For a connected graph, the inequality is strict.
It is also a property that the first left eigenvector φ0 is constant for all data points
and the first right eigenvector ν0 is proportional to the stationary distribution of the
random walk π (when multiplied by φ0, the proportionality becomes an equality).
The stationary distribution is also the function d from Eq. (3.3).
CHAPTER 3. DIFFUSION MAPS 37
ν0(xm) ∝ π(xm) = φ0(xm)ν0(xm) =d(xm)
K−1∑p=0
d(xp)
(3.7)
Another critical property of the eigenvalue decomposition is shown in Eq. (3.5),
where stepping the random walk forward t steps requires only raising the eigenvalues
to the t power. As a result, calculating the probabilities pt(xn|xm) can be done
efficiently with the eigendecomposition. Instead of self-multiplying a K x K matrix t
times, the calculation only requires K scalar exponential calculations. This property
is also essential for the diffusion distance, to be introduced shortly.
It is also important to note that eigenvectors can be efficiently calculated for data
points not included in the original data set using the Nystrom method [3, 89].
φl(y) =1
Kλl
K−1∑n=0
k(y, xn)φl(xn) for y 6∈ X
This property is extremely valuable for large data sets, because it allows for the
eigenvectors to be calculated on a represented subsampling and then extended, rather
than requiring a full eigenvector decomposition on the entire set. This also makes it
possible for eigenvector methods to be used for query-based applications.
3.2.5 Eigenvalue decay and the meaning of the eigenvectors
The magnitude restrictions in Eq. (3.6) imply that, as t increases, the remaining
eigenvalue λtl will decrease. Therefore, for larger t, eigenvectors corresponding to
sufficiently small eigenvectors can be ignored, assuming there is some tolerance for
error, meaning that the calculation of P t can be made even more efficient. Obviously,
smaller eigenvalues approach zero with smaller values of t than larger eigenvalues.
The decay of the eigenvalues also gives some insight into the meaning of the
eigenvectors themselves. As noted above, the probability of moving from one data
point to another in t steps can be written as a function of the eigenvectors.
CHAPTER 3. DIFFUSION MAPS 38
pt(xn|xm) =K−1∑l=0
λtlφl(xm)ν(xn) (3.8)
In other words, the probability pt is a weighted sum of the product of the eigen-
vectors. However, for different times t, only the weighting changes in the sum, while
the product of the eigenvectors remains the same.
This realization, along with the previously noted property of smaller eigenvalues
decaying faster than larger eigenvalues, gives valuable insight into the meaning of the
eigenvectors. All of the non-trivial eigenvectors are required to calculate the proba-
bility for t = 1, and then, as t increases, they will disappear in reverse order of the
eigenvalue magnitude. This means that the eigenvectors with large eigenvalues en-
code the information for the long-term probability function, and the eigenvectors with
smaller eigenvalues encode the information for the short-term probability function.
To approach this in a different way, let us start with the limit presented in Eq.
(3.4) that states that the probability function for any starting state approaches the
stationary distribution as t approaches infinity. Another way to derive this is to
observe that as t approaches infinity, all eigenvalues strictly less than 1 will approach
zero. For a connected graph, all eigenvalues except l = 0 satisfy this, and so all but
l = 0 disappear.
limt→∞
pt(xn|xm) =K−1∑l=0
λ∞l φl(xm)ν(xn) = φ0(xm)ν0(xn)
As a brief aside, note that this derivation reinforces Eq. (3.7), and it also offers
further insight into the limit in Eq. (3.4). First of all, it gives a very simple and
clean interpretation of a previously complicated limit, because all of the eigenvectors
but one simply disappear from the summation. Also, the irrelevance of the starting
location is confirmed, because, as previously stated, the left eigenvector φ0 is constant
for all data points, and therefore the limit is unaffected by changing the initial data
point.
CHAPTER 3. DIFFUSION MAPS 39
Back to the relevance of the eigenvalues to the meaning of the eigenvectors, let
us slowly decrease t. Because λ0 = 1, the weight of the stationary distribution will
remain unchanged, regardless of t. However, as t decreases, the eigenvectors with
larger eigenvalues start to appear and significantly affect the summation, followed by
those with smaller eigenvalues.
In this way, the eigenvectors encode the evolution of the probability function over
time, and the eigenvalues encode the time scale to which its eigenvectors are most
relevant. This interpretation is extremely important in understanding the value of
diffusion distance.
3.2.6 Diffusion maps and diffusion distance
Based on the properties of the eigenvectors of P , a diffusion map Ψt is proposed, in
which the coordinates are the scaled eigenvectors.
Ψt(xn) = [λt1φ1(xn), λt2φ2(xn), ..., λtLφL(xn)] (3.9)
Note that the first eigenvector (l = 0) is excluded from the mapping, because the first
left eigenvector φ0 is constant for all data points, and therefore would be a trivial
inclusion. For perfect representation, L, the number of dimensions of the diffusion
map, is equal to one less than the number of data points (K − 1). Alternatively, if t
is large enough to zero out smaller eigenvalues, or some error is acceptable, L can be
set smaller to allow for more efficient calculation and storage.
The diffusion map also provides an opportunity for visualization of the data set.
While the dimensionality of the map itself will often be more than 3, the trait observed
above that the eigenvectors are organized and categorized based on their relevant time
scale means that subsets of the dimensions can be meaningfully visualized together,
unlike the original data space, where there is no known significance to any particular
set of dimensions.
Observe that the Euclidean norm between two data points in this diffusion space
is defined as the difference between the values of the eigenfunctions at those data
points. This metric is called the diffusion distance, Dt.
CHAPTER 3. DIFFUSION MAPS 40
Dt(xm, xn) = ||Ψ(xm)−Ψ(xn)|| =
√√√√ L∑l=1
λ2tl (φl(xm)− φl(xn))2 (3.10)
The weighted eigenvectors in the diffusion distance are very similar to those in
Eq. (3.8), in which the probability of moving from one point to another in t steps is
calculated from the eigenvectors. Extending from this observation, we see the diffusion
distance can also be interpreted as a weighted norm of the differences between the
probability distributions of the two data points.
Dt(xm, xn) =
√√√√K−1∑p=0
(pt(xp|xm)− pt(xp|xn)
d(xp)
)2
The weighting by d(xp) is necessary to compensate for the absence of the right
vectors νl from Eq. (3.8). A high-level description of the diffusion distance in this
interpretation is that it measures how different xm and xn are as starting points. A
small diffusion distance between two data points means that random walks starting
at each point have a similar set of probabilities for the finishing point. The prob-
ability matrices in Fig. 3.4 reinforce the notion that this is a meaningful metric.
As can clearly be seen, data points within the same structural element have similar
distributions. Diffusion distance will measure these data points as close.
Diffusion distances for the example data sets
Diffusion distances Dt for a few values of t are shown in Fig. 3.5 for the cluster data
set and Fig. 3.6 for the circle data set.
In the cluster case, the conclusions are similar to those from Fig. 3.4, where the
probability matrices for the cluster data set are shown. However, instead of having to
infer the closeness of the data points by looking for similar distributions, the diffusion
distance actually quantifies the closeness directly. So, for t = 100, seen in Fig. 3.5(a),
it can clearly be seen that the clusters all have small diffusion distances within their
CHAPTER 3. DIFFUSION MAPS 41
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(a) D100
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(b) D500
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(c) D5000
Figure 3.5: Diffusion distance matrices Dt for different values of t for the cluster dataset, where dark color means a short distance.
sets. The same is true for the other hierarchical structures as t is increased in Figs.
3.5(b) and 3.5(c).
In the circle case, the ability of the metric to group the circular distributions is
outlined clearly. If the time parameter is set large enough for the connectivity to
spread sufficiently, then the three circles are completely grouped together and clearly
separated in the distances, as seen for t = 5000 in Fig. 3.6(b).
The separation of the circles can also be clearly seen by visualizing the first three
(non-trivial) eigenvectors, φ1, φ2, and φ3, seen in Fig. 3.2.6. The points in the three
circles have clearly been separated from each other in this 3-D space. Some of the
CHAPTER 3. DIFFUSION MAPS 42
1 2 3
1
2
3
Student Version of MATLAB
(a) D100
1 2 3
1
2
3
Student Version of MATLAB
(b) D5000
Figure 3.6: Diffusion distance matrices Dt for different values of t for the circle dataset, where dark color means a short distance.
characteristics in the original circles are preserved as well. The relative size is clearly
visualized, with cluster 3’s large radius still well represented, cluster 2 the second
largest, and cluster 1 in a tight ball. Also, the circular shape is preserved, most easily
seen for cluster 3.
3.2.7 Diffusion time constant
The diffusion distance exists not only in the spatial dimensions defined by the dif-
fusion maps, but also in the dimension of time, as represented by the parameter t.
While this parameter is key to some of the most unique features of diffusion, most sig-
nificantly the hierarchical organization of structure, there are some situations where
it is undesirable.
One such example would be for clustering in diffusion space. While the diffusion
distance Dt could be used as the metric for clustering, this would require a unique
parameter selection, and even if a reasonable t was selected, fixing the parameter
eliminates its hierarchical properties.
For cases where the parametric diffusion distance is suboptimal, a new and novel
metric, the diffusion time constant τ , is proposed. The metric is defined as the
minimum value of the time parameter t for which the diffusion distance is less than
CHAPTER 3. DIFFUSION MAPS 43
−0.020
0.020.04
−0.020
0.020.04
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
1
2
3
Student Version of MATLAB
Figure 3.7: The circle data set plotted in φ1, φ2, and φ3.
CHAPTER 3. DIFFUSION MAPS 44
or equal to some sufficiently small tolerance δ. The definition can be posed as a simple
optimization problem.
minimize t
subject to Dt(xm, xn) ≤ δ
Because the diffusion distance for any two data points decreases in the time di-
mension (a property from Eq. (3.10)), it is easy to see that this problem is solved
when the diffusion distance is equal to the tolerance level δ. So, the diffusion time
constant τ is the value of t that satisfies this relationship.
Dτ(xm,xn)(xm, xn) = δ (3.11)
Essentially, the diffusion time constant represents the amount of time it takes for
two data points to become indistinguishable starting points in the random walk.
This is more easily visualized in the context of the “Hot Potato” analogy from
Section 3.2.3, in which the probability distribution pt(xq|xp) is seen as the likelihood
of a hot potato getting passed from person xp to xq in t steps. In this context, the
diffusion time constant τ(xm, xn) is the number of tosses it would take before it would
be impossible to determine whether the hot potato started with person xm or xn.
In Section 3.2.3, it was suggested in the context of the analogy that the hot potato
would likely be in the local vicinity for a small number of tosses, and would only move
between poorly connected crowds after a large number of tosses. This also makes a
meaningful statement about the diffusion time constant. Data points within a cluster
will then become indistinguishable starting points before objects across clusters. In
this way, at a high level, the ability of the metric to simultaneously encode hierarchical
data can be understood.
Solving for τ is not entirely trivial. Eq. (3.10) shows that the diffusion distance
Dt is the sum of a series of exponential functions in t. So, solving for a specific value
CHAPTER 3. DIFFUSION MAPS 45
of t requires solving this sum of exponentials. Unfortunately, sums of exponentials
cannot be directly solved.
Newton’s Method
One common method for finding solutions to sums of exponentials is using Newton’s
method (also called the Newton-Raphson method), which can be used to iteratively
find the zeros of non-linear functions [90].
tn+1 = tn −f(tn)
f ′(tn)
In this notation, f ′(tn) represents the first derivative of the function f in the variable
tn.
By initializing with some arbitrary value t0, this process solves for the zeros of
f(t), often in only a few steps.
Newton’s method can be used to find the diffusion time constant by defining f(t)
from Eq. (3.11) (after squaring both sides of the equation).
f(t) = Dt(xm, xn)2 − δ2
By incorporating Eq. (3.10), all unknowns except for t are eliminated.
f(t) =∑l≥1
λ2tl (φl(xm)− φl(xn))2 − δ2
The zero-crossing for this function f(t) is the diffusion time constant τ , and therefore
applying Newton’s method will solve for τ .
Diffusion time constants for the example data sets
Fig. 3.8 shows the diffusion time constant matrices for both data sets.
For the cluster data set, the diffusion time constants seen in Fig. 3.8(a) clearly
CHAPTER 3. DIFFUSION MAPS 46
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
(a) Clusters
1 2 3
1
2
3
Student Version of MATLAB
(b) Circles
Figure 3.8: The diffusion time constant matrices for the example data sets, demon-strating that data points within structural elements have small time constants.
group the lowest level clusters together, as seen by the strong block diagonal structure.
The higher level groupings can be seen to a certain extent, especially in the closeness
of clusters 1 and 2 as well as 7 and 8, but how the higher levels organize is not
entirely evident. This is because the diffusion time constant is not meant for visual
interpretation, but rather as a metric for tasks such as clustering, which will be shown
shortly.
The diffusion time constants for the circle data set, seen in Fig. 3.8(b), clearly
groups the circular elements together, though not as drastically as the diffusion dis-
tance for t = 5000 shown in Fig. 3.6(b). While there is, not surprisingly, some
variation within the groups, the time constants between data points within a circle
are less than the time constant with any other data point. This figure is in contrast
to the Euclidean distance (Fig. 3.9), which does not have only small distances within
the circles. This can most obviously be seen in the outer circle (circle 3), where the
distances between data points on opposite sides are larger than distances to any other
data point, even those in the inner circles.
CHAPTER 3. DIFFUSION MAPS 47
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Student Version of MATLAB
Figure 3.9: The Euclidean distances for the circle data set, where the data pointswithin the same circle are not close together.
3.2.8 Hierarchical clustering with the diffusion time constant
The diffusion time constant metric is intended for uses where selecting the time pa-
rameter t for the diffusion distance or simply limiting the diffusion distance to one
time scale is undesirable. Hierarchical clustering is one of these uses.
Hierarchical clustering organizes a data set into a tree where each branch repre-
sents a cluster. The data is grouped into low-level clusters, and then the clustered are
subsequently clustered, and so on. This process is represented in the tree structure.
The specific algorithm that will be used in this work is the agglomerative complete-
link clustering algorithm [39]. This algorithm measures the distance between two
clusters as the maximum of the distances between all points within the clusters.
Complete-link clustering typically creates a cleaner structure than single-link cluster-
ing, which uses the minimum of the distances instead. The complete-link algorithm
is as follows:
1. Consider each data point as an object, and measure the distances between all
objects.
2. Find the two objects with the smallest distance between them, and group those
two objects together as a new object.
CHAPTER 3. DIFFUSION MAPS 48
3. Measure the distances between this new object and all other objects. The
distance is the maximum distance between all data points in the objects.
4. Repeat steps 2 and 3 until only one object remains.
This process is easy to implement, and it can be used in diffusion space by using the
diffusion time constant as the distance metric.
In addition to the obvious organizational advantages, creating a hierarchical clus-
ter can directly lead to applications like active learning [22].
Clustering the example data sets
The hierarchical clusters derived from the complete-link algorithm for the example
data sets can be seen in Fig. 3.10.
The cluster-based data set is particularly interesting (seen in Fig. 3.10(a)). Not
only are the low level clusters well-grouped with only a few errors, but then clusters are
properly organized in the hierarchy as well. Clusters 1 and 2 are grouped together
first, as are 7 and 8. Cluster 3 joins with 1 and 2, and then they are all grouped
together (1, 2, 3, 7, and 8) to form one large structural element. On the other side,
clusters 4 and 6 are grouped together first, and then joined by cluster 5 to form the
other global element. These hierarchies match up exactly with the intuition outlined
previously in section 3.2.1.
The hierarchy for the circle data set (Fig. 3.10(b)) is much simpler, since the hi-
erarchical structure is not as layered, but the three circles are still perfectly clustered,
which is the goal for this data set.
3.2.9 Comparison to other methods
Diffusion is by no means the only approach for identifying the essential structure in a
high-dimensional data set. Here, we will compare diffusion mapping to principal com-
ponents analysis, multidimensional scaling, ISOMAP, and locally linear embedding,
which all attempt to find a low-dimensional space that still captures some element of
CHAPTER 3. DIFFUSION MAPS 49
1
2
3
4
5
6
7
8
Student Version of MATLAB
(a) Clusters
1
2
3
Student Version of MATLAB
(b) Circles
Figure 3.10: Hierarchical trees for both data sets from the diffusion time constants,showing that structural elements at multiple levels are accurately extracted.
CHAPTER 3. DIFFUSION MAPS 50
the original data’s structure. All of these methods, like diffusion mapping, are also
unsupervised.
Each algorithm takes a different approach to try to solve the same problem, which
is to find a low-dimensional representation that captures the important elements of
the structure of the high-dimensional data set. Because they are different, though,
each has its own share of advantages and disadvantages.
The argument here is that diffusion mapping still offers the most natural definition
of connectivity, that the hierarchical ordering of the dimensions is the most clearly un-
derstandable, and that, unlike the other methods, there is not a fundamental concept
of approximation in its derivation. All of these are seen as advantages.
Principal components analysis
Principal components analysis (PCA) is one of the classic methods of dimensional
reduction for data sets. The method essentially identifies the directions in the data’s
high dimensional space in which the data has the highest covariance. In these direc-
tions, the differences between the data points should be most prominent, and, as a
result, this approach is widely used in applications like machine learning.
However, it is also fundamentally different than diffusion mapping. First of all,
the structure is not built upon connectivity, but rather second-order statistics. As a
result, the principal components are defined purely upon the covariance of the data,
with no concern for the actual structure. Essentially, PCA is built on the assumption
that the data is a large cloud, and it finds the directions over which the cloud is most
spread out. While this is useful for classification tasks, there is no real insight gained
into the actual structure of the data. There is no guarantee that the dimensions with
the highest covariance will also be the dimensions that best represent the structure.
Also, because PCA is a linear projection of the data, structures that require
more than three dimensions to represent (for example, a ribbon that winds through 5
dimensions of space) will not be reshaped to fit cleanly into a smaller set of dimensions.
CHAPTER 3. DIFFUSION MAPS 51
Classical multidimensional Scaling
Classical multidimensional scaling (MDS) is a non-linear approach, so it will be able to
theoretically reorient high-dimensional structures for low-dimensional visualization.
The algorithm solves a least-squares optimization problem that finds a low-dimensional
data set {..., ym, yn, ...} ∈ Y with similar distances between its data points as the orig-
inal data set {..., xm, xn, ...} ∈ X .
argminY
∑m,n
||(||ym − yn||)− (||xm − xn||)||
MDS does give nice low-dimensional representations of the original data set. How-
ever, it has a few shortcomings in comparison to a method like diffusion mapping.
First of all, it is fundamentally a statistical approximation of the data, in that the
dimensions are defined by the minimal squared error. This is not as natural of an
organization of the dimensions as diffusion mapping, where the dimensions are or-
ganized based on the time parameter of a random walk. Instead, understanding the
organization fundamentally requires an understanding of the error.
Furthermore, squared error emphasizes avoiding large errors. While this is usually
an advantageous characteristic, in this case it will place a higher priority on accu-
racy for large distances, since a small percentage error will be more costly for two
data points that are far apart than two points that are close together. This is ex-
actly the opposite of what we usually desire for dimensional reduction, which is the
preservation of local geometry. Also, MDS is only based on the proximity between
data points, rather than incorporating any kind of connectivity. Diffusion mapping
is fundamentally built off connectivity and local geometry.
Finally, MDS typically defines the relationships between two points based on their
distance, and this is largely inflexible in order to keep the optimization problem
reasonably calculable. In the diffusion process, the affinity between two points is
much more flexible, requiring only symmetry and non-negativity.
CHAPTER 3. DIFFUSION MAPS 52
Locally linear embedding
Locally linear embeding (LLE) is another non-linear dimensional reduction algorithm,
and it is based on similar approximations to MDS. But, the structure is more highly
prioritized by viewing the data based on an approximation of its local orientations.
The method begins by approximating each data point as a linear combination of
its nearest neighbors. This can be accomplished by solving for the weights Wmn with
a least-squares optimization.
argminWm,n
∑n
||xm −Wm,nxn||
Through the weights Wmn, the entire data set is represented as a collection of linear
functions defined by local geometry. Dimensional reduction can then be accomplished
by finding the best low-dimensional set of data points {..., ym, yn, ...} ∈ Y that share
these local linear relationships.
argminY
∑m,n
||ym −Wm,nyn||
LLE can be seen as a variation of MDS, and, as a result shares many of the
advantages. Additionally, because it fundamentally incorporates local geometry, it
fixes that shortcoming with MDS.
However, it is still built on a statistical approximation, and, in fact, incorporates
two such optimizations. This makes it even more difficult to conceptualize the effects
of the error. And, the linear approximation at a local level, while usually acceptable,
forces an assumption onto the structure of the data, which is that it can be reasonably
modeled at a local level as a linear function. This is not always the case, and no such
assumption is required for diffusion mapping.
CHAPTER 3. DIFFUSION MAPS 53
Isomap
Isomap is another extension of MDS that incorporates local geometry. However, in
this case, the distance is based off connectivity rather than a locally linear approxi-
mation, as was used in LLE.
The Isomap algorithm begins by determining some limited set of neighbors for
each data point in X . The distance between two points xm and xn, d(xm, xn), is
then defined as the shortest pathway between them that is drawn exclusively through
the neighbors. So, if the two points are neighbors, then d(xm, xn) is the distance
between them. If the two points are not neighbors but they share a neighbor xp, then
d(xm, xn) is defined as the distance from xm to xp plus the distance from xp to xn.
This is extended to however many pathways are necessary to connect the two data
points.
Once these distances are established, then the MDS algorithm is applied with
these distances used as the distance between data points in the original set X .
argminY
∑m,n
||(||ym − yn||)− d(xm, xn)||
The Isomap algorithm is a very elegant way to incorporate connectivity into the
MDS algorithm. However, there are still a few potential drawbacks that remain. The
first is that it is still fundamentally built on a minimization of statistical error, which
is only a drawback in its ability to be understood intuitively.
Another minor difference is that the Isomap process requires a strict definition of
the set of neighbors for a data point, which is necessary because the lack of connectiv-
ity between some data points is what differentiates Isomap from MDS. In the diffusion
case, this sort of strict distinction is not necessary, typically using a smoothly decay-
ing affinity function instead. It is not difficult to envision a situation where a small
change in the strict definition required for Isomapping would significantly change
the distance function d(xm, xn) among the points along the border of the decision
boundary. This problem does not need to be confronted in diffusion mapping.
The final drawback is more subtle, but is grounded in the definition of distance as
CHAPTER 3. DIFFUSION MAPS 54
the minimal pathway between two data points. This means that there is no concept
of how well-connected two points are, but instead, only whether they are connected.
In the case of diffusion, structure is not only defined by the shortest path between
two data points, but how many short paths there are connecting them. In the case
of Isomap, bringing two data points close together only requires one pathway. This
difference may not matter in most cases, or may even be advantageous to some, but
it is another fundamental difference between Isomaps and diffusion maps.
3.3 Applying diffusion distance to music analysis
Diffusion distance has several features that suggest it fits well with musical organi-
zation and analysis. The connectivity-based, distribution-free metric has promise in
both music analysis and musical organization.
For theoretical analysis, one of the most important characteristics of diffusion
distance is that it is distribution-free, meaning no assumptions are made about the
structure or orientation of the data. This is essential for music analysis, because the
structure and organization of music varies greatly. While traditional Western music
theory does outline a series of relationships used in Western music, they can be used
in numerous ways, as demonstrated by the wide range of music that has been created
while following that theory. Entering into an analysis process with as few assumptions
as possible about the music itself allows for a clean slate and avoids the bias of trying
to fit a music piece into an inappropriate system. It also allows for an individual
work to define its own structure, rather than forcing it to conform to an exterior one.
This trait is useful for any data analysis, but is especially appropriate in the musical
context.
If a musical work is allowed to create its own space, as it is in diffusion, this also
means that the space itself becomes an interesting tool for analysis. Different musical
works drawing from different theoretical backgrounds will create different spaces. The
flexibility of diffusion also allows for two works to be combined in a separate analysis,
giving multiple avenues for analysis and comparison. A long term goal in diffusion
research should be to fully harness this power in the musical domain.
CHAPTER 3. DIFFUSION MAPS 55
The distribution-free aspect is also important for theoretical analysis because the
theory and rules are well known in a musical space, but translating them into an
arbitrary geometrical space is not trivial. Diffusion distance allows for the use of
different feature sets that project the musical data into any variety of spaces without
requiring any knowledge about the corresponding structural distortions caused by
the projection. This is especially useful in a relatively young field like computational
music analysis, because the ease of the system encourages experimentation with new
feature sets and approaches.
The connectivity of diffusion also has an intuitive analogue in music for connecting
notes, which are chords and temporal proximity. This approach will be fundamental
to the work to follow, and the value of it is that it allows the connections between
the notes to be dictated by musical relationships, and the visualizations that follow
are fundamental to music theory.
Organizing on the local geometry extends its usefulness to databases as well. As is
the case with theoretical geometry, structure from collections of songs is also unlikely
to draw from a known and easily-described distribution.
Finally, the ease of visualization with diffusion is extremely valuable to music
analysis and organization. First of all, visualization has long been a desired and
sought after tool in music for assistance in analysis, new intuitions, and also simply
to create a multimedia musical experience. Also, musical organization is a difficult
prospect. Music makes a layered and multi-structured database, and, as a result, there
are numerous ways to organize a musical database. The visualizations in diffusion can
serve as a valuable tool in assisted searches. By pulling out structures in a database
and organizing them hierarchically, the user’s task of evaluating the database and
finding the desired organization is made much easier.
The remainder of this work will test the validity of some of these applications.
Diffusion maps will be used to build fundamental geometric representations of mu-
sic theory without any prior knowledge required. Hierarchical clustering in diffusion
space will be used to classify real musical scores based on key and meter. And visu-
alization in diffusion space will be used to represent musical excerpts as trajectories
in a custom space built automatically based on the musical structure itself. Finally,
CHAPTER 3. DIFFUSION MAPS 56
a machine-learning-based extension for dual diffusion will be proposed as a potential
system for user-guided database organization.
Chapter 4
Diffusion-based Music Theory
Analysis
4.1 Introduction
This chapter will focus on creating geometric representations of the fundamentals
of music theory using diffusion spaces built only on simple units like interval and
short rhythmic sequences. This will demonstrate on a basic musical level the ability
of diffusion geometry to identify and extract the foundational aspects of a tonal or
rhythmic sequence and display it in the corresponding visual space. In all cases, the
input will be the simplest musical representation: binary vectors coding only whether
a particular note is on or off at a particular time.
First, we will begin with tonal-based theory, specifically in the realm of intervals.
Pitch-class intervals (collapsing the entire set of musical notes to the 12 notes within
one octave) can be interpreted in a geometrical context, as seen in several of the
note-based visualizations presented in section 2.4.1. In this case, we will use diffusion
maps to visualize the relationships between notes created by the intervals and the
basic shapes to which those relationships correspond.
By expanding the note set to include multiple octaves, it then becomes possible
to recreate versions of the visualizations from section 2.4.1 directly. These high-level
representations of Western tonal theory can easily be constructed with diffusion maps
57
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 58
by simply selecting the appropriate intervals to connect the notes.
Metrical structures can also be visualized with diffusion maps. Duple and triple
meter beat trains will not only be separated completely, but also organized into the
simple geometric shapes that correspond to the periodicity of the meter itself: a
square and a triangle. The same process can be performed with similar results for
beat trains that more closely resemble hemiolas, in which 3 beats occur in the time
of 2, or vice versa.
Geometric representations of musical relationships can be a valuable tool for un-
derstanding and communicating the implications and properties of those relation-
ships. Diffusion adds an extra level of simplification by automatically identifying and
extracting those representations from the musical relationships themselves.
4.2 Tonal Theory
Two cases of note sets for tonal theory will be examined separately. First, only pitch
classes will be considered. In this context, all intervals between the limited set of
notes can be separately analyzed and explored. Then, an expanded note set that
spans multiple octaves will be used to recreate helical structures that are historically
relevant to music analysis.
4.2.1 Input Representation
These diffusion spaces are created by following the process outlined in section 3.2.
For input, the full set of notes in use for the given experiment np is combined with
the full set of all occurences of the interval in question iq. To represent the set of
notes combined with the interval i, we will use the notation Xi.
X = {..., np−1, np, np+1, ..., iq−1, , iq, iq+1, ...}
Both the notes and the intervals are represented as binary vectors with one dimension
corresponding to every note in the note set. The note vectors will each be all zeros
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 59
except for a single 1 (indicating which note it represents) and all interval vectors will
have exactly two notes active.
4.2.2 Affinity Function
Unless otherwise stated, all experiments utilize the cosine distance affinity function,
shown in Eq. (3.2). This function is optimal because it defines a similarity between
two data points as exclusively based on how many notes they share in common versus
how many notes they could share in common.
k(xm, xn) =total number of notes xm and xn share in common
total number of notes that are active in either xm and xn
This means that two notes will not have any direct affinity, and so any relationship
they have in the experiments is defined completely by their connectivity through the
intervals.
4.2.3 Geometric representations of pitch-class intervals
We begin with an atomic unit of tonal music theory: the interval. The interval is the
number of musical steps between two notes, and discussions of the interval usually
involve notes played in succession or simultaneously.
In this section, only pitch-class intervals will be used, and so the note set will
consist exclusively of the 12 pitch classes {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}. For conve-
nience, for the remainder of this chapter, the pitch classes will be referred to by the
note set where pitch class 0 = C: {C,C],D,D],E, F, F ],G,G],A,A],B}.Ignoring the octave essentially means that the distance between two notes can be
represented in two ways: the distance between the notes, and the distance outside
the notes (where the pitch-class representation wraps back around). The latter will
be referred to as the inversion. The intervals and their inversions for each distance
are shown in Table 4.1. It can easily be seen that all intervals are represented by the
interval/inversion pairs for distances of 1-6 semitones (distances of 7-11 semitones
simply repeat the intervals with the interval/inversion pairing reversed). Note that,
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 60
Semitones Interval Inversion
1 minor 2nd (m2) major 7th (M7)2 major 2nd (M2) minor 7th (m7)3 minor 3rd (m3) major 6th (M6)4 major 3rd (M3) minor 6th (m6)5 perfect 4th (P4) perfect 5th (P5)6 tritone (T) tritone (T)7 perfect 5th (P5) perfect 4th (P4)8 minor 6th (m6) major 3rd (M3)9 major 6th (M6) minor 3rd (m3)10 minor 7th (m7) major 2nd (M2)11 major 7th (M7) minor 2nd (m2)
Table 4.1: The pitch-class intervals and their inversions
because of the cyclic nature of the pitch-class set itself, visualizations from interval
distances of 7-11 semitones would exactly replicate its inversion in the 1-6 semitone
range. As a result, only distances up to 6 semitones will be examined in this section.
The exercise of visualizing the intervals is useful for a few reasons. First of all,
demonstrating that diffusion maps graph the pitch-class set in a geometry meaningful
to the interval itself is a strong first step toward showing the value of diffusion geom-
etry to music theory analysis. Secondly, the visualizations themselves can be a more
direct and straightforward way to understand the intervals and the relationships they
create. In many cases, a simple visual representation can immediately communicate
the musical relationships that it would take paragraphs to explain. Clearly, as the
rich history addressed in section 2.4.1 demonstrates, many scholars in the field have
seen this value in visualizations as well.
For notational ease, all notes with accidentals (e.g. C] or D[) will be referred to
in their sharp form. In many of these cases, standard musical notation would more
often use a flat, but for consistency and notational efficiency, only the sharp will be
used. As an example, stepping up a minor 3rd from C moves to E[. This note will be
referred to as D] for ease, even though E[ is the proper functional notation, because
both notations refer to the same pitch class.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 61
−0.2 −0.1 0 0.1 0.2
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.1: The pitch classes connected by semitone intervals plotted in the first twodimensions of the diffusion map.
1 semitone: Minor 2nd or Major 7th
The minor 2nd connects two notes that are adjacent to each other in the chromatic
scale (e.g. C and C]). A noteworthy property of the minor 2nd is that it fully
connects the pitch-class set, in that any pitch class can reach any other pitch class
by only taking steps of one semitone. Furthermore, starting at any note and moving
by one semitone in the upward direction will move through every other note before
returning to the starting point (the same, of course, is true with downward stepping
as well).
These two properties are extremely relevant for diffusion. The connectivity of the
interval means that there will only be one stationary distribution for the input set
Xm2, which is built from the pitch classes and semitone intervals. The property of
unidirectional movement passing through all pitch classes and then returning to the
original note would also suggest a cyclic or circular shape would be an appropriate
geometric representation.
Fig. 4.1 shows the first two dimensions of the diffusion map, with the minor 2nd
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 62
0.120.140.160.180.20.220.240.26
−0.3
−0.2
−0.1
0
0.1
0.2
−0.4
−0.2
0
0.2
0.4
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.2: The pitch classes connected by major 2nd intervals plotted in the firstthree dimensions of the diffusion map.
intervals represented by the lines connecting the notes. The shape clearly takes on a
cyclic shape, as expected, and immediately communicates how the pitch classes are
organized in the context of a semitone: moving in any one direction will pass through
all pitch classes before returning to the starting point.
2 semitones: Major 2nd or Minor 7th
The major 2nd, unlike the semitone, does not fully connect the pitch-class set. Instead,
the notes are broken into two separate and exclusive sets: {C,D,E, F],G],A]} and
{C],D], F,G,A,B}. Within these subsets, however, it is once again the case that
unidirectional movement will pass through all notes (within the subset) before re-
turning to the starting point. The subsets, as they were previously written, show the
order of this path.
The existence of two exclusive subsets should be prominently represented in any
visual manifestation of the intervallic relationship. Fig 4.2 shows the first 3 dimensions
of the diffusion map created for the data set XM2, consisting of the pitch classes and all
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 63
−0.20
0.2−0.2 0 0.2
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.3: The pitch classes connected by major 2nd intervals plotted in dimensions2, 3, and 4 of the diffusion map.
major 2nd intervals. The two subsets have been completely separated. This separation
is seen in the highest ranking dimensions because this aspect of the geometry is the
most fundamental to the representation.
In higher dimensions of the diffusion map (corresponding to higher eigenvectors, or
smaller eigenvalues), the expected cyclical shapes are seen (Fig. 4.3), this time hexag-
onal (since there are six elements in each subset). Interestingly, the two hexagons are
oriented in different directions, further illustrating the separation of the two subsets.
It is necessary to move to higher eigenvectors to see this structure in this shape be-
cause the first two eigenvectors necessarily represent the two stationary distributions,
one for each subset.
The geometric representations communicate very clearly and intuitively that the
two sets are completely separate, and that there is no way to move between the sets
using only the major 2nd interval. At the same time, the proper structure within each
subset is also clearly communicated.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 64
−0.2−0.1
00.1
0.2
−0.10
0.10.2
0.3
−0.2
0
0.2
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.4: The pitch classes connected by minor 3rd intervals plotted in the firstthree dimensions of the diffusion map.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 65
!0.20
0.2 !0.2!0.100.10.2
!0.4
!0.3
!0.2
!0.1
0
0.1
0.2
0.3
0.4
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.5: The pitch classes connected by minor 3rd intervals plotted in dimensions3, 4, and 5 of the diffusion map.
3 semitones: Minor 3rd or Major 6th
The minor 3rd interval also divides the pitch-class set into a group of subsets, this
time three subsets with four elements each: {C,D], F ], A}, {C],E,G,A]}, and
{D,F,G],B}. Fig. 4.4 shows clearly that the diffusion space separates these three
subsets with the prominent eigenvectors.
The subsets, which are cyclic within themselves like the previous two examples,
have four elements each, indicating that the proper geometric representation should
be a square for each subset. Fig. 4.5 shows these squares, visualized in dimensions 3,
4, and 5 of the diffusion map. The first two dimensions, plus the eigenvector φ0 which
is excluded from the map (see section 3.2.6 for an explanation), combine to represent
the three stationary distributions that exist due to the three separate subsets within
the pitch-class set.
And, like the hexagons for major 2nd intervals, these squares should vary in their
directional orientation to represent the separation of the subsets. This can be seen
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 66
!0.2
0
0.2
!0.2
0
0.2
!0.4
!0.2
0
0.2
0.4
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.6: The pitch classes connected by minor 3rd intervals plotted in dimensions3, 4, and 5 of the diffusion map, viewed from a different angle than Fig. 4.5.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 67
−0.3−0.2
−0.10
0.1 −0.2−0.1
00.1
0.2
−0.2
0
0.2
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.7: The pitch classes connected by major 3rd intervals plotted in the firstthree dimensions of the diffusion map.
somewhat in Fig. 4.5, but a change in angle shows the orientation of the three squares
even more clearly, as seen in Fig. 4.6. Here, it is very clear that the orientation
of the squares represents the exclusivity of the subsets, though the actual relative
orientation of the squares is not entirely obvious, due to the necessary projection
onto a two-dimensional plane for plotting. In the three-dimensional space, the three
squares are oriented similarly to the x-, y-, and z-planes in a cartesian coordinate
system.
4 semitones: Major 3rd or Minor 6th
The major 3rd interval divides the pitch-class set into four subsets of three elements
each: {C,E,G]}, {C], F,A}, {D,F],A]}, and {D], F],B}. As in the previous cases,
these graphs are exclusive of each other, but they are also fully and cyclically con-
nected within the subset.
Because the subsets now have only three elements each, the desired geometrical
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 68
−0.20
0.2−0.2
00.2
−0.2
−0.1
0
0.1
0.2
0.3
0.4
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.8: The pitch classes connected by major 3rd intervals plotted in dimensions4, 5, and 6 of the diffusion map.
representation would be a series of separate and exclusive triangles.
Fig. 4.7 shows that, once again, the subsets are immediately and clearly separated
in the first few dimensions of the diffusion map. Because there are now four stationary
distributions necessary, one for each subset, we must go all the way to dimensions
4, 5, and 6 of the diffusion map to see the geometrical structure of the intervallic
relationship, seen in Fig. 4.8. Here, four triangles on prominently on display, as
expected.
Unlike the previous examples for the major 2nd and minor 3rd, these triangles are
not oriented as cleanly with relation to each other as the hexagons in Fig. 4.3 or
the squares in Fig. 4.6. This is likely because there are four triangles, or one more
than the three dimensions that we can visualize. It is mathematically impossible to
orient four planes orthogonally in three-dimensional space. As a result, all we can
visualize is some three-dimensional projection of the four-dimensional space in which
these triangles are orthogonal to each other.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 69
−0.2 −0.1 0 0.1 0.2
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.9: The pitch classes connected by perfect 4th intervals plotted in the firsttwo dimensions of the diffusion map.
Note that the apparent distortion of some of the triangles (such as that for the
subset {C,E,G]}, shown in the darkest black) is only because of the angle of perspec-
tive on the space. If viewed from an angle normal to the triangle’s plane, all triangles
appear more like the triangle for subset {D],G,B}, shown in the lightest gray.
5 semitones: Perfect 4th or Perfect 5th
Like the minor 2nd, the perfect 4th fully connects the pitch-class set. In other words,
starting from any note in the set, any other note can be reached by moving only
through perfect 4th intervals. This means that we should, in concept, have a very
similar geometric representation for the perfect 4th as was seen in Fig. 4.1 for the
minor 2nd, except with the pitch classes located differently to reflect the different
intervallic relationship.
In Fig. 4.9, we see that this is indeed the case in the diffusion space created for the
data set XP4. Instead of creating a chromatic circle, for this set, the representation is
the circle of fifths, a commonly used orientation in music theory. Because the graph
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 70
−0.4
−0.2
0
−0.2 0 0.2
−0.2
0
0.2
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
Figure 4.10: The pitch classes connected by tritone intervals plotted in the first threedimensions of the diffusion map.
is fully connected, this graph is displayed by the first two dimensions of the diffusion
map, demonstrating its prominence.
6 semitones: Tritone
The tritone is a critical interval for Western music. The tension created by a tritone,
and the movement toward resolution of that tension, is one of the primary driving
forces in harmonic progressions. However, despite this significant musical role, the
underlying geometry is far less exciting. The tritone breaks the pitch-class set into six
separate subsets: {C,F]}, {C],G}, {D,G]}, {D],A}, {E,A]}, and {F,B}. Because
there are only two elements in the subsets, they necessarily should create lines in the
geometric representation.
In Fig. 4.10, the diffusion space for this case is shown. Clearly, the subsets are all
divided, and the lines create the expected, albeit unexciting, geometric representation.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 71
−0.4−0.2
00.2
0.4
−0.4
−0.2
0
0.2
0.4−0.2
00.2
CC#/DbDD#/EbEFF#/GbGG#/AbAA#/BbB
Student Version of MATLAB
(a) Minor 3rd Squares
−0.2 −0.1 0 0.1 0.2
−0.2
−0.1
0
0.1
0.2
0.3−0.5
0
0.5
CC#/DbDD#/EbEFF#/GbGG#/AbAA#/BbB
Student Version of MATLAB
(b) Major 3rd Triangles
−0.2 −0.1 0 0.1 0.2−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
CC#/DbDD#/EbEFF#/GbGG#/AbAA#/BbB
Student Version of MATLAB
(c) Perfect 4th Circle (Circle of fifths)
Figure 4.11: Several geometric representations of intervals appear in the diffusionspace created with the major chord.
Major Triad
To show how these interval representations appear in more complex musical struc-
tures, let us use a major triad as an example. The intervallic content of a major triad
(e.g., C −E −G) includes a major 3rd, a minor 3rd, and a perfect 5th. By creating a
data set consisting of the pitch-class set and all 12 major chords, we are essentially
combining the data sets from the experiments for those three intervals.
Fig. 4.11 shows a few of the diffusion space dimensions for this data set, and
all three expected geometric shapes can easily be found: the minor 3rd squares (Fig.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 72
4.11(a)), the major 3rd triangles (Fig. 4.11(b)), and the perfect 4th circle.
An interesting observation that can be made from these shapes is how the inter-
play between the major and minor 3rd is represented. Notice that, in Fig. 4.11(a),
the minor 3rd squares or oriented in such a way so that the major 3rd subsets are
represented in the columns formed by their corners. The opposite is true for the ma-
jor 3rd triangles in Fig. 4.11(b), where their corners create columns of the minor 3rd
subsets. In this way, even the relationship between the two intervals is represented
in the diffusion space.
And so, through this example, we see that the intervallic relationships present
themselves in the geometry of musical elements that are built from those intervals,
and that an additional layer of information is added by linking the intervals together
in a meaningful way.
4.2.4 Recreating note-based visualizations
Now that we have established the effectiveness of diffusion for creating simple ge-
ometric visualizations of simple musical intervals, we can try to create some more
sophisticated representations. In section 2.4.1, several historical visualizations of the
notes are reviewed and discussed. These visualization schemes are the result of careful
thought and work by their creators with the intended goal of graphically representing
important relationships in music theory. Yet, with diffusion, we can easily create
these same visualizations in diffusion space with only a few intervals. In a sense,
many historical visualizations are special-case diffusion spaces derived from the spe-
cific intervals considered essential to those organizations.
However, in order to accomplish this, one issue must be addressed. Many of
the visualizations previously discussed require a single pitch class to occupy multiple
spaces at the same time. In a theoretical representation, this is not a problem, because
it can simply be accepted that each location is equal to the others, and any of the
infinite repetitions can be arbitrarily selected. However, in diffusion space, this is not
possible. Each data point is located based on the output of the eigenfunctions. These
eigenfunctions necessarily can only give one output for a single input, and therefore
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 73
−0.1−0.05
00.05
0.1
−0.05
0
0.05
−0.05
0
0.05
Student Version of MATLAB
(a) Pitch classes only
−0.1−0.05
00.05
0.1
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(b) Minor 2nd intervals connected
Figure 4.12: Shepard’s chromatic helix in diffusion space, resulting from the combi-nation of minor 2nd and octave intervals with the full note set, with and without theminor 2nd intervals drawn in.
a note in diffusion space cannot simultaneously occupy multiple locations.
As was previously discussed, the use of helices and spirals in music theory ge-
ometry was intended to alleviate this issue. Chew’s Spiral Array [13], for example.
is the Tonnetz [27, 57] wound into a spiral. So, we would expect to get a similar
representation in diffusion space.
However, while the use of helices and spirals removes the redundancy in the polar
dimension that moves around the structure, it does not remove redundancies up and
down the helices. We will address this problem here by including octaves in the
note set. In the case of Shepard’s chromatic helix and double helix [69], octave, or
pitch height, is already included in the representation, so this addition is consistent
with the original work. However, in the case of Chew’s Spiral Array, the space
described does not include pitch height, so that will be added artificially for the
diffusion representation to avoid the intractable problem of placing the same object
in multiple locations.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 74
−0.1−0.05
00.05
0.1
−0.1−0.05
00.05
0.1−0.01
0
0.01
0.02
0.03
0.04
Student Version of MATLAB
Figure 4.13: Zooming in on two octaves of the chromatic helix from Fig. 4.12(b).
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 75
Shepard’s Chromatic Helix
Shepard’s chromatic helix is fundamentally built on two intervals. The first is the
major 2nd, which is the interval that steps around the helix. The second interval is
the octave, which establishes the period of a cycle around the helix. As seen in Fig.
shepard, the chromatic helix lines up the octave vertically. In the representation, this
alignment also means that collapsing down the octaves projects onto the chromatic
circle (which was the diffusion-derived representation for the minor 2nd in Fig. 4.1).
So, by organizing the notes in the note set (built with multiple octaves), we find
the organization in Fig. 4.12(a), where the notes are organized into some sort of
cylindrical shape. The vertical axis also clearly separates based on pitch height.
However, it is difficult to visually diagnose the representation in any finer detail.
To assist in understanding, Fig. 4.12(b) connects the minor 2nd intervals. This
clarifies the picture a great deal, as the intervals form an ascending helix around
the outside. By simply using the minor 2nd and octave for an organization, we have
created Shepard’s chromatic helix. For easier visualization of this, Fig. 4.13 zooms
in on two octaves from the middle of the helix. In this zoomed view, it is easy to see
that the helix is beautifully recreated.
This approach also confirms that Shepard’s proposed visualization is the fun-
damental representation for a musical system based on the minor 2nd and octave.
However, Shepard clearly recognized that this wasn’t the only way to view the rela-
tionships of the notes, as he also proposed the double helix.
Shepard’s Double Helix
Shepard’s double helix is based on the perfect 5th, major 2nd, and octave. The octave
once again establishes the period of a helical cycle and defines the vertical axis. The
major 2nd is fundamental to each of the two helices, as that is the interval that each
covers with each step. And the representation is designed so that collapsing the octave
projects the helices onto the circle of fifths (establishing the importance of the perfect
5th).
To create this representation in diffusion space, the octave is obviously necessary.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 76
−0.050
0.05
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(a) Pitch classes only
−0.1
0
0.1
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(b) Major 2nd intervals connected
Figure 4.14: Shepard’s double helix in diffusion space, resulting from the combinationof perfect 5th and octave intervals with the full note set, with and without the major2nd intervals drawn in.
The perfect 5th also must be included, it would seem, in order to establish the proper
circular orientation (as was the case with the chromatic helix and the minor 2nd.
However, as it turns out, the major 2nd is unnecessary to create the visualization.
The implications of this will be discussed shortly.
The diffusion space created with the octave and the perfect 5th is shown in Fig.
4.14(a). The major 2nd intervals are connected in Fig. 4.14(b), visualizing the double
helix shape that is created. To more clearly see it, the representation is once again
zoomed to two octaves in Fig. 4.15. Notice that the “helices” could more accurately
be described as hexagons (the fundamental geometric representation of the major 2nd,
shown in diffusion space in Fig. 4.3) stretched in the vertical dimension.
However, it is important to consider that we were able to construct this shape
without including the major 2nd, which, in Shepard’s original formulation, played a
central role in the representation. The reason for this is that the double helix is not
essential to creating this organization of the notes. In fact, if the perfect 5th were re-
placed with the major 2nd, the two helices would exist completely separately, because
there would be no relationship between them. If the helices are to be intertwined, as
is the case in Shepard’s original representation, than the major 2nd is insufficient for
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 77
−0.1−0.05
00.05
0.1
−0.1−0.05
00.05
0.1−0.01
0
0.01
0.02
0.03
0.04
Student Version of MATLAB
Figure 4.15: Zooming in on two octaves of the double helix from Fig. 4.14(b).
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 78
constructing the proper note organization. Rather, it is simply a way to interpret it.
In fact, this organization of the notes can be interpreted in several ways. A few of
these interpretations are shown in Fig. 4.16. We have already seen that it can be seen
as two double helices ascending by major 2nd intervals. However, it is also possible
to view it as 5 separate helices ascending by perfect 4th intervals (Fig. 4.16(a)), or 7
separate helices ascending by perfect 5th intervals (Fig. 4.16(b)). It is even possible
to interpret the space as 12 helices ascending by octaves (Fig. 4.16(c)).
So, we can now see that Shepard’s double helix is really an interpretation of a
note organization, which is itself based on the octave and perfect 5th interval. And it
was through diffusion analysis that we learned this distinction.
Chew’s Spiral Array
Recreating Chew’s Spiral Array is slightly more challenging. This is because the
pitch height (which must be included to constrain one note to one location) does not
incorporate quite as cleanly as it does in Shepard’s helical representations. However,
the proper intervals can be derived with a little understanding of the space in which
the Spiral Array exists.
First of all, the spiral ascends by perfect 5th intervals, with four steps per cycle.
In the array, this leads to a vertical jump by a major 3rd. However, in our space, in
which octave must be accounted for, this vertical jump actually must be by a major
3rd plus two octaves (which is 28 semitones, or four steps of 7 semitones). So, in the
note organization of the Spiral Array adjusted to include pitch height, the vertical
interval is a major 3rd plus two octaves.
It seems obvious that the second fundamental interval would be the perfect 5th,
since that is the unit for the spiral’s ascension. However, as it turns out, this is not
the correct approach. When the pitch height is added, the note space in which the
Spiral Array exists is actually broken into 7 Spiral Arrays. This is because, as was
the case with the major 2nd in Shepard’s double helix, the combination of the perfect
5th and major 3rd plus two octaves does not fully connect the note set. So, with the
perfect 5th interval as the building block, the note set would be broken into exclusive
subsets.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 79
−0.1
0
0.1
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(a) Perfect 4th helices
−0.1
0
0.1
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(b) Perfect 5th helices
−0.1
0
0.1
−0.050
0.05
−0.05
0
0.05
Student Version of MATLAB
(c) Octave helices
Figure 4.16: Several other interpretations of the note organization in which Shepard’sdouble helix exists.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 80
−0.0500.05−0.05 0 0.05
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
Student Version of MATLAB
Figure 4.17: The diffusion space created with minor 2nd and two octave plus major3rd intervals with an approximation of the Spiral Array represented by the lines.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 81
−0.4 −0.2 0 0.2 0.4−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(a) φ1 and φ2
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
−0.4−0.200.20.4−0.5
0
0.5
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(b) φ3, φ4, and φ5
Figure 4.18: The Krumhansl-Kessler key space from Fig. 2.6(a) remade with diffusion.Dots represent major keys and circles represent minor keys.
So, like in the case of Shepard’s double helix, we must instead construct a space
in which the Spiral Array can exist. This can be accomplished with the minor 2nd,
creating one external spiral that ascends chromatically. By adding in the major 3rd
plus two octaves to align the spiral vertically, a space is created in which all 7 Spiral
Arrays can exist intertwined with each other. Fig. 4.17 shows this note space with
one of the Spiral Arrays drawn in by connecting the perfect 5th intervals, starting at
an arbitrary note. The spiral appears as a square because a cycle in the array only
takes four steps.
In this case, the addition of the pitch height complicated the formulation of Chew’s
Spiral Array, which is quite elegant in the pitch-class framework. However, even with
the added complication, the visualization can still be easily discerned and more deeply
understood through the diffusion formulation.
Krumhansl and Kessler’s Key Space
As an exercise, Krumhansl and Kessler [46] used Multi-Dimensional Scaling (MDS)
to create a low dimensional visualization for the keys based on the K-K key profiles.
This key space can be seen in Fig. 2.6(a).
In Fig. 4.18, the same low dimensional spaces are shown in the first several
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 82
−0.01 −0.005 0 0.005 0.01−0.01
−0.008
−0.006
−0.004
−0.002
0
0.002
0.004
0.006
0.008
0.01
Student Version of MATLAB
Figure 4.19: Duple-meter beat trains separated completely from triple meter beattrains and organized into a square.
dimensions of the diffusion maps for the K-K key profiles. The first two dimensions
(Fig. 4.18(a)) form the exact same space in which majors and minors separated by a
major 2nd are grouped together in perfect 5th circles. The second grouping occurs in
the next three dimensions, where keys separated by major 3rd intervals are grouped
together, with each group consisting of only major or minor keys.
4.3 Metrical Structure
Geometric representations for metrical structure have not seen the same attention as
their tonal counterparts. However, metric components of music are equally relevant
to music understanding, and, as will be seen, we can organize them with diffusion in
equally insightful and interesting ways.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 83
−0.015 −0.01 −0.005 0 0.005 0.01
−10
−5
0
5
x 10−3
Student Version of MATLAB
Figure 4.20: Triple-meter beat trains separated completely from duple meter beattrains and organized into a triangle.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 84
4.3.1 Metric geometry
The first metric experiment compares beat trains based on different meters. For this,
the data set is built of a group of beat trains. The affinity between two beat trains
is defined by the number of downbeats that overlap when the two trains are aligned
beat-for-beat.
The beat trains themselves are built on periodicities of 2, 3, 4, 6, 8, or 9 beats.
This gives a good representation for duple (2, 4, and 8 beat periodicities) and triple
(3, 6, and 9 beat periodicities) meters.
By building a graph on these beat trains with the affinity described above, several
interesting results are found. The first is that the duple and triple meters are com-
pletely separated in the eigenvectors. This is to say that there are several eigenvectors
where all duple meter beat trains have a non-zero value while all triple meter beat
trains have a zero value, and other eigenvectors where the opposite is true. These
eigenvectors could be used for metric classification.
This revelation alone is quite significant. The system used was given no clues as
to what is significant about these beat trains or how they can be organized or dis-
tinguished. The only information used was how many downbeats overlap. However,
through the connectivity-based analysis metric of diffusion, the two metric founda-
tions are separated completely.
How they are separated is also quite interesting. The first two eigenvectors that
identify duple meter are shown in Fig. 4.19. They clearly organize the duple meter
beat trains into a square. In Fig. 4.20, the first two eigenvectors in which the triple
meter beat trains are organized are seen to orient them into a triangle. These two
shapes are quite interesting, because they geometrically represent the mathematical
foundation of the metrical systems themselves.
Triple meter is based on periodic units of three, which geometrically most obvi-
ously correlates with a triangle. Duple meter is based on periodic units of two, though
the most common form is based on units of four, which is most closely connected to
the geometric square.
This is a very impressive example of the organizational abilities of diffusion map-
ping. By only presenting the analysis system with number of overlapping downbeats
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 85
for beat trains, the beat trains are separated into metrical units and then organized
into shapes that embody the metrical foundation on which they were built. It shows
the power of diffusion as a analytic tool for music theory that it is possible to extract
all of this without any prior knowledge of the data incorporated into the analysis at
all.
4.3.2 Visualizing hemiolas
A similar experiment can be repeated with a hemiola-based data set. In this example,
the data once again consists of beat trains that are built on periodicities of 2, 3, 4,
6, or 8 beats (9 beat periodicities were excluded in this case). And, the affinity was
once again defined by the number of overlapping downbeats shared in common by
two beat trains.
However, the key difference between this experiment and the previous metrical
experiment is that here all beat trains were considered to be one measure long. In
the metrical case, the beats were aligned, and so this meant that measures were not
necessarily aligned. In this case, measures are aligned, which will mean that beats will
not necessarily align. This experiment is hemiola-based because it compares different
ways of breaking down a single measure into a varying number of units.
We see in Figs. 4.21 and 4.22 that, despite this different approach to aligning the
beat trains, the results are quite similar. Once again, the beat trains that divide the
measure into units based in 2 are organized into a square, and those that divide the
measure into units based on 3 are in a triangle.
However, one noteworthy difference between the metric case and these hemiola
results is that in the metric case, the duple meter beat trains were all located at (0, 0)
in the triple meter organizations, and vice versa for the duple meter organizations.
However, in these hemiola graphs, the opposite is true. The shape of one type of beat
train forces the other type of beat train to its external corners rather than into the
center. The reasons for this difference are not entirely clear.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 86
−0.02−0.015−0.01−0.005 0 0.005 0.01 0.015−0.01 0 0.010.02
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
0.02
Student Version of MATLAB
Figure 4.21: Hemiolas based in units of 2 shaped into a square, similarly to the metriccase shown in Fig. 4.19.
−0.03 −0.02 −0.01 0 0.01 0.02 0.03 0.04−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
Student Version of MATLAB
Figure 4.22: Hemiolas based in units of 3 shaped into a triangle, similarly to themetric case shown in Fig. 4.20.
CHAPTER 4. DIFFUSION-BASED MUSIC THEORY ANALYSIS 87
4.4 Conclusion
In this chapter, we examined the diffusion maps created for several atomic units of
music theory. By starting with the simplest and most fundamental units of music
theory, we are able to build a foundation for understanding of musical relationships
with diffusion.
First, the geometry created with the intervallic relationships within the pitch-class
set was graphed. With lessons learned from these initial experiments, it then became
possible to recreate significant historical note-based visualizations using only a few
simple intervals. This not only gave insight into the process of diffusion mapping for
music, but also into the visualizations themselves. Finally, we looked at rhythmic
organizations, where it was shown that diffusion maps can extract high-level musical
information from even the simplest of inputs.
While these examples, by design, limit themselves to simple musical concepts, the
tasks on display are by no means trivial. Organizing intervals into their fundamen-
tal geometry (with that geometry hierarchically organized into the most prominent
eigenvectors) is a valuable tool. Completely and automatically separating beat trains
into their metrical units is another significant accomplishment. However, in all of
these cases, the most promising aspect is the ability to extract and communicate fun-
damental insights into the mathematical patterns behind the music theory. This is an
extremely interesting prospect, and shows that there is great potential for diffusion-
based analysis in the world of music theory.
Chapter 5
Diffusion-based Musical
Applications
5.1 Introduction
The visualizing and analytic capabilities of diffusion mapping lend themselves to inter-
esting applications in computational music analysis and music information retrieval.
Here, we will examine the role diffusion can play in enhancing key finding, meter
induction, and music visualization.
During the course of this exploration, several machine learning algorithms will
be demonstrated in diffusion space, showing that this approach provides an effective
front-end addition to high-level machine learning tasks.
5.2 Key Finding
Defining the key of a musical excerpt, whether symbolic or in an audio clip, is a highly
sought after task. Teaching a computer to understand musical key is a profound first
step towards teaching a computer to understand music in general.
With diffusion, it is possible to create distributional or functional key-finding
algorithms, depending on what sort of features are used for relating the musical
excerpts. And, in fact, we will first show that the fundamental bases of some of the
88
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 89
most popular algorithms for each approach can be easily derived from the diffusion
time constant.
For all key-finding examples, Bach’s The Well-Tempered Clavier, Books 1 and
2, are used. This database was selected for several reasons. First of all, it gives
nearly equal treatment to all keys, with two preludes and two fugues in each key.
Also, the even distribution of fugues and preludes offers a good mix of harmonic
styles. Finally, these pieces, and much of Bach’s music in general, provide excellent
examples of Western music theory in real scores with a master’s adherence to the
harmonic rules while still offering a great deal of variety.
5.2.1 Key-Finding Characteristics from the Diffusion Time
Constant
This first experiment tries to extract some understanding of key in Western music by
using the full database to organize the 12 pitch classes in a diffusion space.
First, the whole database is transposed into the same key and separated into major
and minor subsets. Then, all unique combinations of notes (intervals and chords) are
extracted and counted for each subset. Using this information, the twelve pitch-
classes are organized into two separate diffusion spaces, one for major keys and one
for minor keys. This process is very similar to the method used to create geometric
interval representations in Section 4.2.3.
The diffusion time constant matrix τ for these two organizations is shown in Fig.
5.1. Analyzed in this form, there isn’t too much to be concluded or learned. It is
clear that the notes 1 and 3 semitones (minor 2nd and minor 3rd, respectively) from
the tonic are furthest removed from the other pitch classes in the major key (Fig.
5.1(a)). 6, 8, and 10 are also somewhat distanced, and this now leaves us with only
the major scale.
In the minor subset (Fig. 5.1(b)), 2 and 6 are most removed, with 4, 9, and, to a
lesser extent, 11 removed, this time deriving the minor scale.
It is noteworthy that the least removed pitch class of those outside the scale (6
for major and 11 for minor) serves a relevant role in their respective modes. In the
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 90
(a) Major (b) Minor
Figure 5.1: Diffusion time constants between the pitch classes for major and minorkeys.
major case, the tritone of the tonic (6 semitones away) often appears in the secondary
dominant. And, in minor modes, the leading tone (11 semitones above the tonic or
one below) is often used in cadences and other moments when a particularly strong
movement to the tonic is desired.
So, some musical understanding can be gleaned from the time constants in the
matrix view. However, plotting this same information in a few slightly different
contexts can yield much deeper insight.
Rare-interval interpretations
Fig. 5.2 shows the relative time constants for different intervals in major keys. So, in
Fig. 5.2(a), the time constants between all pitch classes with only 1 semitone between
them are shown. The labels show the root of the interval, meaning, for example, that
the bar labeled 2 shows the time constant between pitch class 2 and the pitch class
one interval up from 2. The corresponding plots for the minor keys can be seen in
Fig. 5.3.
These plots tell a much more interesting story regarding the keys, especially in the
context of rare intervals and functional key finding. A rare interval, generally, is one
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 91
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(a) Minor 2nd0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(b) Major 2nd
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(c) Minor 3rd
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(d) Major 3rd0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(e) Perfect 4th
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(f) Tritone
Figure 5.2: Diffusion time constants between notes separated by various intervals forthe major subset.
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(a) Minor 2nd
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(b) Major 2nd0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(c) Minor 3rd
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(d) Major 3rd0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(e) Perfect 4th
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
1.2
Student Version of MATLAB
(f) Tritone
Figure 5.3: Diffusion time constants between notes separated by various intervals forthe minor subset.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 92
that occurs only in specific key-dependent situations and therefore can be used in key
identification. In these diffusion time constant plots, a rare interval could be defined
as one that has a few small time constants paired with mostly large time constants,
indicating that the interval has only a few situations where it is highly relevant to the
key and otherwise is highly unusual. Looking at the plots in Figs. 5.2 and 5.3 with
this criteria, we can draw some interesting conclusions.
First, the tritone, which was one of the motivations for the rare-interval theory,
is not particularly effective. The tritones with small diffusion time constants are not
drastically different than the others, and, more importantly, the tritone is cyclic in 6
semitones, meaning the diffusion time constants repeat in the plot. This is intuitively
obvious, since the tritone bisects the octave, so, in the pitch-class set, the tritone
{B,F} is identical to the tritone {F,B}. Unfortunately, this means there are only
6 tritone intervals possible, and the tritone in C major is indistinguishable from the
tritone in F] major (not to mention c minor and f] minor). Additionally, the relative
locations of the characteristic intervals in the plots for the major and minor keys are
separated by a minor 3rd, suggesting there may also be some confusion between the
relative major and minor keys. This analysis suggests that the tritone may not be
the best interval for rare-interval key finding.
Another interval that, at first glance, shows potential for key finding is the major
3rd. In both the major and minor subsets, there are only three intervals with small
diffusion time constants, and the rest are quite a bit larger. However, comparing the
case for major keys (Fig. 5.2(d)) with minor keys (Fig. 5.3(d)) reveals that the two
are very similar, offset by a minor 3rd. This suggests that the major 3rd interval would
struggle to distinguish relative major and minor keys. However, the potential for the
interval to distinguish all other keys promotes it to an interval worth investigating.
The final interval that appears to have potential for rare-interval key finding,
according to these plots, is the minor 2nd. In both major and minor keys, the intervals
common to the key are separated by a non-tritone distance, so no two keys will have
the same profile. However, like the major 3rd, comparing the major and minor keys
reveals that, in both cases, the small diffusion time constants are oriented the same
distance apart. Once again, the relative major and minor keys will likely be confused.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 93
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1
Diffusion−basedK−K Major Profile
Diffusion−basedK−K Minor Profile
Student Version of MATLAB
Figure 5.4: Key profiles derived from the diffusion time constant compared to theK-K key profiles for major (top) and minor (bottom) keys.
The possibility of using the minor 2nd for rare-interval key finding has previously been
suggested as well [44].
All of these intervals show promise but also potential drawbacks. We will examine
their effectiveness for key finding in diffusion space in Section 5.2.2.
K-K key profile interpretations
We can also compare the first row of the τ matrices to the K-K key profiles, since, in
this context, these two metrics are trying to measure the same relationship between
the pitch classes and the tonic. In order to put these two metrics on the same
scale, we will actually compare the K-K profiles to the negative exponential of the
time constant e−τ . This inverts the time constant’s orientation, so larger is a more
significant relationship and smaller is less significant, as was the case for the K-K key
profiles.
The comparison of these two metrics can be seen in Fig. 5.4 (with the K-K key
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 94
profiles normalized to a maximum of 1 for comparison). The two are astoundingly
similar, showing identical relationships with only small scaling differences between
the two. This demonstrates that the diffusion space created by the database and its
organization of the notes actually corresponds almost identically to perceptual data.
By no means is this a suggestion that the diffusion process is in any way related to
perception, but it is highly relevant that the two create the same hierarchy for the
pitch classes.
A more likely conclusion is that both the K-K key profiles and the diffusion time
constant provide accurate representations of a fundamental pitch hierarchy that exists
in the perceptual system and therefore has guided much of Western music theory as
well.
5.2.2 Functional Code-Based Key Organization
We now circle back to experimentally examine the conclusions drawn on rare intervals
in Section 5.2.1. Using the diffusion time constants, several suggestions were made
about the viability of the tritone, major 3rd, and minor 2nd in key-finding applications.
Here, we will test these intervals plus a few other functional characteristics of music.
Method
These experiments are conducted on the Bach database, though all works are trans-
posed into all keys to fully use the data. Each track is represented by the number
of occurrences of the interval or set of intervals used in the particular experiment.
The intervals are separately counted for simultaneous occurrences, steps up by the
interval, and steps down by the interval. These counts are then used to create an
organization of the pieces in a diffusion space, and the diffusion time constant is
subsequently calculated.
In order to evaluate the key-finding capabilities, 90% of the data is randomly
assigned as training data (using the same set for all tests), and the keys for the other
10% of the data is determined using a k-nearest neighbors algorithm, defining the
neighbors as those closest in the diffusion time constant.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 95
Method Accuracy
Tritone 55.65%Major 3rd 76.52%Minor 2nd 73.04%Dominant 73.91%
Combo 91.30%Harmonic 93.91%
Table 5.1: Accuracy for various interval-based key-finding approaches using nearestneighbors in the diffusion time constant.
The tritone, major 3rd, and minor 2nd were all tested separately. A dominant-
based interval relationship was also tested on its own, in which the scores are searched
for occurrences of a descent by a perfect 5th, which is the root progression of the
dominant cadence. Then, all four of these were combined for one test. Finally,
a full harmonic examination was performed, counting the occurrences of all note
combinations simultaneously (in this case, temporal progression was not used).
Results
The results for all of the experiments, which can be seen in Table 5.1, show that all
methods work reasonably well, since the random threshold for 24 classes is 4.17%, and
all methods perform significantly above this. The relative results also support much of
the analysis from Section 5.2.1. First of all, the tritone gives the worst performance,
which was expected based on the potential confusion between parallel majors and
minors as well as keys separated by a tritone. Major 3rd and minor 2nd approaches
performed significantly better, which was also expected, since only relative major and
minor key confusion appears fundamentally problematic for the intervals.
The confusions for these methods can be seen in Fig. 5.5, where incorrect estimates
are shown for the major and minor pieces with important types of mistakes labeled.
The corresponding diffusion maps (in the predominant dimensions) are shown in Fig.
5.6. In the diffusion maps, it is possible to see why the common confusions for each
approach occur.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 96
ParMaj RelMaj Tri ParMin Tri RelMin0
0.05
0.1
0.15
0.2
0.25
MajorMinor
Student Version of MATLAB
(a) Tritone
ParMaj RelMaj Tri ParMin Tri RelMin0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
MajorMinor
Student Version of MATLAB
(b) Major 3rd
ParMaj RelMaj Tri ParMin Tri RelMin0
0.1
0.2
0.3
0.4
0.5
MajorMinor
Student Version of MATLAB
(c) Minor 2nd
ParMaj RelMaj Tri ParMin Tri RelMin0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
MajorMinor
Student Version of MATLAB
(d) Dominant
ParMaj RelMaj Tri ParMin Tri RelMin0
0.1
0.2
0.3
0.4
0.5
MajorMinor
Student Version of MATLAB
(e) Combo
ParMaj RelMaj Tri ParMin Tri RelMin0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MajorMinor
Student Version of MATLAB
(f) Harmonic
Figure 5.5: Confusions for all 6 functional key-finding experiments with noteworthyconfusions labeled.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 97
−0.1
−0.05
0
0.05
0.1
−0.2−0.100.10.2−0.1
0
0.1
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(a) Tritone
−0.06
−0.04
−0.02
0
0.02
0.04
0.06−0.06 −0.04 −0.02 0 0.02 0.04 0.06
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(b) Major 3rd
−0.06 −0.04 −0.02 0 0.02 0.04 0.06−0.06
−0.04
−0.02
0
0.02
0.04
0.06−0.1
0
0.1
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(c) Minor 2nd
−0.06 −0.04 −0.02 0 0.02 0.04 0.06 −0.06
−0.04
−0.02
0
0.02
0.04
0.06
−0.1
0
0.1
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(d) Dominant
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
−0.06−0.04−0.0200.020.040.06−0.1
0
0.1
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(e) Combo
−0.06
−0.04
−0.02
0
0.02
0.04
0.06−0.06 −0.04 −0.02 0 0.02 0.04 0.06
−0.1
0
0.1
C
C#/Db
D
D#/Eb
E
F
F#/Gb
G
G#/Ab
A
A#/Bb
B
Student Version of MATLAB
(f) Harmonic
Figure 5.6: Diffusion maps for all 6 functional key-finding experiments.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 98
In the tritone case, the errors are mostly distributed, though there are peaks at
both tritone intervals, as expected. This confusion trend, seen in Fig. 5.5(a), is
reinforced in the diffusion map (Fig. 5.6(a)) in which only 6 of the keys can even be
seen. This is because the confusion between tritone-separated keys is so significant
that the keys one tritone above (or below) the visible keys are covered up visually.
Note that the tritone is the only key-finding algorithm shown here that ever confuses
a key with a key one tritone away. It is also interesting that the majority of the errors
for the tritone interval guess a minor key.
However, for the major 3rd and minor 2nd experiments (Figs. 5.5(b) and 5.5(c),
respectively), relative major and minor confusions clearly dominate the errors. This
was also predicted from the diffusion time constant plots in Figs. 5.2 and 5.3, and,
in the diffusion maps (Figs. 5.6(b) and 5.6(c), respectively), this same trend can be
seen, where the separation between the dominants (along the circle of fifths that the
structure is built around) is much more significant than the separation between the
relative majors/minors.
The dominant approach is intended to utilize the prevalence of the dominant
cadence as a key-finding cue. Most Western music has a dominant cadence to the
tonic at the end of the piece, and often at the end of phrases or other fundamental
segments as well. This approach certainly has its potential flaws as well, particularly
in that, not only do parallel major and minor keys have the same roots for the
dominant cadence, but the perfect 5th interval can occur in numerous contexts and
is not necessarily key-specific. The hope, though, is that the profile of perfect 5th
intervals will indicate the proper key. And, it is clear in Table 5.1 that this is a
relatively effective approach, yielding comparable results to the best rare-interval
results. And, as expected, many of the errors are relative major and minor confusions,
as seen in Fig. 5.5(d). Interestingly, the map (Fig. 5.6(d)) would lead us to believe
that the common error would be between relative majors and minors, rather than
parallel, but the confusions show this not to be the case. The parallel major/minor
confusion is likely represented in a lower (though clearly still significant) dimension
of the map.
One noteworthy observation in these four approaches is that their diffusion maps
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 99
Method Accuracy
K-S 93.75%K-S + diffusion filter 98.96%
Table 5.2: Accuracy for the K-S key-finding algorithm before and after processingthe data with a filter derived from hierarchical clustering in the diffusion space.
(Figs. 5.6(a)-5.6(d)) show different orientations of the keys, particular with regard to
the connection between major and minor keys. While they are all fundamentally built
on two circles of fifths, those circles are not positioned in the same way in every case
(for example, in Fig. 5.6(b) the minor circle is within the major circle, while, in Fig.
5.6(c) they are stacked on top of each other). This suggests that the approaches may
be complimentary, which is also reinforced by the different types of error encountered.
So, we would expect that combining them would yield good improvement, and this
turns out to be the case. Even though none of these methods achieves greater than
77% accuracy, combining them all gives an accuracy of 91.30%. The majority of the
errors are once again relative major and minor mistakes.
Finally, the best results are seen from the full code set of all harmonic combinations
of notes. Even without temporal progressions, a full count of all note combinations
correctly identifies 93.91% of the keys, and once again, the majority of the errors are
confusions between the relative major and minor keys.
5.2.3 Extending the K-S Algorithm with Clustering
It is also possible to use distributional inputs for diffusion-based key finding. One
simple example is to use the pitch-class distributions used in the K-S key-finding
algorithm. Clustering the Bach database (this time without transposing into all
keys) into a hierarchical tree from the diffusion time constant yields the tree in Fig.
5.7. In this plot, the three incorrectly grouped works are circled in red (this equates
to an accuracy of 96.88%). These errors are Prelude No. 10 in E minor (BWV 855)
from Book 1 and Prelude No. 2 in C minor (BWV 871) and Prelude No. 11 in F
minor (BWV 881) from Book 2.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 100
Student Version of MATLAB
Figure 5.7: The hierarchical tree created from pitch-class distributions labeled withkey with errors circled in red.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 101
If the K-S key-finding algorithm is applied directly to these same pitch-class dis-
tributions, it correctly labels 93.75% of the pieces, incorrectly labeling Fugue No. 3
in C] major (BWV 848) and Prelude No. 21 in B[ (BWV 866) from Book 1 and
Prelude and Fugue No. 15 in G major (BWV 884), Fugue No. 19 in A major (BWV
888), and Prelude No. 23 in B major (BWV 892) from Book 2 (6 total mistakes).
However, these mistakes are completely different from those made in the diffusion
clustering, and, as it turns out, are a different type of mistake.
Diffusion Cluster Filtering
We can extend the K-S key-finding algorithm by inserting the hierarchical clustering
shown in Fig. 5.7 as a pre-processing step. We can take advantage of the unsuper-
vised grouping that has been performed in the diffusion space by filtering objects
grouped together with each other, essentially making objects deemed to be similar
more similar.
To create the filter for this, we first define the distance between two works in the
tree as simply the number of branches between them, or the number of steps in the
tree it takes to move from one data point to another. A filter can then be defined
from this distance. In this case, we will use a simple filter that only filters objects
that are within 3 steps of each other, and gives an extra weight to each step closer
than 3 that the two objects are. This matrix then needs to be normalized for filtering.
f(xm, xn) =i(xm, xn)
K−1∑p=0
i(xm, xp)
, where i(xm, xn) =
3 : 0 steps between xm and xn
2 : 2 steps between xm and xn
1 : 3 steps between xm and xn
By inserting this filter in front of the K-S key-finding algorithm, the accuracy
improves to 98.96%, only erring in the label of Fugue No. 15 in G major (BWV 884)
from Book 2. The two performances can be seen side-by-side in Table 5.2.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 102
Method Accuracy
Euclidean 89.86%Diffusion time constant 91.30%
Table 5.3: Accuracy for the meter induction task using nearest neighbors with bothEuclidean distance and the diffusion time constant.
5.3 Meter Induction
Meter is another high-level musical attribute that humans extract effortlessly while
computers struggle with the task. But, like key, finding algorithms to determine
meter is an important step towards a full computational understanding of music.
Method
Most approaches for computational meter induction use a feature set derived from
autocorrelation of some representation of the musical signal. Sometimes these repre-
sentations incorporate high-level information such as melodic contour or perceptual
accent.
Here, we will use the one of the simplest representations, which is to only count
onsets. To extract this vector from a musical score, simply count how many notes
have their onsets at any given rhythmic unit. This is collected into a time series,
which we then input into an autocorrelation function. Then, one small extension
made from here is to sum together the coefficients of the autocorrelation functions
for multiples of 2, 3, 4, etc. These sums are collected into one vector which is used
for the input. This extension is simply added to encourage the grouping of duple and
triple meters.
The onset-based vectors were calculated for 686 melodies of Germanic origin from
the Essen folksong database [65]. The classification was then performed with a k-
nearest neighbors algorithm in both Euclidean and diffusion space. The training set
was randomly selected as 80% of the data, and the remaining data was used for
testing.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 103
−0.10
0.10.2
0.3 −0.15−0.1
−0.050
0.050.1
−0.2
−0.1
0
0.1
0.2
Student Version of MATLAB
Figure 5.8: The first three dimensions of the diffusoin map for meter classification onthe Essen folksong database colored by meter label.
Results
The classification accuracy for both the diffusion space and Euclidean space can be
seen in Table 5.3. Though the nearest neighbors algorithm performs quite well in
Euclidean space, the move to diffusion space improves the performance by 1.5%.
However, beyond this improvement, diffusion offers the advantage of visualization.
The first three dimensions of the diffusion map are shown in Fig. 5.8 with data
points colored by meter. In the diffusion space, the folksongs with duple and triple
meter are organized approximately into their own plane. The duple plan and triple
plane meet, forming a V-shaped organization. However, at the joining point, there is
mixing between the two data sets, indicating the separation for those melodies is not
as clear.
Fig. 5.9 shows the same plot, but resized according to errors. The training data
is essentially gone from this plot. Test data that was labeled correctly is plotted
with smaller dots and mistaken labels are plotted with large dots. As we would
expect from Fig. 5.8, the errors are concentrated along the spine of the V where the
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 104
−0.10
0.10.2
0.3 −0.15−0.1
−0.050
0.050.1
−0.2
−0.1
0
0.1
0.2
Student Version of MATLAB
Figure 5.9: The same as Fig. 5.3 with the test data indicated by larger size, anderrors in labeling of the test data largest.
duple and triple meter planes meet. We could potentially predict the reliability of a
classification based on its proximity to this spine.
So, in addition to a small improvement to the results, moving the classification to
diffusion space also offers the ability to visualize the classification itself. Based on the
location of a data point with an unknown label in this space, it can be determined
how certain the classification is, and, on a higher level, how strong the metric identity
of the music itself is.
5.4 Visualization of Trajectories
Creating a visualization of a musical excerpt can provide an interesting artistic tool
for experiencing music in a multimedia environment. A visual analog can provide
a different type of insight into a musical piece, and it can also create intuitive and
natural representations of musical concepts without requiring a rigorous background
in music theory. In this way, a good visualization can serve as an analytic tool, an
artistic extension, and a didactic mechanism, all at the same time.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 105
Diffusion mapping is ideal for creating musical trajectories for several reasons.
First of all, as has been shown repeatedly, the dimensions of the diffusion map hi-
erarchically represent the global structure of a data set. This allows for automatic
creation of the trajectory. Also, any affinity function can be used to relate sections of
the music. So, the flexibility of the diffusion process gives the opportunity to design a
system for plotting desired characteristics. The fact that the diffusion space in which
the trajectory exists is created specifically for the data is advantageous as well, since
this means there does not need to be some universal space that will not work as well
for some excerpts than others.
To create the trajectories that follow, a very simple representation was used. A
musical score is broken into small, overlapping windows, only one or two beats in
duration. The affinity between these windows is defined exclusively by how many
notes they share in common at the same relative time.
This approach creates an interesting diffusion space, where the trajectory of the
music is the movement of the excerpt within that space. Different regions will cor-
respond to harmonic distinctions defined by the interval combinations in the music,
because chords that share notes will also share a relatively high affinity. Also, the
circular connectivity of inversions and variations on chords reinforces this closeness
in diffusion space. These harmonic regions will be related to each other by harmonic
pathways created by the presence of intermediate note combinations between those
regions (such as a G major chord and E minor chord being connected by a {G,B}major triad).
However, the time window will also connect harmonic regions that are located
temporally close to each other in the music. As the music progresses from one chord
or note to another, the sliding windows that cover that transition create steps between
those two states. This essentially creates a pathway for the diffusion process to travel
between the two states, and the more pathways that are created, the closer the regions
will be in the diffusion space.
So, the diffusion space will have harmonic regions that are drawn close to each
other by harmonic and temporal commonalities. The collection of these pathways
determines the organization.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 106
Figure 5.10: The melody of Twinkle, Twinkle, Little Star
All of the trajectories to be shown can also be seen as animations online [68].
This is highly recommended, because the addition of temporal movement visualizes
elements like harmonic progression and rhythm. Also, watching the animations in
sync with the music makes it much easier to understand the harmonic layout and the
regions of the diffusion space.
5.4.1 Twinkle, Twinkle, Little Star
Fig. 5.11 shows the trajectory created in the first three dimensions of the diffusion
space for the melody Twinkle, Twinkle, Little Star (Fig. 5.10). This melody is mono-
phonic, and so the orientation of the space is completely determined by the temporal
relationships. That makes this trajectory a particularly good initial example, since
the entire organization is the product of only one characteristic: the melody itself.
To help see how the temporal pathways shape the harmonic regional distribution, the
notes in the melody are marked in the trajectory.
It is very easy to see in this picture how the temporal connections form a pathway
between the notes, and how these connections create the organization in the diffusion
space. Moving along the exterior of the circle in a clockwise direction follows the con-
nections of the first four measures of the melody. These pathways show particularly
clearly how the organization is affected by the connectivity, because the A is only
connected to the G, and as a result is positioned very close to it. The A shares no
direct temporal relationship in this particular melody in any note but G, and this is
clearly reflected in the trajectory. The circular shape of the first four measures also
shows that the phrase starts and ends in the same state (the tonic).
The pathway connecting the D and G is a product of the movement from measure
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 107
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
−0.2−0.1
00.1
0.2−0.2
00.2
C
D
E
F
G
A
Student Version of MATLAB
Figure 5.11: The trajectory for Twinkle, Twinkle, Little Star with the individualnotes marked.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 108
6 to 7. This is the only interval in the entire melody that is not stated in the first
four measures. As a result, the D and G are pulled closer to each other. This is
reflected in the vertical dimension. The pathway between the D and G is the spine
of a saddle shape that the exterior circle creates, with the two ends (C on one side
and F and G on the other) dipping downward. So, in this way, D and G are pulled
together while still maintaining the distance between the notes that do not share a
direct connection.
Through these connections, a shape is created that represents the melodic move-
ments of Twinkle, Twinkle, Little Star and the corresponding harmonic implications.
However, it is important to understand that, because no preconditions of music theory
were imposed on the diffusion mapping, the visualization is designed entirely based
on the music. Purely in the context of this melody, C and E have no relationship
except through D. A has no relationship to any other note except through G (and, in
the case of E, through multiple notes). Diffusion mapping takes these relationships
and maps them appropriately in a low-dimensional space.
5.4.2 Prelude No. 1 in C major (BWV 846) from The Well-
Tempered Clavier, Book 1
The arpeggiated character of Bach’s Prelude No. 1 in C major (BWV 846) from The
Well-Tempered Clavier, Book 1, (Fig. 5.12), with a more sophisticated underlying
harmonic progression and a significantly longer duration, is more complex and elab-
orate than Twinkle, Twinkle, Little Star, and so, as a result, its trajectory is much
more complex, seen in Fig. 5.13.
One of the most pronounced characteristics of this trajectory is the sections that
jump away from the most common region. In Fig. 5.13, the lower left portion is the
area of most common activity, with the paths leading out into the upper right as the
unusual projectiles. These sections correspond approximately to parts of measures
14-17 and 20-24. Looking at the score, these are sections of tonal drift and harmonic
unrest. In the first section (14-18), the unrest is set off by a diminished seventh in
measure 14, and it is followed by several progressions that do not strongly pronounce
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 109
Fig
ure
5.12
:Sco
reof
Bac
h’s
Pre
lude
No.
1inC
maj
or(B
WV
846)
from
The
Wel
l-T
empe
red
Cla
vier
,B
ook
1.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 110
−0.100.1−0.1−0.0500.050.1
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
Student Version of MATLAB
Figure 5.13: The trajectory for Bach’s Prelude No. 1 in C major (BWV 846) fromThe Well-Tempered Clavier, Book 1.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 111
the key or push harmonic movement forward (an inverted tonic, inverted IV 7 and ii7).
The original key is not restored until the dominant-tonic statement in measures 18-19.
A similar tonic drift occurs in measures 20-24, in which the subdominant is tonicized
(m. 20-21), followed by a movement toward the dominant. Once the dominant is
restored in measure 24, the trajectory has returned to the lower left region of Fig.
5.13.
Another noteworthy characteristic of the piece that is visually communicated in
the trajectory is the cyclic nature of the tonal progressions. With few exceptions,
every measure consists of an arpeggiated chord repeated twice (with two smaller rep-
etitions of the final three notes within each repetition). These cycles are represented
in the trajectory by circular paths.
On a general level, the trajectory is essentially built of relatively straight paths
with junction points where the path turns sharply. At each of these junction points,
there is a thick joint that, when viewed from the appropriate angle, is seen to be
circular. Several of these circular junctions can be seen in Fig. 5.13, with the most
clear in the center of the plot in orange or near the upper right in green.
These circular junctions communicate the cyclic nature of the tonal progressions
in the prelude. Also, the scale of the circles is accurate to the musical experience.
This piece is not one large cyclical movement (unlike the opening phrase of Twinkle,
Twinkle, Little Star). Instead, it is built from numerous atomic cycles that create
the harmony of the piece. This element is precisely represented in the trajectory.
To give a clearer picture of these circular junctions, the section of the trajectory
that corresponds to the first four measures is shown on its own in Fig. 5.14. Here,
the greater shape is a closed triangle, because the first four measures happen to begin
and end in the tonic. However, the elbows of the triangle can clearly be seen here to
have a circular shape. Each of these circles represents a chord, and they then group
together by their associations to form the tonal regions. Here, the corners of the
triangle correspond to the tonic, dominant, and subdominant tonal regions. As we
already showed in Fig. 5.13, these harmonic regions all align together, separate from
the regions of tonal drift already discussed.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 112
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 −0.04
−0.02
0
0.02
−0.06
−0.04
−0.02
0
0.02
Student Version of MATLAB
Figure 5.14: The trajectory for Bach’s Prelude No. 1 in C major (BWV 846) fromThe Well-Tempered Clavier, Book 1, with only the first four measures shown.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 113
5.4.3 Robustness to Performance Noise
The connectivity-based organization approach that diffusion utilizes is highly robust
to noise and distortion. This is because, while low-level noise may have a significant
effect on the specific distances between data points, it takes a great deal more noise
to significantly affect the orientation of the local geometry, and even more to affect
the global geometry. So, in the diffusion space, the dimensions corresponding to
larger eigenvalues (and therefore representing the global structure) should be largely
unaffected by reasonable amounts of noise. Instead, the noise is restricted to the
dimensions with smaller eigenvalues, since the noise is a largely local effect. Of
course, everything has a breaking point, but the hierarchically structured nature of
the diffusion space makes the process more robust to noise.
To demonstrate this, performance-like noise was synthetically added to Bach’s
Prelude No. 1 in C major (BWV 846) from The Well-Tempered Clavier, Book 1.
The timing of the note onsets was varied by an amount determined by a sinusoidal
oscillation with random noise added in. Furthermore, occasional random notes were
added in, to simulate mistakes.
Fig. 5.15 shows the trajectories for these signals, with several levels of distortion.
The lowest level (Fig. 5.15(a)) distorts the timing by ±.05 seconds and adds in five
errant notes. With this minor noise included, the trajectory still looks identical to
the undistorted trajectory (Fig. 5.13). Increasing the distortion to ±.1 seconds and
10 errant notes, the trajectory (Fig. 5.15(b)) still looks very similar. Even in the
highest level of distortion, with the timing adjusted by ±.2 seconds and 20 mistakes
added, the trajectory is still recognizably similar, although at this point the effects
are shown by the instability of the pathways and small warbles in the trajectory.
However, considering the level of distortion in the music at this point, this is still an
impressive demonstration of robustness.
It is also worth noting that, in all of the cases, the harmonic regional layout is
still unaffected, because, even with all the distortion, the harmonicity of the music
has not changed.
The growing instability of the trajectory and variations in the timing information
are more salient in the animations available online [68], and the consistency of the
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 114
−0.10
0.1
−0.15−0.1−0.0500.050.1
−0.1
−0.05
0
0.05
0.1
Student Version of MATLAB
(a) Low Noise
−0.1
0
0.1
−0.15−0.1−0.0500.050.1
−0.1
−0.05
0
0.05
0.1
Student Version of MATLAB
(b) Mid-level Noise
−0.100.1 −0.1 −0.05 0 0.05 0.1 0.15
−0.1
−0.05
0
0.05
0.1
Student Version of MATLAB
(c) High Noise
Figure 5.15: Several trajectories for Bach’s Prelude No. 1 in C major (BWV 846)from The Well-Tempered Clavier, Book, with different levels of performance-like noisesynthetically added
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 115
regional layout despite these distortions is also clearly visible as well.
5.4.4 Audio Signals
So far in this work, only symbolic musical representations have been used for the
demonstrations and experiments. However, the abilities of diffusion extend into the
domain of audio signals as well.
To demonstrate this, we will recreate the trajectory for Twinkle, Twinkle, Little
Star (Fig. 5.11) from an audio signal of the melody, played by a piano. However, in
order to accomplish this, a second level of diffusion needs to be added, creating what
we will call a diffusion super graph. In this case, the process breaks the harmonic and
temporal relationships into two separate levels.
The first layer of diffusion takes the Discrete Fourier Transform (DFT) of a win-
dowed segment, 100ms long. Only the magnitude of these Fourier coefficients is used
as the features for the window. The time slices can then be organized with a diffu-
sion map based on the affinity of the Fourier coefficients, which will group individual
notes (or, for a more musically rich signal, chords) together into their harmonic re-
gions. However, unlike in the symbolic trajectories, in this case only the harmonic
relationships will create this space, because the temporal progression is ignored at
this level. But, the physics of resonating sound, with multiple harmonic frequency
components in a single sound source, does play an important role at this level. The
affinity between two frames will be determined in large part by the number of har-
monics they share in common. So, obviously two slices of the same note will have a
high affinity. But, important relationships like the perfect 5th will also have a relevant
role here.
The second layer of the graph tracks the movement of the signal within this space
and breaks this movement down into small sections, several frames long. The distance
between the two points (which can be derived from the diffusion distance, the diffusion
time constant, or the Euclidean distance in a subspace of the full diffusion space) then
defines the affinity of the second layer of the super graph. Theoretically, additional
layers can be added to the graph indefinitely, if desired.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 116
−0.3−0.2
−0.10
0.10.2
−0.2
−0.1
0
0.1
0.2
−0.4
−0.2
0
0.2
0.4
Student Version of MATLAB
Figure 5.16: The trajectory for Twinkle, Twinkle, Little Star, derived here from anaudio signal instead of symbolic music (as was the case in Fig. 5.11).
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 117
By following this two-layered approach, the trajectory seen in Fig. 5.16 was
created from the audio signal. A quick comparison shows the layout to be quite
similar to the trajectory derived from the symbolic data in Fig. 5.11. The same
circular exterior is seen, and the spine across the top of the overall saddle shape is
present as well. The most noteworthy difference is that the small loop that connects
G and A (in the upper right of the symbolic trajectory) is stretched out significantly
further (comparatively to the rest of the structure). This is likely because of the
largely unconnected harmonic sets for these two notes. So, while the two notes are
pulled together temporally, G feels a stronger harmonic connection to other notes,
most particuarly C and D. D and A have a harmonic relationship as well, but their
lack of temporal proximity appears to override this connection in the organization.
The success of this experiment makes two important points. First, the diffusion
analysis in this work is not specifically restricted to symbolic music and can be ex-
tended to audio music as well, an important feature for general applicability to many
music applications. Second, this suggests that the diffusion analysis is in fact ex-
tracting deeper musical information than simply what is available at face value. The
audio signal and symbolic data share no superficial characteristics in common. They
are completely different representations of the exact same musical sequence, but the
only way that similarity can be seen is through an understanding of that underly-
ing musical structure. Diffusion mapping creates a space in which these different
representations map to that musical structure consistently.
5.5 Conclusion
This chapter presents a wide range of applications for diffusion mapping in music
analysis. In key analysis, diffusion space was used for functional key analysis, and
the organizations created with structural input improved the performance of the K-S
key-finding algorithm through a cluster-based filter built in the diffusion space. In
meter induction, not only does the diffusion-based approach improve on the same
algorithm in Euclidean-space, but the regions of confusion can easily be seen in the
diffusion space.
CHAPTER 5. DIFFUSION-BASED MUSICAL APPLICATIONS 118
Trajectories in diffusion space are a particularly interesting application. The visu-
alizations created were shown as means for analyzing and understanding the music.
The visualizations project characteristics of the music into an intuitive representation
that communicate the musical structure without requiring an actual understanding
of the theory behind the structure. Circular structures, big and small, communicate
cyclical sections of music. Tonal drift is shown through regional drift. In this way,
these trajectories are not unlike the geometric representations shown in Chapter 4,
in that they give an intuitive visual representation of high-level musical concepts.
Combining this ability with the creation of the same trajectory from symbolic
and audio representations of a melody demonstrates clearly that the structures and
analyses created in diffusion space are based off the deeper musical concepts rather
than simply superficial features. This is a powerful conclusion and summarizes the
important role that diffusion analysis can play in musical applications.
Chapter 6
Conclusions and Future Work
6.1 Summary of Contributions
This dissertation has introduced a fundamentally novel approach for music analysis
at multiple levels, and so it is difficult to isolate the contributions. The type of
analysis performed is completely new, particularly the theoretical-level analysis in
Chapter 4. However, there are a few particularly salient developments that are worth
emphasizing here.
6.1.1 Diffusion Time Constant
The diffusion time constant is a new metric for diffusion space, and makes a contribu-
tion to the field of diffusion mapping. It extends the diffusion distance in cases where
selecting an optimal time parameter t is difficult, or where the diffusion distance is
insufficient for simultaneously representing the full hierarchy of the structure.
Because finding the diffusion time constant requires utilizing Newton’s method
for solving the sum of exponentials, it can take a significant amount of computation
to calculate, and therefore is not ideal for every application. In many cases, using the
diffusion distance or selecting a few optimal dimensions of the diffusion map provides
a perfectly sufficient metric for analysis. But, in certain cases, such as automatic
processes where the time parameter t cannot be manually selected, or hierarchical
119
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 120
clustering where encoding all levels of the structure is desired, the diffusion time
constant provides a valuable extension to diffusion analysis.
6.1.2 Assumption-Free Music Analysis
All analyses performed in this work were accomplished with as few assumptions as
possible. In this way, we were actually able to derive music theory, rather than just
confirm that the music fits a set of preconceived notions. This type of analysis, partic-
ularly on the interval and meter level, is completely new and unique. Computational
music analysis has previously sought many elements of music like phrasing and theme,
but the approach presented here of building a computational understanding of music
theory from the ground up is novel.
The lack of assumptions is particularly significant, because previous computational
approaches often tried to incorporate musical knowledge, rather than exclude it. In
some ways, the inclusion of prior knowledge can be valuable, but the assumption-free
analysis shown here offers a new perspective on music. First of all, it allows for the
truly individual analysis of a single musical work. The trajectories shown in Section
5.4 are unique in this way, because not only is their movement through the diffusion
space based on the music, but the space itself is based only on the music. As a result,
the diffusion space is catered specifically to the relationships created in the music,
rather than trying to adapt the representation to fit a preconceived musical space.
On a higher level, the use of assumption-free analysis also demonstrates how
fundamental some of the mathematical relationships of music are. The diffusion
approach presented here does not even truly utilize the knowledge that the data is
music. Instead, it is only a series of objects connected by relationships defined by the
“data.” But, still many significant representations of music theory are created at a
fundamental level. This is a powerful realization about the structural importance of
music theory in the musical data. And, it can only be truly demonstrated by starting
without any assumptions at all.
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 121
6.1.3 Musical Visualizations
One of the most exciting and interesting contributions of this work is an easy and
conceptually reasonable process for creating musical trajectories for visualization.
As will be discussed in the Future Work section, there are many ways to extend
these visualizations for the future, but the work accomplished here is already quite
significant and novel.
By breaking symbolic music into a time series of sliding windows, the piece is
visualized as a pathway through a diffusion space created specifically for the musical
relationships. One easy application of this visualization would be for a multime-
dia musical experience, where the visual pathway is directly related to the auditory
experience. But, because the regions within the space are organized based on the
relationships within the music, the movement of the pathway within the space is
meaningful on an analytic level as well. Harmonic regions that are commonly asso-
ciated within an excerpt will be brought closer together while those that are only
rarely connected will be pushed further apart. In this way, a long distance distance
travelled by the pathway could relate to the perceptual cost of a rare harmonic shift.
The type of movement can be informative as well, representing temporal patterns.
Music has previously been visualized as pathways through a space, but the assumption-
free and automatic process that diffusion allows is a unique and interesting contribu-
tion to computational music analysis.
6.2 Future Work and Extensions
The work presented in this dissertation inspires the possibility of many future projects.
Through both improvements and extensions, there are many avenues to progress from
here.
6.2.1 Audio Signal-based Analysis
The most obvious extension is to perform the musical analyses presented in this
work on musical signals rather than symbolic music. The feasibility for this was
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 122
demonstrated for one trajectory in Section 5.4.4, but a more thorough examination
would be beneficial.
A particularly interesting effect of the move to the signal domain is briefly men-
tioned in Section 5.4.4, the effect of the harmonics created by resonant bodies. In
diffusion spaces derived from symbolic music, the spatial relationships are based ex-
clusively on the relationships in the music itself, either temporally or harmonically.
However, acoustic notes resonate at multiple frequencies simultaneously, the funda-
mental and several frequencies at multiples of the fundamental. The relative overlap
of these partials adds an extra relationship for the organization, in addition to the
musical relationships already examined in the symbolic context.
Expanding into the signal domain also allows for the organization of other types
of information, such as instrumentation or timbre. This would add yet another layer
of organization and analysis.
The signal domain also presents the opportunity for other applications for organi-
zation. For example, the first layer of the super graph in Section 5.4.4, in which the
tonal relationships between different spectral coefficients is established, suggests that
diffusion organization could potentially be used for note transcription. Other signal-
level classification tasks such as instrument identification could be solved similarly as
well.
6.2.2 Improved Visualization Platform
The visualizations presented in this work and on the web were all created in Matlab.
This environment is sufficient for the examinations here, but a more visually dynamic
and interactive platform would be an extremely valuable extension. Representing the
data points as dots in a static 2-D projection gives a good starting point, but it is
not hard to envision numerous possible improvements.
One such improvement would be to plot the trajectories as real trajectories instead
of a series of dots. The pathways would be more clearly marked in this context, and
it would more accurately represent the flow of the music, rather than the discretized
concept that the dots present. Identifying clear structures in the diffusion space, such
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 123
as cones or planes, and plotting them as these shapes rather than just the data points,
could be a valuable extension for making the visualizations more accessible as well.
The viewing perspective could also be changed in several ways. Interactive control
of the angle during an animation in particular would allow for the structures to be
more easily seen. An even more interesting possibility is for the user to move the
perspective within the diffusion space, creating the concept of flying through the
data. And, in the trajectory case, the perspective could even ride the trajectory
through time, as if on a roller coaster.
These extensions to the visualization would be a direct way to improve the us-
ability of diffusion-based music analysis and enhance the user experience of systems
derived from the analysis. As shown in this work, there is a great deal of poten-
tial for diffusion mapping as a musical visualization tool, for exploration, education,
and artistic experience, but the potential would be vastly enhanced with a visually
dynamic and fluidly interactive platform.
6.2.3 Comparison of Diffusion Spaces
Throughout this work, diffusion spaces are created from small sets of musical data.
Because the diffusion analysis is assumption-free, these spaces are based entirely on
their musical input, and so they are unique to those musical sets. Unfortunately, one
drawback to the musical representations existing in different spaces is that they cannot
be directly compared to each other. Because the spaces have different meanings and
orientations, they cannot simply be treated as the same space for comparison.
So, it would be a significant extension to develop a mechanism for comparing these
unique diffusion spaces and the trajectories or structures within them. This would be
valuable for several reasons. First, it would allow for a deeper understanding of the
diffusion spaces themselves, since they could be contextualized with each other. The
truly unique characteristics of a certain musical space could be more readily identified
based on its comparison to other spaces. This would also provide an interesting
metric for comparing the works themselves, possibly even extending to a database
organization.
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 124
The ability to compare diffusion spaces would also open up the possibility for
deeper uses of the super graph concept introduced in the context of the audio signal
visualization in Section 5.4.4. Layers of diffusion could be connected through this
extension, creating simultaneous organizations of multiple musical levels. Diffusion
maps could be created for every measure of a musical excerpt, which could then be
organized into a map for the entire work, which could then be compared to other
works to organize a corpus, and so on.
There are several approaches that could work for solving the problem. One way
would be to compare the relative orientations of some set of reference points common
to both graphs (such as the pitch classes). Another would be to look for similar
shapes in the musical trajectories in the diffusion space, an approach that has been
previously used in comparison of images. And there are countless other potential
methods. There are many intriguing possibilities that follow from the development of
a comparison between diffusion spaces, well worth the effort of developing a reliable
means of calculating the relationship.
6.2.4 Implications for Non-Tonal Western or Non-Western
Music
All examples presented in this work used tonal Western music or tonal Western mu-
sic theory for the musical data. However, as has been mentioned many times, the
diffusion methods presented here are completely assumption-free. There was no rea-
son why the examples needed to use only tonal Western music. The data was only
selected for familiarity and for a well-studied theoretical foundation.
It would be very interesting for future work to examine the visualizations and
geometries created by elements of other musical theories. Using data from music
designed around other tonalities or scales, or even from other sets of notes, would be
a very interesting variation.
It has also been suggested here that the geometric visualizations of elements of
music theory provide a more accessible means for understanding the relationships
implied by those elements. It would be very interesting to test this hypothesis on an
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 125
unknown musical framework and see if the deeper understanding found for Western
music was predicated on a prior understanding of the theory or if the enhancements
would be the same in a more exploratory environment in which the music theory is
not yet entirely understood by the user.
6.2.5 Inverting Diffusion Space to Audio
One particularly ambitious and interesting extension would be to develop a means for
inverting data in a diffusion space back into musical data, either symbolic or signal-
based. This would allow for interactive manipulation of music in diffusion space to
create a corresponding alteration of the music. This could even potentially lead to
sonification of non-musical data that had been organized into a diffusion space.
There are two possible fundamental ways to accomplish this. One is to try to
invert the diffusion process (assuming the original data was musical). The second is
to develop a sonification process from the diffusion space.
The first approach, inversion, is mathematically challenging, because the diffusion
process is not meant to preserve the original data in the organization, but rather the
relationships between the data. It is obviously easy to recreate the Markov matrix P
from the eigenvectors by using the summation in the definition of the eigendecomposi-
tion from Eq. (3.8). Additionally, the scaling of the Markov matrix from the affinity
matrix can be determined from the stationary distribution of the Markov matrix.
Unfortunately, it is not possible to recreate the original data from the affinity matrix
for the affinity functions used in this work, because they are rotation invariant. But,
if the orientation of the data is stored separately early in the process, this would allow
for perfect inversion. Unfortunately, this picture gets complicated when manipula-
tions of the data in diffusion space are included, and so a deeper examination of this
process is called for.
The second approach, sonification, is completely free-form and therefore can be
accomplished in essentially any way. But, the resulting musical output will not neces-
sarily be as meaningful as in the inversion approach, because it is not automatically
contextualized by the musical input. That is not to say that this approach is not
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 126
useful or potentially interesting (in fact, it is extremely intriguing), but it does sug-
gest that great care would need to be taken in designing an intuitive mapping for the
audio, a need common to most sonification applications.
6.2.6 Examination of Less Prominent Dimensions of Map
The majority of this work focuses on examining the most prominent dimensions of the
diffusion map, which is to say those dimensions that correspond to larger eigenvalues.
These are the dimensions where the more global aspects of the structure are visualized.
The analyses presented here focused more on the significant aspects of structure, and
so it made sense to mainly stick to the prominent dimensions.
But, a deeper examination is needed into the less prominent dimensions of the
diffusion maps. This is where the more local aspects of the structure can be found.
Also, if there are two data sets that are built on the same fundamental structure with
small variations between them, then the prominent dimensions will show a similar
shape (as was the case with the performance-like noise in Section 5.4.3) and the
structurally minor differences are found instead in the other dimensions. So, it would
be worth examining whether these dimensions can be examined and compared to
determine the nature of the minor differences while still recognizing the fundamental
similarity. If this sort of analysis could be achieved, it would be a very powerful tool.
6.2.7 Dual Diffusion
To this point, the diffusion process has involved a series of data points with a mea-
surable affinity between them. This affinity is calculated on some set of features that
represent the data points, and then the data points are organized based on those
affinities.
A dual diffusion approach can instead be used. In this process, the data points
are treated in the exact same way, with affinities calculated based on the features.
However, in the feature space, affinities are also calculated between the features them-
selves. Then, both of these dimensions can be separately organized.
So, in this approach, two organizations can be created. In one, the data points
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 127
are organized based on common patterns in the features. In the second and new
organization, the features are organized based on common patterns in the data points.
This second map gives a completely new way of looking at the data.
Extracting organizations based on both of these views can offer an opportunity to
extract a deeper and more meaningful organization. An organization that describes
both dimensions, instead of only one, would give a more fundamental description
of the data. Such a method should also be more robust to noise and labeling er-
rors through the enhanced and complimentary understanding from the two different
dimensions of analysis.
Looking at data from multiple angles, as is suggested here, is not specific to
diffusion. In fact, any organizational method could likely be extended to include this
approach. However, diffusion mapping is particularly ideal for the process.
First of all, this dual diffusion suggestion adds an important insight to unsu-
pervised data organization, of which diffusion mapping is an example. The deepest
shortcoming of unsupervised organization that it gives no insight into why that orga-
nization exists or what criteria defines each cluster. Instead, this typically needs to be
extracted analytically. The dual approach solves this problem with the corresponding
organization in the features, which gives a set of clusters of its own that correspond
to groups of data points and therefore another means for understanding the meaning
of the structure.
Also, the hierarchical nature of diffusion space gives a valuable ranking of the
organization, so, when the organizations need to be adjusted or filtered in order to
find a common organization to both the data and the features, more fundamental
levels of the organization can still be preserved.
Finally, the distribution-free nature of diffusion mapping is useful, because it is
not ensured that viewing the set in terms of the features will have the same type of
distribution as viewing it in terms of the data points. With diffusion, this is not a
concern.
Applying this approach to musical database organization, in particular, could be
extremely innovative and valuable, because it would allow a user to not only extract
(and potentially modify) an organization but also to understand the common traits
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 128
in that organization, adding an extra element to the user’s interactivity with the
database. This would be especially ideal for musical discovery.
6.3 Concluding Remarks
This dissertation used diffusion mapping to build fundamental elements of Western
music theory from the ground up without any prior assumptions built into the sys-
tem. Despite the lack of musical knowledge, the first dimensions of the diffusion
map create geometric representations of musical elements that give another means of
understanding the musical relationships in Western music theory.
Extending this concept led to higher-level organizations based on key or meter,
and eventually led to the representation of musical excerpts as trajectories winding
through a unique diffusion space.
The work presented here attempts to provide a thorough foundation for diffusion-
based music analysis. However, it still only represents a beginning. The potential
for the non-linear analysis of music for geometric understanding and exploration is
far too vast for a complete examination in only one dissertation. The future work
suggested here offers many possible directions for the next research steps, though
these are likely only a small sample of the possible directions for diffusion research in
computational music analysis.
There are many more analytic and artistic experiments that this work shows need
to be done, both for the field of music analysis and for the field of diffusion analysis.
The depth of analysis combined with the beauty of the visual space make this work
far too intriguing to end here. Hopefully this work will inspire others to join me
in driving diffusion-based music analysis further forward and expand our concepts
of theoretically grounded music analysis, interactive multimedia musical education,
and the interaction of audio and visual representations for a truly unique artistic
experience.
Bibliography
[1] Bret Aarden and David Huron. Mapping European Folksong: Geographical
Localization of Musical Features. Computing in Musicology, 12:169–83, 1999-
2000.
[2] Bret J. Aarden. Dynamic Melodic Expectancy. PhD thesis, The Ohio State
University, 2003.
[3] Christopher T. H. Baker. The Numerical Treatment of Integral Equations.
Clarendon Press, Oxford, 1977.
[4] Roberto Basili, Alfredo Serafini, and Armando Stellato. Classification of Musi-
cal Genre: A Machine Learning Approach. In Proceedings of the International
Conference on Music Information Retrieval, 2004.
[5] Juan Pablo Bello. Grouping Recorded Music by Structural Similarity. In Pro-
ceedings of ISMIR-09, 2009.
[6] Tony Bergstrom, Karrie Karahalios, and John C. Hart. Isochords: Visualizing
Structure in Music. In Proceedings of Graphics Interface 2007, volume 234, pages
297–304, 2007.
[7] Helen Brown, David Butler, and Mari Riess Jones. Musical and Temporal Influ-
ences on Key Discovery. Music Perception, 11(4):371–407, 1994.
[8] Judith C. Brown. Determination of the meter of musical scores by autocor-
relation. Journal of the Acoustical Society of America, 94(4):1953–7, October
1993.
129
BIBLIOGRAPHY 130
[9] David Butler. Describing the Perception of Tonality in Music: A Critique of
the Tonal Hierarchy Theory and a Proposal for a Theory of Intervallic Rivalry.
Music Perception, 1989.
[10] Clifton Callendar, Ian Quinn, and Dmitri Tymoczko. Generalized voice-leading
spaces. Science, 320:346–348, April 2008.
[11] Chris Chafe, Bernard Mont-Reynaud, and Loren Rush. Toward an Intelligent
Editor of Digital Audio: Recognition of Musical Constructs. Computer Music
Journal, 6(1):30–41, 1982.
[12] Wei Chai and Barry Vercoe. Folk Music Classification Using Hidden Markov
Models. In Proceedings of the International Conference on Artificial Intelligence,
2001.
[13] Elaine Chew. Towards a Mathematical Model of Tonality. PhD thesis, Mas-
sachusetts Institute of Technology, February 2000.
[14] Elaine Chew. The Spiral Array: An Algorithm For Determining Key Bound-
aries. In Proceedings of the International Conference on Music and Artificial
Intelligence, 2002.
[15] Richard Cohn. Neo-Riemannian Operations, Parsimonious Trichords, and Their
“Tonnetz” Representations. Journal of Music Theory, 41(1):1–66, 1997.
[16] Richard Cohn. Introduction to Neo-Riemannian Theory: A Survey and a His-
torical Perspective. Journal of Music Theory, 42(2):167–80, 1998.
[17] Ronald R. Coifman and Stephane Lafon. Diffusion Maps. Applied and Compu-
tational Harmonic Analysis, 21(1):5–30, July 2006.
[18] Ronald R. Coifman and Stephane Lafon. Geometric Harmonics: a Novel Tool
for Multiscale Out-of-sample Extension of Empirical Functions. Applied and
Computational Harmonic Analysis, 2006.
BIBLIOGRAPHY 131
[19] Ronald R. Coifman and Mauro Maggioni. Diffusion wavelets. Applied and Com-
putational Harmonic Analysis, 21(1):53–94, 2006.
[20] Ronald R. Coifman, Mauro Maggioni, Steven W. Zucker, and Ioannis G.
Kevrekidis. Geometric diffusions for the analysis of data from sensor networks.
Current Opinion in Neurobiology, 15:576–584, 2005.
[21] Roger B. Dannenberg, Belinda Thom, and David Watson. A Machine Learn-
ing Approach to Musical Style Recognition. In Proceedings of the International
Computer Music Conference, 1997.
[22] Sanjoy Dasgupta and Daniel Hsu. Hierarchical Sampling for Active Learning. In
Proceedings of the 25th International Conference on Machine Learning, 2008.
[23] Ramon Lopez de Mantaras and Josep Lluis Arcos. AI and Music from Compo-
sition to Expressive Performance. AI Magazine, 23(3):43–58, 2002.
[24] Diana Deutsch. Two Issues Concerning Tonal Hierarchies: Comment on Castel-
lano, Bharucha, and Krumhansl. Journal of Experimental Psychology: General,
113(3):413–6, 1984.
[25] Diana Deutsch. The Processing of Pitch Combinations. In The Psychology of
Music, pages 349–411. Academic Press, 1999.
[26] Simon Dixon. Automatic Extraction of Tempo and Beat from Expressive Per-
formances. Journal of New Music Research, 2001.
[27] Leonhard Euler. Tentatum Novae Theoriae Musicae. Saint Petersburg Academy,
1739.
[28] Jonathan Foote and Matthew Cooper. Visualizing Musical Structure and
Rhythm via Self-Similarity. In Proceedings of the International Computer Music
Conference, 2001.
[29] Edward Gollin. Some Aspects of Three-Dimensional “Tonnetze”. Journal of
Music Theory, 42(2):195–206, 1998.
BIBLIOGRAPHY 132
[30] Emilia Gomez and Jordi Bonada. Tonality Visualization of Polyphonic Audio.
In Proceedings of the International Computer Music Conference, 2005.
[31] Masataka Goto. An audio-based real-time beat tracking system for music with
or without drum-sounds. Journal of New Music Research, 30(2):159–171, 2001.
[32] Erin E. Hannon, Joel S. Snyder, Tuomas Eerola, and Carol L. Krumhansl. The
Role of Melodic and Temporal Cues in Perceiving Musical Meter. Journal of
Experimental Psychology: Human Perception and Performance, 30(5):956–74,
2004.
[33] Julian Hook. Exploring Musical Space. Science, 313, 2006.
[34] Andrew Horner and David E. Goldberg. Genetic Algorithms and Computer-
Assisted Music Composition. Technical report, Center for Complex Systems
Research, University of Illinois at Urbana-Champaign, 1991.
[35] Diane J. Hu and Lawrence K. Saul. A Probabilistic Topic Model for Unsupervised
Learning of Musical Key-Profiles. In Proceedings of the International Conference
on Music Information Retrieval, 2009.
[36] David Huron and Richard Parncutt. An Improved Model of Tonality Perception
Incorporating Pitch Salience and Echoic Memory. Psychomusicology, 12:154–171,
1993.
[37] Brian Hyer. Reimag(in)ing Riemann. Journal of Music Theory, 39(1):101–38,
1995.
[38] Ozgur Izmirli. Tonal-Atonal Classification of Music Audio Using Diffusion Maps.
In Proceedings of ISMIR-09, pages 687–691, 2009.
[39] Anil K. Jain, M.N. Murty, and P.J. Flynn. Data Clustering: A Review. ACM
Computing Surveys, 31(3), 1999.
[40] Petr Janata, Jeffrey L. Birk, John D. Van Horn, Marc Leman, Barbara Till-
mann, and Jamshed J. Bharucha. The Cortical Topography of Tonal Structures
Underlying Western Music. Science, 2002.
BIBLIOGRAPHY 133
[41] Zoltan Juhasz. Motive Identification in 22 Folksong Corpora Using Dynamic
Time Warping and Self Organizing Maps. In Proceedings of ISMIR-09, pages
171–6, 2009.
[42] Anssi P. Klapuri, Antti J. Eronen, and Jaakko T. Astola. Analysis of the Meter
of Acoustic Musical Signals. IEEE Transactions on Speech and Audio Processing,
2004.
[43] Carol L. Krumhansl. Cognitive Foundations of Musical Pitch. Oxford University
Press, New York, 1990.
[44] Carol L. Krumhansl. Tonal Hierarchies and Rare Intervals in Music Cognition.
Music Perception, 7(3):309–24, 1990.
[45] Carol L. Krumhansl. The Geometry of Musical Structure: A Brief Introduction
and History. Computers in Entertainment, 3(4):1–14, October 2005.
[46] Carol L. Krumhansl and Edward J. Kessler. Tracing the Dynamic Changes
in Perceived Tonal Organization in a Spatial Representation of Musical Keys.
Psychological Review, 89(4):334–368, 1982.
[47] Edward W. Large and John F. Kolen. Resonance and the Perception of Musical
Meter. Connection Science, 1994.
[48] Kyogu Lee and Malcolm Slaney. A unified system for chord transcription and
key extraction using hidden markov models. In Proceedings of the Eighth Inter-
national Conference on Music Information Retrieval, 2007.
[49] Fred Lerdahl and Ray Jackendoff. A Generative Theory of Tonal Music. MIT
Press, Cambridge, MA, 1983.
[50] David Lewin. Generalized Musical Intervals and Transformations. Yale Univer-
sity Press, New Haven, 1987.
[51] Hugh Christopher Longuet-Higgins. Review Lecture: The Perception of Music.
In Proceedings of the Royal Society of London. Series B, Biological Sciences,
1979.
BIBLIOGRAPHY 134
[52] Hugh Christopher Longuet-Higgins and Mark Steedman. On interpreting Bach.
Machine Intelligence, 6:221–41, 1971.
[53] Stephen Malinowski. Music Animation Machine. Online at
http://www.musanim.com.
[54] Arpi Mardirossian and Elaine Chew. Visualizing music: Tonal progressions and
distributions. In Proceedings of the Eighth International Conference on Music
Information Retrieval, 2007.
[55] Rie Matsunaga and Jun-Ichi Abe. Cues for Key Perception of a Melody: Pitch
Set Alone? Music Perception, 23(2):153–64, 2005.
[56] Panayotis Mavromatis. A Hidden Markov Model of Melody Production in Greek
Church Chant. Computing in Musicology, 14, 2005.
[57] Arthur Oettingen. Hamoniesystem in dualer Entwicklung. Dorpat und Leipzig,
1866.
[58] George Papadopoulos and Geraint Wiggins. AI Methods for Algorithmic Com-
position: A Survey, a Critical View and Future Prospects. In AISB Symposium
on Musical Creativity, 1999.
[59] Richard Parncutt. A Perceptual Model of Pulse Salience and Metrical Accent in
Musical Rhythms. Music Perception, 11(4):409–64, 1994.
[60] David Rizo, Jose M. Inesta, and Francisco Moreno-Seco. Tree-structured rep-
resentation of musical information. Pattern Recognition and Image Analysis,
2003.
[61] David Rizo, Jose M. Inesta, and Pedro J. Ponce de Leon. Tree Model of Symbolic
Music for Tonality Guessing. In Proceedings of the 24th IASTED International
Conference on Artificial Intelligence and Applications, pages 299–304, 2006.
[62] Craig Sapp. Tonal Landscape Gallery. Online at
http://ccrma.stanford.edu/∼sapp/keyscape/.
BIBLIOGRAPHY 135
[63] Craig Stuart Sapp. Harmonic Visualizations of Tonal Music. In Proceedings of
the International Computer Music Conference, 2001.
[64] Craig Stuart Sapp. Visual Hierarchal Key Analysis. Computers in Entertain-
ment, 3(4), 2005.
[65] Helmut Schaffrath. The Essen Folksong Colletion in Kern Format. [computer
database], Center for Computer Assisted Research in the Humanities, Menlo
Park, CA, 1995.
[66] W. Andrew Schloss. On the Automatic Transcription of Percussive Music –
From Acoustic Signal to High-Level Analysis. PhD thesis, Stanford University,
1985.
[67] Arnold Schoenberg. Structural functions of harmony. Norton, New York, 1969.
[68] Gregory Sell. Musical Diffusion Trajectories. Online at
http://ccrma.stanford.edu/∼gsell/Diffusion/MusicalTrajectories/, 2010.
[69] Roger N. Shepard. Geometrical approximations to the structure of musical pitch.
Psychological Review, 89:305–33, 1982.
[70] Ilya Shmulevich and Olli Yli-Harja. Localized Key-Finding: Algorithms and
Applications. Music Perception, 17(4), 2000.
[71] Amit Singer and Ronald R. Coifman. Non linear independent component analysis
of ito processes. Unpublished from Yale.
[72] Nicholas A. Smith and Mark A. Schmuckler. The Perception of Tonal Structure
Through the Differentiation and Organization of Pitches. Journal of Experimen-
tal Psychology: Human Perception and Performance, 30(2):168–86, 2004.
[73] Efstathios Stamatatos and Gerhard Widmer. Automatic identification of music
performers with learning ensembles. Artificial Intelligence, 165:37–56, 2005.
BIBLIOGRAPHY 136
[74] Martin Szummer and Tommi Jaakkola. Partially labeled classification with
Markov random walks. Advances in Neural Information Processing Systems,
2002.
[75] Annie H. Takeuchi. Maximum key-profile correlation (mkc) as a measure of tonal
structure in music. Perception & Psychophysics, 56(3):335–46, 1994.
[76] David Temperley. What’s Key for Key? The Krumhansl-Schmuckler Key-
Finding Algorithm Reconsidered. Music Perception, 17(1):65–100, 1999.
[77] David Temperley. A Bayesian Approach to Key-Finding. Lecture Notes in Com-
puter Science, 2445:195–206, 2002.
[78] David Temperley. The Tonal Properties of Pitch-Class Sets: Tonal Implications,
Tonal Ambiguity, and Tonalness. In Eleanor Selfridge-Field and Walter Hewlett,
editors, Tonal Theory for the Digital Age. Computing in Musicology, 2008.
[79] David Temperley and Elizabeth West Marvin. Pitch-Class Distribution and the
Identification of Key. Music Perception, 25(3):193–212, 2008.
[80] Naftali Tishby and Noam Slonim. Data clustering by markovian relaxation and
the Information Bottleneck Method. In NIPS, volume 13, 2000.
[81] Petri Toiviainen and Tuomas Eerola. The Role of Accent Periodicities in Meter
Induction: A Classification Study. In Proceedings of the International Conference
on Music Perception and Cognition, 2004.
[82] Petri Toiviainen and Tuomas Eerola. Autocorrelation in meter induction: The
role of accent structure. Journal of the Acoustical Society of America, 119(2),
2006.
[83] Petri Toiviainen and Carol L. Krumhansl. Measuring and modeling real-time
responses to music: The dynamics of tonality induction. Perception, 32, 2003.
[84] Dmitri Tymoczko. The Geometry of Musical Chords. Science, 313(72), 2006.
BIBLIOGRAPHY 137
[85] George Tzanetakis, Andrey Ermolinskyi, and Perry Cook. Pitch Histograms in
Audio and Symbolic Music Information Retrieval. In Proceedings of the Inter-
national Conference on Music Information Retrieval, 2002.
[86] Piet G. Vos, Arjan van Dijk, and Lambert Schomaker. Melodic cues for metre.
Perception, 23(8):965–76, 1994.
[87] Martin Wattenberg. The Shape of Song. Online at
http://www.turbulence.org/Works/song/index.html.
[88] Robert J. West and Roz Fryer. Ratings of Suitability of Probe Tones after
Random Ordersing of Notes of the Diatonic Scales. Music Perception, 7(3):253–
8, 1990.
[89] Christopher K. I. Williams and Matthias Seeger. Using the Nystrom Method to
Speed Up Kernel Machines. Advances in Neural Information Processing Systems,
13, 2001.
[90] Tjalling J. Ypma. Historical Development of the Newton-Raphson Method.
SIAM Review, 37(4):531–51, 1995.