Introduction to Music Informatics: I548/N560, Spring 2011
description
Transcript of Introduction to Music Informatics: I548/N560, Spring 2011
![Page 1: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/1.jpg)
Introduction to Music Informatics: I548/N560, Spring
2011Instructor: Eric Nichols
http://tinyurl.com/Info548
![Page 2: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/2.jpg)
OverviewTues, Feb 15
HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI
alignment Symbolic Representations Reading: Dannenberg
![Page 3: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/3.jpg)
Polyphonic Audio Matching and
Alignment Ning Hu, Roger B. Dannenberg and George
Tzanetakis Goal: align polyphonic audio to a symbolic
score Does not perform transcription Used to search MIDI databases for a match
to a given audio recording
![Page 4: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/4.jpg)
Motivation Query by Humming is an important
problem, and it uses a symbolic database. Why is symbolic better than audio matching
for this problem? Possible solution: do polyphonic
transcription on the query. Then find best match. However, transcription is hard.
![Page 5: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/5.jpg)
Idea Instead of transcription of the query,
convert the symbolic database into audio! Instead of using an entire spectrum,
convert to a chroma vector. Do dynamic time warping (DTW) on audio
to look for matches.
![Page 6: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/6.jpg)
Chroma Vector For each bin in the FFT
Assign the bin to the nearest half-step Remove octave information For each pitch class (1-12), average the value
of its associated bins. For this paper: 0.25 seconds of audio per
chroma vector. Nonoverlapping windows. Computing pitch from MIDI and vice versa
freq = 440 * 2^((MIDI-69) / 12.0) MIDI = 69 + 12*log(freq/440.0) / log(2)
![Page 7: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/7.jpg)
Chroma Vectors
![Page 8: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/8.jpg)
Why chroma? Not super-sensitive to spectral distribution –
ignores many details of timbre by collapsing everything into one octave
Mostly is sensitive to fundamental pitches and chords
![Page 9: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/9.jpg)
Converting MIDI to chroma
Two possibilities: Render the MIDI with a synthesizer, and then
compute the FFT and then the chroma vector. Go directly from MIDI to chroma with a
theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.)
One difficulty: dealing with percussive sounds
![Page 10: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/10.jpg)
Chroma Similarity Now we have lists of chroma vectors for an
audio query and for a database of MIDI files Normalize all vectors to have mean 0 and
variance 1 This helps reduce differences in vectors due
to absolute loudness Compute the Euclidean distance between
vectors (0 distance = perfect match) Compute the entire similarity matrix
between vector pairs.
![Page 11: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/11.jpg)
Similarity MatrixDark = highly similar
Black diagonal = matching path
Note start, end, and length disparity
![Page 12: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/12.jpg)
DTW computation
![Page 13: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/13.jpg)
![Page 14: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/14.jpg)
Results: 10 Beatles songs
![Page 15: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/15.jpg)
Results 2
![Page 16: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/16.jpg)
Results 3
![Page 17: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/17.jpg)
Conclusion More sophisticated DTW could be used to
speed up the search Gives an example of linking symbolic and
audio domains
![Page 18: Introduction to Music Informatics: I548/N560, Spring 2011](https://reader036.fdocuments.us/reader036/viewer/2022081505/568165a3550346895dd88375/html5/thumbnails/18.jpg)
Discussion What elements/features of music should we
represent? Can we create a “dream” representation?