Music Information Retrieval for Jazz

29
MIR for Jazz - Dan Ellis 2012-11-15 /29 1 1. Music Information Retrieval 2. Automatic Tagging 3. Musical Content 4. Future Work Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

Transcript of Music Information Retrieval for Jazz

Page 1: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /291

1. Music Information Retrieval2. Automatic Tagging3. Musical Content4. Future Work

Music Information Retrievalfor Jazz

Dan EllisLaboratory for Recognition and Organization of Speech and Audio

Dept. Electrical Eng., Columbia Univ., NY USA

{dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/

Page 2: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Machine Listening

2

• Extracting useful information from sound... like (we) animals do

Dectect

Classify

Describe

EnvironmentalSound

Speech Music

Task

Domain

ASR MusicTranscription

MusicRecommendation

VAD Speech/Music

Emotion

“Sound Intelligence”

EnvironmentAwareness

AutomaticNarration

Page 3: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

1. The Problem

• We have a lot of music Can computers help?

• Applicationsarchive organization ◦ music recommendationmusicological insight?

3

Page 4: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Music Information Retrieval (MIR)• Small field that has grown since ~2000

musicologists, engineers, librarianssignificant commercial interest

• MIR as musical analog of text IRfind stuff in large archives

• Popular tasksgenre classificationchord, melody, full transcriptionmusic recommendation

• Annual evaluations“Standard” test corpora - pop

4

Page 5: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

2. Automatic Tagging• Statistical Pattern Recognition:

Finding matches to training examples

• Need:Feature design ◦ Labeled training examples

• ApplicationsGenre ◦ Instrumentation ◦ Artist ◦ Studio ...

5

signal

segment

feature vector

class

Pre-processing/segmentation

Feature extraction

Classification

Post-processing

Sensor

200 400 600 800 1000 1200600

800

1000

1200

1400

1600

Page 6: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Features: MFCC• Mel-Frequency Cepstral Coefficients

the standard features from speech recognition

6

0.25 0.255 0.26 0.265 0.27 time / s−0.5

0

0.5

0 1000 2000 3000 freq / Hz05

10

0 5 10 15 freq / Mel012

x 104

0 5 10 15 freq / Mel

050

100

0 10 20 30 quefrency−200

0

200

FFT X[k]

Mel scalefreq. warp

log |X[k]|

IFFT

Truncate

MFCCs

Sound

spectra

audspec

cepstra

Page 7: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Representing Audio• MFCCs are short-time features (25 ms)• Sound is a

“trajectory” in MFCC space

• Describe whole track by its statistics

7

VTS_04_0001 - Spectrogram

freq

/ kHz

1 2 3 4 5 6 7 8 9012345678

-20-100102030

time / sec

time / seclevel / dB

value

MFC

C bin

1 2 3 4 5 6 7 8 92468

101214161820

-20-15-10-505101520

MFCC dimensionM

FCC

dimen

sion

MFCC covariance

5 10 15 20

2468

101214161820

-50

0

50

Audio

MFCCfeatures

MFCCCovariance

Matrix

Page 8: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

MFCCs for Music• Can resynthesize MFCCs by shaping noise

gives an idea about the information retained

8

freq

/ Hz

mel

quef

renc

y

203418860

17703644

2468

1012

time / s

freq

/ Hz

0 10 20 30 40 50 60203418860

17703644

Freddy Freeloader

Original

MFCC

sResynthesis

Page 9: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Ground Truth• MajorMiner: Free-text tags for 10s clips

400 users, 7500 unique tags, 70,000 taggings

• Example: drum, bass, piano, jazz, slow, instrumental, saxophone, soft, quiet, club, ballad, smooth, soulful, easy_listening, swing, improvisation, 60s, cool, light

9

Mandel & Ellis ’08

Page 10: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Classification

• MFCC features+ human ground truth+ standard machine learning tools

10

MFCC(20 dims)

AveragePrec.Sound

Groundtruth

Chop into10 sec blocks ∆

∆2

Mean One-vs-allSVM

Standardizeacrosstrainset

µ(60 dims)

C, γ

CovarianceΣ (399

samples)

Page 11: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Classification Results• Classifiers trained from top 50 tags

11

40 80 120 160 200 240 280 320drumguitarmalesynthrockelectronicpopvocalbassfemaledancetechnopianojazzhip_hoprapslowbeatvoice80selectronicainstrumentalfastsaxophonekeyboardcountrydrum_machinedistortionbritishambientsoftfunkr_balternativehouseindiestringssolonoisequietsilencesamplespunkhornssingingdrum_bassendtranceclub_90s

320−2

−1.5

−1

−0.5

0

0.5

1

1.5

01 Soul Eyes

50 100 150 200 250 300

time / s

freq

/ Hz

135240427761

13562416

Page 12: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

3. Musical Content• MFCCs (and speech recognizers)

don’t respect pitch

pitch is importantvisible in spectrogram

• Pitch-related tasksnote transcriptionchord transcriptionmatching by musical content (“cover songs”)

12

time / s

freq

/ Hz

46 48 50 52 54

203

418

860

1770

3644

Page 13: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Note Transcription

13

Poliner & Ellis ‘05,’06,’07

Classification:

•N-binary SVMs (one for ea. note).

•Independent frame-level

classification on 10 ms grid.

•Dist. to class bndy as posterior.

classification posteriors

Temporal Smoothing:

•Two state (on/off) independent

HMM for ea. note. Parameters

learned from training data.

•Find Viterbi sequence for ea. note.

hmm smoothing

Training data and features:

•MIDI, multi-track recordings, playback piano, & resampled audio

(less than 28 mins of train audio). •Normalized magnitude STFT.

feature representation feature vector

Page 14: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Polyphonic Transcription• Real music excerpts + ground truth

14

MIREX 2007

Frame-level transcriptionEstimate the fundamental frequency of all notes present on a 10 ms grid

Note-level transcriptionGroup frame-level predictions into note-level transcriptions by estimating onset/offset

0

0.25

0.50

0.75

1.00

1.25

Precision Recall Acc Etot Esubs Emiss Efa

0

0.25

0.50

0.75

1.00

1.25

Precision Recall Ave. F-measure Ave. Overlap

Page 15: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chroma Features• Idea: Project onto 12 semitones

regardless of octavemaintains main “musical” distinctioninvariant to musical equivalenceno need to worry about harmonics?

W(k) is weighting, B(b) selects every ~ mod1215

C(b) =NM�

k=0

B(12 log2(k/k0)� b)W (k)|X[k]|

50 100 150 fft bin 2 4 6 8 time / sec 50 100 150 200 250 time / frame

freq

/ kHz

01234

chro

ma

A

CD

FG

chro

ma

A

CD

FG

Fujishima 1999

War

ren

et a

l. 200

3

Page 16: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chroma Resynthesis• Chroma describes the notes in an octave

... but not the octave

• Can resynthesize by presenting all octaves... with a smooth envelope“Shepard tones” - octave is ambiguous

endless sequence illusion

16

0 500 1000 1500 2000 2500 freq / Hz-60-50-40-30-20-10

0

2 4 6 8 10 time / sec

freq

/ kH

z

leve

l / d

B

0

1

2

3

4Shepard tone resynth12 Shepard tone spectra

yb(t) =M�

o=1

W (o +b

12) cos 2o+ b

12 w0t

Ellis & Poliner 2007

Page 17: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chroma Example• Simple Shepard tone resynthesis

can also reimpose broad spectrum from MFCCs

17

freq

/ Hz

203418860

17703644

203418

CDEGA

time / s

Freddie

freq

/ Hz

chro

ma

0 10 20 30 40 50 60203418860

17703644

Page 18: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Onset detection• Simplest thing is

energy envelope

18

e(n0) =W/2�

n=�W/2

w[n] |x(n + n0)|2

time / sec

freq

/ kHz

level

/ dB

level

/ dB

0 2 4 6 8 10

0

2

4

6

8

0102030405060

0102030405060

time / sec0 2 4 6 8 10

Harnoncourt Maracatu

Bello et al. 2005

emphasis on high frequencies?

f

f · |X(f, t)|

f

|X(f, t)|

Page 19: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Tempo Estimation• Beat tracking (may) need global tempo period τ

otherwise problem is not “optimal substructure”

19

8 8.5 9 9.5 10 10.5 11 11.5 12-2

0

2

4

0

200

400

0 0.5 1 1.5 2 2.5 3 3.5 4lag / s

time / s

-100

0

100

200

Onset Strength Envelope (part)

Raw Autocorrelation

Windowed Autocorrelation

Primary Tempo PeriodSecondary Tempo Period

• Pick peak in onset envelope autocorrelationafter applying “human preference” windowcheck for subbeat

Page 20: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Beat Tracking by Dynamic Programming• To optimize

define C*(t) as best score up to time t then build up recursively (with traceback P(t))

final beat sequence {ti} is best C* + back-trace

20

C({ti}) =N�

i=1

O(ti) + �N�

i=2

F (ti � ti�1, �p)

C*(t) = O(t) + max{αF(t – τ, τp) + C*(τ)}τ

P(t) = argmax{αF(t – τ, τp) + C*(τ)}τ

O(t)

C*(t)

Page 21: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Beat Tracking Results• Prefers drums & steady tempo

21

time / s0 5 10 15

Soul Eyes

20 25

0 100 200 300 400 500 600 700 800 900period / 4 ms samp

Page 22: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Beat-Synchronous Chroma• Record one chroma vector per beat

compact representation of harmonies

22

freq

/ Hz

203418860

17703644

20 40 60 80

Freddy

100 120 time / beatCDEGAB

203418860

17703644

time / s0 10 20 30 40 50 60203418860

17703644

Beat-sync

chroma

Resynth

Resynth

+MFCC

Page 23: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chord Recognition• Beat synchronous chroma look like chords

can we transcribe them?

• Two approachesmanual templates (prior knowledge)learned models (from training data)

23

ACDEG

chro

ma

bin

time / sec0 5 10 15 20

C-E

-G

B-D

-G

A-C

-E

A-C

-D-F ...

Page 24: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chord Recognition System• Analogous to speech recognition

Gaussian models of features for each chordHidden Markov Models for chord transitions

24

Audio

Labels

Beat track

Resample

Chroma100-1600 HzBPF

Chroma25-400 HzBPF

Root normalize

HMMViterbi

Counttransitions

Gaussian Unnormalize

beat-synchronouschroma features

chordlabels

24x24transition

matrix

24Gaussmodels

traintest

C D E G A BCDE

GAB

CDE

GAB

C D E G A B

C maj

c min

C D E F G A B c d e f g a bC

D

EF

G

A

Bc

d

ef

g

a

b

Sheh & Ellis 2003

Page 25: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chord Recognition• Often works:

• But only about 60% of the time

25

freq

/ Hz

Let It Be/06-Let It Be

C G A:min

A:mi

n/b7

F:ma

j7F:

maj6

C G F C C G A:min

A:mi

n/b7

F:ma

j7

240

761

2416

0 2 4 6 8 10 12 14 16 18

ABCDEFG

C G a F C G F C G a

Groundtruthchord

Audio

Recognized

Beat-synchronous

chroma

Page 26: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /2926

• Chord model centers (means) indicate chord ‘templates’:

What did the models learn?

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4PCP_ROT family model means (train18)

DIMDOM7MAJMINMIN7

C D E F G A B C (for C-root chords)

Page 27: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Chords for Jazz• How many types?

27

time / sec

freq

/ Hz

Freddy − logf sgram

0 10 20 30 40 50 60

240

761

2416

Freddy − beat−sync chroma

chro

ma

CC#DD#E

FF#GG#AA#B

CC#

Freddy − chord likelihoods + Viterbi path

chor

ds

CDEF#G#A#cde

f#g#a#

CDCCFreddy − chord−based chroma reconstruction

chro

ma

time / beats20 40 60 80 100 120CC#D

D#EFF#G

G#AA#B

Page 28: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Future Work• Matching items

cover songs / standardssimilar instruments, styles

• Analyzing musical contentsolo transcription & modelingmusical structure

• And so much more...

28

Between the Bars − Elliot Smith − pwr .25

120 130 140 150 160 1702468

1012

Between the Bars − Glenn Phillips − pwr .25 − −17 beats − trsp 2

120 130 140 150 160 1702468

1012

pointwise product (sum(12x300) = 19.34)

time / beats

chro

ma

bin

120 130 140 150 160 1702468

1012

Page 29: Music Information Retrieval for Jazz

MIR for Jazz - Dan Ellis 2012-11-15 /29

Summary• Finding Musical Similarity at Large Scale

29

Musicaudio

Tempoand beat

Low-levelfeatures Classification

and Similarity

MusicStructureDiscovery

Melodyand notes

Keyand chords

browsingdiscoveryproduction

modelinggenerationcuriosity