Lecture 10: Music Analysis - Columbia University€¦ · Lecture 10: Music Analysis Music...

40
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 1 EE E6820: Speech & Audio Processing & Recognition Lecture 10: Music Analysis Music Transcription Music Summarization Music Information Retrieval Music Similarity Browsing Dan Ellis <[email protected]> http://www .ee .columbia.edu/~dpw e/e6820/ Columbia University Dept. of Electrical Engineering Spring 2006 1 2 3 4

Transcript of Lecture 10: Music Analysis - Columbia University€¦ · Lecture 10: Music Analysis Music...

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 1

EE E6820: Speech & Audio Processing & Recognition

Lecture 10:Music Analysis

Music Transcription

Music Summarization

Music Information Retrieval

Music Similarity Browsing

Dan Ellis <[email protected]>http://www.ee.columbia.edu/~dpwe/e6820/

Columbia University Dept. of Electrical EngineeringSpring 2006

1

2

3

4

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 2

Music Transcription

• Basic idea: Recover the score

• Is it possible? Why is it hard?

- music students do it... but they are highly trained; know the rules

• Motivations

- for study: what was played?- highly compressed representation (e.g. MIDI)- the ultimate restoration system...

1

Time

Fre

quen

cy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

3000

4000

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 3

Transcription framework

• Recover discrete events to explain signal

- analysis-by-synthesis?

• Exhaustive search?

- would be possible given exact note waveforms - .. or just a 2-dimensional ‘note’ template?

but superposition is not linear in

|

STFT

|

space

• Inference depends on all detected notes

- is this evidence ‘available’ or ‘used’?- full solution is exponentially complex

Note events{tk, pk, ik}

ObservationsX[k,n]synthesis

?

notetemplate

2-Dconvolution

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 4

Problems for transcription

• Music is practically worst case!

- note events are often synchronized

defeats common onset- notes have harmonic relations (2:3 etc.)

collision/interference between harmonics- variety of instruments, techniques, ...

• Listeners are very sensitive to certain errors

- .. and impervious to others

• Apply further constraints

- like our ‘music student’- maybe even the whole score (Scheirer)!

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 5

Spectrogram Modeling

• Sinusoid model

- as with synthesis, but signal is more complex

• Break tracks

- need to detect new ‘onset’ at single frequencies

• Group by onset & common harmonicity

- find sets of tracks that start around the same time

- + stable harmonic pattern

• Pass on to constraint-based filtering...

time / s

freq

/ H

z

0 1 2 3 40

500

1000

1500

2000

2500

3000

0 0.5 1 1.5 time / s0

0.020.040.06

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 6

Searching for multiple pitches

(Klapuri 2001)

• At each frame:

- estimate dominant

f

0

by checking for harmonics

- cancel it from spectrum- repeat until no

f

0

is prominent

audioframe

Harmonics enhancement Predominant f0 estimation

-20

0

20

40

60

0 1000 2000 3000 4000frq / Hz

leve

l / d

B

-10

0

10

20 0 1000 2000 3000 frq / Hz0

10

dB

0 100 200 300 f0 / Hz0

f0 spectral smoothing

Stop when no more prominent f0s

Subtract& iterate

0 1000 2000 3000 frq/Hz0

10

20

30

dB

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 7

Multi-Pitch Extraction Results

(Rob Turetsky)

• After continuity cleanup:

• Captures main notes, plus a lot else

- hand-tuned termination thresholds?

• (Evaluation?)

freq

/ kH

z

0

2

4

6

8

time / sec

f0 /

Hz

0 2 4 6 8 10 12 14 16 180

500

1000

Beatles - Lucy in the Sky with Diamonds - seg 1

MPE Pianoroll

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 8

Probabilistic Pitch Estimates

(Goto 2001)

• Generative probabilistic model of spectrum as weighted combination of tone models at different fundamental frequencies:

• ‘Knowledge’ in terms of tone models + prior distributions for

f

0

:

• EM (RT) results:

p x f( )( ) w F m,( ) p x f( ) F m,( )m∑ Fd∫=

x [cent]

x [cent]F

c(t)(1|F,1)

c(t)(1|F,2)

c(t)(2|F,1)

c(t)(3|F,1)

c(t)(4|F,1)

c(t)(2|F,2) c(t)(3|F,2)

c(t)(4|F,2)

F+1200

F+2400F+1902

fundamentalfrequency

p(x | F,1, (t)(F,1))

p(x | F,2, (t)(F,2))(m=2)Tone Model

(m=1)Tone Model

p(x,1 | F,2, (t)(F,2))

µ

µµ

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 9

Generative Model Fitting

(Walmsley et al. 1999)

• Generative model of harmonic complexes in the time domain:

• Too many parameters to solve by EM!

Use Markov chain Monte Carlo (MCMC)to find good solution

• Results?

di γ iqGi

qbi

qei+

q 1=

Q

∑=samples

voices harmonic

switch harmonic weights

bases

noise

bq

νq

ωq

Γqσ2ωq

σ2e

iHq

i γqii

di

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 10

Transcription as Pattern Recognition

(Graham Poliner)

• Existing methods use prior knowledge about the structure of pitched notes

- i.e. we

know

they have regular harmonics

• What if we didn’t know that, but just had examples and features?

- the classic pattern recognition problem

• Could use music signal as evidence for pitch class in a black-box classifier:

- nb: more than one class at once!

• But where can we get labeled training data?

Trainedclassifier

Audio

p("C0"|Audio)p("C#0"|Audio)p("D0"|Audio)p("D#0"|Audio)p("E0"|Audio)p("F0"|Audio)

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 11

Ground Truth Data

(Turetsky & Ellis 2003)

• Pattern classifiers need training data

- i.e. need

{signal, note-label}

sets- i.e. MIDI transcripts of real music...already exist?

• Idea: force-align MIDI and original

- can estimate time-warp relationships- recover accurate note events in real music!

"Don't you want me" (Human League), verse1

17 18 19 20 21 22 23 24 25 26

0

2

4

19 20 21 22 23 24 25 26 27 2840

60

80MIDI notesMIDI notes

MIDI ReplicaMIDI Replica

OriginalOriginal

freq

/ kH

zM

IDI #

0

2

4

freq

/ kH

z

time / sec

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 12

Features for MIDI alignments

• Features that will match between MIDI replicas and original audio...

• Pitch is key attribute to match

- narrowband spectral features (but: timing...)- emphasize 100 Hz - 2 kHz

• Local spectral variation, not absolute levels

- remove local average & normalize local range

-40

-20

0

20

0 500 1000 1500 2000-2

0

2

4

0 500 1000 1500 2000freq / Hz

leve

l / d

Bno

rmal

ized

OriginalDYWMB: Corresponding Frames @ ts = 20.3

MIDI replica

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 13

Alignment example

• Inner-product distance on normalized spectral slices (8192 pt @ 22050 Hz):

200 400 600 800 1000 1200

200

400

600

800

1000

1200

Original / 186 ms frames

DYWMB: Original - MIDI alignmentM

IDI r

eplic

a / 1

86 m

s fr

ames

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 14

Extracted training data

• Want labeled examples of notes (in every context) to train pattern recognizer- still perfecting alignment, but an example:

23 240

1

2

72 730

1

2

93 940

1

2

162 1630

1

2

31 32 40 41 48 49

80 81 85 86 88 89

146 147 153 154 158 159

167 168 186 187 191 192

freq

/ kH

z

time / sec

DYWMB: Alignments to MIDI note 57 mapped to Orig Audio

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 15

Polyphonic Piano Transcription(Poliner & Ellis 2006)

• Training data from player piano

• Independent classifiers for each note- plus a little HMM smoothing

• Nice results- .. when test data resembles training

i /level / dB

freq

/ pi

tch

0 1 2 3 4 5 6 7 8 9

A1

A2

A3

A4

A5

A6

-20

-10

0

10

20

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 16

Outline

Music Transcription

Music Summarization- Segmentation- Identifying repetition- Evaluation

Music Information Retrieval

Music Similarity Browsing

1

2

3

4

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 17

Music Summarization

• What does it mean to ‘summarize’?- compact representation of larger entity- maximize ‘information content’- sufficient to recognize known item

• So summarizing music?- short version e.g. <10% duration (< 20s for pop)- sufficient to identify style, artist- e.g. chorus or ‘hook’?

• Why? - browsing existing collection- discovery among unknown works- commerce...

2

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 18

Summarization Approach

(with thanks to Beth Logan)

convert to features

label features & segmentto discover song structure

use heuristics to choosekey phrase

0 1 1 2 2 3 3 2 2

0 1 1 2 2 3 3 2 2

song

key phrase

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 19

Segmentation

• Find contiguous regions that are internally similar and different from neighbors

• E.g. “self-similarity” matrix (Foote 1997)

- 2D convolution of checkerboard down diagonal= compare fixed windows at every point

0 500 1000 15000

200

400

600

800

1000

1200

1400

1600

1800

100 200 300 400 500

DYWMB - self similarity

time / 183ms frames

time

/ 183

ms

fram

es

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 20

BIC segmentation

• Want to use evidence from whole segment, not just local window

• Do ‘significance test’ on every possible division of every possible context

• Eventually, a boundary is found:

L(X;M0)

lastsegmentation point

currentcontext limit

candidateboundary

L(X1;M1) L(X2;M2)

time0 N

log L(X1;M1)L(X2;M2)L(X;M0)

≷ λ2 log(N)∆#(M)BIC:

15 16 17 18 19 20 21 22 23 24 25

-200

-100

0

time / min

BIC

score last

segpoint current

contextlimit

boundary passes BIC

no boundary found

with shorter context

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 21

HMM segmentation

• Recall, HMM Viterbi path is joint classification and segmentation- e.g. for singing/accompaniment segmentation

• But: HMM states need to be defined in advance- define a ‘generic set’? (MPEG7)- learn them from the piece to be segmented?

(Chu & Logan 2000, Peeters et. al 2002)

• Result is ‘anonymous’ state sequencecharacteristic of particular piece

# U2-The_Joshua_Tree-01-Where_The_Streets_Have_No_Name 3367717 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 3 3 3 3 3 3 3 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 1

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 22

Finding Repeats

• Music frequently repeats main phrases

• Repeats give off-diagonal ridges in Similarity matrix (Bartsch ’01)

• Or: clustering at phrase-level ...

0 500 1000 15000

200

400

600

800

1000

1200

1400

1600

1800

DYWMB - self similarity

time / 183ms frames

time

/ 183

ms

fram

es

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 23

Clustering-based summarization(Logan & Chu 2000)

• Find segments in song by greedy clustering:

• Biggest cluster chosen as “key phrase”- large contiguous block taken as example

divide into fixed length segments

calculate distortion between everysegment pair

find pair with lowest distortion

song features

combine thepair

0 1 2 3 1 5 6distortion > threshold

stop

0 1 2 3 4 5 6

0 1 2 3 4 5 6

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 24

Evaluating Summaries

• Hard to evaluate:What is the ‘right answer’?- difficult to construct or judge a summary until you

know the song...

• Bartsch & Wakefield: 93 songs, ‘chorus’ hand-marked, 70% frame-level precision-recall- aiming to find chorus/refrain

• Chu & Logan:18 Beatles #1 hits rated by 10 subjectsas Good/Average/Poor- “significantly better than random”

• Without a good metric, how to make choices to improve the algorithm?

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 25

Outline

Music Transcription

Music Summarization

Music Information Retrieval- What it could mean- Unsupervised clustering- Learned classification

Music Similarity Browsing

1

2

3

4

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 26

Music Information Retrieval

• Text-based searching concepts for music?- “musical Google”- finding a specific item- finding something vague- finding something new

• Significant commercial interest

• Basic idea:Project music into a space where neighbors are “similar”

• (Competition from human labeling)

3

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 27

Music IR: Queries & Evaluation

• What is the form of the query?

- Query by Humming- considerable attention, recent demonstrations- need/user base?

• Query by noisy example- “Name that tune” in a noisy bar- Shazam Ltd.: commercial deployment- database access is the hard part?

• Query by multiple examples- “Find me more stuff like this”

• Text queries? (Whitman & Smaragdis 2002)

• Evaluation problems- requires large, shareable music corpus!- requires a well-defined task

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 28

Unsupervised Clustering(Rauber, Pampalk, Merkl 2002)

• Map music into an auditory-based space

• Build ‘clusters’ of nearby → similar music- “Self-Organizing Maps”

(Kohonen)

• Look at the results:

- quantitative evaluation?

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

d3-kryptonitelimp-rearranged

limp-showpr-deadcellwildwildwest

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

fbs-praisekorn-freaklimp-99

rem-endoftheworld

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough

limp-nobodylovespr-neverenough

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

ga-anneclairega-heaven

limp-wanderingpr-binge

br-punkrocklimp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalematebr-punkrock

limp-stalemate limp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lesson ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

ga-innocentga-time

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

addictga-lie

pinkpanthervm-classicalgas

vm-toccata

verve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetga-iwantit

ga-moneymilkparty

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

ga-iwantitga-moneymilk

party

nma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousay

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

backforgoodbigworld

br-anesthesiayoulearn

“Islands of Music”

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 29

Genre Classification(Tzanetakis et al. 2001)

• Classifying music into genres would get you some way towards finding “more like this”

• Genre labels are problematic, but they exist

• Real-time visualization of “GenreGram”:

- 9 spectral and 8 rhythm features every 200ms- 15 genres trained on 50 examples each,

single Gaussian model → ~ 60% correct

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 30

Artist Classification(Berenzweig et al. 2001)

• Artist label as available stand-in for genre

• Train MLP to classify frames among 21 artists

• Using only “voice” segments:Song-level accuracy improves 56.7% → 64.9%

0 10 20 30 40 50 60 70 80

Boards of CanadaSugarplastic

Belle & SebastianMercury Rev

CorneliusRichard Davies

Dj ShadowMouse on Mars

The Flaming LipsAimee Mann

WilcoXTCBeck

Built to SpillJason Falkner

OvalArto LindsayEric Matthews

The MolesThe Roots

Michael Penntrue voice

time / sec

Track 4 - Arto Lindsay (dynvox=Arto, unseg=Oval)

0 50 100 150 200

Boards of CanadaSugarplastic

Belle & SebastianMercury Rev

CorneliusRichard Davies

Dj ShadowMouse on Mars

The Flaming LipsAimee Mann

WilcoXTCBeck

Built to SpillJason Falkner

OvalArto Lindsay

Eric MatthewsThe MolesThe Roots

Michael Penntrue voice

time / sec

Track 117 - Aimee Mann (dynvox=Aimee, unseg=Aimee)

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 31

Artist Similarity

• Artist classes as a basis for overall similarity:Less corrupt than ‘record store genres’?

• But: what is similarity between artists?

- pattern recognition systems give a number...

• Need subjective ground truth:Collected via web site

www.musicseer.com

• Results:- 1800 users, 22,500 judgments collected over 6 months

backstreet_boys

whitney_

new_ron_carter

a

all_saints

annie_lennox

aqua

belinda_carlislers

celine_dionchristina_aguilera

eiffel_65

erasure

miroquai

janet_jacksonjessica_simpson lara_fabian

lauryn_hill

madonna

mariah_carey

nelly_furtado

pet_shop_boys

pr

roxette

sade

wain

sof

spice_girls

toni_braxtone

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 32

Outline

Music Transcription

Music Summarization

Music Information Retrieval

Music Similarity Browsing- Anchor space- Playola browser

1

2

3

4

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 33

Music Similarity Browsing

• Most interesting problem in music IR is finding new music- is there anything on mp3.com that I would like?

• Need a space where music/artists are arranged according to perceived similarity

• Particularly interested in little-known bands- little or no ‘community data’ (e.g. collab. filtering)- audio-based measures are critical

• Also need models of personal preference- where in the space is stuff I like- relative sensitivity to different dimensions

4

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 34

Anchor space

• A classifier trained for one artist (or genre) will respond partially to a similar artist

• A new artist will evoke a particular pattern of responses over a set of classifiers

• We can treat these classifier outputs as a new feature space in which to estimate similarity

• “Anchor space” reflects subjective qualities?

n-dimensionalvector in "Anchor

Space"Anchor

Anchor

Anchor

AudioInput

(Class j)

p(a1|x)

p(a2|x)

p(an|x)

GMMModeling

Conversion to Anchorspace

n-dimensionalvector in "Anchor

Space"Anchor

Anchor

Anchor

AudioInput

(Class i)

p(a1|x)

p(a2|x)

p(an|x)

GMMModeling

Conversion to Anchorspace

SimilarityComputation

KL-d, EMD, etc.

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 35

Anchor space visualization

• Comparing 2D projections of per-frame feature points in cepstral and anchor spaces:

- each artist represented by 5GMM- greater separation under MFCCs!- but: relevant information?

third cepstral coef

fifth

cep

stra

l coe

f

madonnabowie

Cepstral Features

1 0.5 0 0.5

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

Country

Ele

ctro

nica

madonnabowie

10

5

Anchor Space Features

15 10 5

15

0

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 36

Playola interface ( www.playola.org )

• Browser finds closest matches to single tracks or entire artists in anchor space

• Direct manipulation of anchor space axes

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 37

Evaluation

• Are recommendations good or bad?

• Subjective evaluation is the ground truth- .. but subjects don’t know the bands being

recommended- can take a long time to decide if a

recommendation is good

• Measure match to other similarity judgments- e.g. musicseer data:

Top rank agreement

0

10

20

30

40

50

60

70

80

cei cmb erd e3d opn kn2 rnd ANK

%

SrvKnw 4789x3.58

SrvAll 6178x8.93

GamKnw 7410x3.96GamAll 7421x8.92

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 38

Summary

• Music transcription:Hard, but some progress

• Music summarization:New, interesting problem

• Music IR:Alternative paradigms, lots of interest

Data-driven machine learning techniquesare valuable in each case

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 39

References

Mark A. Bartsch and Gregory H. Wakefield (2001) “To Catch a Chorus: Using Chroma-based Representations for Audio Thumbnailing”, Proc. WASPAA, Mohonk, Oct 2001. http://musen.engin.umich.edu/papers/ bartsch_wakefield_waspaa01_final.pdf

A. Berenzweig, D. Ellis, S. Lawrence (2002). “Using Voice Segments to Improve Artist Classification of Music “, Proc. AES-22 Intl. Conf. on Virt., Synth., and Ent. Audio. Espoo, Finland, June 2002.http://www.ee.columbia.edu/~dpwe/pubs/aes02-aclass.pdf

A. Berenzweig, D. Ellis, S. Lawrence (2002). “Anchor Space for Classification and Similarity Measurement of Music“, Proc. ICME-03, Baltimore, July 2003.http://www.ee.columbia.edu/~dpwe/pubs/icme03-anchor.pdf

J. Foote (1997), “A similarity measure for automatic audio classification”, Proc. AAAI Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora, March 1997.http://citeseer.nj.nec.com/foote97similarity.html

Masataka Goto (2001), “A Predominant-F0 Estimation Method for CD Recordings: MAP Estimation using EM Algorithm for Adaptive Tone Models”, Proc. ICASSP 2001, Salt Lake City, May 2001.http://staff.aist.go.jp/m.goto/PAPER/ICASSP2001goto.pdf

A. Klapuri, T. Virtanen, A. Eronen, J. Seppänen (2001), “Automatic transcription of musical recordings”, Proc. CRAC workshop, Eurospeech, Denmark, Sep 2001.http://www.cs.tut.fi/sgn/arg/klap/crac2001/crac2001.pdf

Beth Logan and Stephen Chu (2000), “Music summarization using key phrases”, Proc. IEEE ICASSP, Istanbul, June 2000. http://crl.research.compaq.com/publications/techreports/reports/2000-1.pdf

E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 40

References (2)

G. Peeters, A. La Burthe, X. Rodet (2002), “Toward automatic music audio summary generation from signal analysis”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP03-3.pdf

Andreas Rauber, Elias Pampalk and Dieter Merkl (2002), “Using Psychoacoustic models and Self-Organizing Maps to create a hierarchical structuring of music by musical styles”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP02-4.pdf

G. Tzanetakis, G. Essl, P. Cook (2001), “Automatic Musical Genre Classification of Audio Signals”, Proc. ISMIR-01, Bloomington, October 2001. http://ismir2001.indiana.edu/pdf/tzanetakis.pdf

P.J. Walmsley, S.J. Godsill, and P.J.W. Rayner (1999), “Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters”, Proc. WASPAA, Mohonk, Oct 1999. http://www.ee.columbia.edu/~dpwe/papers/WalmGR99-polypitch.pdf

Brian Whitman and Paris Smaragdis (2002), “Combining Musical and Cultural Features for Intelligent Style Detection”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP02-1.pdf