Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

21
Timbre Similarity Timbre Similarity Work by Aucouturier & Pachet Work by Aucouturier & Pachet Rebecca Fiebrink Rebecca Fiebrink MUMT 611 MUMT 611 3 March 2005 3 March 2005

Transcript of Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

Page 1: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

Timbre Similarity Timbre Similarity Work by Aucouturier & PachetWork by Aucouturier & Pachet

Rebecca FiebrinkRebecca Fiebrink

MUMT 611MUMT 611

3 March 20053 March 2005

Page 2: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

22 of 21 of 21

Presentation OverviewPresentation Overview

Pachet & Aucouturier; why timbre similarity?Pachet & Aucouturier; why timbre similarity? Basic approach to quantifying timbre and Basic approach to quantifying timbre and

timbre similaritytimbre similarity ““Finding songs that sound the same,” 2002Finding songs that sound the same,” 2002 The CUIDADO projectThe CUIDADO project P & A’s work in contextP & A’s work in context Practical and theoretical improvements, 2004Practical and theoretical improvements, 2004 Remaining problems and future workRemaining problems and future work

Page 3: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

33 of 21 of 21

Who are they?Who are they?

Sony Computer Science Library (CSL), Sony Computer Science Library (CSL), ParisParis

François Pachet: Music access and François Pachet: Music access and interaction, “interestingness”interaction, “interestingness”

Jean-Julien Aucouturier: PhD studentJean-Julien Aucouturier: PhD student A host of papers on music browsing, A host of papers on music browsing,

genre, metadata, segmentation, …genre, metadata, segmentation, …

Page 4: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

44 of 21 of 21

Why timbre similarity?Why timbre similarity?

Electronic Music Distribution (EMD) systems:Electronic Music Distribution (EMD) systems: Move from mass-market to individualized Move from mass-market to individualized

distributiondistribution Collaborative filtering isn’t sufficientCollaborative filtering isn’t sufficient High-level, perceptually relevant descriptors High-level, perceptually relevant descriptors

play complementary / competing role; allow for play complementary / competing role; allow for more interestingmore interesting music browsing music browsing

Makes more sense than “melodic similarity”Makes more sense than “melodic similarity” Tied to genre, but not too tightlyTied to genre, but not too tightly

Page 5: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

55 of 21 of 21

How to quantify timbre?How to quantify timbre?

High-level descriptor for an entire song or High-level descriptor for an entire song or piecepiece

Mel Frequency Cepstral Coefficients (MFCCs) Mel Frequency Cepstral Coefficients (MFCCs) are building blocksare building blocks Related to spectral envelopeRelated to spectral envelope First few coefficients account for timbre envelope; First few coefficients account for timbre envelope;

later ones describe pitchlater ones describe pitch Derive a compact representation of a piece’s Derive a compact representation of a piece’s

MFCC “space” and a way to compare MFCC “space” and a way to compare representations for two piecesrepresentations for two pieces

Page 6: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

66 of 21 of 21

A & P’s implementation A & P’s implementation (2002)(2002)

Find first 8 MFCCs every 50 msFind first 8 MFCCs every 50 ms Model song as mixture of 3 Gaussian densities over all Model song as mixture of 3 Gaussian densities over all

possible MFCCs of length 8 (GMM = “Gaussian possible MFCCs of length 8 (GMM = “Gaussian mixture model”)mixture model”)

Calculate “distance” between GMMs by samplingCalculate “distance” between GMMs by sampling Sample from one GMM, compute likelihood of the samples Sample from one GMM, compute likelihood of the samples

given the other GMMgiven the other GMM Force symmetry and normalizeForce symmetry and normalize Use 1000 samplesUse 1000 samples

Store GMM information for each song and calculate Store GMM information for each song and calculate similarity matrixsimilarity matrix

Page 7: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

77 of 21 of 21

Results of 2002 versionResults of 2002 version

Same artistSame artist Harpsichord pieces: Bach - Wohltemperierte Clavier Fuga II in C

minor and Bach – Wohltemperierte Clavier - Praeludium IV in C sharp minor

Trip Hop: Portishead - Mysterons (live) and Portishead - Sour Times Different artists, same genreDifferent artists, same genre

Harpsichord pieces: Bach - Das Wohltemperierte Clavier - Praeludium IV in C sharp minor BWV849 and Couperin – Gavotte

"Woman Rock Singer": Leah Andreone - It's OK and Meredith Brooks – Bitch

““Interesting” resultsInteresting” results “Classical” and “Pop": Beethoven - Romanze fur Violine und

Orchester Nr. 2 F-dur op.50 and Beatles - Eleanor Rigby "Trip Hop" and "Celtic Folk ": Portishead - Mysterons and Alan Stivell

- Arvor You. (same kind of harpy theremin-like ambiance)

Page 8: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

88 of 21 of 21

Evaluating resultsEvaluating results

No ground truth existsNo ground truth exists Similarity is subjectiveSimilarity is subjective People don’t hear timbre alonePeople don’t hear timbre alone

Survey of 10 people: Is A more like B or Survey of 10 people: Is A more like B or C?C? Algorithm matches people 80% of timeAlgorithm matches people 80% of time

One view: Divergence from expectation One view: Divergence from expectation makes it makes it usefuluseful

Page 9: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

99 of 21 of 21

Generating “aha!”Generating “aha!”

Produce Produce interestinginteresting matches: when matches: when genre and timbre are not correlatedgenre and timbre are not correlated

Allow user control over size of “Aha!” Allow user control over size of “Aha!” explorationexploration

Page 10: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1010 of 21 of 21

Using the measure: Using the measure: CUIDADOCUIDADO

CContent-based ontent-based UUnified nified IInterfaces and nterfaces and DDescriptors for escriptors for AAudio and udio and MMusic usic DDatabases available atabases available OOnlinenline

2001-2003 European research project2001-2003 European research project ““aims at developing a new chain of applications aims at developing a new chain of applications

through the use of audio/music content descriptors, in through the use of audio/music content descriptors, in the spirit of the MPEG-7 standard”the spirit of the MPEG-7 standard”

design of appropriate design of appropriate description structuresdescription structures development of development of extractors for deriving high-level extractors for deriving high-level

informationinformation from audio signals from audio signals design and implementation of two applications: the design and implementation of two applications: the

Sound PaletteSound Palette and the and the Music BrowserMusic Browser(From the CUIDADO website)(From the CUIDADO website)

Page 11: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1111 of 21 of 21

CUIDADO Music BrowserCUIDADO Music Browser

Client/server architecture for music Client/server architecture for music browsingbrowsing

Target audience: casual music loverTarget audience: casual music lover 17,075 popular music titles with metadata17,075 popular music titles with metadata

Picture from “The CUIDADO project”

Page 12: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1212 of 21 of 21

Music Browser Query Music Browser Query PanelPanel

Picture from “Popular music access”

Page 13: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1313 of 21 of 21

Using Timbre in the Using Timbre in the Music BrowserMusic Browser

Nearest-neighbor searchNearest-neighbor search ““Find me something that sounds like this song”Find me something that sounds like this song” Allow user control over size of exploration: “Aha sliderAllow user control over size of exploration: “Aha slider

Same artist … Same genre … “interesting” Same artist … Same genre … “interesting”

Playlist generationPlaylist generation Example:Example:

1- Timbre continuity throughout the sequence1- Timbre continuity throughout the sequence2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop2- Genre Cardinality: 30% Rock, 30% Folk, 30%Pop

3- Genre Distribution: the titles of the same genre should be as 3- Genre Distribution: the titles of the same genre should be as separated as possibleseparated as possible

Page 14: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1414 of 21 of 21

Sample playlistSample playlist

Arlo Guthrie – City Of New Orleans (Folk/Rock)Arlo Guthrie – City Of New Orleans (Folk/Rock) Belle & Sebastien – The boy done wrong again (Rock/Alternative)Belle & Sebastien – The boy done wrong again (Rock/Alternative) Ben Harper – Pleasure & Pain (Pop/Blues)Ben Harper – Pleasure & Pain (Pop/Blues) Joni Mitchell – Borderline (Folk/Pop)Joni Mitchell – Borderline (Folk/Pop) Badly Drawn Boy – Camping Next to Water (Rock/Alternative)Badly Drawn Boy – Camping Next to Water (Rock/Alternative) Rolling Stones – You Can’t always get what you want (Pop/Blues)Rolling Stones – You Can’t always get what you want (Pop/Blues) Nick Drake - One of these things first (Folk/Pop)Nick Drake - One of these things first (Folk/Pop) Radiohead - Motion Picture Soundtrack (Rock/Brit)Radiohead - Motion Picture Soundtrack (Rock/Brit) The Beatles - Mother Nature's Son (Pop/Brit)The Beatles - Mother Nature's Son (Pop/Brit) Tracy Chapman - Talkin' about a Revolution (Rock/Folk)Tracy Chapman - Talkin' about a Revolution (Rock/Folk)

Page 15: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1515 of 21 of 21

Work in ContextWork in Context

Several other researchers also use MFCCs with Several other researchers also use MFCCs with reasonable results: Baumann 2003, Berenzweig et al. reasonable results: Baumann 2003, Berenzweig et al. 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2002, Foote 1997, Kulesh 2003, Logan and Salomon 2001, … 2001, …

Pampalk, Dixon, and Widmer 2003Pampalk, Dixon, and Widmer 2003 P & A’s work is relatively accurateP & A’s work is relatively accurate Implementation is relatively slowImplementation is relatively slow Incorporating use of 1Incorporating use of 1stst MFCC integrates average dynamic MFCC integrates average dynamic

level into resultslevel into results Hard to compare one group’s work with another’sHard to compare one group’s work with another’s Hard to propose future research directions beyond Hard to propose future research directions beyond

parameter tweakingparameter tweaking

Page 16: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1616 of 21 of 21

Practical & Theoretical Practical & Theoretical Improvements, 2004Improvements, 2004

A & P conducted extensive tests varying A & P conducted extensive tests varying algorithms and parameters of 2002 algorithms and parameters of 2002 systemsystem Can optimal parameter settings be found?Can optimal parameter settings be found? What is the limit on improvement?What is the limit on improvement?

Evaluate in the context of CUIDADO Evaluate in the context of CUIDADO Music BrowserMusic Browser

Page 17: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1717 of 21 of 21

Optimal parameter Optimal parameter valuesvalues

Signal sample rate: higher is betterSignal sample rate: higher is better Distance sample rate (used to compare GMMs): higher is better, Distance sample rate (used to compare GMMs): higher is better,

but little improvement over 1000but little improvement over 1000 Sampling can perform as well as Earth Mover’s distance (EMD)Sampling can perform as well as Earth Mover’s distance (EMD) The number of MFCCs and the number of components in the The number of MFCCs and the number of components in the

GMM jointly affect the outcome:GMM jointly affect the outcome: 50 components and 20 MFCCs is optimal50 components and 20 MFCCs is optimal # components can be reduced without hurting performance # components can be reduced without hurting performance

muchmuch 30 ms is optimum window size30 ms is optimum window size Adhering to above guidelines leads to absolute improvement of Adhering to above guidelines leads to absolute improvement of

16% to precision16% to precision Precision is underestimated: considers same-genre onlyPrecision is underestimated: considers same-genre only

Page 18: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1818 of 21 of 21

Alternative algorithmsAlternative algorithms

Several speech-processing algorithms Several speech-processing algorithms were triedwere tried Mixed resultsMixed results No drastic improvements: 2% additional No drastic improvements: 2% additional

precision at mostprecision at most

HMM instead of GMM offers no HMM instead of GMM offers no improvementimprovement

Page 19: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

1919 of 21 of 21

Conclusions of 2004 Conclusions of 2004 studystudy

““Ceiling” of 65% precision (conservative Ceiling” of 65% precision (conservative estimate)estimate)

False positives remain a problemFalse positives remain a problem Jimi Hendrix != Joni MitchellJimi Hendrix != Joni Mitchell Due to “hubs” in nearest-neighbor spaceDue to “hubs” in nearest-neighbor space

Problems are inherent in approach itself?Problems are inherent in approach itself?

Page 20: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

2020 of 21 of 21

Proposals for future workProposals for future work

Address Address perceptionperception of timbre of timbre Some frames are more important than Some frames are more important than

othersothers Some timbres more salient than othersSome timbres more salient than others People assess similarity by choosing “This People assess similarity by choosing “This

sounds like X” or “This doesn’t sound like X”sounds like X” or “This doesn’t sound like X”

Page 21: Timbre Similarity Work by Aucouturier & Pachet Rebecca Fiebrink MUMT 611 3 March 2005.

2121 of 21 of 21

ConclusionsConclusions

High-level, perceptually based similarity High-level, perceptually based similarity has a place in electronic music has a place in electronic music distributiondistribution

Current systems for timbre similarity have Current systems for timbre similarity have some usesome use

There is still room for new, innovative, There is still room for new, innovative, and cross-disciplinary workand cross-disciplinary work