1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

36
1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005

Transcript of 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

Page 1: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

1

Musical Genre Classification

Prepared by Elliot Sinyor for MUMT 611March 3, 2005

Page 2: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

2/36

Table of Contents

What is Genre? Approaches to Genre Classification

Manual Automatic

Related Work Soltau 1998 Tzanetakis & Cook

prescriptive approach Pachet et al. 2001

emergent approach Conclustion

Page 3: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

3/36

What is Genre?

A way of describing what an item shares with other items as well as what differentiates it from other items

From Aucouturier and Pachet “The genesis of genre is therefore to be

found in our natural and irrepressible tendency to classify”

Page 4: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

4/36

What is Genre?

A&P separate into two broad categories Intentional vs. Extensional

Page 5: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

5/36

What is Genre? - Intentional

More subjective Relies on collective cultural knowledge Social/Historical context Eg 60s, hippies, brit-pop

Page 6: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

6/36

Problems with “Genre”

What do the names mean? Rock? Pop?

No fixed semantics Amazon.com Genres by:

Period (“60s pop”) Topic (“love song”) Country of Origin (“Japanese music”)

Genre is based on extrinsic habits rather than intrinsic properties

To a French person – C. Aznavour – Variety To an English person – C. Aznavour – French

Page 7: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

7/36

What is Genre? - Extensional

Analysis-based Describes the music itself Tempo, timbre, pitch, language, etc. (sometimes) easier for automatic genre

classification systems Eg fast rock, mellow classical.

Page 8: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

8/36

Problems with “Genre”

What granularity to use? By Artist?

Please Please Me vs. Sgt. Pepper

By Album? Revolution 9 vs. Helter Skelter vs. Mother

Nature’s Son

Does work for broad categories Rock vs. Classical

Page 9: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

9/36

Problems with “Genre”

Does anyone agree? Allmusic.com – 531 genres Amazon.com – 719 genres Mp3.com – 430 genres

Only 70 words common to the three taxonomies (Pachet and Cazaly 2000)

Page 10: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

10/36

Approaches to Genre Classification

Manual Musicologists and Elbow Grease

Automatic Prescriptive

Signal Analysis based Emergent

Uses existing human-entered meta-data to group things together

Page 11: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

11/36

Manual Classification

Dannenberg et al. 2001: To build a taxonomy for MSN Music

Search Engine “Few hundred thousand songs” Hired full-time musicologists Took 30 human years “The details of the taxonomy and the

design methodology are, however, not available”

Page 12: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

12/36

Manual Classification

Pachet and Cazaly 2001 (CUIDADO) Separated descriptors – country,

instrumentation, artist type, etc _____ Rock

Too sensitive to musical evolution, difficult to build, difficult to maintain

Changed focus to artists instead of titles.

In any case, insufficient for millions of titles

Page 13: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

13/36

Prescriptive – History

Originated from Speech Recognition work

Most Classified audio from TV into music/speech/environmental

Page 14: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

14/36

Prescriptive – Various Approches

Saunders 1996 Thresholding/ZCR techniques

Scheirer and Slaney 1997 Multiple features and statistical pattern recognition

Kimber and Wilcox 1996 MFCCs and HMM to classify into music, speech,

laughter and nonspeech Zhang and Kuo 2001

Rule-based system for classifying audio from movies and TV into:

Non-music Pure speech, non harmonic environmental sound

Music Harmonic environmental sound, pure music, song,

speech with music, environmental sound with music

Page 15: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

15/36

Prescriptive

Soltau et al 1998 – “Recognition of Music Types”

New approach – Explicit Time Modelling with Neural Network (ETM-NN)

Page 16: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

16/36

Prescriptive – Soltau et al. 1998

In a nutshell: Transform acoustic signal into

sequence of abstract sonic events Look at statistical patterns derived from

sequences combine into vectors that represent temporal structure

3-layer feed-forward network

Page 17: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

17/36

Prescriptive – Soltau et al. 1998

Experimental Results: 3 hours of data (360 samples, 30 sec each) Rock, Pop, Techno, Classical 67% training, 13% cross-validation, 20%

evaluation

Compare ETM-NN vs. HMM, using cepstral coefficients ETM-NN: 86.1% HMM: 79.2%

Page 18: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

18/36

“Musical Genre Classification of Audio Signals” – Tzanetakis and Cook, 2002

Timbral Texture Features Spectral {Centroid, Rolloff, Flux}, ZCR, MFCC

(5 coefficients) Analysis Window – features should be

stable – 23 ms Texture Window – “minimum amount of

time to identify a 'texture’” 43 analysis windows, 1 sec.

“Memory of the past”

Statistics (means, variances) of features over the texture window

Page 19: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

19/36

“Musical Genre Classification of Audio Signals” – Tzanetakis and Cook, 2002

Timbral Texture Features Spectral {Centroid, Rolloff, Flux}, ZCR, MFCC

(5 coefficients) Analysis Window – features should be

stable – 23 ms Texture Window – “minimum amount of

time to identify a 'texture’” 43 analysis windows, 1 sec.

“Memory of the past”

Page 20: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

20/36

Timbral Texture Feature Vector

Statistics (means, variances) of features over the texture window 19 dimensions

(m, v) of SC, SF, SR, ZCR, 5 MFCC “low energy feature” fraction of analysis

windows over texture window that have less than average RMS energy

Eg vocal music will have more silences

Page 21: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

21/36

Rhythmic Content – “Beat Histogram” “Pitch detection with larger

periods” Use DWT to divide signal into

frequency bands

Page 22: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

22/36

Rhythmic Content – “Beat Histogram”

Page 23: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

23/36

Features taken from BH

A0, A1: relative amplitude (divided by the sum of amplitudes) of the first, and second histogram peak;

RA: ratio of the amplitude of the second peak divided by the amplitude of the first peak;

P1, P2: period of the first, second peak in bpm;

SUM: overall sum of the histogram (indication of beat strength).

Page 24: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

24/36

Pitch Content Features

Used enhanced Autocorrelation function to create folded (1 octave) and unfolded (all notes) pitch histograms

Mapped to MIDI note numbers Folded- common pitch classes Unfolded – pitch range

Higher for jazz, classical FA0, UP0, UP1, IPO1 (interval between

2 highest peaks), SUM

Page 25: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

25/36

Experimental Results

Used GMM classifiers with diagonal covariance matrices

Page 26: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

26/36

Experimental Results

Page 27: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

27/36

Prescriptive – Some Results: (from A&P)

Gaussian and Gaussian Mixture Models, used in 48% of successful classification in Ermolinskiy et al.(2001) using 100 songs for each class in the training phase. This result has to be taken with care since the system uses only pitch information.

Tzanetakis et al. (2001) achieves a rather disappointing 57%, but also reports 75% in Tzanetakis and Cook (2000a) using 50 songs per class.

90% in Lambrou and Sandler (1998) and 75% in Deshpande et al. (2001) on a very small training and test set, which may not be representative.

Pye (2000) reports 90% on a total set of 175 songs. Soltau (1998) reports 80% with HMM, 86% with NN,

with a database of 360 songs.

Page 28: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

28/36

Emergent

Unlike Prescriptive, it is unsupervised

Based on “cultural similarity from text documents”

Possible to extract similarities that are not possible to extract from the audio signal

Page 29: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

29/36

Emergent – Collaborative Filtering

Shardanand & Maes 1995, Pestoni et al. 2001 There are patterns in tastes Have users rate their music, match like-tasted

users, recommend unknown items to users Problems

Good for naïve profiles, bad for broad, eclectic tastes

Favors “middle of the road” – liked by large proportion

Only works some time after release of new music

Page 30: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

30/36

Emergent – co-concurrent analysis

Pachet et al. 2001 Looks at online text sources for co-

occurrences of songs (aka data mining)

If 2 items appear in the same context (or share a common neighbour), this is evidence of some sort of similarity

Page 31: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

31/36

Co-occurrence

Pachet et al. 2001 “Musical Data Mining for Electronic Music Distribution”

Sources used Track listing databases (CDDB)

Mostly look at compilations of similar artists Radio Show playlists

Specialty programs better than daily commercial radio

Lists made by experts

Page 32: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

32/36

Co-occurrence

Build a matrix where: Value of entry (i, j) corresponds to

number of times title i co-occurs with title j

What about indirect co-occurrence? Eg Eleanor Rigby/Good Vibrations, Good

Vibrations/God Only Knows Eleanor Rigby God Only Knows

Correlation measure, using co-variance matrices of each title

Page 33: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

33/36

Experimental Results

Using distance functions, use Ascendant Hierarchical Clustering

Used CDDB database, compared co-occurrence vs correlation

Manually examined results “70% of clusters had interesting

similarities”

Page 34: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

34/36

Experimental Results

Page 35: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

35/36

Challenges

Name format is not strictly enforced The Beatles; Beatles, The; Beatles

Difficult to characterize the nature of the similarities

Cover songs can sound nothing alike

Page 36: 1 Musical Genre Classification Prepared by Elliot Sinyor for MUMT 611 March 3, 2005.

36/36

Conclusions and Future directions

“It seems that samples of Techno and Classical are easy to discriminate … Rock and Pop seems to be more difficult” – Soltau et al 1998

Manual classification not feasible Why not combine

prescriptive/emergent techniques?