Singer similarity / identification Francois Thibault MUMT 614B McGill University.

18
Singer similarity / identification Francois Thibault MUMT 614B McGill University

Transcript of Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Page 1: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Singer similarity / identification

Francois Thibault

MUMT 614B

McGill University

Page 2: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Introduction Relatively easy for humans to identify

singing voice in various contexts Difficult to find time/environment

invariant features for robust automatic identification

Growing demand for such systems as Network databases keep expanding

Page 3: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Background (1) Significant research in speaker identification,

systems perform poorly with singing voice (inadequate training)

Singer identification research can draw much of automatic instrument recognition systems

Artist / singer identification much harder than song identification (due to necessity of context invariant features)

Page 4: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Background (2) Often builds on speech / music discrimination

systems Acoustical features heavily used to create N-

dimensional Euclidean space: loudness, pitch, brightness, bandwidth, harmonicity

Often uses the same tools as style identification because each singer correspond to a ‘micro’ style

Page 5: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Kim and Whitman overview Segmentation of vocal regions

prior to singer identification algorithm Assumes singing regions display

strong harmonic energy in voice frequency range

Band-pass filter (200-2000 Hz) Inverse comb filter bank to detect

harmonicity Identification classifier uses

features based on LPC

Page 6: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W features extraction Determine formant

location and amplitude by a 12-poles linear predictor using the autocorrelation method

Augments low frequency resolution without increasing model order by warping the frequency representation with a function approximating the Bark scale

Page 7: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W classification Uses Gaussian mixture model (GMM) to

capture behavior of a class Parameters of Gaussians determined by

Expectation Maximization (EM) Run PCA prior to EM (normalizes the data

variance, good for EM)

SVMs computes optimal hyperplane that can linearly separate classes

Page 8: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W results Testbed contained more than 200

songs by 17 solo singers Half for training, half for testing

Vocal segmentation inaccurate (~55%) Experimenting GMM and SVM for

complete song and vocal parts only Overall results well short of human

performance

Page 9: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W Experimental results

Page 10: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Liu and Huang overview Singer classification of MP3 files First segment audio into phonemes Calculate feature vector and store phoneme

feature vector with associated singer for training set

Above feature vectors are used as discriminators for classification of unknown MP3 music objects

Page 11: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

L & H System Architecture

Page 12: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

L & H segmentation features Phoneme segmentation is derived from

polyphase filter coefficients by obtaining a frame energy measurement

Page 13: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W phoneme database Phonemes are separated by a minimum

in FE

Page 14: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

L & H Phoneme features The phoneme features are obtained

directly from the MDCT coefficients

Page 15: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

L & H classification (1) Compares phonemes

features with those in the phoneme database

Discriminating radius (Euclidean distance) is determines uniqueness of a phoneme

Number of neighbors by same singer within the discriminating radius is called frequency (w)

Page 16: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

L & H classification (2) kNN classifier used to guess artist in

unknown MP3 songs For efficiency, only uses the first N phonemes

in unknown MP3 Find the k closest neighbors in database and

allow to vote if distance is within a threshold For each neighbor, give a weighted vote

dependent on frequency, and distance

where w is frequency and

Page 17: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

K & W results 3 influencing factors

Number of neighbors (N) Threshold for vote decision Number of singers in

database

Page 18: Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Other works… Minnowmatch: MIR engine including

artist classification using NN and SVM (Whitman, Flake, Lawrence (NEC))

Quest for ground truth in musical artist similarity: determine accurate measure of similarity given subjective nature of artist classification (Ellis, Whitman, Berenzweig, Lawrence)