Singer similarity / identification Francois Thibault MUMT 614B McGill University.
-
Upload
byron-summers -
Category
Documents
-
view
218 -
download
0
Transcript of Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Singer similarity / identification
Francois Thibault
MUMT 614B
McGill University
Introduction Relatively easy for humans to identify
singing voice in various contexts Difficult to find time/environment
invariant features for robust automatic identification
Growing demand for such systems as Network databases keep expanding
Background (1) Significant research in speaker identification,
systems perform poorly with singing voice (inadequate training)
Singer identification research can draw much of automatic instrument recognition systems
Artist / singer identification much harder than song identification (due to necessity of context invariant features)
Background (2) Often builds on speech / music discrimination
systems Acoustical features heavily used to create N-
dimensional Euclidean space: loudness, pitch, brightness, bandwidth, harmonicity
Often uses the same tools as style identification because each singer correspond to a ‘micro’ style
Kim and Whitman overview Segmentation of vocal regions
prior to singer identification algorithm Assumes singing regions display
strong harmonic energy in voice frequency range
Band-pass filter (200-2000 Hz) Inverse comb filter bank to detect
harmonicity Identification classifier uses
features based on LPC
K & W features extraction Determine formant
location and amplitude by a 12-poles linear predictor using the autocorrelation method
Augments low frequency resolution without increasing model order by warping the frequency representation with a function approximating the Bark scale
K & W classification Uses Gaussian mixture model (GMM) to
capture behavior of a class Parameters of Gaussians determined by
Expectation Maximization (EM) Run PCA prior to EM (normalizes the data
variance, good for EM)
SVMs computes optimal hyperplane that can linearly separate classes
K & W results Testbed contained more than 200
songs by 17 solo singers Half for training, half for testing
Vocal segmentation inaccurate (~55%) Experimenting GMM and SVM for
complete song and vocal parts only Overall results well short of human
performance
K & W Experimental results
Liu and Huang overview Singer classification of MP3 files First segment audio into phonemes Calculate feature vector and store phoneme
feature vector with associated singer for training set
Above feature vectors are used as discriminators for classification of unknown MP3 music objects
L & H System Architecture
L & H segmentation features Phoneme segmentation is derived from
polyphase filter coefficients by obtaining a frame energy measurement
K & W phoneme database Phonemes are separated by a minimum
in FE
L & H Phoneme features The phoneme features are obtained
directly from the MDCT coefficients
L & H classification (1) Compares phonemes
features with those in the phoneme database
Discriminating radius (Euclidean distance) is determines uniqueness of a phoneme
Number of neighbors by same singer within the discriminating radius is called frequency (w)
L & H classification (2) kNN classifier used to guess artist in
unknown MP3 songs For efficiency, only uses the first N phonemes
in unknown MP3 Find the k closest neighbors in database and
allow to vote if distance is within a threshold For each neighbor, give a weighted vote
dependent on frequency, and distance
where w is frequency and
K & W results 3 influencing factors
Number of neighbors (N) Threshold for vote decision Number of singers in
database
Other works… Minnowmatch: MIR engine including
artist classification using NN and SVM (Whitman, Flake, Lawrence (NEC))
Quest for ground truth in musical artist similarity: determine accurate measure of similarity given subjective nature of artist classification (Ellis, Whitman, Berenzweig, Lawrence)