STATISTICAL TOOLS FOR AUDIO PROCESSING
Transcript of STATISTICAL TOOLS FOR AUDIO PROCESSING
![Page 1: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/1.jpg)
STATISTICAL TOOLS FOR AUDIO
PROCESSING Signal Image (Ecn) Mathieu Lagrange
Some material taken from Dan Ellis courses
![Page 2: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/2.jpg)
Machine Learning
• Machine Learning deals with sub-problems in engineering and sciences rather than the global “intelligence” issue! • Applied • A set of well-defined approaches each within its limits that
can be applied to a problem set • Classification / Pattern Recognition / Sequential
Reasoning / Induction / Parameter Estimation etc.
2
![Page 3: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/3.jpg)
Machine Learning • Provide tools and reasoning for the design process of a
given problem • Is an empirical science • Has a profound theoretical background • Is extremely diverse • Keep in mind that,
• Algorithms SHALL NOT be applied blindly to your data/problem set! • The MATLAB Toolbox syndrome: Examine the hypothesis and limitation of each approach before hitting enter!
• Do not forget your own intelligence!
3
![Page 4: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/4.jpg)
Sample Example (I) • Communication theory:
• Question: What should an optimal decoder do to recover Y from X ?
• X is usually referred to as observation and is a random variable. • In most problems, the real state of the world (y) is not observable
to us! So we try to infer this from the observation.
4
![Page 5: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/5.jpg)
Sample Example (I) • This is a typical Classification problem • Intuitive Solution:
• Threshold on 0.5 • But let’s make life more difficult!
5
![Page 6: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/6.jpg)
![Page 7: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/7.jpg)
Sample Example (I) • Simple Solution 2:
• Try to find an optimal boundary (defined as g(x)) that can best separate the two.
• Define the decision function as + or - distance from this boundry.
• I am thus assuming that the family of g(x) that discriminate X classes.
7
g(x)
![Page 8: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/8.jpg)
![Page 9: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/9.jpg)
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem • Not hard to see the problem!
9
![Page 10: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/10.jpg)
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input!
10
![Page 11: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/11.jpg)
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input! 2. How do we know when we have
collected adequately large and representative set of examples for training?
11
![Page 12: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/12.jpg)
Sample Example (I) • In the real world things are not as simple
• Consider the following 2-dimensional problem 1. To what extend does our solution
generalize to new data? • The central aim of designing a classifier
is to correctly classify novel input! 2. How do we know when we have
collected adequately large and representative set of examples for training?
3. How can we decide model complexity versus performance?
12
![Page 13: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/13.jpg)
Sample Example (II) • This is a typical Regression problem • Polynomial Curve Fitting
13
![Page 14: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/14.jpg)
Sample Example (II) • Polynomial Curve Fitting
• Sum-of-squares Error Function
14
![Page 15: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/15.jpg)
Sample Example (II) • Polynomial Curve Fitting
• 0th order polynomial
15
![Page 16: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/16.jpg)
Sample Example (II) • Polynomial Curve Fitting
• 1st order polynomial
16
![Page 17: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/17.jpg)
Sample Example (II) • Polynomial Curve Fitting
• 3rd order polynomial
17
![Page 18: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/18.jpg)
Sample Example (II) • Polynomial Curve Fitting
• 9th order polynomial
18
![Page 19: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/19.jpg)
Sample Example (II) • Polynomial Curve Fitting
• Over-fitting
19
Root-‐Mean-‐Square (RMS) Error:
![Page 20: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/20.jpg)
Sample Example (II) • Polynomial Curve Fitting
• Over-fitting and regularization • Effect of data set size (9th order polynomial)
20
![Page 21: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/21.jpg)
Sample Example (II) • Polynomial Curve Fitting
• Regularization • Penalize large coefficient values
• 9th order polynomial with
21
![Page 22: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/22.jpg)
TRAIN MACHINES
• Interaction between • The machine • The designer
![Page 23: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/23.jpg)
The machine • Pattern recognition in action:
• Examples: • Speaker Detection • Music genre classification • Many more
23
![Page 24: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/24.jpg)
The designer • Pattern recognition design cycle:
• Examples: • Speaker Detection • Music genre classification • Many more
24
![Page 25: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/25.jpg)
Feature Extraction • Right features are critical
• Invariance under irrelevant modications
• Theoretically equivalent features may act very differently in a particular classifer • Representations make important aspects explicit • Remove irrelevant information
• Feature design incorporates `domain knowledge’ • although more data -> less need for `cleverness’
• Smaller `feature space' (fewer dimensions) • Simpler models (fewer parameters) • less training data needed • faster training
![Page 26: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/26.jpg)
The right features for audio ? • Completely depends on the task at hand
• Speaker recognition • Musical genre detection
• Most common perceptual dimensions • Loudness (Amplitude) • Pitch (Frequency) • Timbre (Spectral Envelope)
![Page 27: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/27.jpg)
What is important for human ?
![Page 28: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/28.jpg)
Frequency Decomposition • A great idea that can be implemented in various ways:
• Mechanically • Analogically • Numerically (fortunately)
![Page 29: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/29.jpg)
Discrete Fourier Transform (DFT)
![Page 30: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/30.jpg)
Short Time Fourier Transform (STFT) • Want to localize energy in time and frequency
• break sound into short-time pieces • calculate DFT of each one
![Page 31: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/31.jpg)
The Spectrogram
![Page 32: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/32.jpg)
Focus on the spectral envelope
![Page 33: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/33.jpg)
MFCCs ? 1. Take the Fourier
transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform (DCT)
5. The MFCCs are the amplitudes of the resulting spectrum.
33
![Page 34: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/34.jpg)
MFCCs Rules ?
34
![Page 35: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/35.jpg)
Example • Audio
35
![Page 36: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/36.jpg)
Potentials of the DCT step • Observation of Pols that the main components capture
most of the variance using a few smooth basis functions, smoothing away the pitch ripples
• Principal components of vowel spectra on a warped frequency scale aren't so far from the cosine basis functions
• Decorrelates the features. • This is important because the MFCC are in most cases modelled
by Gaussians with diagonal covariance matrices
36
![Page 37: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/37.jpg)
Classification • Given some data x and some classes Ci, the optimal
classifier is
• Can model data distribution directly • Nearest neighbor, SVMs, AdaBoost, neural nets • Leads to a discriminative model
• Can consider data likelihood • Thanks to the Bayes’ rule • Leads to a generative model
![Page 38: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/38.jpg)
Basics on random variables • Random variable have joint
distributions p(x, y) • Marginal distribution of y is
• Knowing one value in a joint distribution constrains the remainder
![Page 39: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/39.jpg)
Bayes Rule • Bayes is powerful
• For generative models, it boils down to
![Page 40: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/40.jpg)
Gaussian models • Easiest way to model distributions is via a parametric
model • Assume known form, estimate a few parameters
• Gaussian model is simple and useful:
• Parameters to fit:
• Mean • variance
![Page 41: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/41.jpg)
In d dimensions
• Described by • A d-dimensional mean • A dxd covariance matrix
![Page 42: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/42.jpg)
Gaussian mixture models • Single Gaussians cannot model
• distributions with multiple modes • distributions with nonlinear correlations
• What about a weighted sum ?
• Can fit anything given enough components
• Interpretation: each observation is generated by one of the Gaussians, chosen with probability
![Page 43: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/43.jpg)
Gaussian mixtures • Can approximate non linear correlation
• Problem: estimate the parameters of the model • Easy if we knew which gaussian generated each x
![Page 44: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/44.jpg)
Expectation-maximisation (EM) • General procedure for estimating model parameters when
some are unknown • e.g. which GMM component generated a point
• Iteratively updated model parameters to maximize Q, the • expected log-probability of observed data x and hidden
data z
• E step: calculate using • M step: find model that maximizes Q using • can prove that the likelihood is non-decreasing • hence maximum likelihood model • local optimum -> depends on initialization
![Page 45: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/45.jpg)
Fitting GMMs with EM • Want to find
• The parameters of the Gaussians • Weights / priors on Gaussians • That maximize the likelihood of training data x
• If one could assign each training sample x to a particular gaussian, the estimation is trivial (model fitting)
• Hence, we treat mixture indices, z, as hidden • Want to optimize Q of the form • Differentiate wrt model parameters • Leads to update equations that are:
![Page 46: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/46.jpg)
Update equations • Parameters that maximize Q
• Each involves a « soft assignment » of xn in Gaussian k
![Page 47: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/47.jpg)
E-M example • Start
47
(Fig. From A. Moore’s Tutorial)
![Page 48: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/48.jpg)
E-M example • 1-st iteration
48
(Fig. From A. Moore’s Tutorial)
![Page 49: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/49.jpg)
E-M example • 2-nd iteration
49
(Fig. From A. Moore’s Tutorial)
![Page 50: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/50.jpg)
E-M example • 3-rd iteration
50
(Fig. From A. Moore’s Tutorial)
![Page 51: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/51.jpg)
E-M example • 4-th iteration
51
(Fig. From A. Moore’s Tutorial)
![Page 52: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/52.jpg)
E-M example • 5-th iteration
52
(Fig. From A. Moore’s Tutorial)
![Page 53: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/53.jpg)
E-M example • 6-th iteration
53
(Fig. From A. Moore’s Tutorial)
![Page 54: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/54.jpg)
E-M example • 20-th iteration
54
(Fig. From A. Moore’s Tutorial)
![Page 55: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/55.jpg)
Density Estimation
55
(Fig. Wikipedia)
![Page 56: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/56.jpg)
What about K-means then ? • A special case of EM for
GMMs, where • The membership assignement
is thresholded • The Gaussians are fully
described by their means
![Page 57: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/57.jpg)
K-means
![Page 58: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/58.jpg)
K-means
![Page 59: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/59.jpg)
K-means
![Page 60: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/60.jpg)
Now • You have
• Features for representing audio in a meaningful way • the MFCCs are able to complactly describe the spectral envelope
• A tool to learn GMMs from training data (complete data for which you know the memberships)
• Thanks to the Bayes’ theorem, • you know that given an observation, the model for which this observation
have the maximum likelihood is the best one to consider. • You can
• Abstract recorded audio in a meaningful way • Learn models for each class • Given an unlabeled sample, decide which label is the most suitable
• So, • Have some rest and some food • See you this afternoon for some hands-on practise
![Page 61: STATISTICAL TOOLS FOR AUDIO PROCESSING](https://reader033.fdocuments.us/reader033/viewer/2022052601/628cd52a79a7246fc776786b/html5/thumbnails/61.jpg)
Resources
• Artificial Intelligence: A Modern Approach Stuart Russel and Peter Norvig, Prentice Hall.
• Pattern Classification R. Duda, P. Hart, D. Stork, Wiley Interscience, 2000.
• Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.
• Introduction to Machine Learning, Ethem Alpaydin, MIT Press, 2004.
• The Elements of Statistical learning, T. Hastie, R. Tibshirani, J. Friedman, Springer Verlag, 2001. Also available online: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
61