Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...
Transcript of Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...
Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture)
Dr. Gerald Friedland, [email protected]
1
Today
2
•More on Audio Features•Recap: Some Basic Machine Learning
•Some Error Metrics
More on Features
3
• Mel-Frequency-Scaled Coefficients (MFCC)
Other (not explained here):• LPC (Linear Prediction Coefficients)• PLP (Perceptual Linear Predictive) Features• RASTA (see Morgan et al)• MSG (Modulation Spectrogram)
MFCC: Idea
4
power cepstrum of signal
Pre-emphasis
Windowing
FFT
Mel-Scale
Filterbank
Log-Scale
DCT
Audio Signal
MFCC
MFCC: Mel Scale
5
MFCC: Result
6
MFCC Variants and Derivates
7
Derivates: •LFCC (no Mel scale)•AMFCC (anti Mel scale)
Parameters: •MFCC12 (often used for ASR)•MFCC19 (often used in speaker id, diarization)•“delta”: coefficients subtracted (“first derivative”)•“deltadelta”: “second derivative”•Short term: Usually calculated on 10-50ms window
Typical Machine Learning for Audio Analysis
8
Typical Machine Learning for Audio Analysis
8
Today:
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
@home:
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
@home:•Supervector Approaches
Recap: Architecture of Content Analysis Algorithms
9
The Data...
10
The Data...
10
• ...should be plenty (there is no data than more data).
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
• Training should consists of a representative sample for good results
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
• Training should consists of a representative sample for good results
• If there is not enough data, significance must be tested
Test/train data mismatch that will detoriate accuracy
11
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch•Unseen test data
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch•Unseen test data•Too many parameters in
training model (overfitting)
Type of Algorithms
12
Type of Algorithms
12
•Classification/Identification
Type of Algorithms
12
•Classification/Identification•Verification/Detection
Type of Algorithms
12
•Classification/Identification•Verification/Detection•Estimation/Regression
Ground Truth
13
Ground Truth
13
• Is never 100% accurate.
Ground Truth
13
• Is never 100% accurate.•Annotator agreement should be
measured for high accuracy tasks, low confidence annotators
Reminder: K-Means
14
Choose k initial means µi at randomloop for all samples xj: assign membership of each element to a mean (closest mean) for all means µi calculate a new µi by averaging all values xj that were assigned membersuntil means µi are not updated significantly anymore
Algorithm Outline (Expectation Maximization)
Reminder: Gaussian Mixtures
15
Reminder: Training of Mixture Models
16
Goal: Find ai for
Expectation:
Maximization:
Magic Duo
17
~ 90% of audio papers use the combination of MFCCs and Gaussian Mixture Models to model audio signals!
Bayesian Information Criterion = “Acoustic Edge Detector”
18
BIC =where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment,λ is an optimization parameter.
Bayesian Information Criterion: Explanation
19
Bayesian Information Criterion: Explanation
19
• BIC penalizes the complexity of the model (as of number of parameters in model).
Bayesian Information Criterion: Explanation
19
• BIC penalizes the complexity of the model (as of number of parameters in model).
• BIC measures the efficiency of the parameterized model in terms of predicting the data.
Bayesian Information Criterion: Properties
20
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
• BIC is independent of the prior.
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
• BIC is independent of the prior.• It is closely related to other penalized
likelihood criteria such as RIC and the Akaike information criterion.
Some Error Metrics
21
Some Error Metrics
21
•Classification error
Some Error Metrics
21
•Classification error •The types of errors
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure•Word Error Rate
Classification Error
22
error =wrongclassificationstotalclassifications
Classification Error
22
error =wrongclassificationstotalclassifications
•Usually expressed in %
Classification Error
22
error =wrongclassificationstotalclassifications
•Usually expressed in %•Most simple and most popular
metric
Types of Errors
23
ROC Curve
24
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
ROC Curve
24
• Invented in the 1940s (radar detection accuracy)
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
ROC Curve
24
• Invented in the 1940s (radar detection accuracy)
•Said to have become very popular after Pearl Harbor incident
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
ROC Curve
25
DET Curve
26
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
•Very popular in retrieval community
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
•Very popular in retrieval community
•Equal Error Rate: Point at FN=FP
DET Curve
27
Precision/Recall
28
Precision/Recall
28
•Precision = True Positive Rate
Precision/Recall
28
•Precision = True Positive Rate•Became popular because of
F-Measure
29
F-Measure
29
•Two numbers are hard to compare => F-Measure
F-Measure
29
•Two numbers are hard to compare => F-Measure
•Harmonic Mean of Precision and Recall
F-Measure
29
•Two numbers are hard to compare => F-Measure
•Harmonic Mean of Precision and Recall
•Highly debated
Word Error Rate
30
where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.
Word Error Rate
30
Metric for comparing speech recognizers:
where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.
Next Week (Project Meeting)
31
•SeJITs•Project Idea Sketches (from groups)
Next Week (Lecture)
32
•Visual Content Analysis