Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...

$: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT$
Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture)

Dr. Gerald Friedland, [email protected]

1

mailto:[email protected]

mailto:[email protected]

Today

2

•More on Audio Features•Recap: Some Basic Machine Learning

•Some Error Metrics

More on Features

3

• Mel-Frequency-Scaled Coefficients (MFCC)

Other (not explained here):• LPC (Linear Prediction Coefficients)• PLP (Perceptual Linear Predictive) Features• RASTA (see Morgan et al)• MSG (Modulation Spectrogram)

MFCC: Idea

4

power cepstrum of signal

Pre-emphasis

Windowing

FFT

Mel-Scale

Filterbank

Log-Scale

DCT

Audio Signal

MFCC

MFCC: Mel Scale

5

MFCC: Result

6

MFCC Variants and Derivates

7

Derivates: •LFCC (no Mel scale)•AMFCC (anti Mel scale)

Parameters: •MFCC12 (often used for ASR)•MFCC19 (often used in speaker id, diarization)•“delta”: coefficients subtracted (“first derivative”)•“deltadelta”: “second derivative”•Short term: Usually calculated on 10-50ms window

Typical Machine Learning for Audio Analysis

8


8

Today:


8

Today:•Gaussian Mixture Models


8

Today:•Gaussian Mixture Models•Bayesian Information Criterion


8


Later:


8


Later:•HMMs/FSAs


8


Later:•HMMs/FSAs

@home:


8


Later:•HMMs/FSAs

@home:•Supervector Approaches

Recap: Architecture of Content Analysis Algorithms

9

The Data...

10

The Data...

10

• ...should be plenty (there is no data than more data).

The Data...

10


• Training set and test set must be different

The Data...

10



• Training should consists of a representative sample for good results

The Data...

10



• Training should consists of a representative sample for good results

• If there is not enough data, significance must be tested

Test/train data mismatch that will detoriate accuracy

11


11

•Channel mismatch


11

•Channel mismatch•Domain mismatch


11

•Channel mismatch•Domain mismatch•Unseen test data


11

•Channel mismatch•Domain mismatch•Unseen test data•Too many parameters in

training model (overfitting)

Type of Algorithms

12

Type of Algorithms

12

•Classification/Identification

Type of Algorithms

12

•Classification/Identification•Verification/Detection

Type of Algorithms

12

•Classification/Identification•Verification/Detection•Estimation/Regression

Ground Truth

13

Ground Truth

13

• Is never 100% accurate.

Ground Truth

13

• Is never 100% accurate.•Annotator agreement should be

measured for high accuracy tasks, low confidence annotators

Reminder: K-Means

14

Choose k initial means µi at randomloop for all samples xj: assign membership of each element to a mean (closest mean) for all means µi calculate a new µi by averaging all values xj that were assigned membersuntil means µi are not updated significantly anymore

Algorithm Outline (Expectation Maximization)

Reminder: Gaussian Mixtures

15

Reminder: Training of Mixture Models

16

Goal: Find ai for

Expectation:

Maximization:

Magic Duo

17

~ 90% of audio papers use the combination of MFCCs and Gaussian Mixture Models to model audio signals!

Bayesian Information Criterion = “Acoustic Edge Detector”

18

BIC =where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment,λ is an optimization parameter.

Bayesian Information Criterion: Explanation

19


19

• BIC penalizes the complexity of the model (as of number of parameters in model).


19

• BIC penalizes the complexity of the model (as of number of parameters in model).

• BIC measures the efficiency of the parameterized model in terms of predicting the data.

Bayesian Information Criterion: Properties

20


20

• BIC is a minimum description length criterion.

http://en.wikipedia.org/wiki/Minimum_description_length



20


• BIC is independent of the prior.




20


• BIC is independent of the prior.• It is closely related to other penalized

likelihood criteria such as RIC and the Akaike information criterion.



http://en.wikipedia.org/wiki/Akaike_information_criterion

http://en.wikipedia.org/wiki/Akaike_information_criterion

Some Error Metrics

21

Some Error Metrics

21

•Classification error

Some Error Metrics

21

•Classification error •The types of errors

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure•Word Error Rate

Classification Error

22

error =wrongclassificationstotalclassifications


22


•Usually expressed in %


22


•Usually expressed in %•Most simple and most popular

metric

Types of Errors

23

ROC Curve

24

True Positive Rate (TPR) = TP / P = TP / (TP + FN)

False Positive Rate (FPR) = FP / N = FP / (FP + TN)

Receiver-Operator Characteristics:

vs

ROC Curve

24

• Invented in the 1940s (radar detection accuracy)




vs

ROC Curve

24

• Invented in the 1940s (radar detection accuracy)

•Said to have become very popular after Pearl Harbor incident




vs

ROC Curve

25

DET Curve

26

DET Curve

26

•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled

DET Curve

26


•Very useful for detection tasks (threshold tuning)

DET Curve

26



•Very popular in retrieval community

DET Curve

26



•Very popular in retrieval community

•Equal Error Rate: Point at FN=FP

DET Curve

27

Precision/Recall

28

Precision/Recall

28

•Precision = True Positive Rate

Precision/Recall

28

•Precision = True Positive Rate•Became popular because of

Google

F-Measure

29

F-Measure

29

•Two numbers are hard to compare => F-Measure

F-Measure

29


•Harmonic Mean of Precision and Recall

F-Measure

29


•Harmonic Mean of Precision and Recall

•Highly debated

Word Error Rate

30

where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.

Word Error Rate

30

Metric for comparing speech recognizers:

where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.

Next Week (Project Meeting)

31

•SeJITs•Project Idea Sketches (from groups)

Next Week (Lecture)

32

•Visual Content Analysis

Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...

Documents

Transcript of Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...