Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...

76
Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture) Dr. Gerald Friedland, [email protected] 1

Transcript of Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...

Page 1: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture)

Dr. Gerald Friedland, [email protected]

1

Page 2: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Today

2

•More on Audio Features•Recap: Some Basic Machine Learning

•Some Error Metrics

Page 3: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

More on Features

3

• Mel-Frequency-Scaled Coefficients (MFCC)

Other (not explained here):• LPC (Linear Prediction Coefficients)• PLP (Perceptual Linear Predictive) Features• RASTA (see Morgan et al)• MSG (Modulation Spectrogram)

Page 4: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

MFCC: Idea

4

power cepstrum of signal

Pre-emphasis

Windowing

FFT

Mel-Scale

Filterbank

Log-Scale

DCT

Audio Signal

MFCC

Page 5: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

MFCC: Mel Scale

5

Page 6: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

MFCC: Result

6

Page 7: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

MFCC Variants and Derivates

7

Derivates: •LFCC (no Mel scale)•AMFCC (anti Mel scale)

Parameters: •MFCC12 (often used for ASR)•MFCC19 (often used in speaker id, diarization)•“delta”: coefficients subtracted (“first derivative”)•“deltadelta”: “second derivative”•Short term: Usually calculated on 10-50ms window

Page 8: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Page 9: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:

Page 10: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models

Page 11: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models•Bayesian Information Criterion

Page 12: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models•Bayesian Information Criterion

Later:

Page 13: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models•Bayesian Information Criterion

Later:•HMMs/FSAs

Page 14: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models•Bayesian Information Criterion

Later:•HMMs/FSAs

@home:

Page 15: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Typical Machine Learning for Audio Analysis

8

Today:•Gaussian Mixture Models•Bayesian Information Criterion

Later:•HMMs/FSAs

@home:•Supervector Approaches

Page 16: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Recap: Architecture of Content Analysis Algorithms

9

Page 17: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

The Data...

10

Page 18: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

The Data...

10

• ...should be plenty (there is no data than more data).

Page 19: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

The Data...

10

• ...should be plenty (there is no data than more data).

• Training set and test set must be different

Page 20: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

The Data...

10

• ...should be plenty (there is no data than more data).

• Training set and test set must be different

• Training should consists of a representative sample for good results

Page 21: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

The Data...

10

• ...should be plenty (there is no data than more data).

• Training set and test set must be different

• Training should consists of a representative sample for good results

• If there is not enough data, significance must be tested

Page 22: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Test/train data mismatch that will detoriate accuracy

11

Page 23: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Test/train data mismatch that will detoriate accuracy

11

•Channel mismatch

Page 24: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Test/train data mismatch that will detoriate accuracy

11

•Channel mismatch•Domain mismatch

Page 25: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Test/train data mismatch that will detoriate accuracy

11

•Channel mismatch•Domain mismatch•Unseen test data

Page 26: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Test/train data mismatch that will detoriate accuracy

11

•Channel mismatch•Domain mismatch•Unseen test data•Too many parameters in

training model (overfitting)

Page 27: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Type of Algorithms

12

Page 28: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Type of Algorithms

12

•Classification/Identification

Page 29: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Type of Algorithms

12

•Classification/Identification•Verification/Detection

Page 30: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Type of Algorithms

12

•Classification/Identification•Verification/Detection•Estimation/Regression

Page 31: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Ground Truth

13

Page 32: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Ground Truth

13

• Is never 100% accurate.

Page 33: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Ground Truth

13

• Is never 100% accurate.•Annotator agreement should be

measured for high accuracy tasks, low confidence annotators

Page 34: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Reminder: K-Means

14

Choose k initial means µi at randomloop for all samples xj: assign membership of each element to a mean (closest mean) for all means µi calculate a new µi by averaging all values xj that were assigned membersuntil means µi are not updated significantly anymore

Algorithm Outline (Expectation Maximization)

Page 35: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Reminder: Gaussian Mixtures

15

Page 36: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Reminder: Training of Mixture Models

16

Goal: Find ai for

Expectation:

Maximization:

Page 37: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Magic Duo

17

~ 90% of audio papers use the combination of MFCCs and Gaussian Mixture Models to model audio signals!

Page 38: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion = “Acoustic Edge Detector”

18

BIC =where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment,λ is an optimization parameter.

Page 39: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Explanation

19

Page 40: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Explanation

19

• BIC penalizes the complexity of the model (as of number of parameters in model).

Page 41: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Explanation

19

• BIC penalizes the complexity of the model (as of number of parameters in model).

• BIC measures the efficiency of the parameterized model in terms of predicting the data.

Page 42: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Properties

20

Page 43: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Properties

20

• BIC is a minimum description length criterion.

Page 44: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Properties

20

• BIC is a minimum description length criterion.

• BIC is independent of the prior.

Page 45: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Bayesian Information Criterion: Properties

20

• BIC is a minimum description length criterion.

• BIC is independent of the prior.• It is closely related to other penalized

likelihood criteria such as RIC and the Akaike information criterion.

Page 46: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

Page 47: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

•Classification error

Page 48: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

•Classification error •The types of errors

Page 49: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve

Page 50: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure

Page 51: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Some Error Metrics

21

•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure•Word Error Rate

Page 52: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Classification Error

22

error =wrongclassificationstotalclassifications

Page 53: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Classification Error

22

error =wrongclassificationstotalclassifications

•Usually expressed in %

Page 54: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Classification Error

22

error =wrongclassificationstotalclassifications

•Usually expressed in %•Most simple and most popular

metric

Page 55: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Types of Errors

23

Page 56: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

ROC Curve

24

True Positive Rate (TPR) = TP / P = TP / (TP + FN)

False Positive Rate (FPR) = FP / N = FP / (FP + TN)

Receiver-Operator Characteristics:

vs

Page 57: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

ROC Curve

24

• Invented in the 1940s (radar detection accuracy)

True Positive Rate (TPR) = TP / P = TP / (TP + FN)

False Positive Rate (FPR) = FP / N = FP / (FP + TN)

Receiver-Operator Characteristics:

vs

Page 58: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

ROC Curve

24

• Invented in the 1940s (radar detection accuracy)

•Said to have become very popular after Pearl Harbor incident

True Positive Rate (TPR) = TP / P = TP / (TP + FN)

False Positive Rate (FPR) = FP / N = FP / (FP + TN)

Receiver-Operator Characteristics:

vs

Page 59: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

ROC Curve

25

Page 60: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

26

Page 61: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

26

•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled

Page 62: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

26

•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled

•Very useful for detection tasks (threshold tuning)

Page 63: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

26

•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled

•Very useful for detection tasks (threshold tuning)

•Very popular in retrieval community

Page 64: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

26

•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled

•Very useful for detection tasks (threshold tuning)

•Very popular in retrieval community

•Equal Error Rate: Point at FN=FP

Page 65: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

DET Curve

27

Page 66: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Precision/Recall

28

Page 67: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Precision/Recall

28

•Precision = True Positive Rate

Page 68: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Precision/Recall

28

•Precision = True Positive Rate•Became popular because of

Google

Page 69: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

F-Measure

29

Page 70: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

F-Measure

29

•Two numbers are hard to compare => F-Measure

Page 71: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

F-Measure

29

•Two numbers are hard to compare => F-Measure

•Harmonic Mean of Precision and Recall

Page 72: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

F-Measure

29

•Two numbers are hard to compare => F-Measure

•Harmonic Mean of Precision and Recall

•Highly debated

Page 73: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Word Error Rate

30

where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.

Page 74: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Word Error Rate

30

Metric for comparing speech recognizers:

where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.

Page 75: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Next Week (Project Meeting)

31

•SeJITs•Project Idea Sketches (from groups)

Page 76: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT

Next Week (Lecture)

32

•Visual Content Analysis