The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

17
The TUM Cumulative DTW Approach for the Spoken Web Search Task Cyril Joder, Felix Weninger , Martin Wöllmer, Björn Schuller Institute for Human-Machine Communication Technische Universität München

description

 

Transcript of The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Page 1: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

The TUM Cumulative DTW Approach for the Spoken Web

Search Task

Cyril Joder, Felix Weninger, Martin Wöllmer, Björn Schuller

Institute for Human-Machine CommunicationTechnische Universität München

Page 2: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Summary

• Not a „system“• Low-level features only• No ASR• Little „engineering“• Method of integrating discriminative

training into DTW

Mediaeval 2012 Workshop 2

Page 3: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Cumulative DTW (CDTW)

• Limitations of DTW: – Only one local cost function (distance)– Usually manual parameter tuning

• Idea: – Use different local cost functions for each step – Automatic learning of these functions as

combination of general features

Mediaeval 2012 Workshop 3

Page 4: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

From DTW to CDTW

• Local cost function:

Mediaeval 2012 Workshop 4

(𝑖 , 𝑗)

(𝑖 , 𝑗−1)

(𝑖−1 , 𝑗 )

(𝑖−1 , 𝑗−1)

𝛼1

𝛼2

𝛼3

• Dynamic Programming:

Page 5: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

From DTW to CDTW

• Local step function:

Mediaeval 2012 Workshop 5

(𝑖 , 𝑗)

(𝑖 , 𝑗−1)

(𝑖−1 , 𝑗 )

(𝑖−1 , 𝑗−1)

𝑠1(𝑖 , 𝑗)

𝑠2(𝑖 , 𝑗)

𝑠3( 𝑖 , 𝑗)

• Dynamic Programming:

Page 6: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

From DTW to CDTW

• Local step function:

Mediaeval 2012 Workshop 6

(𝑖 , 𝑗)

(𝑖 , 𝑗−1)

(𝑖−1 , 𝑗 )

(𝑖−1 , 𝑗−1)

𝑠1(𝑖 , 𝑗)

𝑠2(𝑖 , 𝑗)

𝑠3( 𝑖 , 𝑗)

• Dynamic Programming:

Page 7: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Softmax?

+ Differentiable– Allow for an optimization of the

+ Combine several alignment paths– More robust to local changes

- Only give a score (not the optimal path)

Mediaeval 2012 Workshop 7

Page 8: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Features

• Acoustic descriptors: MFCC++ (D=36) – HTK, 25 ms, CMN, Global normalization

• Features (k=1…D):– Local distance

– “Local self-similarity”

; – Distance / product of the self-similarities

Mediaeval 2012 Workshop 8

Page 9: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Decision

• Are the two sequences instances of the same word/expression?

• Learning of the parameters.– Backpropagation (stochastic gradient descent)– Training data: queries/utterances of dev set

Mediaeval 2012 Workshop 9

Decision𝑆( 𝐼 , 𝐽 )𝐼+ 𝐽

Page 10: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Search Procedure

Given query and utterance

1) Feature extraction

2) Candidate search in

3) CDTW comparison

4) Score post-processing

Mediaeval 2012 Workshop 10

Page 11: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Candidate Search

• Align query with entire utterance– CDTW with backtracking– “Scores” for each point

• Extract potential starts and ends– Peak-picking of scores

• Filter by duration– Only allow warping factors < 2

Mediaeval 2012 Workshop 11

Page 12: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Candidate Search

• Align query with entire utterance– CDTW with backtracking– “Scores” for each point

• Extract potential starts and ends– Peak-picking of scores

• Filter by duration– Only allow warping factors < 2

Mediaeval 2012 Workshop 12

Page 13: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

CDTW Score Post-Processing

• Same decision function as for learning– Many false positives– Bias toward some queries

• Heuristic post-processing:– For each query, subtract a specific threshold– Threshold: 90-th percentile of the CDTW

scores for that query

Mediaeval 2012 Workshop 13

Page 14: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Results

run devQ-devC evalQ-devC devQ-evalC evalQ-evalC

P(miss) 55.6% 59.5% 60.2% 54.5%

P(FA) 1.18% 1.13% 1.17% 1.13%

ATWV 0.263 0.333 0.164 0.290

Mediaeval 2012 Workshop 14

• Great improvement over naive DTW– ATWV = 0.065 on devQ-devC

• ATWV scores depend on the run

Page 15: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Results

• DET curves similar

• CDTW seems to generalize well

• Decision function has to be improved

Mediaeval 2012 Workshop 15

Page 16: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Conclusion

• CDTW: promising results– Data-based approach with satisfactory results– Significantly outperforms (naive) DTW– Good generalization

• Future work:– Decision function– Acoustic descriptors– Integrate „hard“ path constraints into search

Mediaeval 2012 Workshop 16

Page 17: The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task

Thank you.

[email protected]

Mediaeval 2012 Workshop 17