A principled approach for rejection threshold optimization Dan Bohusdbohus Alexander I. Rudnickyair...

21
a principled approach for rejection threshold optimization Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of A principled approach for rejection threshold optimization Dan Bohusdbohus Alexander I. Rudnickyair...

a principled approach for rejection threshold optimization

Dan Bohus www.cs.cmu.edu/~dbohusAlexander I. Rudnicky www.cs.cmu.edu/~air

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15217

2

understanding errors and rejection

systems often misunderstand

use confidence scores

common design pattern compare input confidence against a threshold reject utterance if confidence is too low

may lead to false rejections

3

0 10.5 0.750.25

rejection threshold

0%

25%

50%

75%

misunderstandings vs. false rejections

rejection tradeoff

misunderstandings

false rejections

4

0 10.5 0.750.25

rejection threshold

misunderstandings vs. false rejections correctly vs. incorrectly transferred

concepts

rejection tradeoff

correctly transferred concepts / turn

incorrectly transferred

5

given this trade-off, how can we optimize the rejection threshold in a principled fashion?

question

6

outline

current solutions

proposed approach

data

results

conclusion

7

current solutions follow ASR manual [Nuance documentation]

acknowledge the tradeoff + postulate costs “misunderstandings are X times more costly

than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002]

costs are likely to differ across domains / systems across dialog states within a system

8

proposed approach

derive costs in a principled fashion1. identify a set of variables involved in the tradeoff

correctly and incorrectly transferred concepts per turn (CTC, ITC)

CTC

ITC

2. choose a dialog performance metrictask completion (binary, kappa) – TC;

3. build a regression modellogit(TC) ← C0 + CCTC•CTC + CITC•ITC

4. optimize threshold to maximize performanceth* = argmax (CCTC•CTC + CITC•ITC)

9

state-specific costs

costs are different in different dialog states

CTC and ITC on a per-state basis logit(TC) ← C0 +

CCTCstate1•CTCstate1 + CITCstate1•ITCstate1+

CCTCstate2•CTCstate2 + CITCstate2•ITCstate2+

CCTCstate3•CTCstate3 + CITCstate3•ITCstate3+

optimize separate threshold for each state

thstate_x* = argmax (CCTCstate_x•CTCstate_x + CITCstate_x•ITCstate_x)

10

outline

current solutions

proposed approach

data

results

conclusion

11

data

collected using RoomLine phone-based, mixed-initiative spoken dialog

system conference room reservations sphinx-2 utterance-level confidence annotator [0-1]

46 participants (first-time users) 10 scenario-driven interactions

corpus 449 dialog sessions 8278 user turns manually labeled decoded concept

“correctness”

12

roomline states

71 “dialog states” total clustered into 3 classes

open-requestHow may I help you?

request(bool)Would you like a reservation for this room?

Would you like a room with a projector?

request(non-bool)For what time would you like to reserve the room?

13

results: task success model

Baseline Train Cross-V p

AVG-LL -0.4655 -0.2952 -0.3059 < 10-4

HARD 17.62% 11.66% 11.75%

model predicting binary task success

sepCoeffVariable

1.10460.0018-3.441ITC / request(non-bool)

0.81370.00172.5514CTC / request(non-bool)

1.30980.6491-0.5959ITC / request(bool)

1.00760.00103.3127CTC / request(bool)

0.46340.3801-0.4067ITC / open-request

0.29550.06190.5518CTC / open-request

1.15040.0416-2.3442Const

cost coefficients

14

results: threshold optimization

correctly transferred concepts per turn

incorrectly transferred concepts per turn

utility = 0.55 x CTC – 0.40 x ITC

open-request

0 10.50.25 0.75

1

0.5

0

sepCoeffVariable

1.10460.0018-3.441ITC / request(non-bool)

0.81370.00172.5514CTC / request(non-bool)

1.30980.6491-0.5959ITC / request(bool)

1.00760.00103.3127CTC / request(bool)

0.46340.3801-0.4067ITC / open-request

0.29550.06190.5518CTC / open-request

1.15040.0416-2.3442Const

cost coefficients

15

results: threshold optimizationrequest(bool)

utility = 3.31 x CTC – 0.60 x ITC

0 10.50.25 0.75

3

2

1

0

utility profiles are different across the three states

task duration models lead to similar results

correctly transferred concepts per turn

incorrectly transferred concepts per turn

utility = 0.55 x CTC – 0.40 x ITC

open-request

0 10.50.25 0.75

1

0.5

0

request(non-bool)

0 10.50.25 0.750.6

utility = 2.55 x CTC – 3.44 x ITC

0

1

0.5

16

conclusion

principled method for optimizing rejection threshold

determine costs for various types of understanding errors data-driven approach can derive state-specific costs

bridge mismatches between off-the-shelf confidence annotators and domain

17

thank you

18

fit for task success model

19

Current New Estimate Delta

Open-requestCTC 0.54 0.89 +0.35

ITC 0.16 0.31 +0.15

Requestbool

CTC 0.84 0.86 +0.02

ITC 0.09 0.12 +0.03

Requestnon-bool

CTC 0.72 0.66 -0.06

ITC 0.25 0.17 -0.08

Current New Estimate Delta

Task success 82.75% 87.16% +4.41%

Remains to be seen …

expected changes in task success

20

task duration model

Variable Coeff p se

Const 1.2750 0.0000 0.1019

CTC / oreq -0.1769 0.0000 0.0187

ITC / oreq -0.1567 0.0001 0.0401

CTC / req(bool) -0.7865 0.0000 0.0869

ITC / req(bool) -0.6389 0.0000 0.1297

CTC / req(non-bool) -0.5127 0.0000 0.0440

ITC / req(non-bool) 0.4256 0.0000 0.0851

21

Model 2: Resulting fit and coefficients

R^2 = 0.56

intro : data collection : rejection threshold