A principled approach for rejection threshold optimization Dan Bohusdbohus Alexander I. Rudnickyair...
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of A principled approach for rejection threshold optimization Dan Bohusdbohus Alexander I. Rudnickyair...
a principled approach for rejection threshold optimization
Dan Bohus www.cs.cmu.edu/~dbohusAlexander I. Rudnicky www.cs.cmu.edu/~air
Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15217
2
understanding errors and rejection
systems often misunderstand
use confidence scores
common design pattern compare input confidence against a threshold reject utterance if confidence is too low
may lead to false rejections
3
0 10.5 0.750.25
rejection threshold
0%
25%
50%
75%
misunderstandings vs. false rejections
rejection tradeoff
misunderstandings
false rejections
4
0 10.5 0.750.25
rejection threshold
misunderstandings vs. false rejections correctly vs. incorrectly transferred
concepts
rejection tradeoff
correctly transferred concepts / turn
incorrectly transferred
5
given this trade-off, how can we optimize the rejection threshold in a principled fashion?
question
7
current solutions follow ASR manual [Nuance documentation]
acknowledge the tradeoff + postulate costs “misunderstandings are X times more costly
than false rejections” [Raymond et al 2004; Kawahara et al, 2000; Cuayahuitl et al, 2002]
costs are likely to differ across domains / systems across dialog states within a system
8
proposed approach
derive costs in a principled fashion1. identify a set of variables involved in the tradeoff
correctly and incorrectly transferred concepts per turn (CTC, ITC)
CTC
ITC
2. choose a dialog performance metrictask completion (binary, kappa) – TC;
3. build a regression modellogit(TC) ← C0 + CCTC•CTC + CITC•ITC
4. optimize threshold to maximize performanceth* = argmax (CCTC•CTC + CITC•ITC)
9
state-specific costs
costs are different in different dialog states
CTC and ITC on a per-state basis logit(TC) ← C0 +
CCTCstate1•CTCstate1 + CITCstate1•ITCstate1+
CCTCstate2•CTCstate2 + CITCstate2•ITCstate2+
CCTCstate3•CTCstate3 + CITCstate3•ITCstate3+
…
optimize separate threshold for each state
thstate_x* = argmax (CCTCstate_x•CTCstate_x + CITCstate_x•ITCstate_x)
11
data
collected using RoomLine phone-based, mixed-initiative spoken dialog
system conference room reservations sphinx-2 utterance-level confidence annotator [0-1]
46 participants (first-time users) 10 scenario-driven interactions
corpus 449 dialog sessions 8278 user turns manually labeled decoded concept
“correctness”
12
roomline states
71 “dialog states” total clustered into 3 classes
open-requestHow may I help you?
request(bool)Would you like a reservation for this room?
Would you like a room with a projector?
request(non-bool)For what time would you like to reserve the room?
13
results: task success model
Baseline Train Cross-V p
AVG-LL -0.4655 -0.2952 -0.3059 < 10-4
HARD 17.62% 11.66% 11.75%
model predicting binary task success
sepCoeffVariable
1.10460.0018-3.441ITC / request(non-bool)
0.81370.00172.5514CTC / request(non-bool)
1.30980.6491-0.5959ITC / request(bool)
1.00760.00103.3127CTC / request(bool)
0.46340.3801-0.4067ITC / open-request
0.29550.06190.5518CTC / open-request
1.15040.0416-2.3442Const
cost coefficients
14
results: threshold optimization
correctly transferred concepts per turn
incorrectly transferred concepts per turn
utility = 0.55 x CTC – 0.40 x ITC
open-request
0 10.50.25 0.75
1
0.5
0
sepCoeffVariable
1.10460.0018-3.441ITC / request(non-bool)
0.81370.00172.5514CTC / request(non-bool)
1.30980.6491-0.5959ITC / request(bool)
1.00760.00103.3127CTC / request(bool)
0.46340.3801-0.4067ITC / open-request
0.29550.06190.5518CTC / open-request
1.15040.0416-2.3442Const
cost coefficients
15
results: threshold optimizationrequest(bool)
utility = 3.31 x CTC – 0.60 x ITC
0 10.50.25 0.75
3
2
1
0
utility profiles are different across the three states
task duration models lead to similar results
correctly transferred concepts per turn
incorrectly transferred concepts per turn
utility = 0.55 x CTC – 0.40 x ITC
open-request
0 10.50.25 0.75
1
0.5
0
request(non-bool)
0 10.50.25 0.750.6
utility = 2.55 x CTC – 3.44 x ITC
0
1
0.5
16
conclusion
principled method for optimizing rejection threshold
determine costs for various types of understanding errors data-driven approach can derive state-specific costs
bridge mismatches between off-the-shelf confidence annotators and domain
19
Current New Estimate Delta
Open-requestCTC 0.54 0.89 +0.35
ITC 0.16 0.31 +0.15
Requestbool
CTC 0.84 0.86 +0.02
ITC 0.09 0.12 +0.03
Requestnon-bool
CTC 0.72 0.66 -0.06
ITC 0.25 0.17 -0.08
Current New Estimate Delta
Task success 82.75% 87.16% +4.41%
Remains to be seen …
expected changes in task success
20
task duration model
Variable Coeff p se
Const 1.2750 0.0000 0.1019
CTC / oreq -0.1769 0.0000 0.0187
ITC / oreq -0.1567 0.0001 0.0401
CTC / req(bool) -0.7865 0.0000 0.0869
ITC / req(bool) -0.6389 0.0000 0.1297
CTC / req(non-bool) -0.5127 0.0000 0.0440
ITC / req(non-bool) 0.4256 0.0000 0.0851