CHI'07: Biases in Human Estimation of Interruptibility
-
Upload
cptpositive -
Category
Travel
-
view
1.826 -
download
0
description
Transcript of CHI'07: Biases in Human Estimation of Interruptibility
1
Avrahami, Fogarty, Hudson
Biases in Human Estimation of Interruptibility:
Effects and Implications for Practice
Daniel Avrahami, James Fogarty & Scott Hudson
Avrahami, Fogarty, Hudson
Avrahami, Fogarty, Hudson
Introduction
Estimating someone else’s interruptibility is something we do every day At home, at work, at the store
…but it’s also something we’re not always good at
This becomes much harder when done over a distance
Let’s try this together:
Avrahami, Fogarty, Hudson
Estimating Interruptibility
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
30 seconds
Avrahami, Fogarty, Hudson
Estimating Interruptibility
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
Avrahami, Fogarty, Hudson
Prompting for Self-Report
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
Avrahami, Fogarty, Hudson
Prompting for Self-Report
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
Avrahami, Fogarty, Hudson
And the answer is…
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
Avrahami, Fogarty, Hudson
And the answer is…
1 2 3 4 5Highly Interruptible Highly Non-Interruptible
3
Avrahami, Fogarty, Hudson
Goals
A better understanding on the biases in estimating others’ interruptibility can inform the design of CMC and awareness systems
Provide an insight into how people are likely to use different pieces of contextual information For example, the state of an office door was significantly
correlated with errors in estimation
A lot of previous work on CMC and awareness systems [cf. Fish’90, Mantei’91,Dourish’92, Bly’93, Adler’94, Tang’01]
Avrahami, Fogarty, Hudson
Research Questions
Which contextual cues (e.g., working on the computer) affect the error in human estimation of another person’s interruptibility?
What is the source of a contextual cue’s effect on the bias in estimation? e.g., overrating the strength of a cue
Avrahami, Fogarty, Hudson
Talk Outline
Study Design
Measures and Analysis
Results
Conclusions and Future work
Details (f-measures, etc.)
13
Avrahami, Fogarty, Hudson
Study Design
Avrahami, Fogarty, Hudson
Study Design
Compare Self-Reports of Interruptibility (Reported) with Estimations of Interruptibility (Estimated)
Two groups of participants: Reporters (in their natural work environment) Estimators (viewing video and audio recordings)
Avrahami, Fogarty, Hudson
Reporters
Four high-level staff members (3 females, 1 male) Audio and video recordings in their offices during a
month-long period Experience-sampling method used to collect self-
reports of interruptibility at random intervals (30-minutes average)
672 self-reports and over 600 hours of data
[Used in Hudson’03, Fogarty’05 for the creation of predictive models]
Avrahami, Fogarty, Hudson
Estimators
40 subjects (Online recruitment, majority were students)
Watched 15- or 30-second clips of reporters Ensured that did not know the Reporters
Provided estimates of interruptibility 60 clips each
Allowed to watch each clip as many times as wanted
Task lasted about 1 hour
[Used in Fogarty’05 to compare performance of estimators vs. ML]
Avrahami, Fogarty, Hudson
Estimators
40 subjects (Online recruitment, majority were students)
Watched 15- or 30-second clips of reporters Ensured that did not know the Reporters
Provided estimates of interruptibility 60 clips each
Allowed to watch each clip as many times as wanted
Task lasted about 1 hour
[Used in Fogarty’05 to compare performance of estimators vs. ML]
Avrahami, Fogarty, Hudson
Estimations vs. Self-Reports
Tested the relationship between Estimations and Reports:
Estimated Interruptibility was significantly correlated with Reported Interruptibility (p<.001)
(This is good). Means that Estimators examined the situation presented to them
Estimated Interruptibility was significantly different from Reported Interruptibility (p<.001)
Estimators, on average, estimated Reporters to be more interruptible than reported
(Two outliers’ data excluded from the remaining analyses)
19
Avrahami, Fogarty, Hudson
Measures and Analysis
Avrahami, Fogarty, Hudson
Contextual Cues
Coded by six paid coders
For each 15 seconds segment, coded for a large set of contextual cues that could be coded reliably Reporters activities Guest activities Environmental cues
Inter-coder agreement was 93.4% (evaluated on 5% of the data)
Avrahami, Fogarty, Hudson
Contextual Cues (cont.)
PhoneSocial Engagement
Computer
Desk
PapersFile Cabinet
Food
Writing
Door is ClosedDrink
Standing
Present
Avrahami, Fogarty, Hudson
“Estimation Error”
Estimation Error = Reported – EstimatedReporte
d
4
3
1
2
5
2Estimated1 3 4 5
Avrahami, Fogarty, Hudson
4
3
3
2
2
1
1
1
12
-1
-1
-1
-1
-2
-2
-2
-3
-3
-40
0
0
0
0
“Estimation Error”
Estimation Error = Reported – EstimatedReporte
d
4
3
1
2
5
2Estimated1 3 4 5
Avrahami, Fogarty, Hudson
“Estimation Error”
Estimation Error = Reported – EstimatedReporte
d
4
3
1
2
5
-1
-1
-1
-1
-2
-2
-2
-3
-3
-40
0
0
0
04
3
3
2
2
1
1
1
12
0
0
0
0
0
Under-estimation
Over-estimation
2Estimated1 3 4 5
Avrahami, Fogarty, Hudson
Analysis Approach
Step 1: Find which cues have an effect on Estimation Error Effect on Under-Estimation errors? Effect on Over-Estimation errors?
Step 2: Investigate the cause for a cue’s effect on Estimation Error Effect on Reported Interruptibility? Effect on Estimated Interruptibility?
„‚
Avrahami, Fogarty, Hudson
Analysis Approach
Step 1: Find which cues have an effect on Estimation Error Effect on Under-Estimation errors? Effect on Over-Estimation errors?
Step 2: Investigate the cause for a cue’s effect on Estimation Error Effect on Reported Interruptibility? Effect on Estimated Interruptibility?
„‚
For example:
Found that the Reporter using the had a
• Significant effect on Over-Estimation (no significant effect on Under-Estimation)
• Significant effect on Estimated Interruptibility, but…no significant effect on Reported Interruptibility
“Considering a cue that is not significant”
For example:
Found that the Reporter using the had a
• Significant effect on Over-Estimation (no significant effect on Under-Estimation)
• Significant effect on Estimated Interruptibility, but…no significant effect on Reported Interruptibility
“Considering a cue that is not significant”
‚
Avrahami, Fogarty, Hudson
A couple of notes on the analysis
A self-report determines the possible range of Estimation Errors Reported = 1, Error can be -4 … 0 Reported = 5, Error can be 0 … 4
=> Need to include the Reported Interruptibility as a control measure in the analysis of error
All done using Mixed Model analysis
Avrahami, Fogarty, Hudson
Results
Avrahami, Fogarty, Hudson
Under-estimated when the reporter was socially engaged
Over-estimated when the reporter wasn’t socially engaged
Social Engagement
„ ‚
“Overrating the strength of a cue”
Avrahami, Fogarty, Hudson
Over-estimated reporter’s interruptibility
when wasn’t using the phone
Phone
‚
“Overrating the strength of a cue”
Avrahami, Fogarty, Hudson
Greater over-estimation error when the reporter was standing
Reporter Standing significantly correlated with situation more interruptible (both R,E) Link between physical transitions and interruptions [Ho’05]
Reporter is Standing
‚
“Overrating the strength of a cue”
Avrahami, Fogarty, Hudson
Computer
Estimators more likely to interpret a situation as more interruptible than reported when the Reporter was using the computer Link to issues of online presence and availability
‚
“Considering a cue that is not significant”
Avrahami, Fogarty, Hudson
Under-estimated when the door was closed Correlation between the state of the door
and Reported interruptibility was not significant
Door is Closed
„ ‚
“Considering a cue that is not significant”
Avrahami, Fogarty, Hudson
Estimators assessing Reporters as more interruptible when they were drinking
Correlation between drinking and Reported interruptibility was not significant
Drink
‚
“Considering a cue that is not significant”
35
Avrahami, Fogarty, Hudson
Conclusions and Future Work
Avrahami, Fogarty, Hudson
Conclusions
Presented results from an in-depth analysis of causes for biases in human estimation of interruptibility
Compared self-reports, collected in the field, and estimations based on audio and video recordings
Avrahami, Fogarty, Hudson
Conclusions (cont.)
Findings suggest that providing too much information may not only be a concern for privacy, but may lead to errors in estimations Sharing certain contextual cues will likely result in
misinterpretations of a person’s interruptibility
A new system, informed by our results, could Avoid exposing certain cues (or specific levels of cues) Enhance (or moderate) others
Avrahami, Fogarty, Hudson
Future Work
Examine the effect of other clip-durations
Examine the effect of degree of familiarity between reporter and estimator on estimation errors
Observe reporters in other settings and jobs
Investigate the use of these findings for effective creation of computer-supported communication and awareness systems
Avrahami, Fogarty, Hudson
Acknowledgements
Yaakov Kareev
Darren Gergle
Laura Dabbish
Joonhwan Lee
40
Avrahami, Fogarty, Hudson
This work was funded in part by NSF Grants IIS-0121560, IIS-0325351, and by DARPA Contract No. NBCHD030010
Thank you
for more info visit: www.cs.cmu.edu/~nx6
or email:[email protected]
[email protected]@cs.cmu.edu
Avrahami, Fogarty, Hudson
FAQ
2PT
Y-SPRTE
LEN1530
Avrahami, Fogarty, Hudson
Why separate over and under?
Shouldn’t just use absolute or squared error because over and under estimation will make it seem like there is no effect
Shouldn’t just put all together because errors will cancel each other out
back
Avrahami, Fogarty, Hudson
Is the length of the clips reasonable?
The information available to Estimators in this study (15/30 second video+audio clips) was similar to information available to users of media space systems, and far richer than information available in most awareness systems
back
Avrahami, Fogarty, Hudson
Why not use a 2-point scale?
With 2 levels, Estimation is either 100% correct, or 100% incorrect With 5-levels, we get degrees of error
Doesn’t make sense asking Reporters and Estimators to make a binary decision when Don’t know what interruption is about Don’t know whom the interruption is from
Couldn’t analyze 5-scale data as 2-points If Reported=1 and Estimated=4, should we count as
correct??back