End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of...

35
End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science http://www.eecs.oregonstate.edu/~wong

Transcript of End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of...

Page 1: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

End-User Debugging of Machine Learning Systems

Weng-Keen WongOregon State UniversitySchool of Electrical Engineering and Computer Sciencehttp://www.eecs.oregonstate.edu/~wong

Page 2: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Collaborators

• Margaret Burnett

• Simone Stumpf

• Tom Dietterich

• Jon Herlocker

• Erin Fitzhenry

• Lida Li

• Ian Oberst

• Vidya Rajaram

• Russell Drummond

• Erin Sullivan

Faculty Grad Students Undergrads

Page 3: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Papers

Stumpf S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E., Drummond R., Herlocker J. (2007) . Toward Harnessing User Feedback For Machine Learning. In Proceedings of IUI 2007.

Stumpf, S., Rajaram V., Li L., Wong, W.-K., Burnett, M., Dietterich, T., Sullivan, E., Herlocker, J. (2008) Interacting Meaningfully with Machine Learning Systems: Three Experiments. (Submitted to IJHCS)

Stumpf, S., Sullivan, E., Fitzhenry, E., Oberst, I., Wong, W.-K., Burnett., M. (2008). Integrating Rich User Feedback into Intelligent User Interfaces. In Proceedings of IUI 2008.

Page 4: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

MotivationDate: Mon, 28 Apr 2008 23:59:00 (PST)From: John Doe <[email protected]>To: Weng-Keen Wong <[email protected]>Subject: CS 162 Assignment

I can’t get my Java assignment to work! It just won’t compile and it prints out lots of error messages! Please help!

public class MyFrame extends JFrame {

private AsciiFrameManager reader;

private JPanel displayPanel;

public MyFrame(String filename) throws Exception {reader = new AsciiFrameManager(filename);displayPanel = new JPanel();

...

CS 162

John Doe

Trash

?

• Machine learning tool adapts to end user

• Similar situation in recommender systems, smart desktops, etc.

Page 5: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

MotivationDate: Mon, 28 Apr 2008 23:51:00 (PST)From: Bella Bose <[email protected]>To: Weng-Keen Wong <[email protected]>Subject: Teaching Assignments

I’ve compiled the teaching preferences for all the faculty. Here are the teaching assignments for next year:

Fall QuarterCS 160 (Computer Science Orientation) – Paul PaulsonCS 161 (Introduction to Programming I) – Chris WallaceCS 162 (Introduction to Programming II) – Weng-Keen Wong...

Trash

• Machine Learning systems are great when they work correctly, aggravating when they don’t

• The end user is the only person at the computer

• Can we let end users correct machine learning systems?

Page 6: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

6

Motivation

Learn to correct behavior quickly Sparse data on start Concept drift

Rich end-user knowledge Effects of user feedback on accuracy? Effects on users?

Page 7: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Overview

ExplanationEnd user feedback

End-User

Machine Learning Algorithm

Page 8: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Related WorkExplanation

• Expert Systems (Swartout 83, Wick and Thompson 92)

• TREPAN (Craven and Shavlik 95)

• Description Logics (McGuinness 96)

• Bayesian networks (LaCave and Diez 00)

• Additive classifiers (Poulin et al. 06)

• Others (Crawford et al. 02, Herlocker et al. 00)

End user interaction

• Active Learning (Cohn et al. 96, many others)

• Constraints (Altendorf et al. 05, Huang and Mitchell 06)

• Ranks (Radlinski and Joachims 05)

• Feature Selection (Raghavan et al. 06)

• Crayons (Fails and Olsen 03)

• Programming by Demonstration (Cypher 93, Lau and Weld 99, Lieberman 01)

Page 9: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

9

Outline

1. What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007)

2. How do we incorporate this feedback into a ML algorithm? (IJHCS 2008)

3. What happens when we put this together? (IUI 2008)

Page 10: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

What Types of Explanations do End Users Understand? Thinkaloud study with 13

participants Classify Enron emails Explanation systems: rule-based,

keyword-based, similarity-based Findings:

Rule-based best but not a clear winner Evidence indicates multiple

explanation paradigms needed

Page 11: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

What types of corrective feedback could end users provide?

Suggested corrective feedback in response to explanations:

1. Adjust importance of word2. Add/remove word from consideration3. Parse / extract text in a different way4. Word combinations5. Relationships between

messages/people

Page 12: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

12

Outline

1. What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007)

2. How do we incorporate this feedback into a ML algorithm? (IJHCS 2008)

3. What happens when we put this together? (IUI 2008)

Page 13: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Incorporating Feedback into ML Algorithms

Two approaches: Constraint-based User co-training

Page 14: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Constraint-based approach

Constraints:1. If weight on word reduced or word removed,

remove the word as a feature2. If weight of word increased, word assumed to

be important for that folder

3. If weight of word increased, word is a better predictor for that folder than other words

)1|()1|( kkjk xyYPxyYP

)|1()|1( kjkj yYxPyYxP

Estimate parameters for Naive Bayes using MLE with these constraints

Page 15: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Standard Co-training

Create classifiers C1 and C2 based on the two independent feature sets.

Repeat i timesAdd most confidently classified messages by any classifier to training data

Rebuild C1 and C2 with the new training data

Page 16: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

User Co-training

CUSER = “Classifier” based on user feedback

CML = Machine learning algorithm

For each “session” of user feedback

Add most confidently classified messages by CUSER to training data

Rebuild CML with the new training data

Page 17: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

User Co-training

CUSER = “Classifier” based on user feedback

CML = Machine learning algorithm

For each “session” of user feedback

Add most confidently classified messages by CUSER to training data

Rebuild CML with the new training data

We’ll expand the inner loop on the next slide

Page 18: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

User Co-training

For each folder f, let vector vf = words with weights increased by the user

For each message m in the unlabeled set For each folder f, Compute Probf from the machine learning classifier Scoref=# of words in vf appearing in the message * Probf

Scorem=Scorefmax –Scoreother

Sort Scorem for all messages in decreasing order

Select the top k messages to add to the training set along with their folder label fmax

Rebuild CML with the new training data

fmax ScorefFoldersf

maxarg

fother ScoreScoremax\

maxfFoldersf

Page 19: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Constraint-based vs User co-training

Constraint-based Difficult to set “hardness” of constraint Constraints often already satisfied End-user can over-constrain the

learning algorithm Slow

User co-training Requires unlabeled emails in inbox Better accuracy than constraint-based

Page 20: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Results

0%10%20%30%40%50%60%70%80%90%

100%

Algorithm

Accura

cy

0%10%20%30%40%50%60%70%80%90%

100%

Algorithm

Acc

ura

cy

Feedback from keyword-based paradigm

Feedback from similarity-based paradigm

Page 21: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

21

Outline

1. What types of explanations work for end users? What types of corrective feedback could end users provide? (IUI 2007)

2. How do we incorporate this feedback into a ML algorithm? (IJHCS 2008)

3. What happens when we put this together? (IUI 2008)

Page 22: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Experiment: Email program

22

Page 23: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Experiment: Procedure

Intelligent email system to classify emails into folders 43 English-speaking, non-CS students Background questionnaire Tutorial (email program and folders) Experiment task on feedback set

Correct folders. Add, remove, change weight on keywords.

30 interaction logs Post-session questionnaire

23

Page 24: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Experiment: Data

Enron data set 9 folders 50 training messages

10 each for 5 folders with folder labels 50 feedback messages

For use in experiment Same for each participant

1051 test messages For evaluation after experiment

24

Page 25: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Experiment: Classification algorithm “User co-training”

Two classifiers: User, Naïve Bayes Slight modification on user classifier

Scoref=sum of weights in vf appearing in the message

Weights can be modified interactively by user

25

Page 26: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Results: Accuracy improvements of rich feedback

26

Rich Feedback: participant folder labels and keyword changes

Folder feedback: participant folder labels

Subject

Accuracy Δ over folder feedback

Page 27: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Results: Accuracy improvements of rich feedback

27

Rich Feedback: participant folder labels and keyword changes

Baseline: original Enron labels

Subject

Accuracy Δ over baseline

Page 28: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Results: Accuracy summary

60% of participants saw accuracy improvements, some very substantial

Some dramatic decreases More time between filing emails or more

folder assignments → higher accuracy

29

Page 29: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Interesting bits

1. Need to communicate the effects of the user’s corrective feedback

2. Unstable classifier period With sparse training data, a single new

training example can dramatically change the classifier’s decision boundaries

Wild fluctuations in classifier’s predictions frustrate end users

Causes “wall of red”

Page 30: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Interesting bits: Unstable classifier period

31

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 50 100 150 200 250 300 350

Number of training data points

Acc

ura

cy

Moved test emails into training set to look for effect on accuracy (Baseline, participant 101)

Page 31: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Interesting bits

3. “Unlearning” important, especially to correct undesirable changes

4. Gender differences Females took longer to complete Females added twice as many

keywords Comment more on unlearning

Page 32: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Interesting directions for HCI

1. Gender differences2. More directed debugging3. Other forms of feedback4. Communicating effects of corrective

feedback Users need to detect the system is

listening to their feedback

5. Explanations Form Fidelity

Page 33: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

Interesting directions for Machine Learning

1. Algorithms for learning from corrective feedback

2. Modeling reliability of user feedback

3. Explanations4. Incorporating new features

Page 34: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

35

Future work

ML Whyline (with Andy Ko)

Page 35: End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science wong.

For more information

[email protected]

www.eecs.oregonstate.edu/~wong

36