Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Information...

24
Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Information and Computer Science University of California, Irvine [email protected] http://www.ics.uci.edu/~pazzani

Transcript of Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Information...

Representation of Electronic Mail Filtering Profiles: A User Study

Michael J. PazzaniInformation and Computer Science

University of California, Irvine

[email protected]

http://www.ics.uci.edu/~pazzani

Issues Addressed

• Would you let an agent filter your mail?

• If you could examine its filtering criteria, would this increase acceptance?

• Comprehensible filters can reduce legal liabilityThis release of Outlook Express comes equipped with a

new "junk" e-mail "filter. Insofar as Blue Mountain can ascertain, Microsoft's e-mail filter relegates e-mail greeting cards sent from Blue Mountain's web site to a "junk mail" folder for immediate discard, rather than receipt by the user.

• How should the mail filtering profile be represented?

Mail Filtering: Rule-based

• SpamFilter© by Novasoft

• Microsoft Outlook

Learning to Filter Mail

• Vector Space (TF-IDF)- R. Segal and J. Kephart. MailCat: An Intelligent Assistant for Organizing E-Mail. In Proceedings of the Third International Conference on Autonomous Agents, May 1999.

• Rules- Cohen, W. (1996). Learning Rules that Classify E-Mail

• Bayesian- Sahami, M., Dumais, S., Heckerman, D. and E. Horvitz (1998). A

Bayesian approach to filtering junk e-mail.

• Support Vector Machines Dumais, S., Platt, J., Heckerman, D., and Sahami, M. (1998). Inductive learning algorithms and representations for text categorization.

• Neural Networks- Lewis, D., Schapire, R., Callan, J., & Papka, R. (1996). Training algorithms for linear text classifiers.

The paper I was going to write

• Word pairs increase user acceptance of learned rule-based e-mail filters

– Collect representative e-mail messages

– Learned rule-based models with and without word pairs

– Ask users to rate profiles learned under various conditions

– Demonstrate increased acceptance of models with word pairs

Assumptions

• Why Rules?

W. Cohen (1996) “the greater comprehensibility of the rules may be advantageous in a system that allows users to extend or otherwise modify a learned classifier.”

• Word Pairs: Treating two contiguous words as a single term

Restaurant Recommendation: Pazzani (in press)

“goat” vs. “goat cheese” “prime” vs. “prime rib”

General finding: Negligible increase in accuracy of learned profile

Intuition: It might make profiles much more understandable

Ripper Rules: Comprehensible Acceptable

Discard if the message contains our & internetDiscard if the message contains free & callDiscard if the message contains http & comDiscard if the message contains UCI & availableDiscard if the message contains all & our & notDiscard if the message contains business & youDiscard if the message contains by & HumanitiesDiscard if the message contains over & you & canOtherwise Forward

Ripper Rules with Word Pairs;A “floor” effect

Discard if the message contains you can & to beDiscard if the message contains the UCIDiscard if the message contains the internet & if youDiscard if the message contains you can & you haveDiscard if the message contains http://wwwDiscard if the message contains P.M.Discard if the message contains you wantDiscard if the message contains one ofDiscard if the message contains there areDiscard if the message contains please contactOtherwise Forward

Ripper Rules for Forwarding

Forward if the message contains I &not business &not you can Forward if the message contains computer scienceForward if the message contains Subject Re:Forward if the message contains in your &not free Forward if the message contains I &not usForward if the message contains use theOtherwise Discard

Ripper Rules with Style Features

Discard if the message has greater than 5% capital letters & does not contain I & does not contain computingDiscard if there is greater than 1 $ & not theyDiscard if the message contains our & httpDiscard if greater than 2% of the words are in ALL CAPSDiscard if the message contains please &not yourOtherwise Forward

FOCL Rules with Word Pairs

Discard if the message contains not I &not scienceDiscard if the message contains business &not Subject:ReDiscard if the message contains our & internetDiscard if the message contains incomeDiscard if the message contains you can &not all yourDiscard if the message contains the UCIOtherwise Forward

Ripper Rules:80% accurate profile

Discard if the message contains the UCI & to theDiscard if the message contains the internet & you haveDiscard if the message contains http://www & you canDiscard if the message contains are availableDiscard if the message contains you willDiscard if the message contains web siteDiscard if the message contains of the & we areDiscard if the message contains a newOtherwise Forward

Evaluation Criteria for Mail Filtering

• Accuracy (and precision, recall, sensitivity, etc.)

• Efficiency (Learning and Classification)

• Cost Sensitivity• Traceability The ease with which the user can emulate

the categorization using a model.

• Credibility: The degree to which the user believes the decision-making criteria will produce the desirable results.

• Accountability: The degree to which the representation allows a user to distinguish an accurate model from an inaccurate one.

Text classification for e-mail

Pilot Study: People are greater than 95% accurate

Willingness to use profiles

Text classification profiles

Goals: create user understandable/editable create profile that makes errors easy to detect/correct

• Rule-based Representation similar to outlook disappointing results

• SpeculationsRepresentation issues

Are weighted representations less understandable?

Are “prototype” representations more understandable

• HypothesesUsing word pairs as terms make profile more understandable

Using absence of words make profile less understandable

Prototype Representation

IF the message contains more of

papers particular business internet http money us

THANI me Re science problem talk ICS begins

THEN Discard

OTHERWISE Forward

Linear Threshold

IF ( 11"remove" + 10"internet" + 8"http" + 7"call" + 7"business" +5"center"

+3"please" + 3"marketing" + 2"money" + 1"us" + 1"reply" + 1"my" + 1"free"

-14"ICS" - 10"me" - 8"science" - 6"thanks" - 6"meeting" - 5"problem"

-5"begins" - 5"I" - 3"mail" - 3"com" - 2"www" - 2"talk" - 2"homework"

-1"our" - 1"it" - 1"email" - 1"all" - 1) is positiveThen DELETEElse Forward

Linear Threshold with Pairs

IF ( 10"business" + 7"internet" + 6"you can" + 6"http" + 6"center"

+5"our" + 5"e-mail" + 3"money" + 2"the UCI" + 1"I have"

-13"ICS" - 10"I'm" - 7"science" - 7"com" - 6"but I" - 6"Subject: Re"

-5"I" - 4"thanks" - 4"problem" - 4"me" - 4"computer science"

-4"I can" - 2"talk" - 2"mail" - 1"my" - 2) is positive

Prototype Representation with Pairs

IF the message contains more ofcom service us marketing financial 'the UCI' 'http www' 'you

can' 'removed from'

THANI me ICS learning 'Subject: Re:' function 'talk begins'

'computer science' 'the end'

THEN Discard

OTHERWISE Forward

Prototype Representation 80% accurate

IF the message contains more oflooking are over mailing expert reply

‘the subject’ ‘send an’ ‘at UCI’

THANdone I research sorry science because minute

overview similar ‘of it’ ‘need to’ ‘a minute’

THEN Discard

OTHERWISE Forward

Preferences

Algorithm Mean Rating

Rules 0.015

Rules (Pairs) -0.135

Rules (Noise) -0.105

Linear Model 0.421

Linear Model (Pairs) 0.518

Linear Model (Noise) -0.120

Prototype 0.677

Prototype (Pairs) 1.06

Prototype (Noise) 0.195

• The following differences were highly significant (at least at the .005 level).

Prototype representations with word pairs received higher ratings than rule representations with word pairs t(132) = 5.64. Inaccurate prototype models (learned from noisy training data) are less acceptable to users than accurate ones t(132)= 4.88.

• The following differences were significant (at least at the .05 level).

Prototype representations with word pairs received higher ratings than linear model representations with word pairs t(132) = 2.84. Inaccurate linear models are less acceptable to users than accurate ones. t(132)=2.99.

• The following difference was marginally significant (between the 0.1 and .05 level).

For prototype representations using word pairs as terms increases user ratings: t(132) = 2.37.

Learning Prototype:A First Pass

• Genetic AlgorithmInstance is a pair of terms vectors

128 most informative termsInitialized with 10% of features of each example

Fitness function: number correct on training data Operators

breedingmutation

results on mail, S&W data: as good as anything else

Algorithm Mail Goats Sheep BandsPerceptron 83.6 65.7 80.0 65.9Nearest 81.4 71.4 75.7 70.4ID3 82.3 72.8 86.3 68.8Naïve Bayes 90.1 72.6 81.4 70.7Rocchio 84.9 70.1 78.5 67.6Prototype 87.1 72.8 84.2 71.4