Chad Mills Program Manager Windows Live Safety Platform Microsoft.
-
Upload
giles-clarke -
Category
Documents
-
view
213 -
download
0
Transcript of Chad Mills Program Manager Windows Live Safety Platform Microsoft.
Chad MillsProgram Manager
Windows Live Safety PlatformMicrosoft
features
Training Time
score
feature weights
(feature,weight) pairs
features
Run Time
training
Assumption: Spam words continue to appear in spam messages Good words continue to appear in good messages
milliondollarstransferguardian
Marchcommunit
ysocialfellow
(dollars, 0.2)(million, 0.1)(transfer, 0.1)(community, -
0.01)(social, -0.01)(fellow, -0.01)(guardian, 0.03)(March, -0.08)
0.37
-0.11
From: "Chelsea Clark" <[email protected]>
Subject: Get PaidFor yourOpinion
<style>…<br Bij board bar atteindre jYST GCS re sonrisa fuse Kiviuq padded />
<br Star Honolulu />
<br Ons apporter />
opens NRSU syringe />
<br Jerusalem comfort HTTPS 2604 confidence Miles />
<br 27 mails Qty backwards Meditations bans sedative ect salve <br insightful />
Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf<br graphiques />
undertaking paced Liquidation reduction />…
Overall Group of words
Good newsletter peers month select these
Good late click commissioner media
Good smoothly off close support before
Good okay sponsor rock go by ads
Good none cases text membership
Good Message
+Free
NigeriaViagra
Spammy Words
= Borderline Spam
Message
+Borderline
Spam
lateclick
commissioner
Unknown Words
=lateclick
commissionerGood
WordsInbox
+Borderline
Spam
newsletter
selectmonthUnknown Words
=newslett
erselectmonthNon-Good Words
Junk Folder
Chaff Spam [spam content] newsletter peers month select these late click commissioner media smoothly off close support before okay sponsor rock go by ads none cases text membership
Legitimate MailMarch is all about the Zune community. This month,
you can help create a new featurefor The Social, get tips from a fellow Zuneuser and find out the winners of theYour Zune Your Choice Awards.
Sum of weights (content filter score) Average weight Standard Deviation Percent of words that are good Percent of words that are spam Number of features Maximum feature weight Number of strong spam words Etc.
features
features(feature,weight)
pairs
Metafeatures
score
Metafeature weights
(Metafeature,weight)Pairs
feature weights
Metafeatures
Training Time
Run Time
training
training
metafeature extraction
metafeature extraction
milliondollarstransferguardian
Marchcommunit
ysocialfellow
(dollars, 0.2)(million, 0.1)(transfer, 0.1)(community, -
0.01)(social, -0.01)(fellow, -0.01)(guardian, 0.03)(March, -0.08)
Sum: 0.37
σ: 0.09Max: 0.2
Sum: -0.11
σ: 0.04Max: -0.1
Features
(feature, weight)
Metafeatures
(Metafeature, weight)(Sum: 0.37,
1.0)(σ: 0.09, 0.8)(Max: 0.2, 0.1)
(Sum: -0.11, -0.8)(σ: 0.04, -0.6)(Max: -0.1, -0.3)
-1.7
1.9
Hotmail Feedback Loop◦ Messages classified by recipients
Training Set: 1,800,000 messages◦ Ending on 5/20/07
Evaluation Set: 50,000 messages◦ Data from 5/21/07
45% improvement in TP at low FP levels
At a reasonable False Positive rate:◦ 98% of unique catches are chaff spam◦ Caught 99.5% of chaff spam missed by regular
content filter◦ Similar types of False Positives as regular filter
Challenges Remaining◦ Primarily just helped on spam with chaff◦ Relies on base content filter to detect spam with
obfuscated content (e.g. v1agra) or naïve spam without any chaff
Spam messages with good word chaff have unnatural weight distributions
Metafeatures is able to identify and catch these messages
This resulted in a 45% improvement in TP Gains were limited to spam with good word
chaff