SP.a.M /\ TØ

16
SP.a.M/\by Keno Albrecht Nicolas Burri Roger Wattenhofer Spamato An Extendable Spam Filter System

description

SP.a.M /\ TØ. Spamato An Extendable Spam Filter System. by Keno Albrecht Nicolas Burri Roger Wattenhofer. Motivation. Countless number of different spam filters Google: 1,740,000 hits (not spam filters) Freshmeat/Sourceforge: 404/420 projects Several "once-only" research projects - PowerPoint PPT Presentation

Transcript of SP.a.M /\ TØ

Page 1: SP.a.M /\ TØ

SP.a.M/\TØ

by

Keno AlbrechtNicolas Burri

Roger Wattenhofer

Spamato

An Extendable Spam Filter System

Page 2: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Motivation• Countless number of different spam filters

– Google: 1,740,000 hits (not spam filters)– Freshmeat/Sourceforge: 404/420 projects– Several "once-only" research projects

• Client-side filtering (vs. server-side)– Email Client Add-On: Outlook (Express), …– Proxy: Mediator between Client and Server– Stand-alone: Proprietary “email clients”

Page 3: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Project Goal

• Build an extendable spam filter system to…– ease the development of filters; provide filter

container – help implementing tools for common tasks– support as many email clients as possible

• Encourage filter developers to use our framework

Page 4: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Subject: Free Spam Filter SystemTo: [email protected]: [email protected]

Dear Spam Filter Developer,

This is your once-in-a-lifetime opportunity to use the free spam filter system Spamato. Spamato aims to bring a practical, easy-to-use, and effective spam filter technology to the user’s desktop. It has been designed to be used primarily as an add-on for several email clients. The combination of multiple filtering techniques leads to a high spam detection rate and a low false-positive rate. It offers a variety of features that simplifies your life as a spam filter developer.

Do not reinvent the wheel!Write your filter in an instance!

Use Spamato!Visit our homepage at http://www.spamato.net. To unsubscribe click here.

The Spamato-Team

Page 5: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Architecture

Java• platform independent

Depending on Add-on:• Visual Basic• Java Script• …

Page 6: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Filtering ProcessEmails are processed in five phases:

(1) Initialization(2) Pre-Check(3) Check(4) Decision(5) Post-Check

Page 7: SP.a.M /\ TØ

• Email client receives email, forwards it to Spamato, and waits for check result.

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Filtering Process

(1) Initialization

Page 8: SP.a.M /\ TØ

• Veto against further processing(Configuration, Sender-whitelist)

• Gain information for other plugins (URL extractor)

Filtering Process

(2) Pre-Check

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 9: SP.a.M /\ TØ

• Each filter calculates the spam probability

Filtering Process

(3) CheckSpamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 10: SP.a.M /\ TØ

• The overall spam probability is calculated and returned to the email client

Filtering Process

(4) Decision

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 11: SP.a.M /\ TØ

• Learn from global decision• Collect statistics• Play sound

Filtering Process

(5) Post-Check

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 12: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Filters

• Bayesianato: Naïve Bayesian-based filter• Ruleminator: Rule-based filter• Razor(Ephemeral): Hash-based filter

» Vipul’s Razor: http://razor.sourceforge.net

• URL-based filters:– Domainator: Search engine (“Google”) filter– Earlgrey: Our collaborative multi-domain filter– Razor(Whiplash): Collaborative single-domain filter

Page 13: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

URL/URI/Domain Filtering

• About 70,000 spam emails investigated– ~76% with at least one domains, thereof…

• ~20% with more than one distinct domain• ~2% with ten or more distinct domains

• Spammers obfuscate their messages for the (sole) purpose of misleading URL filters!

• How to handle “fake” (including ham) domains? How to find “spam” domains?

Page 14: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

URL-Filters in Comparison

D E R/W NOT ONLY

D 26.5% 1.1% 27.3% 0.6%

E 11.7% 2.5% 42.1% 2.0%

R/W 25.2% 41.4% 3.1% 15.6%

26.5% (1.1%) of all spam messages were identified by the Domainator, but not by the Earlgrey (Razor/Whiplash) filter. 27.3% of all messages were not identified by the Domainator, and 0.6% of all spam messages were solely identified by it.

Page 15: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Conclusion & Future Work• Spamato eases the implementation and

deployment of spam filters and tools. It can be used with all email clients. It is open source.

• A multi-faceted (URL-) filtering approach is reasonable.

• TODO:– Integration of more filters and improved analysis tools– Decision module (dynamic weighting of filter results)– Trust system for collaborative filters

Page 16: SP.a.M /\ TØ

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Thank you!Questions?Comments?

(Un)[email protected]@spamato.nethttp://www.spamato.nethttp://sf.net/projects/spamato