Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services

Challenges to Privacy in New Challenges to Privacy in New Internet Applications: VoIP, IM, Internet Applications: VoIP, IM, location-based serviceslocation-based services

Prof. Henning SchulzrinneComputer Science

Columbia University, New YorkATIS Network Security Symposium and Workshop

Washington, DCSeptember 2004

OverviewOverview Email spam: a history of failed

miracle cures New challenges emerging:

VoIP unsolicited calls instant messaging Location and presence privacy Reputation management

Not just emailNot just email Email is just first large-scale, open

communication medium initially also “closed user groups” (DECmail, PROFS,

UUnet, Fido, …) When does UBC occur?

Single domain large number of independently operated domains

removes easy remote authentication Published or guessable addresses

as old as unlisted numbers conflict between usability and using addresses as communication keys

Others emerging from closed user groups instant messaging (IM) VoIP and multimedia calls presence queries

The universe of message The universe of message senderssenders

human user(known and unknown)

opt-in bulk communicati

ons

mailing lists(forwarder)

robots(event

notification)machine

human

machine machine

(EDI)

human involvement

The problem is easy…The problem is easy… if you’re willing to make some

minor assumptions: single administrative domain only previously-known senders (but

how?) global public key infrastructure (PKI) only real human users, no lists

Communication challengesCommunication challenges Joe-job

“The act of faking a spam so that it appears to be from an innocent third party, in order to damage their reputation and possibly to trick their provider into revoking their Internet access. Named after Joes.com, which was victimized in this way by a spammer some years ago.”

Phishing “The act of sending an e-mail to a user falsely claiming to

be an established legitimate enterprise in an attempt to scam the user into surrendering private information that will be used for identity theft.”

Spam, spim (unsolicited bulk communications)

Nuisance communications

Tools available Tools available countermeasurescountermeasures From address blacklisting IP sender blacklisting Content filtering (Bayesian filters)

spam folder

MUA

MTA

mail sender

DNSPOPIMAP

SMTP marked withheader

SPF, SBL, …

Method does failspostage for email sender pays receiver for

reading mailcollection, socially unacceptable (job offer), mailing lists

Haiku (e.g., Habeas) include copyrighted haiku in non-spam

enforcement

computation(e.g., Microsoft Penny Black)

sender solves computational puzzle

lists, bots

Turing test (challenge-response)

automated senders

Graylisting return temporary failure UBC gives up

unreliable, delay

Address hiding & spoofing prevents web crawlers from picking up addresses

existing addressessingle failure

Bayesian content analysis detect spam/non-spam terms

pictures, word spoofing, poisoning, IM, VoIP, …

Bonding third party (e.g., BSP) promises bond if domain spams

enforcement

Miracle curesMiracle cures

The UBC arms raceThe UBC arms raceIP blacklisting

openrelays

From blacklisting

senderfaking bot armies

RBL, SPEWS

Bayesianfilters pictures

dictionary attacksSPF, DMP, …

““We need a new mail/IM We need a new mail/IM protocol”protocol” True: SMTP not designed for today’s hostile

Internet no sender authentication no easy policy inclusion

False: A new mail protocol is going to fix UBE/UBC Hard problems are ecosystem, not protocol:

authentication – domains and individuals PKI (S/MIME, PGP) has never scaled current email certificates just certify ownership of email

address help with whitelist, but not with unknown users too costly for true verification

reputation accreditation

IETF MARIDIETF MARID IETF working group for verifying

sender “It would be useful for those maintaining domains and networks

to be able to specify that individual hosts or nodes are authorizedto act as MTAs for messages sent from those domains or networks.This working group will develop a DNS-based mechanism forstoring and distributing information associated with that authorization.”

related to IRTF ASRG (Anti-spam Research Group)

DNS extensions, “purported responsible address”

MARID processingMARID processing“Given an email message, and given an IP address from which it has been (or will be) received, is the SMTP client at that IP address authorized to send that email message?”

extract purported

responsible address (PRA)

extract purported

responsible domain (PRD)

SPF: IP legal for

PRD?

N

Y

client SMTP validation (CSV)

Clientauthenticate

d, authorized

and accredited?

Y

MARID: Client SMTP MARID: Client SMTP Validation (CSV)Validation (CSV)

EHLO domain

real?

Host authorized to be

MTA?

Domain reputati

on?

authentication

authorization

accreditation

draft-ietf-marid-csv-intro

draft-ietf-marid-csv-csa

draft-ietf-marid-csv-dna

EHLO aol.com from 64.12.187.24

A(aol.com)IN A 64.12.187.24

SRV(_client._smtp.aol.com)SRV weight=2

PTR(aol.com)_vouch.smtp.isgood.com

TXT(aol.com.isgood.com)

IN TXT MARID,1,A

PRA (Purported responsible PRA (Purported responsible address)address) “Allows one to determine

who appears to have most recently caused an e-mail message to be delivered. It does this by inspecting the headers in the message.” (draft-ietf-marid-pra)

uses Resent-Sender, Resent-From, Sender, From RFC 2822 headers

draft-ietf-marid-submitter defines new MAIL parameter for SMTP

S: 220 company.com.example ESMTP server readyC: EHLO almamater.edu.exampleS: 250-company.com.exampleS: 250-DSNS: 250-AUTHS: 250-SUBMITTERS: 250 SIZEC: MAIL FROM:<[email protected]> [email protected]: 250 <[email protected]> sender okC: RCPT TO:<[email protected]>

[email protected] almamater.edu [email protected]

SPF, Sender-IDSPF, Sender-ID SPF (sender policy framework) Verifies that most recent

sender (e.g., mailing list forwarder) is authorized for its domain

Does not prevent spam, but enables white and black-listing

Adds DNS TXT or SPF resource record (RR) for domain

spf2.0/mfrom,pra +mx +a:192.1.2.0/28 –all

“mail from MX server for example.com and from IP 192.1.2.0 are ok; all others are bad”

HELO or EHLO

MAIL FROM

From:

body delivery

SMTPconnection

Putting the tools togetherPutting the tools together

[email protected] [email protected]

transitive trust model: intra-domain user, inter-domain domain/host-only authentication

SMTPAUTHsubmission(password)

bpm.comSMTP server

SPFCSV

accreditation:• aol.com does not host spammers• bpm.com verifies user identities (not yet)

SMTP

What’s different about IM and What’s different about IM and VoIP?VoIP? Higher nuisance factor

combine the worst of email and phone telemarketing Close to zero cost

call origination has no capacity limitation (unlike PSTN line limitation)

can be originated in volume from residential broadband – not T1 required

T1: 2.4 call attempts/second @ $1000/month + LD 500 kb/s DSL: 9 call attempts/second @ $50/month

easy to get addresses: SIP address = email address or E.164 number

non-US origin: cheap labor, no DNC laws Privacy invasion

know user is actually there Nuisance calls

possibly no good way to trace already a problem with Skype

SIP spamSIP spam Call spam

telemarketing content filtering likely ineffective

IM spam SIP MESSAGE or message sessions spam intent may not be obvious in first message

get attention first with “Hello” short messages harder to analyze with content filters

but typically requires white-listing based on presence subscription

Presence spam (request addition to watcher list) mostly nuisance – user may need to manually deny

request

J. Rosenberg, C. Jennings, draft-rosenberg-sipping-spam, July 2004

SIP spam preventionSIP spam prevention All earlier mechanisms apply, with largely the

same caveats Black lists

domain-level within domain, only if domain practices sound user

management White list

may use buddy list as white list stronger user authentication

Consent-based communication needs to subscribe first but may not be able to recognize address (“is

[email protected] a spammer or some long-lost friend?”)

SIP spam preventionSIP spam prevention Use of MARID-like DNS domain verification possible may not be needed, due to usage of TLS for interdomain

communications but doesn’t preclude rogue sub-domains e.g., “is hgs10.columbia.edu allowed to route SIP calls for

columbia.edu?” transitive trust principle:

trust that previous hop applied identity management principles

longer term, use S/MIME certificates for user-level authentication, but doesn’t improve spam prevention much

not widely available now if S/MIME certificates are cheap, spammers can mint new

identities

SIP authenticationSIP authentication

SIP trapezoid

outbound proxy

[email protected]: 128.59.16.1 registrar

voice traffic(S)RTP

destination proxy(identified by SIP URI domain)

Digest authover TLS

TLSmutual hostverification

insertcrypto-signed

identity assertion (AIB sip-identity)

From domain to user From domain to user policiespolicies Not all domains can be

classified as “good” or “bad” as a whole

Many different domain types:

Employer ISP Associations (IEEE,

ACM, ATIS, …) Personal domains Mailbox providers

Divide domains by their user policy:

Admission-controlled domains

most employers Bonded domains Membership domains

e.g., credit card Open, rate-limited

domains Open domains

Kumar Srivastava, Henning Schulzrinne, “Preventing Spam for SIP-based Instant Messaging and Sessions”, Columbia University Technical Report, September 2004.

Reputation and domain Reputation and domain descriptionsdescriptions Need to define mechanism to obtain

domain user verification policy Individual user reputation:

deposit positive or negative feedback information based on calls

depends on cooperation of domain limit user feedback rate to avoid ballot-

stuffing Fortunately, there seem to be few part-

time spammers

Using social networks for Using social networks for spam controlspam control

is a friend of

strength of knowledge = 0.3trust in good behavior = 0.5

total trust = ∑ (strength * trust)

Privacy: ContextPrivacy: Context context = “the interrelated conditions

in which something exists or occurs” anything known about the

participants in the (potential) communication relationship

both at caller and calleetime CPLcapabilities caller preferenceslocation location-based call routing

location eventsactivity/availability presencesensor data (mood, bio)

not yet, but similar in many aspects to location data

Architectures for (geo) Architectures for (geo) information accessinformation access Claim: all using protocols

fall into one of these categories

Presence or event notification “circuit-switched” model subscription: binary

decision Messaging

email, SMS basically, event

notification without (explicit) subscription

but often out-of-band subscription (mailing list)

Request-response RPC, HTTP; also DNS,

LDAP typically, already has

session-level access control (if any at all)

Presence is superset of other two

GEOPRIV IETF working group looking generically at location services (privacy)

SIMPLE and SIP: event notification, presence

GEOPRIV and SIMPLE GEOPRIV and SIMPLE architecturesarchitectures

target locationserver

locationrecipient

rulemaker

presentity

caller

presenceagent watcher

callee

GEOPRIV

SIPpresence

SIPcall

PUBLISHNOTIFY

SUBSCRIBE

INVITE

publicationinterface

notificationinterface

ruleinterface

INVITE

GEOPRIV and SIMPLE Policy GEOPRIV and SIMPLE Policy rulesrules There is no sharp geospatial boundary Discussed in both GEOPRIV (geospatial)

and SIMPLE (SIP IM) Presence contains other sensitive data

(activity, icons, …) and others may be added

Example: future extensions to personal medical data “only my cardiologist may see heart rate, but

notify everybody in building if heart rate = 0” Thus, generic policies are necessary

Presence/Event Presence/Event notificationnotification Three places for policy enforcement

subscription binary only policy, no geo information subscriber may provide filter could reject

based on filter (“sorry, you only get county-level information”) greatly improves scaling since no event-level checks needed

notification content filtering, suppression only policy, no geo information

third-party notification e.g., event aggregator can convert models: gateway subscribes to

event source, distributes by email both policy and geo data

Presence policyPresence policy

subscriptionpolicy

event generatorpolicy

subscriberfilter

rate limiter

change to previousnotification?

for eachwatcher

subscriber (watcher)

SUBSCRIBE

NOTIFY

XML rulesmanaged via XCAP

Policy relationshipsPolicy relationships

geopriv-specific presence-specific

common policy

RPID CIPID

future

PIDF-LO (location object)PIDF-LO (location object) Basic location

object civic and geospatial typically, in

conjunction with presence

contains source and authority

basic privacy rules:

retention period redistribution

allowed

?xml version="1.0" encoding="UTF-8"?> <presence xmlns="urn:ietf:params:xml:ns:pidf" xmlns:gp="urn:ietf:params:xml:ns:pidf:geopriv10" xmlns:gml="urn:opengis:specification:gml:schema-xsd:feature:v3.0" entity="pres:[email protected]"> <tuple id="sg89ae"> <status> <gp:geopriv> <gp:location-info> <gml:location> <gml:Point gml:id="point1" srsName="epsg:4326"> <gml:coordinates>37:46:30N 122:25:10W</gml:coordinates> </gml:Point> </gml:location> </gp:location-info> <gp:usage-rules> <gp:retransmission-allowed>no</gp:retransmission-allowed> <gp:retention-expiry>2003-06-23T04:57:29Z</gp:retention-expiry> </gp:usage-rules> </gp:geopriv> </status> <timestamp>2003-06-22T20:57:29Z</timestamp> </tuple> </presence>

Privacy rule setsPrivacy rule sets Conditions such

as… identity of

requestor time-of-day sphere

Actions e.g., allow

subscription Transformation

e.g., reduce accuracy of geo data

<rule id="f3g44r1"> <conditions> <identity> <uri>[email protected]</uri> </identity> <validity> <from>2003-12-24T17:00:00+01:00</from> <to>2003-12-24T19:00:00+01:00</to> </validity> </conditions> <actions></actions> </rule>

ConclusionConclusion Protocol and technical means as a complement to

legal actions Identity-based techniques more promising than

content-based approaches New applications (VoIP, IM, presence) vulnerable

to unsolicited communications with possibly larger impact due to lower cost, legal

barriers content-based techniques fail altogether

New applications do not lend themselves to current content-based spam prevention techniques

Domain-based rather than person-based mechanisms appear promising

Need policy languages for sharing private data

Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services

Documents

Transcript of Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services