RfC2822 for Mere Mortals

Post on 05-Dec-2014

138 views 0 download

description

This is a presentation I did years ago, but I heard that there are still people using it as a reference. So here it is, slightly cleaned up. If you are writing systems that process email addresses in some form or anotehr you might want to read this.

Transcript of RfC2822 for Mere Mortals

What's in an Email Address?

RFC2822 Em@il @ddresses for Mere Mortals

Schalk W. Cronjé@ysb33r

Why This Topic?

● Recurring bugs in software we build● Lack of understanding at all levels

– Deve lopers– Testers– Support People

● Assumptions made, without reading RFCs● Understanding RFCs are not straightforward

– RTFM is difficult when TFM cannot be found● We require a basic reference

Content

● Overview● Local-part● Domain-part● Valid or not?● The real world

RFC2821RFC2821 RFC1034RFC1034RFC1035RFC1035

RFC2822RFC2822

RFC821RFC821

RFC822RFC822

Domain name specification.Restrictions on email addresses at protocol levels.

Specifies layout of email transmitted over internet. Specifies format of email address.

Brave, brave RFC World

RFC2047RFC2047Encoding of 8-bit in RFC2822 header fields

RFC3490RFC3490Encoding international domain names

RFC1123RFC1123

Requirements for internet hosts

(Partially updated by RFC2821)(Partially updated by RFC2821)

Address Format

Modern formatlocal-part @ domain-part

Historic format (RFC821/RFC2821)source-route : local-part @ domain-part

RFC2822 Local Parts● Unrestricted characters

0..9 a..z A..Z ! # $ % & ' * + - / = ? ^ _ ` | { } ~ .

● Quotable characters (quoted by “ \)

< [ ( : @ ; ) ] > , non-ws-ctrl

● Illegal characters

All 8-bit.

● Whitespacews-ctrl illegal, only used for folding in headersspace character is valid if quoted

[ RFC2821: 4.1.2; RFC2822: 3.2, 3.4 ]

Local Payload

● Routing characters– ! % have been used for local-routing in legacy

systems, including UUCP and MHS.– Can be used to bypass routing in mis-configured

systems.

● Shell exploits– | / ` $ have been used to attempt remote

command execution

Does Case Matter?

● Case is ignored in domain

ntaba.biz == ntaba.biz

● Strictly-speaking case matters in local-parts

schalk@ntaba.biz != ScHaLk@ntaba.biz

– Most MTAs ignore case– RFC2821 discourages use of case as a

distinguishing factor

● Case ignored in source-routes

[ RFC2821: 2.4 ]

Does Size Matter?

● RFC2821 places lim itations on length of local-part and domain-part– 64 characters for local-part– 255 characters for domain-part

● This is normally not a problem for messages transmitted across the internet, but can be problematic for in-house applications or encoded email addresses such as X.400.

● Many MTAs will now ignore this length restriction as long as the overall SMTP protocol line length restriction is not exceeded.

[ RFC2821: 4.5.3.1 ]

Domain Parts

● Can either be a RFC1035 doma in or an address literal● Valid characters for domain names:

a..z A..Z 0..9 -● Subdomains separated by dot character.● Subdomain may not start or end with dash.● 255 characters max length.● 63 characters max per subdomain.● Cannot start or end in dot.● Restriction of subdomain starting with digit have been

relaxed.

Address Literals

● Workarounds for when host names cannot be resolved.– @[protocol:host-address]– IPv4: @[192.1.1.1]– IPv6: @[IPv6:fe80::a00:20ff:fec2:2ef4]

● Protocol must be registered with ICANN.

[ RFC2821: 4.1.3 ]

International Domain Names

● Domain names not representable in US-ASCII can be registered

● Such domain names cannot be handles by DNS or existing protocols

● RFC 3490 describes the encoding/decoding of such domain names from presentation to protocol:

exämple.com => xn--example-cua.com

● Potential for phising

Valid or not?

● Valid even under strict RFC2822 interpretation

● Most punctuation are valid in local part, including: {$cha?k*cr%nje}@ntaba.biz

schalk_cronje@ntaba.biz

Valid or not?

● Yes, the domain part is an address-literal● Acceptance of address-literals should be

configurable– They can be security risks– RFC2821 prefers usage of MX-based deliveries.

schalk_cronje@[192.168.1.1]

Valid or not?

● No, it is not an address-literal nor a valid domain name.

● Some systems will attempt to deliver this by passing the 192.168.1.1 to the domain resolving subsystem, which in return will simply return the IP address.– This violates RFC1123– This is a potential security risk.

schalk_cronje@192.168.1.1

[ RFC1123: 2.1 ]

Valid or not?

● Not valid according to RFC1035● Limitation lifted in RFC1123.

schalk_cronje@1967.com

[ RFC1123: 2.1 ]

Valid or not?

● Valid in RFC821 for compatibility with non-TCP/IP networks.

● Outlawed by RFC2821.● Not supported by any modern MTA.

schalk_cronje@#192168

[ RFC821: 4.1.2; RFC2821: F.4 ]

Valid or not?

● No, domain-part may not start with a dot.

schalk_cronje@.ntaba.biz

[ RFC2822: 3.2.4 ]

Valid or not?

● No, strictly RFC2822 states that domain-part may not end with a dot.

● RFC1034 use the dot-ending to indicate absolute domains (FQDN) in resource records.

● Most systems will accept, resolve and deliver this

schalk_cronje@ntaba.biz.

[ RFC2822: 3.2.4; RFC1034: 3.1]

Valid or not?

● No, consecutive dots are not allowed in domain parts.

schalk_cronje@ntaba..biz.

[ RFC2822: 3.2.4; RFC1034: 3.1]

Valid or not?

● No.– Local-parts may not start with a dot.– Consecutive dots are not allowed in local parts.

● Pragmatically, many known MTAs don’t care

.schalk_cronje@ntaba.bizschalk..cronje@ntaba.biz

[ RFC2822: 3.2.4]

Valid or not?

● No, _ is not valid in domain names● Some DNS servers will support this.● Some sites do use the _ for internal systems.● It remains illegal for internet operations

schalk_cronje@lon_eng.ntaba.biz

[ RFC2821: 4.1.3 ]

Valid or not?

● No, @ cannot be used unquoted in local parts

“schalk_cronje@lon_eng”@ntaba.bizschalk_cronje\@lon_eng@ntaba.biz

schalk_cronje@lon_eng@ntaba.biz

[ RFC2822: 3.2.5, 3.4 ]

Local-part Quoting

● Quoting should only be used where absolutely necessary

● Where a quoted-form have an unquoted form... – The two forms are equivalent– The unquoted form should be used for

transmission● Quoting is performed by enclosing local-

part in quotes or preceding a character by backslash.

[ RFC2821: 4.1.2 ]

Valid or not?

● No, this is an envelope for email addresses● The following is valid:

“<schalk_cronje>”@ntaba.biz

<schalk_cronje@ntaba.biz>

Valid or not?

● No, the double quote is a quoting character.

schalk_O”cronje@ntaba.biz

Valid or not?

● Yes, apostrophe is valid in unquoted form

schalk_O'cronje@ntaba.biz

Valid or not?

● This is debatable● Neither RFC2821, nor RFC2822, is

completely clear whether the double quote is valid if escapedNote that the backslash, "\", is a quote character, which is used to indicate that the next character is to be used literally

“schalk_O\”cronje”@ntaba.biz

[ RFC2821: 4.1.2 ]

Valid or not?

● Not at RFC2821/RFC2822 levels - contains at one least 8-bit character

● Can be completely valid at the presentation level– Email client can take care of translation between

a user-readable form and a level suitable for transmission

● There is NO agreed standard for encoding non-US-ASCII in local parts

schalk_cronjé@ntaba.biz

My 8-bit's Worth

● Custom encoding is valid, when both the sender and receiver will know about the encoding – Intermediate relays will simply pass it through

● UTF-7: schalk+AF8-cronj+AOk@ntaba.biz

● RFC2047 (adapted): =?UTF-8?Q?schalk_cronj=C3=A9?=@ntaba.biz

● Storing email addresses with 8-bit content in XML is problematic – requires encoding.

The 8-bit Legacy

● RFC822 was written in a 7-bit world– It can be m isinterpreted as to 8-bit being legal.

● Some MTAs will actually transmit 8-bit characters in email addresses

● In-house systems might have a requirement for 8-bit

● An email must be able to allow, block, quarantine or filter on 8-bit characters.

Valid or not?

● Valid even under strict RFC2822 interpretation

● Quoting allows for spaces and | to be used● Imagine if this was passed to a shell script in

a badly configured system!

"`echo haX0r | /usr/bin/passwd root --stdin`"@ntaba.biz

Valid or not?

● Valid even under strict RFC2822 interpretation

● Quoting allows for @ :, to be used

"@lon-eng,@scm-eng:schalk_cronje"@ntaba.biz

Valid or not?

● Valid even under strict RFC2822 interpretation

● This is an example of a source-route.● Usage is deprecated● It is best to remove them, before relaying.

@lon-eng,@scm-eng:schalk_cronje@ntaba.biz

[ RFC2821: 3.7, C, F.2 ]

Practical Validation

● Address validation cannot purely be performed against the RFC

● Context is very important● Validation at user-level will differ from that at

protocol-level.

RFC rule of thumb: Be as lenient as possible in what you accept, but as strict as possible in what you send out.

Validation Context

● Context places additional demands on validation algorithms

● Validation algorithms must be configurable– Allows for specifics in user environments– Allows for adaptability within various code

subsystems

Pattern Matching

● DOS-patterns (*?) is useful, but not good enough

● Regex is a better way to perform complex pattern matches– Not all users understand regex– It is therefore good to give users the option of an

input notation, but use regex internally to perform the matching

The *? Problem

● The above is a valid email address● Was the intention to filter for this exact

address?● Or was the intention to filter for addresses

such as schalkRfcDudecronje@ntaba.biz

● Regex: – schalk\*cronje@ntaba.biz– schalk.*cronje@ntaba.biz

schalk*cronje@ntaba.biz

Lists of Addresses

● RFC2822 uses the comma for separating address lists in headers

● A common misnomer is that it is easy to delimit addresses using ; or ,.

● Although it is possible, it is no trivial task to parse lists such as

schalk@ntaba.biz, “s,c,h,a,l,k”@ntaba.biz ,s\,\\cha\,lk@ntaba.biz , “sch\”,alk”@ntaba.biz

Real World Violations

● Use of _ in domain-part● Domain part starts with dot● Domain part ends in dot● 4000 characters in local part● 8-bit characters in local-part

What can we do?

● Developers should never make any assumptions as to what the customer might need or to what the customer's infrastructure might be– Code to be as RFC-compliant as possible, but

allow for configurability as and when needed.– User interfaces should be context-sensitive.

● Testers should ensure that nobody makes such assumptions

Questions ?

Handling email addresses is an extraodinarycomplex matter for something very simple.

Next time you enter an email address...

...you might not want to take it for granted