RfC2822 for Mere Mortals

42
What's in an Email Address? RFC2822 Em@il @ddresses for Mere Mortals Schalk W. Cronjé @ysb33r

description

This is a presentation I did years ago, but I heard that there are still people using it as a reference. So here it is, slightly cleaned up. If you are writing systems that process email addresses in some form or anotehr you might want to read this.

Transcript of RfC2822 for Mere Mortals

Page 1: RfC2822 for Mere Mortals

What's in an Email Address?

RFC2822 Em@il @ddresses for Mere Mortals

Schalk W. Cronjé@ysb33r

Page 2: RfC2822 for Mere Mortals

Why This Topic?

● Recurring bugs in software we build● Lack of understanding at all levels

– Deve lopers– Testers– Support People

● Assumptions made, without reading RFCs● Understanding RFCs are not straightforward

– RTFM is difficult when TFM cannot be found● We require a basic reference

Page 3: RfC2822 for Mere Mortals

Content

● Overview● Local-part● Domain-part● Valid or not?● The real world

Page 4: RfC2822 for Mere Mortals

RFC2821RFC2821 RFC1034RFC1034RFC1035RFC1035

RFC2822RFC2822

RFC821RFC821

RFC822RFC822

Domain name specification.Restrictions on email addresses at protocol levels.

Specifies layout of email transmitted over internet. Specifies format of email address.

Brave, brave RFC World

RFC2047RFC2047Encoding of 8-bit in RFC2822 header fields

RFC3490RFC3490Encoding international domain names

RFC1123RFC1123

Requirements for internet hosts

(Partially updated by RFC2821)(Partially updated by RFC2821)

Page 5: RfC2822 for Mere Mortals

Address Format

Modern formatlocal-part @ domain-part

Historic format (RFC821/RFC2821)source-route : local-part @ domain-part

Page 6: RfC2822 for Mere Mortals

RFC2822 Local Parts● Unrestricted characters

0..9 a..z A..Z ! # $ % & ' * + - / = ? ^ _ ` | { } ~ .

● Quotable characters (quoted by “ \)

< [ ( : @ ; ) ] > , non-ws-ctrl

● Illegal characters

All 8-bit.

● Whitespacews-ctrl illegal, only used for folding in headersspace character is valid if quoted

[ RFC2821: 4.1.2; RFC2822: 3.2, 3.4 ]

Page 7: RfC2822 for Mere Mortals

Local Payload

● Routing characters– ! % have been used for local-routing in legacy

systems, including UUCP and MHS.– Can be used to bypass routing in mis-configured

systems.

● Shell exploits– | / ` $ have been used to attempt remote

command execution

Page 8: RfC2822 for Mere Mortals

Does Case Matter?

● Case is ignored in domain

ntaba.biz == ntaba.biz

● Strictly-speaking case matters in local-parts

[email protected] != [email protected]

– Most MTAs ignore case– RFC2821 discourages use of case as a

distinguishing factor

● Case ignored in source-routes

[ RFC2821: 2.4 ]

Page 9: RfC2822 for Mere Mortals

Does Size Matter?

● RFC2821 places lim itations on length of local-part and domain-part– 64 characters for local-part– 255 characters for domain-part

● This is normally not a problem for messages transmitted across the internet, but can be problematic for in-house applications or encoded email addresses such as X.400.

● Many MTAs will now ignore this length restriction as long as the overall SMTP protocol line length restriction is not exceeded.

[ RFC2821: 4.5.3.1 ]

Page 10: RfC2822 for Mere Mortals

Domain Parts

● Can either be a RFC1035 doma in or an address literal● Valid characters for domain names:

a..z A..Z 0..9 -● Subdomains separated by dot character.● Subdomain may not start or end with dash.● 255 characters max length.● 63 characters max per subdomain.● Cannot start or end in dot.● Restriction of subdomain starting with digit have been

relaxed.

Page 11: RfC2822 for Mere Mortals

Address Literals

● Workarounds for when host names cannot be resolved.– @[protocol:host-address]– IPv4: @[192.1.1.1]– IPv6: @[IPv6:fe80::a00:20ff:fec2:2ef4]

● Protocol must be registered with ICANN.

[ RFC2821: 4.1.3 ]

Page 12: RfC2822 for Mere Mortals

International Domain Names

● Domain names not representable in US-ASCII can be registered

● Such domain names cannot be handles by DNS or existing protocols

● RFC 3490 describes the encoding/decoding of such domain names from presentation to protocol:

exämple.com => xn--example-cua.com

● Potential for phising

Page 13: RfC2822 for Mere Mortals

Valid or not?

● Valid even under strict RFC2822 interpretation

● Most punctuation are valid in local part, including: {$cha?k*cr%nje}@ntaba.biz

[email protected]

Page 14: RfC2822 for Mere Mortals

Valid or not?

● Yes, the domain part is an address-literal● Acceptance of address-literals should be

configurable– They can be security risks– RFC2821 prefers usage of MX-based deliveries.

schalk_cronje@[192.168.1.1]

Page 15: RfC2822 for Mere Mortals

Valid or not?

● No, it is not an address-literal nor a valid domain name.

● Some systems will attempt to deliver this by passing the 192.168.1.1 to the domain resolving subsystem, which in return will simply return the IP address.– This violates RFC1123– This is a potential security risk.

[email protected]

[ RFC1123: 2.1 ]

Page 16: RfC2822 for Mere Mortals

Valid or not?

● Not valid according to RFC1035● Limitation lifted in RFC1123.

[email protected]

[ RFC1123: 2.1 ]

Page 17: RfC2822 for Mere Mortals

Valid or not?

● Valid in RFC821 for compatibility with non-TCP/IP networks.

● Outlawed by RFC2821.● Not supported by any modern MTA.

schalk_cronje@#192168

[ RFC821: 4.1.2; RFC2821: F.4 ]

Page 18: RfC2822 for Mere Mortals

Valid or not?

● No, domain-part may not start with a dot.

[email protected]

[ RFC2822: 3.2.4 ]

Page 19: RfC2822 for Mere Mortals

Valid or not?

● No, strictly RFC2822 states that domain-part may not end with a dot.

● RFC1034 use the dot-ending to indicate absolute domains (FQDN) in resource records.

● Most systems will accept, resolve and deliver this

[email protected].

[ RFC2822: 3.2.4; RFC1034: 3.1]

Page 20: RfC2822 for Mere Mortals

Valid or not?

● No, consecutive dots are not allowed in domain parts.

[email protected].

[ RFC2822: 3.2.4; RFC1034: 3.1]

Page 21: RfC2822 for Mere Mortals

Valid or not?

● No.– Local-parts may not start with a dot.– Consecutive dots are not allowed in local parts.

● Pragmatically, many known MTAs don’t care

[email protected]@ntaba.biz

[ RFC2822: 3.2.4]

Page 22: RfC2822 for Mere Mortals

Valid or not?

● No, _ is not valid in domain names● Some DNS servers will support this.● Some sites do use the _ for internal systems.● It remains illegal for internet operations

schalk_cronje@lon_eng.ntaba.biz

[ RFC2821: 4.1.3 ]

Page 23: RfC2822 for Mere Mortals

Valid or not?

● No, @ cannot be used unquoted in local parts

“schalk_cronje@lon_eng”@ntaba.bizschalk_cronje\@[email protected]

schalk_cronje@[email protected]

[ RFC2822: 3.2.5, 3.4 ]

Page 24: RfC2822 for Mere Mortals

Local-part Quoting

● Quoting should only be used where absolutely necessary

● Where a quoted-form have an unquoted form... – The two forms are equivalent– The unquoted form should be used for

transmission● Quoting is performed by enclosing local-

part in quotes or preceding a character by backslash.

[ RFC2821: 4.1.2 ]

Page 25: RfC2822 for Mere Mortals

Valid or not?

● No, this is an envelope for email addresses● The following is valid:

“<schalk_cronje>”@ntaba.biz

<[email protected]>

Page 26: RfC2822 for Mere Mortals

Valid or not?

● No, the double quote is a quoting character.

schalk_O”[email protected]

Page 27: RfC2822 for Mere Mortals

Valid or not?

● Yes, apostrophe is valid in unquoted form

schalk_O'[email protected]

Page 28: RfC2822 for Mere Mortals

Valid or not?

● This is debatable● Neither RFC2821, nor RFC2822, is

completely clear whether the double quote is valid if escapedNote that the backslash, "\", is a quote character, which is used to indicate that the next character is to be used literally

“schalk_O\”cronje”@ntaba.biz

[ RFC2821: 4.1.2 ]

Page 29: RfC2822 for Mere Mortals

Valid or not?

● Not at RFC2821/RFC2822 levels - contains at one least 8-bit character

● Can be completely valid at the presentation level– Email client can take care of translation between

a user-readable form and a level suitable for transmission

● There is NO agreed standard for encoding non-US-ASCII in local parts

schalk_cronjé@ntaba.biz

Page 30: RfC2822 for Mere Mortals

My 8-bit's Worth

● Custom encoding is valid, when both the sender and receiver will know about the encoding – Intermediate relays will simply pass it through

● UTF-7: [email protected]

● RFC2047 (adapted): [email protected]

● Storing email addresses with 8-bit content in XML is problematic – requires encoding.

Page 31: RfC2822 for Mere Mortals

The 8-bit Legacy

● RFC822 was written in a 7-bit world– It can be m isinterpreted as to 8-bit being legal.

● Some MTAs will actually transmit 8-bit characters in email addresses

● In-house systems might have a requirement for 8-bit

● An email must be able to allow, block, quarantine or filter on 8-bit characters.

Page 32: RfC2822 for Mere Mortals

Valid or not?

● Valid even under strict RFC2822 interpretation

● Quoting allows for spaces and | to be used● Imagine if this was passed to a shell script in

a badly configured system!

"`echo haX0r | /usr/bin/passwd root --stdin`"@ntaba.biz

Page 33: RfC2822 for Mere Mortals

Valid or not?

● Valid even under strict RFC2822 interpretation

● Quoting allows for @ :, to be used

"@lon-eng,@scm-eng:schalk_cronje"@ntaba.biz

Page 34: RfC2822 for Mere Mortals

Valid or not?

● Valid even under strict RFC2822 interpretation

● This is an example of a source-route.● Usage is deprecated● It is best to remove them, before relaying.

@lon-eng,@scm-eng:[email protected]

[ RFC2821: 3.7, C, F.2 ]

Page 35: RfC2822 for Mere Mortals

Practical Validation

● Address validation cannot purely be performed against the RFC

● Context is very important● Validation at user-level will differ from that at

protocol-level.

RFC rule of thumb: Be as lenient as possible in what you accept, but as strict as possible in what you send out.

Page 36: RfC2822 for Mere Mortals

Validation Context

● Context places additional demands on validation algorithms

● Validation algorithms must be configurable– Allows for specifics in user environments– Allows for adaptability within various code

subsystems

Page 37: RfC2822 for Mere Mortals

Pattern Matching

● DOS-patterns (*?) is useful, but not good enough

● Regex is a better way to perform complex pattern matches– Not all users understand regex– It is therefore good to give users the option of an

input notation, but use regex internally to perform the matching

Page 38: RfC2822 for Mere Mortals

The *? Problem

● The above is a valid email address● Was the intention to filter for this exact

address?● Or was the intention to filter for addresses

such as [email protected]

● Regex: – schalk\*[email protected]– schalk.*[email protected]

schalk*[email protected]

Page 39: RfC2822 for Mere Mortals

Lists of Addresses

● RFC2822 uses the comma for separating address lists in headers

● A common misnomer is that it is easy to delimit addresses using ; or ,.

● Although it is possible, it is no trivial task to parse lists such as

[email protected], “s,c,h,a,l,k”@ntaba.biz ,s\,\\cha\,[email protected] , “sch\”,alk”@ntaba.biz

Page 40: RfC2822 for Mere Mortals

Real World Violations

● Use of _ in domain-part● Domain part starts with dot● Domain part ends in dot● 4000 characters in local part● 8-bit characters in local-part

Page 41: RfC2822 for Mere Mortals

What can we do?

● Developers should never make any assumptions as to what the customer might need or to what the customer's infrastructure might be– Code to be as RFC-compliant as possible, but

allow for configurability as and when needed.– User interfaces should be context-sensitive.

● Testers should ensure that nobody makes such assumptions

Page 42: RfC2822 for Mere Mortals

Questions ?

Handling email addresses is an extraodinarycomplex matter for something very simple.

Next time you enter an email address...

...you might not want to take it for granted