Dovecot M ail Storage

Post on 23-Feb-2016

124 views 1 download

Tags:

description

Dovecot M ail Storage. Timo Sirainen. Me: Timo Sirainen. Born 1979 in Finland First C64 BASIC programs around 1988 Open source coding since about 1998 Irssi IRC client 1999-2004, still widely used Worked as programmer since 1999 Went to university in 2006 - PowerPoint PPT Presentation

Transcript of Dovecot M ail Storage

Dovecot Mail Storage

Timo Sirainen

Me: Timo Sirainen

• Born 1979 in Finland• First C64 BASIC programs around 1988• Open source coding since about 1998– Irssi IRC client 1999-2004, still widely used

• Worked as programmer since 1999• Went to university in 2006• Dovecot project started in 2002– Working full time on it since about 2007– 2009: Rackspace, USA– 2010: SAPO, Portugal

Dovecot

• Open source IMAP/POP3 server– Only mail retrieval to clients, no mail sending

• First version released in 2002• Mostly written by me– Except Sieve by Stephan Bosch

• High performance is an important goal– Disk I/O is typical bottleneck -> everything

optimized to reduce it

Talk Overview

• Traditional mailbox formats• Dovecot indexes• Dovecot mailbox formats• Full text search indexes• Future ideas

mbox

• One file per mailbox• Metadata in headers that are filtered out– X-UID, Status, X-Status, X-Keywords, etc.

• Deleting requires moving data around– Fragile: corruption if crashes in the middle– Slow when deleting old messages

• May become fragmented with constant appends• But non-fragmented file is fast to read

Maildir

• One file per message– Reading through all files can be slow

• Message flags in filename (name:2,<flags>)– Lots of renaming– Finding the current filename can be difficult

• Maildir is lockless? Not so much, Dovecot uses write/sync lock– Otherwise files can temporarily be lost during renames

• Was the file really deleted or just renamed?

Dovecot Index Files

• Main index– List of messages– Message flags– Offsets to cache records

• Cache file– Message size, some headers, etc.– Keep only data that client actually uses• Different clients want different data for different

amount of time

Dovecot Main Index• In two files:

– dovecot.index: Somewhat recent snapshot– dovecot.index.log: Recent changes

• All changes go through the log• Readers read snapshot to memory and apply latest changes from

log– Once opened, only need to read log updates

• Very efficient with remote filesystems (NFS, cluster FSes)!

• Snapshot is updated “once in a while”– Tries to minimize disk I/O– Writes are usually more expensive than reads

• Log also useful for finding “what changed” events for IMAP clients

Dovecot Cache

• The main reason for Dovecot’s good performance• Different IMAP clients want different data

– Caching data that client doesn’t use wastes disk space and disk I/O

• Flexible format, allows adding any number of fields– Per-field caching decisions: “no”, “temporary”, “permanent”

• Cached fields never change (IMAP guarantees)– Data is added without locking -> duplicate data is possible

• Once in a while the file is recreated -> deleted and unwanted records are dropped

Locking

• Lock waits are bad– Higher user visible latency– Timeout failures during high load

• Dovecot v0.99 used traditional read/write index locks– Locking timeout problems– Redesigned v1.0 to do lockless reads

Lockless reads: rename()• For:

– Small files– Rarely changing files– If a large part of the file changes

• Writer– Lock– If file has changed, read+update internal state– Write the updated data to temp file– rename() over the original file– Unlock

• Reader– Just read the file.

#1

#2

Temp file rename()

Lockless reads: Appends• For append-only files with “size” header in each written

record• Writer

– Lock– Write data with size=0– Write size with each byte’s highest bit set to 1– Unlock

• Reader– Read one record at a time– Stop when seeing a size that isn’t fully written

DataSize

Bits Content

0-6 Bits 0-6 of size

7 Always 1

8-14 Bits 7-13 of size

15 Always 1

etc.

Lockless writes in future?

• open(path, O_APPEND) usually provides atomic writes– Except with NFS– write() may also return less bytes than intended?

(signal, out of space)– read() during a write may see incomplete data?

Single-dbox

• One file per message (u.<IMAP UID>)• Files have immutable metadata section– GUID, POP3 UIDL, received date, etc.

• Advantages over Maildir:– Filenames don’t change– No IMAP UID <-> filename mapping required

• Flags stored only in Dovecot index files– Automatically creates dovecot.index.backup once in a while– When fixing corruption, tries very hard to preserve flags

based on (corrupted) index and backup files

Multi-dbox• Multiple messages in a single file (m.<id>)

– File format same as with single-dbox• Multiple files in a single mailbox

– Files are about 2 MB (configurable)• Larger files -> less fragmentation, but deletion slower• Preallocation

– Can be rotated every n days (for incremental backups)– Delayed (ioniced) nightly deletions (“doveadm purge”)

• Crash or power loss can’t corrupt or lose data• Tries very hard to preserve as much data as possible in case of

(filesystem) corruption.– Saves a backup of the original broken file

Benchmarks

• Realistic IMAP benchmarks are difficult to do• Depends on clients and user behavior

Benchmarks

• Reading 10k messages via IMAPSSD, OSX, HFS+ Uncached Cached

mbox 2.9 s 1.6 s

Maildir 3.9 s 0.6 s

Single-dbox 3.9 s 0.6 s

Multi-dbox 1.5 s 0.4 s

HDD, Linux, ext4 Uncached Cached

mbox 2.8 s 2.3 s

Maildir 8.0 s 0.9 s

Single-dbox 6.8 s 0.9 s

Multi-dbox 1.6 s 0.7 s

Benchmarks: # NFS ops

• Reading 10k messages via IMAP• Above: uncached, below: cached

mdbox

sdbox

Maildir

mbox

0 5000 10000 15000 20000 25000 30000 35000

Reads

Lookup

Access

Getattr

Benchmarks: # NFS opsimaptest logout=5 msgs=1000 delete=10 expunge=10 secs=60 seed=1Random IMAP commands sent with:

L+A+G = lookup + access + getattr

mbox

Maildir

sdbox

mdbox

ReadWriteReaddirL+A+GOther

New dbox-only Features

Alternative Mail Storage

• Users rarely access their old mails• Lower performance storage is cheaper -> Move old mails there• dbox supports “alternative path” setting: If u.* or

m.* file isn’t found from primary path, it’s looked up from alternative path – Files could even be moved with /bin/mv

• But easier/safer with “doveadm altmove”– This would be difficult with Maildir because its filenames

change

Detached Mail Attachments• MIME parts can be saved to external files

– Only if they’re large enough (default: 128 kB)– Also can be filtered based on Content-Type, etc. headers

• Avoid extra disk seek for downloading attachments that clients automatically display inline

• Supports saving base64 encoded MIME parts decoded (25% less disk space)– Only if re-encoding can be done to 100% original

• dbox-only– Metadata contains pointers to external parts

• Saving is done via simplified “filesystem API”

Single Instance Storage

• Storage’s internal deduplication– Could be enabled only for attachment storage

• Dovecot’s SIS– FS API backend– Based on file hashes and hard links

• Hash is configurable (e.g. SHA256 + size)– Byte-by-byte verification after hash found

a) Never, trust hash uniqueness (not implemented)b) Immediate comparison during savingc) Delayed (nightly) comparison and deduplication

Dovecot SIS• Attachments saved to “HA/SH/HASH-GUID” under global

attachment dir (e.g. /var/attachments/)– GUID guarantees filename uniqueness– e.g. file with hash “123456” is saved to 12/34/123456-GUID– “HA” and “SH” may be symlinks to other mounts

• SIS is done by hard linking HA/SH/hashes/HASH to HA/SH/HASH-GUID if it exists.– Basically: “ln hashes/123456 123456-guid”– No attempts to create cross-mount hard links

• Safe to move/backup/restore attachment files– But hashes/HASH is auto-deleted only when its link count drops from

2 to 1. External changes may leak it.

Full Text Search Indexes

• Dovecot has abstract FTS API• IMAP protocol says search is about “substring

matching” (e.g. “ello” matches “hello”)– Almost no FTS engines support this– Few people seem to care about this anymore

• Currently supported FTS backends:– Squat: Dovecot’s own indexer, supports substring

matching.• Currently index updating is too inefficient

– Apache Solr

FTS: Solr

• Solr is a search engine server using Lucene• Dovecot talks to Solr via HTTP• Sharding via per-user fts_solr setting

Future

• FS API used for indexes and dbox– Support for key-value databases– Asynchronous disk I/O