dovecot

Authentication Mechanisms

Plaintext authentication

The simplest authentication mechanism is PLAIN. The client simply sends the password unencrypted to Dovecot. All clients support the PLAIN mechanism, but obviously there's the problem that anyone listening on the network can steal the password. For that reason (and some others) other mechanisms were implemented.

Today however many people use SSL/TLS, and there's no problem with sending unencrypted password inside SSL secured connections. So if you're using SSL, you probably don't need to bother worrying about anything else than the PLAIN mechanism.

Another plaintext mechanism is LOGIN. It's typically used only by SMTP servers to let Outlook clients perform SMTP authentication. Note that LOGIN mechanism is not the same as IMAP's LOGIN command. The LOGIN command is internally handled using PLAIN mechanism.

Non-plaintext authentication

Non-plaintext mechanisms have been designed to be safe to use even without SSL/TLS encryption. Because of how they have been designed, they require access to the plaintext password or their own special hashed version of it. This means that it's impossible to use non-plaintext mechanisms with commonly used DES or MD5 password hashes.

If you want to use more than one non-plaintext mechanism, the passwords must be stored as plaintext so that Dovecot is able to generate the required special hashes for all the different mechanisms. If you want to use only one non-plaintext mechanism, you can store the passwords using the mechanism's own password scheme.

With success/failure password databases (e.g. PAM) it's not possible to use non-plaintext mechanisms at all, because they only support verifying a known plaintext password.

Dovecot supports the following non-plaintext mechanisms:

CRAM-MD5 : Protects the password in transit against eavesdroppers. Somewhat good support in clients.

DIGEST-MD5 : Somewhat stronger cryptographically than CRAM-MD5, but clients rarely support it.

APOP: This is a POP3-specific authentication. Similiar to CRAM-MD5, but requires storing password in plaintext.

NTLM : Mechanism created by Microsoft and supported by their clients. o Optionally supported using Samba's winbind.

GSS-SPNEGO : Similar to NTLM. GSSAPI : Kerberos v5 support.

RPA: Compuserve RPA authentication mechanism. Similar to DIGEST-MD5, but client support is rare.

ANONYMOUS: Support for logging in anonymously. This may be useful if you're intending to provide publically accessible IMAP archive.

OTP and SKEY: One time password mechanisms. Supported only by Dovecot v1.1 and later.

EXTERNAL: EXTERNAL SASL mechanism. Supported only by Dovecot v1.2 and later.

Configuration

By default only PLAIN mechanism is enabled. You can change this by modifying dovecot.conf:

auth default { mechanisms = plain login cram-md5 # ..}

SSLSSL and TLS terms are often used in confusing ways:

SSL (Secure Sockets Layer) is the original protocol implementation. SSLv3 is still allowed by Dovecot, but it's rarely used. Some clients use SSL to mean that they're going to connect to the imaps (993), pop3s (995) or smtps (465) port, although they're still going to use TLSv1 protocol.

TLS (Transport Layer Security) replaced the SSL protocol. TLSv1 protocol is used practically always nowadays. Some clients use TLS to mean that they're going to use STARTTLS command after connecting to the standard imap (143), pop3 (110) or smtp port (25/587). Nothing would prevent using SSLv3 protocol after STARTTLS command.

Using two separate ports for plaintext and SSL connections was thought to be wasteful, so STARTTLS intended to deprecate the SSL ports (imaps, pop3s, smtps, etc). This never really happened, probably because of two reasons:

Some admins don't even know about STARTTLS. Some admins want to require SSL/TLS, but don't realize that this is also possible with

STARTTLS (Dovecot has disable_plaintext_auth=yes and ssl=required settings). Some admins understand everything, but still prefer to allow only SSL ports. This could

be because it makes it easier to ensure that no information is leaked, because SSL/TLS handshake happens immediately. Some clients unfortunately try to do plaintext authentication without STARTTLS, even when IMAP server has told the client that it won't work.

Unfortunately there doesn't seem to be any clear and simple way to refer to these different meanings.

SSL term is much more widely understood than TLS, so Dovecot configuration and this documentation only talks about SSL when in fact it means both SSL/TLS.

Self-signed SSL certificates

Self-signed SSL certificates are the easiest way to get your SSL server working. However unless you take some action to prevent it, this is at the cost of security:

The first time the client connects to the server, it sees the certificate and asks the user whether to trust it. The user of course doesn't really bother verifying the certificate's fingerprint, so a man-in-the-middle attack can easily bypass all the SSL security, steal the user's password and so on.

If the client was lucky enough not to get attacked the first time it connected, the following connections will be secure as long as the client had permanently saved the certificate. Some clients do this, while others have to be manually configured to accept the certificate.

The only way to be fully secure is to import the SSL certificate to client's (or operating system's) list of trusted CA certificates prior to first connection. See SSL/CertificateClientImporting how to do it for different clients.

Self-signed certificate creation

Dovecot includes a script to build self-signed SSL certificates using OpenSSL. In the source distribution this exists in doc/mkcert.sh. Binary installations usually create the certificate automatically when installing Dovecot and don't include the script.

The SSL certificate's configuration is taken from doc/dovecot-openssl.cnf file. Modify the file before running mkcert.sh. Especially important field is the CN (Common Name) field, which should contain your server's host name. The clients will verify that the CN matches the connected host name, otherwise they'll say the certificate is invalid. It's also possible to use wildcards (eg. *.domain.com) in the host name. They should work with most clients.

By default the certificate is created to /etc/ssl/certs/dovecot.pem and the private key file is created to /etc/ssl/private/dovecot.pem. Also by default the certificate will expire in 365 days. If you wish to change any of these, modify the mkcert.sh script.

Certificate Authorities

The correct way to use SSL is to have each SSL certificate signed by an Certificate Authority (CA). The client has a list of trusted Certificate Authorities, so whenever it sees a new SSL certificate signed by a trusted CA, it will automatically trust the new certificate without asking the user any questions.

http://dovecot.org/doc/dovecot-openssl.cnf

http://dovecot.org/doc/mkcert.sh

http://wiki.dovecot.org/SSL/CertificateClientImporting

There are two ways to get a CA signed certificate: buy it, or create your own CA. The clients have a built-in list of trusted CAs, so buying from one of those CAs will have the advantage of the certificate working without any client configuration. If you create your own CA, you'll have to install the CA certificate to all the clients (see SSL/CertificateClientImporting).

There are multiple different tools for managing your own CA. The simplest way is to use a CA managing tool as gnoMint or TinyCA. However, if you need to tailor the properties of the CA, you always can use OpenSSL, very much customizable, but however a bit cumbersome.

Dovecot is a

high performance, secure, and fully standards-compliant IMAP/POP3 server. It also boasts a much simpler configuration setup than other IMAP servers and has a broad variety of authentication mechanisms. It also supports SSL and TLS encryption.

Many distributions are now available with Dovecot included; it may not be the default IMAP/POP3 server, but it is usually a simple install command away.

Once you have installed Dovecot, the configuration file will most likely be /etc/dovecot.conf. Many of the defaults are likely sufficient and will require little changes unless you need specific locations for the mail spool, whether to change default authentication options, and so forth.

By default, Dovecot will only act as an IMAP server, but it can act as a POP3 server as well. To do this, edit /etc/dovecot.conf and look for the protocols section:

protocols = pop3

This would tell Dovecot to act as a pure POP3 server. If you want to provide the full gambit of POP3 and IMAP, with both the regular and SSL variants, use:

protocols = pop3 pop3s imap imaps

To use SSL, you will need to appropriately set the ssl_cert_file and ssl_key_file settings, and set ssl_disable to no. The simplest way to get these certificates is to use the mkcert.sh script that Dovecot comes with. On Mandriva Linux, this file is stored in /usr/share/doc/dovecot/. There is also a dovecot-openssl.cnf file that you will want to edit to set the SSL certificate options. Depending on where you wish to store the certificate and key file, you may want to edit mkcert.sh as well, or change the SSLDIR variable to override the location:

# cd /usr/share/doc/dovecot# vim dovecot-openssl.cnf# mkdir -p /etc/ssl/dovecot/{certs,private}# SSLDIR=/etc/ssl/dovecot sh mkcert.shGenerating a 1024 bit RSA private key..++++++..................++++++writing new private key to '/etc/ssl/dovecot/private/dovecot.pem'-----

http://tinyca.sm-zone.net/

http://gnomint.sourceforge.net/

http://wiki.dovecot.org/SSL/CertificateClientImporting

subject= /C=CA/ST=Alberta/L=Edmonton/O=Foo Company/OU=IMAP server/CN=example.com/[email protected] Fingerprint=9A:23:B8:B4:0E:16:06:11:B2:FE:4E:49:C8:A8:C2:87:D8:79:1B:82

Next, edit /etc/dovecot.conf again and set the following:

ssl_disable = nossl_cert_file = /etc/ssl/dovecot/certs/dovecot.pemssl_key_file = /etc/ssl/dovecot/private/dovecot.pem

Now restart dovecot and it will authenticate against the system for users, using PAM. Dovecot does support virtual users as well, which makes it quite versatile. More information on configuring Dovecot and all the other features it provides is available on the Dovecot wiki.

Dovecot is a

high performance, secure, and fully standards-compliant IMAP/POP3 server. It also boasts a much simpler configuration setup than other IMAP servers and has a broad variety of authentication mechanisms. It also supports SSL and TLS encryption.

Many distributions are now available with Dovecot included; it may not be the default IMAP/POP3 server, but it is usually a simple install command away.

Once you have installed Dovecot, the configuration file will most likely be /etc/dovecot.conf. Many of the defaults are likely sufficient and will require little changes unless you need specific locations for the mail spool, whether to change default authentication options, and so forth.

By default, Dovecot will only act as an IMAP server, but it can act as a POP3 server as well. To do this, edit /etc/dovecot.conf and look for the protocols section:

protocols = pop3

This would tell Dovecot to act as a pure POP3 server. If you want to provide the full gambit of POP3 and IMAP, with both the regular and SSL variants, use:

protocols = pop3 pop3s imap imaps

To use SSL, you will need to appropriately set the ssl_cert_file and ssl_key_file settings, and set ssl_disable to no. The simplest way to get these certificates is to use the mkcert.sh script that Dovecot comes with. On Mandriva Linux, this file is stored in /usr/share/doc/dovecot/. There is also a dovecot-openssl.cnf file that you will want to edit to set the SSL certificate options. Depending on where you wish to store the certificate and key file, you may want to edit mkcert.sh as well, or change the SSLDIR variable to override the location:

# cd /usr/share/doc/dovecot# vim dovecot-openssl.cnf

# mkdir -p /etc/ssl/dovecot/{certs,private}# SSLDIR=/etc/ssl/dovecot sh mkcert.shGenerating a 1024 bit RSA private key..++++++..................++++++writing new private key to '/etc/ssl/dovecot/private/dovecot.pem'-----subject= /C=CA/ST=Alberta/L=Edmonton/O=Foo Company/OU=IMAP server/CN=example.com/[email protected] Fingerprint=9A:23:B8:B4:0E:16:06:11:B2:FE:4E:49:C8:A8:C2:87:D8:79:1B:82

Next, edit /etc/dovecot.conf again and set the following:

ssl_disable = nossl_cert_file = /etc/ssl/dovecot/certs/dovecot.pemssl_key_file = /etc/ssl/dovecot/private/dovecot.pem

Now restart dovecot and it will authenticate against the system for users, using PAM. Dovecot does support virtual users as well, which makes it quite versatile. More information on configuring Dovecot and all the other features it provides is available on the Dovecot wiki.

Dovecot LDAThe Dovecot LDA, called deliver, is a local delivery agent which takes mail from an MTA and delivers it to a user's mailbox, while keeping Dovecot index files up to date.

This page describes the common settings required to make deliver work. You should read it first, and then the MTA specific pages:

LDA/Postfix LDA/Exim LDA/Sendmail LDA/Qmail LDA/ZMailer

Main features of Dovecot LDA

Mailbox indexing during mail delivery , providing faster mailbox access later Quota enforcing by a plugin Sieve language support by a plugin

o Mail filtering o Mail forwarding o Vacation auto-reply

Common configuration

http://wiki.dovecot.org/LDA/Sieve

http://wiki.dovecot.org/Quota

http://wiki.dovecot.org/LDA/Indexing

http://wiki.dovecot.org/LDA/ZMailer

http://wiki.dovecot.org/LDA/Qmail

http://wiki.dovecot.org/LDA/Sendmail

http://wiki.dovecot.org/LDA/Exim

http://wiki.dovecot.org/LDA/Postfix

http://wiki.dovecot.org/MTA

http://wiki.dovecot.org/MDA

The configuration is done in the protocol lda section in dovecot.conf. The important settings are:

postmaster_address is used as the From: header address in bounce mails hostname is used in generated Message-IDs and in Reporting-UA: header in bounce

mails sendmail_path is used to send mails. Note that the default is /usr/lib/sendmail,

which doesn't necessarily work the same as /usr/sbin/sendmail. auth_socket_path specifies the UNIX socket to dovecot-auth where deliver can lookup

userdb information when -d parameter is used. See below how to configure Dovecot to create the socket.

Note that dovecot.conf file must be world readable to enable deliver process read it, while running with user privileges.

Parameters

Parameters accepted by deliver:

-d <username>: Destination username. If given, the user information is looked up from dovecot-auth. Typically used with virtual users, but not necessarily with system users.

-a <address>: Destination address (e.g. user+ext@domain). Default is the same as username. (v1.1+ only)

-f <address>: Envelope sender address. -c <path>: Alternative configuration file path. -m <mailbox>: Destination mailbox (default is INBOX). If the mailbox doesn't exist, it's

created (unless -n is used). If message couldn't be saved to the mailbox for any reason, it's delivered to INBOX instead.

o If Sieve plugin is used, this mailbox is used as the "keep" action's mailbox. It's also used if there is no Sieve script or if the script fails for some reason.

o v1.1: Deliveries to namespace prefix will result in saving the mail to INBOX instead. For example if you have "Mail/" namespace, this allows you to specify deliver -n -m Mail/$mailbox where mail is stored to Mail/$mailbox or to INBOX if $mailbox is empty.

o The mailbox name is specified the same as it's visible in IMAP client. For example if you've a Maildir with .box.sub/ directory and your namespace configuration is prefix=INBOX/, separator=/, the correct way to deliver mail there is to use -m INBOX/box/sub

-n: If the destination mailbox doesn't exist, don't create it. This affects both -m parameter and fileinto action in Sieve scripts. The fallback is to deliver mail to INBOX.

-s: Subscribe to mailboxes that are automatically created (via -m parameter or fileinto Sieve action). (v1.1.3+)

-e: If mail gets rejected, write the rejection reason to stderr and exit with EX_NOPERM. The default is to send a rejection mail ourself (v1.0.1+).

-k: Don't clear all environment at startup (v1.1+). -p <path>: Path to the mail to be delivered instead of reading from stdin. If using

maildir the file is hard linked to the destination if possible. This allows a single mail to be delivered to multiple users using hard links, but currently it also prevents deliver from updating cache file so it shouldn't be used unless really necessary. (v1.1+)

Return values

deliver will exit with one of the following values:

0 (EX_OK): Delivery was successful 64 (EX_USAGE): Invalid parameter given. 67 (EX_NOUSER): The destination username was not found. 78 (EX_CONFIG): Failed to read configuration file, a missing configuration setting or deliver

binary is setuid-root and world-executable. (v1.2+ no longer uses this.)

77 (EX_NOPERM): -e parameter was used and mail was rejected. Typically this happens when user is over quota and quota_full_tempfail=no.

75 (EX_TEMPFAIL): A temporary failure. This is returned for almost all failures. See the log file for details.

System users

You can use deliver with a few selected system users (ie. user is found from /etc/passwd / NSS) by calling deliver in the user's ~/.forward file:

| "/usr/local/libexec/dovecot/deliver"

This should work with any MTA which supports per-user .forward files. For qmail's per-user setup, see LDA/Qmail.

This method doesn't require the authentication socket explained below since it's executed as the user itself.

Virtual users

With a lookup

Give the destination username to deliver with -d parameter, for example:

deliver -f $FROM_ENVELOPE -d $DEST_USERNAME

http://wiki.dovecot.org/LDA/Qmail

You'll need to set up a master authentication socket for deliver so it knows where to find mailboxes for the users:

protocol lda {.. # UNIX socket path to master authentication server to find users. #auth_socket_path = /var/run/dovecot/auth-master}auth default {.. socket listen { # Note that we're setting a master socket. SMTP AUTH for Postfix and Exim uses client sockets. master { # Typically under base_dir/, if not the directory must be created. path = /var/run/dovecot/auth-master

# Auth master socket can be used to look up userdb information for # given usernames. This probably isn't very sensitive information # for most systems, but still try to restrict the socket access if possible. mode = 0600 user = vmail # User running deliver #group = mail # Or alternatively mode 0660 + deliver user in this group } }..}

The master socket can be used to do userdb lookups for given usernames. Typically the result will contain the user's UID, GID and home directory, but depending on your configuration it may return other information as well. So the information is similar to what can be found from eg. /etc/passwd for system users. This means that it's probably not a problem to use mode=0666 for the socket, but you should try to restrict it more just to be safe.

Without a lookup

If you have already looked up the user's home directory and you don't need a userdb lookup for any other reason either (such as overriding settings for specific users), you can run deliver similar to how it's run for system users:

HOME=/path/to/user/homedir deliver -f $FROM_ENVELOPE

This way you don't need to have a master listener socket. Note that you should verify the user's existence prior to running deliver, otherwise you'll end up having mail delivered to non-existing users as well.

You must have set the proper UID (and GID) before running deliver. It's not possible to run deliver as root without -d parameter.

http://wiki.dovecot.org/UserDatabase

Multiple UIDs

If you're using more than one UID for users, you're going to have problems running deliver, as most MTAs won't let you run deliver as root. There are two ways to work around this problem:

1. Make deliver setuid-root. 2. Use sudo to wrap the invocation of deliver.

Making deliver setuid-root:

Beware: it's insecure to make deliver setuid-root, especially if you have untrusted users in your system. Setuid-root deliver can be used to gain root privileges. You should take extra steps to make sure that untrusted users can't run it and potentially gain root privileges. You can do this by making sure only your MTA has execution access to it. For example:

# chgrp secmail /usr/local/libexec/dovecot/deliver# chmod 04750 /usr/local/libexec/dovecot/deliver# ls -l /usr/local/libexec/dovecot/deliver-rwsr-x--- 1 root secmail 4023932 2009-01-15 16:23 deliver

Then start deliver as a user that belongs to secmail group. Note that you have to recreate these rights after each update of dovecot.

Using sudo:

Alternatively, you can use sudo to wrap the invocation of deliver. This has the advantage that updates will not clobber the setuid bit, but note that it is just as insecure being able to run deliver via sudo as setuid-root. Make sure you only give your MTA the ability to invoke deliver via sudo.

First configure sudo to allow 'dovelda' user to invoke deliver by adding the following to your /etc/sudoers:

Defaults:dovelda !syslogdovelda ALL=NOPASSWD:/usr/local/libexec/dovecot/deliver

Then configure your MTA to invoke deliver as user 'dovelda' and via sudo:

/usr/bin/sudo /usr/local/libexec/dovecot/deliver

instead of just plain /usr/local/libexec/dovecot/deliver.

Problems with deliver

Namespaces are supported with v1.1 and later. With v1.0 and older versions mails can be delivered only to mailboxes specified by the mail_location setting.

http://wiki.dovecot.org/MailLocation

http://wiki.dovecot.org/Namespaces

If you are using prefetch userdb, keep in mind that deliver does not make a password query and thus will not work if -d parameter is used. The UserDatabase/Prefetch page explains how to fix this.

o See Checkpassword for how to make deliver work with checkpassword.

Logging

Normally Dovecot logs everything through its master process, which is running as root. Deliver doesn't, which means that you might need some special configuration for it to log anything at all.

If deliver fails to write to log files it exits with temporary failure.

If you have trouble finding where Dovecot logs by default, see Logging. Note that Postfix's mailbox_size_limit setting applies to all files that are written to. So

if you have a limit of 50 MB, deliver can't write to log files larger than 50 MB and you'll start getting temporary failures.

If you want deliver to keep using Dovecot's the default log files:

If you're logging to syslog, make sure the syslog socket (usually /dev/log) has enough write permissions for deliver. For example set it world-read/writable: chmod a+rw /dev/log.

If you're logging to Dovecot's default log files again you'll need to give enough write permissions to the log files for deliver.

You can also specify different log files for deliver. This way you don't have to give any extra write permissions to other log files or the syslog socket. You can do this by overriding the log_path and info_log_path settings:

protocol lda { .. # remember to give proper permissions for these files as well log_path = /var/log/dovecot-deliver-errors.log info_log_path = /var/log/dovecot-deliver.log}

For using syslog with deliver, set the paths empty:

protocol lda { .. log_path = info_log_path = # You can also override the default syslog_facility: #syslog_facility = mail}Dovecot

Plugins

http://wiki.dovecot.org/Logging

http://wiki.dovecot.org/PasswordDatabase/CheckPassword

http://wiki.dovecot.org/UserDatabase/Prefetch

http://wiki.dovecot.org/UserDatabase/Prefetch

Most of the Dovecot plugins work with deliver. http://wiki.dovecot.org/Plugins Virtual quota can be enforced using Quota plugin. http://wiki.dovecot.org/Quota Sieve language support can be added with Sieve plugin.


entation.

Search:

Login

MailboxFormat Maildir

FrontPage RecentChanges FindPage HelpContents MailboxFormat/Maildir

Edit (Text) Edit (GUI) Info Attachments

MaildirThis format debuted with the qmail server in the mid-1990s. Each mailbox folder is a directory and each message a file. This improves efficiency because individual emails can be modified, deleted and added without affecting the mailbox or other emails, and makes it safer to use on networked file systems such as NFS.

fullsearch 180 Titles Text

http://wiki.dovecot.org/action/AttachFile/MailboxFormat/Maildir?action=AttachFile

http://wiki.dovecot.org/action/info/MailboxFormat/Maildir?action=info

http://wiki.dovecot.org/action/edit/MailboxFormat/Maildir?action=edit&editor=gui

http://wiki.dovecot.org/action/edit/MailboxFormat/Maildir?action=edit&editor=text

http://wiki.dovecot.org/MailboxFormat/Maildir

http://wiki.dovecot.org/HelpContents

http://wiki.dovecot.org/FindPage

http://wiki.dovecot.org/RecentChanges

http://wiki.dovecot.org/FrontPage

http://wiki.dovecot.org/action/fullsearch/MailboxFormat/Maildir?action=fullsearch&context=180&value=linkto%3A%22MailboxFormat%2FMaildir%22

http://wiki.dovecot.org/MailboxFormat

http://wiki.dovecot.org/action/login/MailboxFormat/Maildir?action=login


http://wiki.dovecot.org/Quota

http://wiki.dovecot.org/Plugins

http://wiki.dovecot.org/FrontPage

Contents

1. Maildir 1. Dovecot extensions

1. IMAP UID mapping 2. IMAP keywords 3. Maildir filename extensions

2. Maildir and filesystems 1. General comparisons of Maildir on different filesystems 2. Linux ext2 / ext3 3. ReiserFS 4. XFS

1. Various tips 3. Directory Structure 4. Issues with the specification

1. Locking 2. Mail delivery

5. Procmail Problems 6. References

Dovecot extensions

Since the standard maildir specification doesn't provide everything needed to fully support the IMAP protocol, Dovecot had to create some of its own non-standard extensions. The extensions still keep the maildir standards compliant, so MUAs not supporting the extensions can still safely use it as a normal maildir.

IMAP UID mapping

IMAP requires each message to have a permanent unique ID number. Dovecot uses dovecot-uidlist file to keep UID <-> filename mapping. The file is basically in the same format as Courier IMAP's courierimapuiddb file, except for one difference (see below).

The file begins with a header:

1 1173189136 20221

Where 1 means the file format version number, 1173189136 is the IMAP UIDVALIDITY and 20221 is the UID that will be given to the next added message. The version number is always 1 currently. Dovecot used to have version number 2 also for a while, so if the number is ever increased it needs to become version 3.

After the header comes the list of UID <-> filename mappings:

123 1035478339.27041_118.foo.org20220 1035478339.27041_118.foo.org:2,S

http://cr.yp.to/proto/maildir.html

http://wiki.dovecot.org/MailboxFormat/Maildir#References

http://wiki.dovecot.org/MailboxFormat/Maildir#Procmail_Problems

http://wiki.dovecot.org/MailboxFormat/Maildir#Mail_delivery

http://wiki.dovecot.org/MailboxFormat/Maildir#Locking

http://wiki.dovecot.org/MailboxFormat/Maildir#Issues_with_the_specification

http://wiki.dovecot.org/MailboxFormat/Maildir#Directory_Structure

http://wiki.dovecot.org/MailboxFormat/Maildir#Various_tips

http://wiki.dovecot.org/MailboxFormat/Maildir#XFS

http://wiki.dovecot.org/MailboxFormat/Maildir#ReiserFS

http://wiki.dovecot.org/MailboxFormat/Maildir#Linux_ext2_.2BAC8_ext3

http://wiki.dovecot.org/MailboxFormat/Maildir#General_comparisons_of_Maildir_on_different_filesystems

http://wiki.dovecot.org/MailboxFormat/Maildir#Maildir_and_filesystems

http://wiki.dovecot.org/MailboxFormat/Maildir#Maildir_filename_extensions

http://wiki.dovecot.org/MailboxFormat/Maildir#IMAP_keywords

http://wiki.dovecot.org/MailboxFormat/Maildir#IMAP_UID_mapping

http://wiki.dovecot.org/MailboxFormat/Maildir#Dovecot_extensions

http://wiki.dovecot.org/MailboxFormat/Maildir#Maildir

Because with maildir the filename changes every time the message's flags change, the filename listed in the file doesn't necessarily exist. With Courier IMAP the filenames contained only the maildir file's basename (ie. everything before ":2," string). Dovecot instead writes the file's last known full filename. Usually this allows opening the file without reading the directory's contents to find the file's current file name.

The dovecot-uidlist file doesn't need to be locked for reading. When writing dovecot-uidlist.lock file needs to be created. The dovecot-uidlist file must never be directly modified, it can only be replaced with rename() call.

dovecot-uidlist is updated lazily to optimize for disk I/O. If a message is expunged, it may not be removed from dovecot-uidlist until sometimes later. This means that if you create a new file using the same file name as what already exists in dovecot-uidlist, Dovecot thinks you "unexpunged" message by restoring a message from backup. This causes a warning to be logged and the file to be renamed.

Note that messages must not be modified once they've been delivered. IMAP (and Dovecot) requires that messages are immutable. If you wish to modify them in any way, create a new message instead and expunge the old one.

IMAP keywords

All the non-standard message flags are called keywords in IMAP. Some clients use these automatically for marking spam (eg. $Junk, $NonJunk, $Spam, $NonSpam keywords). Thunderbird uses labels which map to keywords $Label1, $Label2, etc.

Dovecot stores keywords in the maildir filename's flags field using letters a..z. This means that only 26 keywords are possible to store in the maildir. If more are used, they're still stored in Dovecot's index files. The mapping from single letters to keyword names is stored in dovecot-keywords file. The file is in format:

0 $Junk1 $NonJunk

0 means letter 'a' in the maildir filename, 1 means 'b' and so on. The file doesn't need to be locked for reading, but when writing dovecot-uidlist file must be locked. The file must not be directly modified, it can only be replaced with rename() call.

Maildir filename extensions

The standard filename definition is: "<base filename>:2,<flags>". Dovecot has extended the <flags> field to be "<flags>[,<non-standard fields>]". This means that if Dovecot sees a comma in the <flags> field while updating flags in the filename, it doesn't touch anything after the comma. However other maildir MUAs may mess them up, so it's still not such a good idea to do that. Basic <flags> are described here. The <non-standard fields> isn't used by Dovecot for anything currently.

http://cr.yp.to/proto/maildir.html

http://wiki.dovecot.org/NonSpam

http://wiki.dovecot.org/NonJunk

Dovecot supports reading a few fields from the <base filename>:

,S=<size>: <size> contains the file size. Getting the size from the filename avoids doing a stat(), which may improve the performance. This is especially useful with Maildir++ quota.

,W=<vsize>: <vsize> contains the file's RFC822.SIZE, ie. the file size with linefeeds being CR+LF characters. If the message was stored with CR+LF linefeeds, <size> and <vsize> are the same. Setting this may give a small speedup because now Dovecot doesn't need to calculate the size itself.

A maildir filename with those fields would look something like: 1035478339.27041_118.foo.org,S=1000,W=1030:2,S

Maildir and filesystems

General comparisons of Maildir on different filesystems

http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html http://www.htiweb.inf.br/benchmark/fsbench.htm (including some graphs)

Linux ext2 / ext3

The main disadvantage is that searching can be slightly slower, and access to very large mailboxes (thousands of messages) can get slow with filesystems which don't have directory indexes.

Old versions of ext2 and ext3 on Linux don't support directory indexing (to speed up access), but newer versions of ext3 do, although you may have to manually enable it. You can check if the indexing is already enabled with tune2fs:

tune2fs -l /dev/hda3 | grep features

If you see dir_index, you're all set. If dir_index is missing, add it using:

umount /dev/hda3tune2fs -O dir_index /dev/hda3e2fsck -fD /dev/hda3mount /dev/hda3

ReiserFS

ReiserFS was built to be fast with lots of small files, so it works well with maildir.

XFS

XFS performance seems to depend on a lot of factors, also on the system and the file system parameters.

http://www.htiweb.inf.br/benchmark/fsbench.htm

http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html

http://wiki.dovecot.org/Quota/Maildir

http://wiki.dovecot.org/Quota/Maildir

There are early reports on the dovecot mailing list which suggest that XFS seems quite a lot slower than ext3 or ReiserFS: http://www.dovecot.org/list/dovecot/2007-January/018994.html

But then again others recommend XFS for the use with Maildir and dovecot: http://www.dovecot.org/list/dovecot/2006-May/013216.html

This 2007 Linux.conf.au talk about "Choosing and Tuning Linux File Systems" (Slides as PDF) also recommends XFS for Maildir (alternatively ext3 with small blocks and high inodetofile ratio)

Someone else wrote here in the wiki: XFS on TSL 3.0.5 works almost twice as fast as our prior EXT3 installation of which is significant in size. ReiserFS is also a good option.

Comparisons which suggest XFS as being best choice:

o http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html o http://www.htiweb.inf.br/benchmark/fsbench.htm

Various tips

Mounting XFS with logbufs=8 option might increase the speed. Create the XFS with options -b size=1024 -d su=16k,sw=3 -

l logdev=<some_other_device> (Source: http://www.thesmbexchange.com/eng/qmail_fs_benchmark.html)

Use mkfs.xfs -f -l size=32768b,version=2 and mount.xfs -o noatime,logbufs=8,logbsize=131072 (Source: http://www.htiweb.inf.br/benchmark/fsbench.htm)

Directory Structure

Dovecot uses Maildir++ directory layout for organizing mailbox directories. This means that all the folders are directly inside ~/Maildir directory:

~/Maildir/new, ~/Maildir/cur and ~/Maildir/tmp directories contain the messages for INBOX. The tmp directory is used during delivery, new messages arrive in new and read shall be moved to cur by the clients.

~/Maildir/.folder/ is a mailbox folder ~/Maildir/.folder.subfolder/ is a subfolder of a folder (ie. "folder/subfolder")

Most importantly this means that if your maildir folders exist in eg. ~/Maildir/folder and ~/Maildir/folder/subfolder, Dovecot won't see them unless you rename them to Maildir++ layout. v1.1 supports them by adding :LAYOUT=fs to mail_location.

subscriptions file contains IMAP's mailbox subscriptions. (Note difference with Mbox.)

Issues with the specification

http://wiki.dovecot.org/MailboxFormat/mbox#Directory_Structure


http://wiki.dovecot.org/MailLocation

http://www.inter7.com/courierimap/README.maildirquota.html





http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/348.pdf

http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/348.pdf

http://www.dovecot.org/list/dovecot/2006-May/013216.html

http://www.dovecot.org/list/dovecot/2007-January/018994.html

Locking

Although maildir was designed to be lockless, Dovecot locks the maildir while doing modifications to it or while looking for new messages in it. This is required because otherwise Dovecot might temporarily see mails incorrectly deleted, which would cause trouble. Basically the problem is that if one process modifies the maildir (eg. a rename() to change a message's flag), another process in the middle of listing files at the same time could skip a file. The skipping happens because readdir() system call doesn't guarantee that all the files are returned if the directory is modified between the calls to it. This problem exists with all the commonly used filesystems.

Because Dovecot uses its own non-standard locking (dovecot-uidlist.lock dotlock file), other MUAs accessing the maildir don't support it. This means that if another MUA is updating messages' flags or expunging messages, Dovecot might temporarily lose some message. After the next sync when it finds it again, an error message may be written to log and the message will receive a new UID.

Delivering mails to new/ directory doesn't have any problems, so there's no need for LDAs to support any type of locking.

Mail delivery

Qmail's how a message is delivered page suggests to deliver the mail like this:

1. Create a unique filename (only "time.pid.host" here, later Maildir spec has been updated to allow more uniqueness identifiers)

2. Do stat(tmp/<filename>). If the stat() found a file, wait 2 seconds and go back to step 1.

3. Create and write the message to the tmp/<filename>. 4. link() it into new/ directory. Although not mentioned here, the link() could again fail if the mail

existed in new/ dir. In that case you should probably go back to step 1.

All this trouble is rather pointless. Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection and will just as easily pass duplicate filenames through and overwrite existing mails.

Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination by itself should already guarantee that it never finds such a file. If it does, something's broken and the stat() check won't help since another process might be doing the same thing at the same time, and you end up writing to the same file in tmp/, causing the mail to get corrupted.

In step 4 the link() would fail if an identical file already existed in the maildir, right? Wrong. The file may already have been moved to cur/ directory, and since it may contain any number of flags by then you can't check with a simple stat() anymore if it exists or not.

http://www.qmail.org/man/man5/maildir.html

Step 2 was pointed out to be useful if clock had moved backwards. However again this doesn't give any actual safety guarantees, because an identical base filename could already exist in cur/. Besides if the system was just rebooted, the file in tmp/ could probably be even overwritten safely (assuming it wasn't already link()ed to new/).

So really, all that's important in not getting mails overwritten in your maildir is the step 1: Always create filenames that are guaranteed to be unique. Forget about the 2 second waits and such that the Qmail's man page talks about.

Procmail Problems

Maildir format is somewhat compatible with MH format. This is sometimes a problem when people configure their procmail to deliver mails to Maildir/new. This makes procmail create the messages in MH format, which basically means that the file is called msg.inode_number. While this appears to work first, after expunging messages from the maildir the inodes are freed and will be reused later. This means that another file with the same name may come to the maildir, which makes Dovecot think that an expunged file reappeared into the mailbox and an error is logged.

The proper way to configure procmail to deliver to a Maildir is to use Maildir/ as the destination.

Mbox Mailbox FormatContents

1. Mbox Mailbox Format 1. Locking

1. Dotlock 2. Deadlocks

2. Directory Structure 3. Dovecot's Metadata 4. Dovecot's Speed Optimizations 5. From Escaping 6. Mbox Variants 7. References

Usually UNIX systems are configured by default to deliver mails to /var/mail/username or /var/spool/mail/username mboxes. In IMAP world these files are called INBOX mailboxes. IMAP protocol supports multiple mailboxes however, so there needs to be a place for them as well. Typically they're stored in ~/mail/ or ~/Mail/ directories.

The mbox file contains all the messages of a single mailbox. Because of this, the mbox format is typically thought of as a slow format. However with Dovecot's indexing this isn't true. Only expunging messages from the beginning of a large mbox file is slow with Dovecot, most other

http://wiki.dovecot.org/MailboxFormat/mbox#References

http://wiki.dovecot.org/MailboxFormat/mbox#Mbox_Variants

http://wiki.dovecot.org/MailboxFormat/mbox#From_Escaping

http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations

http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Metadata


http://wiki.dovecot.org/MailboxFormat/mbox#Deadlocks

http://wiki.dovecot.org/MailboxFormat/mbox#Dotlock

http://wiki.dovecot.org/MailboxFormat/mbox#Locking

http://wiki.dovecot.org/MailboxFormat/mbox#Mbox_Mailbox_Format

operations should be fast. Also because all the mails are in a single file, searching is much faster than with maildir.

Modifications to mbox may require moving data around within the file, so interruptions (eg. power failures) can cause the mbox to break more or less badly. Although Dovecot tries to minimize the damage by moving the data in a way that data should never get lost (only duplicated), mboxes still aren't recommended to be used for important data.

Locking

Locking is a mess with mboxes. There are multiple different ways to lock a mbox, and software often uses incompatible locking. See MboxLocking for how to check what locking methods some commonly used programs use.

There are at least four different ways to lock a mbox:

dotlock: mailboxname.lock file created by almost all software when writing to mboxes. This grants the writer an exclusive lock over the mbox, so it's usually not used while reading the mbox so that other processes can also read it at the same time. So while using a dotlock typically prevents actual mailbox corruption, it doesn't protect against read errors if mailbox is modified while a process is reading.

flock: flock() system call is quite commonly used for both read and write locking. The read lock allows multiple processes to obtain a read lock for the mbox, so it works well for reading as well. The one downside to it is that it doesn't work if mailboxes are stored in NFS.

fcntl: Very similar to flock, also commonly used by software. In some systems this fcntl() system call is compatible with flock(), but in other systems it's not, so you shouldn't rely on it. fcntl works with NFS if you're using lockd daemon in both NFS server and client.

lockf: POSIX lockf() locking. Because it allows creating only exclusive locks, it's somewhat useless so Dovecot doesn't support it. With Linux lockf() is internally compatible with fcntl() locks, but again you shouldn't rely on this.

Dotlock

Another problem with dotlocks is that if the mailboxes exist in /var/mail/, the user may not have write access to the directory, so the dotlock file can't be created. There are a couple of ways to work around this:

Give a mail group write access to the directory and then make sure that all software requiring access to the directory runs with the group's privileges. This may mean making the binary itself setgid-mail, or using a separate dotlock helper program which is setgid-mail. With Dovecot this can be done by setting mail_privileged_group = mail.

Set sticky bit to the directory (chmod +t /var/mail). This makes it somewhat safe to use, because users can't delete each others mailboxes, but they can still create new files

http://wiki.dovecot.org/MboxLocking

(the dotlock files). The downside to this is that users can create whatever files they wish in there, such as a mbox for newly created user who hadn't yet received mail.

Deadlocks

If multiple lock methods are used, which is usually the case since dotlocks aren't typically used for read locking, the order in which the locking is done is important. Consider if two programs were running at the same time, both use dotlock and fcntl locking but in different order:

Program A: fcntl locks the mbox Program B at the same time: dotlocks the mbox Program A continues: tries to dotlock the mbox, but since it's already dotlocked by B, it starts

waiting Program B continues: tries to fcntl lock the mbox, but since it's already fcntl locked by A, it starts

waiting

Now both of them are waiting for each others locks. Finally after a couple of minutes they time out and fail the operation.

Directory Structure

When listing mailboxes, Dovecot simply assumes that all files it sees are mboxes and all directories mean that they contain sub-mailboxes. There are two special cases however which aren't listed:

.subscriptions file contains IMAP's mailbox subscriptions. .imap/ directory contains Dovecot's index files.

Because it's not possible to have a file which is also a directory, it's not possible to create a mailbox and child mailboxes under it.

Dovecot's Metadata

Dovecot uses C-Client (ie. UW-IMAP, Pine) compatible headers in mbox messages to store metadata. These headers are:

X-IMAPbase: Contains UIDVALIDITY, last used UID and list of used keywords X-IMAP: Same as X-IMAPbase but also specifies that the message is a "pseudo message" X-UID: Message's allocated UID Status: R (\Seen) and O (non-\Recent) flags X-Status: A (\Answered), F (\Flagged), T (\Draft) and D (\Deleted) flags X-Keywords: Message's keywords Content-Length: Length of the message body in bytes

Whenever any of these headers exist, Dovecot treats them as its own private metadata. It does sanity checks for them, so the headers may also be modified or removed completely. None of these headers are sent to IMAP/POP3 clients when they read the mail. Preferably your LDA should strip all these headers before writing the mail to the mbox.

Only the first message contains the X-IMAP or X-IMAPbase header. The difference is that when all the messages are deleted from mbox file, a "pseudo message" is written to the mbox which contains X-IMAP header. This is the "DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA" message which you hate seeing when using non-C-client and non-Dovecot software. This is however important to prevent abuse, otherwise the first mail which is received could contain faked X-IMAPbase header which could cause trouble.

If message contains X-Keywords header, it contains a space-separated list of keywords for the mail. Since the same header can come from the mail's sender, only the keywords are listed in X-IMAP header are used.

The UID for a new message is calculated from "last used UID" in X-IMAP header + 1. This is done always, so fake X-UID headers don't really matter. This is also why the pseudo message is important. Otherwise the UIDs could easily grow over 231 which some clients start treating as negative numbers, which then cause all kinds of problems. Also when 232 is exceeded, Dovecot will also start having some problems.

Content-Length is used as long as another valid mail starts after that many bytes. Because the byte count must be exact, it's quite unlikely that abusing it can cause messages to be skipped (or rather appended to the previous message's body).

Status and X-Status headers are trusted completely, so it's pretty good idea to filter them in LDA if possible.

Dovecot's Speed Optimizations

Updating messages' flags and keywords can be a slow operation since you may have to insert a new header (Status, X-Status, X-Keywords) or at least insert data in the header's value. Some mbox MUAs do this simply by rewriting all of the mbox after the inserted data. If the mbox is large, this can be very slow. Dovecot optimizes this by always leaving some space characters after some of its internal headers. It can use this space to move only minimal amount of data necessary to get the necessary data inserted. Also if data is removed, it just grows these spaces areas.

mbox_lazy_writes setting works by adding and/or updating Dovecot's metadata headers only after closing the mailbox or when messages are expunged from the mailbox. C-Client works the same way. The upside of this is that it reduces writes because multiple flag updates to same message can be grouped, and sometimes the writes don't have to be done at all if the whole message is expunged. The downside is that other processes don't notice the changes immediately (but other Dovecot processes do notice because the changes are in index files).

mbox_dirty_syncs setting tries to avoid re-reading the mbox every time something changes. Whenever the mbox changes (ie. timestamp or size), it first checks if the mailbox's size changed. If it didn't, it most likely meant that only message flags were changed so it does a full mbox read to find it. If the mailbox shrunk, it means that mails were expunged and again Dovecot does a full sync. Usually however the only thing besides Dovecot that modifies the mbox is the LDA which appends new mails to the mbox. So if the mbox size was grown, Dovecot first checks if the last known message is still where it was last time. If it is, Dovecot reads only the newly added messages and goes into a "dirty mode". As long as Dovecot is in dirty mode, it can't be certain that mails are where it expects them to be, so whenever accessing some mail, it first verifies that it really is the correct mail by finding its X-UID header. If the X-UID header is different, it fallbacks to a full sync to find the mail's correct position. The dirty mode goes away after a full sync. If mbox_lazy_writes was enabled and the mail didn't yet have X-UID header, Dovecot uses MD5 sum of a couple of headers to compare the mails.

mbox_very_dirty_syncs does the same as mbox_dirty_syncs, but the dirty state is kept also when opening the mailbox. Normally opening the mailbox does a full sync if it had been changed outside Dovecot.

From Escaping

In mboxes a new mail always begins with a "From " line, commonly referred to as From_-line. To avoid confusion, lines beginning with "From " in message bodies are usually prefixed with '>' character while the message is being written to in mbox.

Dovecot doesn't currently do this escaping however. Instead it prevents this confusion by adding Content-Length headers so it knows later where the next message begins. Dovecot doesn't either remove the '>' characters before sending the data to clients. Both of these will probably be implemented later.

Mbox Variants

There are a few minor variants of this format:

mboxo is the name of original mbox format originated with Unix System V. Messages are stored in a single file, with each message beginning with a line containing "From SENDER DATE". If "From" occurs at the beginning of a line anywhere in the email, it is escaped with a greater-than sign (>From).

mboxrd was named for Raul Dhesi in June 1995, though several people came up with the same idea around the same time. An issue with the mboxo format was that if the text ">From" appeared in the body of an email (such as from a reply quote), it was not possible to distinguish this from the mailbox format's ">From". mboxrd fixes this by always quoting ">From" lines as well, so readers can just remove the first ">" character. This format is used by qmail.

mboxcl format was originated with Unix System V Release 4 mail tools. It adds a Content-Length field which indicates the number of bytes in the message. This is used to determine message boundaries. It still quotes "From" as the original mboxo format does (and not as mboxrd does it).

mboxcl2 is like mboxcl but does away with the "From" quoting.

MMDF (Multi-channel Memorandum Distribution Facility mailbox format) was originated with the MMDF daemon. The format surrounds each message with lines containing four control-A's. This eliminates the need to escape From: lines.

Dovecot currently uses mboxcl2 format internally, but it's planned to move to combination of mboxrd and mboxcl.

LDA IndexingDovecot v1.0's deliver updates the main index file while message is being saved. This is useful with mbox format, especially if mbox_very_dirty_syncs=no. With Maildir the benefits of this are pretty small.

Dovecot v1.1+ deliver updates also cache file, which can be very useful with all mailbox formats. It means that when IMAP client wants to fetch the message's metadata (e.g. some header fields) they're already found from the cache file and Dovecot doesn't have to open and parse the message file. There are some tradeoffs though:

LDA indexing wastes disk I/O because it has to open and update index files LDA indexing saves disk I/O because it already has the message body in memory, so it

doesn't need to read it from disk. IMAP indexing wastes disk I/O because it has to open and read message files IMAP indexing may save disk I/O because IMAP process always has index files opened,

and many IMAP clients are configured to download all new message bodies anyway, so the second time message bodies are read they're already in memory

So it depends on IMAP client if it's faster to use LDA or IMAP time indexing. In any case the user experience is typically faster with LDA indexing, because the message list metadata can be returned faster when it's pre-indexed.

See IndexFiles for more information about what the index files contain.

Non-indexed mail delivery

Ignoring the benefits of cache file updates, the only thing left is the main index updates. As mentioned above, with Maildir format these benefits are very small. This also means that it's perfectly fine to use a non-Dovecot MDA to deliver mails that doesn't update indexes. Dovecot

can efficiently see and index such new mails without doing anything expensive like "rebuilding indexes".

Dovecot's index filesThe basic idea behind Dovecot's index files is that it makes reading the mailboxes a lot faster. The index files consist of the following files:

dovecot.index: Main index file dovecot.index.cache: Cached mailbox data dovecot.index.log: Transaction log file dovecot.index.log.2: .log file is rotated to .log.2 file when it grows too large.

Each mailbox has its own separate index files. If the index files are disabled, the same structures are still kept in the memory, except cache file is disabled completely (because the client probably won't fetch the same data twice within a connection).

If index files are missing, Dovecot creates them automatically when the mailbox is opened. If at any point creating a file or growing a file gives "not enough disk space" error, the indexes are transparently moved to memory for the rest of the session.

See Design/Indexes for more technical information how the index files are handled.

Main index

The main index contains the following information for each message:

IMAP UID Current flags and keywords Pointer to cache file mbox-only: mbox file offset mbox-only: MD5 sum of some of the message headers, intended to help find the message when

its X-UID: header hasn't yet been written Other extensions in Dovecot v1.1+, such as mailbox sorting data

This is the same information that most other IMAP servers keep in memory while the mailbox is open, but Dovecot has the advantage of keeping the information permanently stored so it's easy to get it when opening the mailbox.

The index file's header also contains some summary information, such as how many messages exist, how many of them are unseen and how many are marked with \Deleted flag. Opening mailboxes and answering to STATUS IMAP commands can be usually done simply by getting the required information from the index file's header. This is why these operations are extremely fast with Dovecot compared to other servers that don't use an equivalent index file.

http://wiki.dovecot.org/Design/Indexes

Mailbox synchronization

The main index's header also contains mailbox syncing state:

Maildir: cur/ and new/ directories' timestamps mbox: mbox file's mtime and size

The index file is synchronized against mailbox only if the syncing information changes.

Cache file

Cache file may contain the following information for messages:

Message headers (some, not all) Sent date (parsed Date: header) Received date (IMAP's INTERNALDATE field) Physical and virtual message sizes Message's parsed MIME structure, allowing to quickly read only a specific MIME part (IMAP's

FETCH BODY[1.2.3] command) IMAP's BODY and BODYSTRUCTURE fields

o If both are used, only BODYSTRUCTURE is saved, since BODY can be generated from it IMAP's ENVELOPE isn't cached currently. Instead the headers used to build it are cached

directly.

IMAP clients can work in many different ways. There are basically 2 types:

1. Online clients that ask for the same information multiple times (eg. webmails, Pine) 2. Offline clients that usually download first some of the interesting message headers and only

after that the message bodies (possibly automatically, or possibly only when the user opens the mail). Most IMAP clients behave like this.

Cache file is extremely helpful with the type 1 clients. The first time that client requests message headers or some other metadata they're stored into the cache file. The second time they ask for the same information Dovecot can now get it quickly from the cache file instead of opening the message and parsing the headers.

For type 2 clients the cache file is helpful if they use multiple clients or if the data was cached while the message was being saved (Dovecot v1.1+ can do this). Some of the information is helpful in any case, for example it's required to know the message's virtual size when downloading the message. Without the virtual size being in cache Dovecot first has to read the whole message to calculate it.

Only the mailbox metadata that client(s) have asked for earlier are stored into cache file. This allows Dovecot to be adaptive to different clients' needs and still not waste disk space (and cause extra disk I/O!) for fields that client never needs.

Dovecot can cache fields either permanently or temporarily. Temporarily cached fields are dropped from the cache file after about a week. Dovecot uses two rules to determine when data should be cached permanently instead of temporarily:

1. Client accessed messages in non-sequential order within this session. This most likely means it doesn't have a local cache.

2. Client accessed a message older than one week.

Design/Indexes/Cache explains the reasons for these rules.

Transaction log

All changes to the main index go through transaction log first. This has two advantages when the mailbox is accessed using multiple simultaneous connections:

1. It allows getting a list of changes quickly so that IMAP clients can be notified of the changes. An alternative would be to do a comparison of two index mappings, which is what most other IMAP servers do.

2. mmap_disable=yes implementation relies on the transaction log. Instead of re-reading the whole main index file after each change it's necessary to only read a few bytes from the transaction log.

In Dovecot v1.1+ the transaction log plays an even more important role. The main index file is updated only "once in a while" to reduce disk writes, so it is common to first read the main index and then apply new changes from the transaction log on top of that. With empty mailboxes (eg. download+delete POP3 users) it would even be possible to delete the whole main index and keep only the transaction log (although this isn't done currently).

Cache fileCache file is used for storing immutable data. It supports several different kinds of fields:

MAIL_CACHE_FIELD_FIXED_SIZEThe field size doesn't need to be stored in the cache file. It's always the same.

MAIL_CACHE_FIELD_BITMASKA fixed size bitmask field. It's possible to add new bits by updating this field. All the added fields are ORed together.

MAIL_CACHE_FIELD_VARIABLE_SIZEVariable sized binary data.

MAIL_CACHE_FIELD_STRINGVariable sized string.

MAIL_CACHE_FIELD_HEADER

http://wiki.dovecot.org/Design/Indexes/Cache

Variable sized message header. The data begins with a 0-terminated uint32_t line_numbers[]. The line number exists only for each header, header continuation lines in multiline headers don't get listed. After the line numbers comes the list of headers, including the "header-name: " prefix for each line, LFs and the TABs or spaces for continued lines.

The last 3 variable sized fields are treated identically by the cache file code. Their main purpose is to make it easier for "dump cache file's contents" programs (src/util/idxview) to do their job.

Locking

Because cache file is typically used in potentially long-running operations, such as with IMAP command FETCH 1:* (BODY.PEEK[] ENVELOPE BODYSTRUCTURE) it's important that updating the cache file doesn't block out any other readers. Also because the readers are often also writers (if something isn't cached, it's added there), it's important that they don't block writers either.

Reading cache files requires no locking. Writing is done by first locking the file, reserving some space to write to, and immediately after that unlocking the file. This way the transaction can keep writing to the cache file as long as it wants to without blocking other writers. When the transaction is committed, the updated cache offsets are written to the transaction log which makes them visible to other processes.

This also means that it's possible for two processes to write the same cached fields twice to the cache file. Because the data written to the cache file are really just cached data, the fields' contents are identical. Having the data exist twice (or even more times) means wasting some disk space, but otherwise it isn't a problem. The duplicates are dropped the next time the file is compressed.

Cache decisions

Dovecot tries to be smart about what it keeps in the cache file. If the client never fetches the cached data, it's just waste of disk space and disk I/O.

The caching decisions are:

MAIL_CACHE_DECISION_NOThis field isn't cached currently.

MAIL_CACHE_DECISION_TEMPThis field is cached for new mails.

MAIL_CACHE_DECISION_YESThis field is cached for all mails.

Normally Dovecot changes the decisions based on what fields are fetched and for what messages. A specific decision can be forced by ORing it with MAIL_CACHE_DECISION_FORCED.

mail-cache-decisions.c file contains the rules how Dovecot changes the decisions. The following is copied from the file:

Users can be divided to three groups:

1. Most users will use only a single IMAP client which caches everything locally. For these users it's quite pointless to do any kind of caching as it only wastes disk space. That might also mean more disk I/O.

2. Some users use multiple IMAP clients which cache everything locally. These could benefit from caching until all clients have fetched the data. After that it's useless.

3. Some clients don't do permanent local caching at all. For example Pine and webmails. These clients would benefit from caching everything. Some locally caching clients might also access some data from server again, such as when searching messages. They could benefit from caching only these fields.

After thinking about these a while, I figured out that people who care about performance most will be using Dovecot optimized LDA anyway which updates the indexes/cache immediately. In that case even the first user group would benefit from caching the same way as second group. LDA reads the mail anyway, so it might as well extract some information about it and store them into cache.

So, group 1. and 2. could be optimally implemented by keeping things cached only for a while. I thought a week would be good. When cache file is compressed, everything older than week will be dropped.

But how to figure out if user is in group 3? One quite easy rule would be to see if client is accessing messages older than a week. But with only that rule we might have already dropped useful cached data. It's not very nice if we have to read and cache it twice.

Most locally caching clients always fetch new messages (all but body) when they see them. They fetch them in ascending order. Noncaching clients might fetch messages in pretty much any order, as they usually don't fetch everything they can, only what's visible in screen. Some will use server side sorting/threading which also makes messages to be fetched in random order. Second rule would then be that if a session doesn't fetch messages in ascending order, the fetched field type will be permanently cached.

So, we have three caching decisions:

1. Don't cache: Clients have never wanted the field 2. Cache temporarily: Clients want this only once 3. Cache permanently: Clients want this more than once

Different mailboxes have different decisions. Different fields have different decisions.

There are some problems, such as if a client accesses message older than a week, we can't know if user just started using a new client which is just filling its local cache for the first time. Or it

might be a client user hasn't just used for over a week. In these cases we shouldn't have marked the field to be permanently cached. User might also switch clients from non-caching to caching.

So we should re-evaluate our caching decisions from time to time. This is done by checking the above rules constantly and marking when was the last time the decision was right. If decision hasn't matched for two months, it's changed. I picked two months because people go to at least one month vacations where they might still be reading mails, but with different clients.

Dovecot's index filesDovecot's index files consist of three different files:

Main index file (dovecot.index) Transaction log (dovecot.index.log and dovecot.index.log.2) Cache file (dovecot.index.cache)

See IndexFiles for more generic information about what they contain and why.

The index files can be accessed using mail-index.h API.

Locking

The index files are designed so that readers cannot block a writer, and write locks are always short enough not to cause other processes to wait too long. Dovecot v0.99's index files didn't do this, and it was common to get lock timeouts when using multiple connections to the same large mailbox.

The main index file is the only file which has read locks. They can however block the writer only for two seconds (and even this could be changed to not block at all). The writes are locked only for the duration of the mailbox synchronization.

Transaction logs don't require read locks. The writing is locked for the duration of the mailbox synchronization, and also for single transaction appends.

Cache files doesn't require read locks. They're locked for writing only for the duration of allocating space inside the file. The actual writing inside the allocated space is done without any locks being held.

In future these could be improved even further. For example there's no need to keep any index files locked while synchronizing, as long the mailbox backend takes care of the locking issues. Also writing to transaction log could work in a similar way to cache files: Lock, allocate space, unlock, write.

Lockless integers

Dovecot uses several different techniques to allow reading files without locking them. One of them uses fields in a "lockless integer" format. Initially these fields have "unset" value. They can be set to a wanted value in range 0..228 (with 32bit fields) once, but they cannot be changed. It would be possible to set them back to "unset", but setting them the second time isn't safe anymore, so Dovecot never does this.

The lockless integers work by allocating one bit from each byte of the value to "this value is set" flag. The reader then verifies that the flag is set for the value's all bytes. If all of them aren't set, the value is still "unset". Dovecot uses the highest bit for this flag. So for example:

0x00000000: The value is unset 0xFFFF7FFF: The value is unset, because one of the bytes didn't have the highest bit set 0xFFFFFFFF: The value is 228-1 0x80808080: The value is 0 0x80808180: The value is 0x80

Dovecot contains mail_index_uint32_to_offset() and mail_index_offset_to_uint32() functions to translate values between integers and lockless integers. The "unset" value is returned as 0, so it's not possible to differentiate between "unset" and "set" 0 values.

Main indexThe main index can be used to quickly look up messages' UIDs, flags, keywords and extension-specific data, such as cache file or mbox file offsets.

Reading, writing and locking

Reading dovecot.index file requires locking, unfortunately. Shared read locking is done using the standard index locking method specified in lock_method setting (lock_method parameter for mail_index_open()).

Writing to index files requires transaction log to be exclusively locked first. This way the index locking only has to worry about existing read locks. The locking works by first trying to lock the

index with the standard locking method, but if it couldn't acquire the lock in two seconds, it'll fallback to copying the index file to a temporary file, and when unlocking it'll rename() the temporary file over the dovecot.index file. Note that this is safe only because of the exclusive transaction log lock. This way the writers are never blocked by readers who are allowed to keep the shared lock as long as they want.

The copy-locking is used always when doing anything that could corrupt the index file if it crashed in the middle of an operation. For example if the header or record size changes, or if messages are expunged. New messages can be appended however, because the message count in the header is updated last. Expunging the last messages would probably be safe also (because only the header needs updating), but it's not done currently.

The index file should never be directly modified. Everything should go through the transaction log, and the only time the index needs to be write-locked is when transactions are written to it.

Currently the index file is updated whenever the backend mailbox is synchronized. This isn't necessary, because an old index file can be updated using the transaction log. In future there could be some smarter decisions about when writing to the index isn't worth the extra disk writes.

Header

Fields that won't change without recreating the index:

major_version

If this doesn't match MAIL_INDEX_MAJOR_VERSION, don't try to read the index. Dovecot recreates the index file then.

minor_version

If this doesn't match MAIL_INDEX_MINOR_VERSION there are some backwards compatible changes in the index file (typically header fields). Try to preserve the headers and the minor version when updating the index file.

base_header_size

Extension headers begin after the base headers. This is normally the same as sizeof(struct mail_index_header).

header_sizeRecords begin after base and extension headers.

record_size

Size of each record and its extensions. Initially the same as sizeof(struct mail_index_record).

compat_flags

Currently there is just one compatibility flag: MAIL_INDEX_COMPAT_LITTLE_ENDIAN. Dovecot doesn't try to bother to read different endianess files, they're simply recreated.

indexidUnique index file ID. This is used to make sure that the main index, transaction log and cache file are all part of the same index.

Header flags:

MAIL_INDEX_HDR_FLAG_CORRUPTEDSet whenever the index file is found to be corrupted. If the reader notices this flag, it shouldn't try to continue using the index.

MAIL_INDEX_HDR_FLAG_HAVE_DIRTY

This index has records with MAIL_INDEX_MAIL_FLAG_DIRTY flag set.

MAIL_INDEX_HDR_FLAG_FSCK

Call mail_index_fsck() as soon as possible. This flag isn't actually set anywhere currently.

Message UIDs and counters:

uid_validityIMAP UIDVALIDITY field. Initially can be 0, but after it's set we don't currently try to even handle the case of UIDVALIDITY changing. It's done by marking the index file corrupted and recreating it. That's a bit ugly, but typically the UIDVALIDITY never changes.

next_uidUID given to the next appended message. Only increases.

messages_countNumber of records in the index file.

recent_messages_count

Number of records with MAIL_RECENT flag set.

seen_messages_count

Number of records with MAIL_SEEN flag set.

deleted_messages_count

Number of records with MAIL_DELETED flag set.

first_recent_uid_lowwater

There are no UIDs lower than this with MAIL_RECENT flag set.

first_unseen_uid_lowwater

There are no UIDs lower than this without MAIL_SEEN flag set.

first_deleted_uid_lowwater

There are no UIDs lower than this with MAIL_DELETE flag set.

The lowwater fields are used to optimize searching messages with/without a specific flag.

Fields related to syncing:

log_file_seqLog file the log_*_offset fields point to.

log_file_int_offset, log_file_ext_offset

All the internal/external transactions before this offset in the log file are synced to the index. External transactions are synced more often than internal, so log_file_int_offset <= log_file_ext_offset.

sync_size, sync_stampUsed by the mailbox backends to store their synchronization information. Some day these should be removed and replaced with extension headers.

Then there are day fields:

day_stampUNIX timestamp to the beginning of the day when new records were last added to the index file.

day_first_uid[8]

These fields are updated when day_stamp < today. The [0..6] are first moved to [1..7], then [0] is set to the first appended UID. So they contain the first UID of the day for last 8 days when messages were appended.

The day_first_uid[] fields are used by cache file compression to decide when to drop MAIL_CACHE_DECISION_TEMP data.

Extension headers

After the base header comes a list of extensions and their headers. The first extension begins from mail_index_header.base_header_size offset. The second begins after the first one's data[] and so on. The extensions always begin 64bit aligned however, so you may need to skip a few bytes always. Read the extensions as long as the offset is smaller than mail_index_header.header_size.

struct mail_index_ext_header { uint32_t hdr_size; /* size of data[] */ uint32_t reset_id; uint16_t record_offset; uint16_t record_size; uint16_t record_align; uint16_t name_size; /* unsigned char name[name_size] */ /* unsigned char data[hdr_size] (starting 64bit aligned) */};

reset_id, record offset, size and alignment is explained in Design/Indexes/TransactionLog's struct mail_transaction_ext_intro.

Records

There are hdr.messages_count records in the file. Each record contains at least two fields: Record UID and flags. The UID is always increasing for the records, so it's possible to find a record by its UID with binary search. The record size is specified by mail_index_header.record_size.

The flags are a combination of enum mail_flags and enum mail_index_mail_flags. There exists only one index flag currently: MAIL_INDEX_MAIL_FLAG_DIRTY. If a record has this flag set, it means that the mailbox syncing code should ignore the flag in the mailbox and use the flag in the index file instead. This is used for example with mbox and mbox_lazy_writes=yes. It also allows having modifiable flags for read-only mailboxes.

The rest data is stored in record extensions.

Keywords

The keywords are stored in record extensions, but for better performance and lower disk space usage in transaction logs, they are quite tightly integrated to the index file code.

The list of keywords is stored in "keywords" extension header:

struct mail_index_keyword_header { uint32_t keywords_count; /* struct mail_index_keyword_header_rec[] */ /* char name[][] */};struct mail_index_keyword_header_rec {

uint32_t unused; /* for backwards compatibility */ uint32_t name_offset; /* relative to beginning of name[] */};

The unused field originally contained count field, but while writing this documentation I noticed it's not actually used anywhere. Apparently it was added there accidentally. It'll be removed in later versions.

So there exists keywords_count keywords, each listed in a NUL-terminated string beginning from name_offset.

Since crashing in the middle of updating the keywords list pretty much breaks the keywords, adding new keywords causes the index file to be always copied to a temporary file and be replaced.

The keywords in the records are stored in a "keywords" extension bitfield. So the nth bit in the bitfield points to the nth keyword listed in the header.

It's not currently possible to safely remove existing keywords.

Extensions

The extensions only specify their wanted size and alignmentation, the index file syncing code is free to assign any offset inside the record to them. The extensions may be reordered at any time.

Dovecot's current extension ordering code works pretty well, but it's not perfect. If the extension size isn't the same as its alignmentation, it may create larger records than necessary. This will be fixed later.

The records size is always divisible by the maximum alignmentation requirement. This isn't strictly necessary either, so it could be fixed later as well.

Transaction logThe transaction log is a bit similar to transaction logs in databases. All the updates to the main index files are first written to the transaction log, and only after that the main index file is updated. There are several advantages to this:

It provides atomic transactions: The transaction either succeeds, or it doesn't. For example if a transaction sets a flag to one message and removes it from another, it's guaranteed that both changes happen.

o When updating the changes to the main index file, the last thing that's done is to update the "transaction log position" in the header. So if Dovecot crashes after

having updated only the first flag, the next time the mailbox is opened both of the changes are done all over again.

It allows another process to quickly see what changes have been made. For example IMAP needs to get a list of external changes after each command.

o This is also important when storing the index files in NFS or in a clustered filesystem. Instead of re-reading the whole index file after each external change, Dovecot can simply read the new changes from the transaction log and apply them to the in-memory copy of the main index. In-memory caching of dovecot.index.cache file also relies on the transaction log telling what parts of the file has changed.

In future the transaction logs can be somewhat easily used to implement replication.

Internal vs. external

Transactions are either internal or external. The difference is that external transactions describe changes that were already made to the mailbox, while internal transactions are commands to do something to the mailbox. When beginning to synchronize a mailbox with index files, the index file is first updated with all the external changes, and the uncommitted internal transactions are applied on top of them.

When synchronizing the mailbox, using the synchronization transaction writes only external transactions. Also if the index file is updated when saving new mails to the mailbox, the append transactions must be external. This is because the changes are already in the mailbox at the time the transaction is read.

Reading and writing

Reading transaction logs doesn't require any locking at all. Writing is exclusively locked using the index files' default lock method (as specified by the lock_method setting).

A new log is created by first creating a dovecot.index.log.newlock dotlock file. Once you have the dotlock, check again that the dovecot.index.log wasn't created (or recreated) by another process. If not, go ahead and write the log header to the dotlock file and finally rename() it to dovecot.index.log.

Currently there doesn't exist actual transaction boundaries in the log file. All the changes in a transaction are simply written as separate records to the file. Each record begins with a struct mail_transaction_header, which contains the record's size and type. The size is in lockless integer format.

The first transaction record is written with the size field being 0. Once the whole transaction has been written, the 0 is updated with the actual size. This way the transaction log readers won't see partial transactions because they stop at the size=0 if the transaction isn't fully written yet.

Note that because there are no transaction boundaries, there's a small race condition here with mmap()ed log files:

1. Process A: write() half of the transaction 2. Process B: mmap() the file. 3. Process A: write() the rest of the transaction, updating the size=0 also 4. Process B: parse the log file. it'll go past the original size=0 because the size had changed

in the mmap, but it stops in the middle of the transaction because the mmap size doesn't contain the whole transaction

This probably isn't a big problem, because I've never seen this happen even with stress tests. Should be fixed at some point anyway.

Header

The transaction log's header never changes, except the indexid field may be overwritten with 0 if the log is found to be corrupted. The fields are:

major_version

If this doesn't match MAIL_TRANSACTION_LOG_MAJOR_VERSION, don't try to parse it. If Dovecot sees this, it'll recreate the log file.

minor_version

If this doesn't match MAIL_TRANSACTION_LOG_MINOR_VERSION, the log file contains some backwards compatible changes. Currently you can just ignore this field.

hdr_size

Size of the log file's header. Use this instead of sizeof(struct mail_transaction_log_header), so that it's possible to add new fields and still be backwards compatible.

indexidThis field must match to main index file's indexid field.

file_seqThe file's creation sequence. Must be increasing.

prev_file_seq, prev_file_offsetContains the sequence and offset of where the last transaction log ended. When transaction log is rotated and the reader's "sync position" still points to the previous log file, these fields allow it to easily check if there had been any more changes in the previous file.

create_stampUNIX timestamp when the file was created. Used in determining when to rotate the log file.

Record header

The transaction record header (struct mail_transaction_header) contains size and type fields. The size field is in lockless integer format. A single transaction record may contain multiple changes of the same type, although some types don't allow this. Because the size of the transaction record for each type is known (or can be determined from the type-specific record contents), the size field can be used to figure out how many changes need to be done. So for example a record can contain:

struct mail_transaction_header { type = MAIL_TRANSACTION_APPEND, size =

sizeof(struct mail_index_record) * 2 } struct mail_index_record { uid = 1, flags = 0 } struct mail_index_record { uid = 2, flags = 0 }

UIDs

Many record types contain uint32_t uid1, uid2 fields. This means that the changes apply to all the messages in uid1..uid2 range. The messages don't really have to exist in the range, so for example if the first messages in the mailbox had UIDs 1, 100 and 1000, it would be possible to use uid1=1, uid2=1000 to describe changes made to these 3 messages. This also means that it's safe to write transactions describing changes to messages that were just expunged by another process (and already written to the log file before our changes).

Appends

As described above, the appends must be in external transactions. The append transaction's contents is simply the struct mail_index_record, so it contains only the message's UID and flags. The message contents aren't written to transaction log. Also if the message had any keywords when it was appended, they're in a separate transaction record.

Expunges

Because expunges actually destroy messages, they deserve some extra protection to make it less likely to accidentally expunge wrong messages in case of for example file corruption. The expunge transactions must have MAIL_TRANSACTION_EXPUNGE_PROT ORed to the transaction type field. If an expunge type is found without it, assume a corrupted transaction log.

Flag changes

The flag changes are described in:

struct mail_transaction_flag_update { uint32_t uid1, uid2; uint8_t add_flags;

uint8_t remove_flags; uint16_t padding;};

The padding is ignored completely. A single flag update structure can add new flags or remove existing flags. Replacing all the files works by setting remove_flags = 0xFF and the add_flags containing the new flags.

Keyword changes

Specific keywords can be added or removed one keyword at a time:

struct mail_transaction_keyword_update { uint8_t modify_type; /* enum modify_type : MODIFY_ADD / MODIFY_REMOVE */ uint8_t padding; uint16_t name_size; /* unsigned char name[]; array of { uint32_t uid1, uid2; } */};

There is padding after name[] so that uid1 begins from a 32bit aligned offset.

If you want to replace all the keywords (eg. IMAP's STORE 1:* FLAGS (keyword) command), you'll first have to remove all of them with MAIL_TRANSACTION_KEYWORD_RESET and then add the new keywords.

Extensions

Extension records allow creating and updating extension-specific header and message record data. For example messages' offsets to cache file or mbox file are stored in extensions.

Whenever using an extension, you'll need to first write MAIL_TRANSACTION_EXT_INTRO record. This is a bit kludgy and hopefully will be replaced by something better in future. The intro contains:

struct mail_transaction_ext_intro { /* old extension: set ext_id. don't set name. new extension: ext_id = (uint32_t)-1. give name. */ uint32_t ext_id; uint32_t reset_id; uint32_t hdr_size; uint16_t record_size; uint16_t record_align; uint16_t unused_padding; uint16_t name_size; /* unsigned char name[]; */};

If the extension already exists in the index file (it can't be removed), you can use the ext_id field directly. Otherwise you'll need to give a name to the extension. It's always possible to just give the name if you don't know the existing extension ID, but this uses more space of course.

reset_id contains kind of a "transaction validity" field. It's updated with MAIL_TRANSACTION_EXT_RESET record, which also causes the extension records' contents to be zeroed. If an introduction's reset_id doesn't match the last EXT_RESET, it means that the extension changes are stale and they must be ignored. For example:

dovecot.index.cache file's file_seq header is used as a reset_id. Initially it's 1. Process A: Begins a cache transaction, updating some fields in it. Process B: Decides to compress the cache file, and issues a reset_id = 2 change. Process A: Commits the transaction with reset_id = 1, but the cache file offsets point

to the old file, so the changes must be ignored.

hdr_size specifies the number of bytes the extension wants to have in the index file's header. record_size specifies the number of bytes it wants to use for each record. The sizes may grow or shrink any time. record_align contains the required alignmentation for the field. For example if the extension contains a 32bit integer, you want it to be 32bit aligned so that the process won't crash in CPUs which require proper alignmentation. Then again if you want to access the field as 4 bytes, the alignmentation can be 1.

Extension record updates typically are message-specific, so the changes must be done for each message separately:

struct mail_transaction_ext_rec_update { uint32_t uid; /* unsigned char data[]; */};

Cache fileCache file is used for storing immutable data. It supports several different kinds of fields:

MAIL_CACHE_FIELD_FIXED_SIZEThe field size doesn't need to be stored in the cache file. It's always the same.

MAIL_CACHE_FIELD_BITMASKA fixed size bitmask field. It's possible to add new bits by updating this field. All the added fields are ORed together.

MAIL_CACHE_FIELD_VARIABLE_SIZEVariable sized binary data.

MAIL_CACHE_FIELD_STRINGVariable sized string.

MAIL_CACHE_FIELD_HEADER

Variable sized message header. The data begins with a 0-terminated uint32_t line_numbers[]. The line number exists only for each header, header continuation lines in multiline headers don't get listed. After the line numbers comes the list of headers, including the "header-name: " prefix for each line, LFs and the TABs or spaces for continued lines.

The last 3 variable sized fields are treated identically by the cache file code. Their main purpose is to make it easier for "dump cache file's contents" programs (src/util/idxview) to do their job.

Locking

Because cache file is typically used in potentially long-running operations, such as with IMAP command FETCH 1:* (BODY.PEEK[] ENVELOPE BODYSTRUCTURE) it's important that updating the cache file doesn't block out any other readers. Also because the readers are often also writers (if something isn't cached, it's added there), it's important that they don't block writers either.

Reading cache files requires no locking. Writing is done by first locking the file, reserving some space to write to, and immediately after that unlocking the file. This way the transaction can keep writing to the cache file as long as it wants to without blocking other writers. When the transaction is committed, the updated cache offsets are written to the transaction log which makes them visible to other processes.

This also means that it's possible for two processes to write the same cached fields twice to the cache file. Because the data written to the cache file are really just cached data, the fields' contents are identical. Having the data exist twice (or even more times) means wasting some disk space, but otherwise it isn't a problem. The duplicates are dropped the next time the file is compressed.

Cache decisions

Dovecot tries to be smart about what it keeps in the cache file. If the client never fetches the cached data, it's just waste of disk space and disk I/O.

The caching decisions are:

MAIL_CACHE_DECISION_NOThis field isn't cached currently.

MAIL_CACHE_DECISION_TEMPThis field is cached for new mails.

MAIL_CACHE_DECISION_YESThis field is cached for all mails.

Normally Dovecot changes the decisions based on what fields are fetched and for what messages. A specific decision can be forced by ORing it with MAIL_CACHE_DECISION_FORCED.

mail-cache-decisions.c file contains the rules how Dovecot changes the decisions. The following is copied from the file:

Users can be divided to three groups:

1. Most users will use only a single IMAP client which caches everything locally. For these users it's quite pointless to do any kind of caching as it only wastes disk space. That might also mean more disk I/O.

2. Some users use multiple IMAP clients which cache everything locally. These could benefit from caching until all clients have fetched the data. After that it's useless.

3. Some clients don't do permanent local caching at all. For example Pine and webmails. These clients would benefit from caching everything. Some locally caching clients might also access some data from server again, such as when searching messages. They could benefit from caching only these fields.

After thinking about these a while, I figured out that people who care about performance most will be using Dovecot optimized LDA anyway which updates the indexes/cache immediately. In that case even the first user group would benefit from caching the same way as second group. LDA reads the mail anyway, so it might as well extract some information about it and store them into cache.

So, group 1. and 2. could be optimally implemented by keeping things cached only for a while. I thought a week would be good. When cache file is compressed, everything older than week will be dropped.

But how to figure out if user is in group 3? One quite easy rule would be to see if client is accessing messages older than a week. But with only that rule we might have already dropped useful cached data. It's not very nice if we have to read and cache it twice.

Most locally caching clients always fetch new messages (all but body) when they see them. They fetch them in ascending order. Noncaching clients might fetch messages in pretty much any order, as they usually don't fetch everything they can, only what's visible in screen. Some will use server side sorting/threading which also makes messages to be fetched in random order. Second rule would then be that if a session doesn't fetch messages in ascending order, the fetched field type will be permanently cached.

So, we have three caching decisions:

1. Don't cache: Clients have never wanted the field 2. Cache temporarily: Clients want this only once 3. Cache permanently: Clients want this more than once

Different mailboxes have different decisions. Different fields have different decisions.

There are some problems, such as if a client accesses message older than a week, we can't know if user just started using a new client which is just filling its local cache for the first time. Or it

might be a client user hasn't just used for over a week. In these cases we shouldn't have marked the field to be permanently cached. User might also switch clients from non-caching to caching.

So we should re-evaluate our caching decisions from time to time. This is done by checking the above rules constantly and marking when was the last time the decision was right. If decision hasn't matched for two months, it's changed. I picked two months because people go to at least one month vacations where they might still be reading mails, but with different clients.

QuotaQuota backend specifies the method how Dovecot keeps track of the current quota usage. They don't (usually) specify users' quota limits, that's done by returning extra fields from userdb. There are different quota backends that Dovecot can use:

fs : Filesystem quota. dirsize : The simplest and slowest quota backend, but it works quite well with mboxes. dict : Store quota usage in a dictionary (e.g. SQL). maildir : Store quota usage in Maildir++ maildirsize files. This is the most commonly

used quota for virtual users.

Enabling quota plugins

There are currently two quota related plugins:

quota: Implements the actual quota handling and includes also all the quota backends. imap_quota: For reporting quota information via IMAP.

Usually you'd enable these by adding them to the mail_plugins settings in the config file:

protocol imap { mail_plugins = quota imap_quota}protocol pop3 { mail_plugins = quota}# In case you're using deliver:protocol lda { mail_plugins = quota}

Configuration

The configuration is done differently for v1.0 and v1.1:

v1.0 quota configuration v1.1 quota configuration

Quota and Trash mailbox

Standard way to expunge messages with IMAP works by:

1. Marking message with \Deleted flag 2. Actually expunging the message using EXPUNGE command

Both of these commands can be successfully used while user's quota is full. However many clients use a "move-to-Trash" feature, which works by:

1. COPY the message to Trash mailbox 2. Mark the message with \Deleted 3. Expunge the message from the original mailbox. 4. (Maybe later expunge the message from Trash when "clean trash" feature is used)

If user is over quota (or just under it), the first COPY command will fail and user may get an unintuitive message about not being able to delete messages because user is over quota. The possible solutions for this are:

Disable move-to-trash feature from client Dovecot v1.0 + Maildir++ quota: You can completely ignore Trash mailbox from quota

calculation by appending :ignore=Trash to the quota line. Note that this would allow users to store messages infinitely to the mailbox.

Dovecot v1.1 or v1.0 quota rewrite: You can ignore Trash like with v1.0, but you can also give a separate quota rule giving Trash mailbox somewhat more quota (but not unlimited).

To make sure users don't start keeping messages permanently in Trash you can use a nightly cronjob or expire plugin (v1.1) to expunge old messages from Trash mailbox.

dovecot

Documents

Transcript of dovecot