Go to the first, previous, next, last section, table of contents.

33. Retry configuration

The fifth part of the configuration file contains a list of retry rules which control how often Exim tries to deliver messages that cannot be delivered at the first attempt. If there are no retry rules, Exim gives up after the first failure. The -brt command line option can be used to test which retry rule will be used for a given address or domain.

The most common cause of retries is temporary failure to deliver to a remote host. Exim's retry processing in this case is applied on a per-host (strictly, per IP address) basis, not on a per-message basis. Thus, if one message has recently been delayed, a new message to the same host does not immediately get tried, but waits for the host's retry time to arrive. If the value of log_level is greater than 4, the message `retry time not reached' is written to the main log whenever a delivery is skipped for this reason. Section 48.2 contains more details of the handling of errors during remote deliveries.

Retry processing applies to directing and routing as well as to delivering, except as covered in the next paragraph. The retry rules do not distinguish between these three actions, so it is not possible, for example, to specify different behaviour for failures to route the domain snark.fict.book and failures to deliver to the host snark.fict.book. I didn't think anyone would ever need this added complication, so did not implement it. However, although they share the same retry rule, the actual retry times for routing, directing, and transporting a given domain are maintained independently.

When a delivery is not part of a queue run (typically an immediate delivery on receipt of a message), the directors are always run for local addresses, and local deliveries are always attempted, even if retry times are set for them. This makes for better behaviour if one particular message is causing problems (for example, causing quota overflow, or provoking an error in a filter file). If such a delivery suffers a temporary failure, the retry data gets updated as normal, and subsequent delivery attempts from queue runs occur only when the retry time for the local address is reached.

33.1 Retry rules

Each retry rule occupies one line and consists of three parts, separated by white space: a pattern, an error name, and a list of retry parameters. The rules are searched in order until one is found whose pattern matches the failing host or address.

The pattern may be a complete address (local_part@domain), a plain domain, a wildcarded domain (that is, starting with an asterisk), a domain lookup (as in a domain list), or a regular expression. The first form must be used with local domains only; in this case the local part may begin with an asterisk.

After a directing or local delivery failure, regular expressions and patterns containing local parts are normally matched against the complete address (local_part@domain). However, if there is no local part in a pattern that is not a regular expression, the local part of the address isn't used in the matching. Thus an entry such as

lookingglass.fict.book        *  F,24h,30m;

matches any address whose domain is lookingglass.fict.book, whether this is a local or a remote domain, whereas

alice@lookingglass.fict.book  *  F,24h,30m;

can be used only if lookingglass.fict.book is a local domain. It applies to temporary failures involving the local part alice, but not to any other local parts.

If a local delivery is being used to collect messages for onward transmission by some other means (for example, as batched SMTP), a temporary failure may not be dependent on the local part at all. Both the appendfile and pipe transports have an option called retry_use_local_part which can be set false in order to suppress the inclusion of local parts when matching retry patterns for those transport instances. When this option is set, patterns containing local parts are skipped, and regular expressions are matched against the domain only.

For remote domains, when looking for a retry rule after a routing attempt has failed (for example, after a DNS timeout), each line in the retry configuration is tested only against the domain in the address. However, when looking for a retry rule after a remote delivery attempt has failed (for example, a connection timeout), each line in the retry configuration is first tested against the remote host name, and then against the domain name in the address. For example, if the MX records for a.b.c.d are

a.b.c.d  MX  5  x.y.z
         MX  6  p.q.r
         MX  7  m.n.o

and the retry rules are

p.q.r    *      F,24h,30m;
a.b.c.d  *      F,4d,45m;

then failures to deliver to host p.q.r use the first rule to determine retry times, but for all the other hosts for the domain a.b.c.d, the second rule is used, and that rule would also be used if routing to a.b.c.d suffers a temporary failure.

The second field in a retry rule is the name of a particular error, or an asterisk, which matches any error. The errors that can be tested for are:

refused_MX: connection refused from a host obtained from an MX record
refused_A: connection refused from a host not obtained from an MX record
refused: any connection refusal
timeout_connect: connection timed out
timeout_DNS: DNS lookup timed out
timeout: any timeout
quota: quota exceeded in local delivery
quota_<time>: quota exceeded in local delivery, and the mailbox has not been read for <time>.

The quota errors apply both to system-enforced quotas and to Exim's own quota mechanism in the appendfile transport. It also applies when a local delivery is deferred because a partition is full (the ESNOSP error).

The third field in a retry rule is a sequence of retry parameter sets, separated by semicolons. Each set consists of

<letter>,<cutoff time>,<arguments>

The letter identifies the algorithm for computing a new retry time; the cutoff time is the time beyond which this algorithm no longer applies, and the arguments vary the algorithm's action. The cutoff time is measured from the time that the first failure for the domain (combined with the local part if relevant) was detected, not from the time the message was received. The available algorithms are:

F: retry at fixed intervals. There is a single time parameter specifying the interval.
G: retry at geometrically increasing intervals. The first argument specifies a starting value for the interval, and the second a multiplier.

When computing the next retry time, the algorithm definitions are scanned in order until one whose cutoff time has not yet passed is reached. This is then used to compute a new retry time that is later than the current time. In the case of fixed interval retries, this simply means adding the interval to the current time. For geometrically increasing intervals, retry intervals are computed from the rule's parameters until one that is greater than the previous interval is found. The main configuration variable retry_interval_max limits the maximum interval between retries.

A single remote domain may have a number of hosts associated with it, and each host may have more than one IP address. Retry algorithms are selected on the basis of the domain name, but are applied to each IP address independently. If, for example, a host has two IP addresses and one is broken, Exim will generate retry times for it and will not try to use it until its next retry time comes. Thus the good IP address is likely to be tried first most of the time.

Retry times are hints rather than promises. Exim does not make any attempt to run deliveries exactly at the computed times. Instead, a queue-running process starts delivery processes for delayed messages periodically, and these attempt new deliveries only for those addresses that have passed their next retry time. If a new message arrives for a deferred address, an immediate delivery attempt occurs only if the address has passed its retry time. In the absence of new messages, the minimum time between retries is the interval between queue-running processes. There is not much point in setting retry times of five minutes if your queue-runners happen only once an hour, unless there are a significant number of incoming messages (which might be the case on a system that is sending everything to a smart host, for example).

The data in the retry hints database can be inspected by using the exim_dumpdb or exim_fixdb utility programs (see chapter 53). The latter utility can also be used to change the data. The exinext utility script can be used to find out what the next retry times are for the hosts associated with a particular mail domain, and also for local deliveries that have been deferred.

33.2 Retry rule examples

Here are some example retry rules suitable for use when wonderland.fict.book is a local domain:

alice@wonderland.fict.book quota_5d  F,7d,3h
wonderland.fict.book       quota_5d
wonderland.fict.book       *         F,1h,15m; G,2d,1h,2;
lookingglass.fict.book     *         F,24h,30m;
*                          refused_A F,2h,20m;
*                          *         F,2h,15m; G,16h,1h,1.5; F,5d,8h

The first rule sets up special handling for mail to alice@wonderland.fict.book when there is an over-quota error and the mailbox hasn't been read for at least 5 days. Retries continue every three hours for 7 days. The second rule handles over-quota errors for all other local parts at wonderland.fict.book; the absence of a local part has the same effect as supplying `*@'. As no retry algorithms are supplied, messages that fail are bounced immediately if the mailbox hasn't been read for at least 5 days.

The third rule handles all other errors at wonderland.fict.book; retries happen every 15 minutes for an hour, then with geometrically increasing intervals until two days have passed since a delivery first failed. The fourth rule controls retries for the domain lookingglass.fict.book, whether it is local or remote, and the remaining two rules handle all other domains, with special action for connection refusal from hosts that were not obtained from an MX record.

The final rule in a retry configuration should always have asterisks in the first two fields so as to provide a general catch-all for any addresses that do not have their own special handling. This example tries every 15 minutes for 2 hours, then with intervals starting at one hour and increasing by a factor of 1.5 up to 16 hours, then every 8 hours up to 5 days.

33.3 Timeout of retry data

Exim timestamps the data that it writes to its retry hints database. When it consults the data during a delivery it ignores any that is older than the value set in retry_data_expire (default 7 days). If, for example, a host hasn't been tried for 7 days, Exim will try to deliver to it immediately a message arrives, and if that fails, it will calculate a retry time as if it were failing for the first time.

This improves the behaviour for messages routed to rarely-used hosts such as MX backups. If such a host was down at one time, and happens to be down again when Exim tries a month later, using the old retry data would imply that it had been down all the time, which is not a justified assumption.

If a host really is permanently dead, this behaviour causes a burst of retries every now and again, but only if messages routed to it are rare. It there is a message at least once every 7 days the retry data never expires.

33.4 Long-term failures

Special processing happens when an address has been failing for so long that the cutoff time for the last algorithm has been reached. This is independent of how long any specific message has been failing; it is the length of continuous failure for the address that counts. When this is the case for a local delivery, or for all IP addresses associated with a remote delivery, a subsequent delivery failure causes Exim to give up on the address, and a delivery error message is generated. In order to cater for new messages that may use the failing address, a next retry time is still computed from the final algorithm, and is used as follows:

If the delivery is a local one, one delivery attempt is always made for any subsequent messages. If it fails, the address fails immediately. The post-cutoff retry time is not used.

If the delivery is remote, there are two possibilities, controlled by the delay_after_cutoff option of the smtp transport. The option is true by default and in that case:

Until the post-cutoff retry time for one of the IP addresses is reached, any attempt to deliver to the failing address is bounced immediately. After that time, one new delivery attempt is made to those IP addresses that are past their retry times, and if that still fails, the address is bounced and new retry times are computed.

In other words, Exim delays retrying an IP address after the final cutoff time until a new retry time is reached, and can therefore bounce an email address without ever trying a delivery when machines have been down for a long time. This ensures that few resources are wasted in repeatedly trying to deliver to a broken destination, but if it does recover, Exim will eventually notice.

If delay_after_cutoff is set false, Exim behaves differently. If all IP addresses are past their final cutoff time, Exim tries to deliver to those IP addresses that have not been tried since the message arrived. If there are none, or if they all fail, the address is bounced. In other words, it does not delay when a new message arrives, but tries the expired addresses immediately, unless they have been tried since the message arrived. If there is a continuous stream of messages for the failing domains, unsetting delay_after_cutoff means that there will be many more attempts to deliver to failing IP addresses than when delay_after_cutoff is true.

33.5 Ultimate address timeout

An additional rule is needed to cope with cases where a host is intermittently available, or when a message has some attribute that prevents its delivery when others to the same address get through. In this situation, because some messages are successfully delivered, the `retry clock' for the address keeps getting restarted, and so a message could remain on the queue for ever. To prevent this, if a message has been on the queue for longer than the cutoff time of any applicable retry rule for a given address, a delivery is attempted for that address, even if it is not yet time, and if this delivery fails, the address is timed out. A new retry time is not computed in this case, so that other messages for the same address are considered immediately.

Go to the first, previous, next, last section, table of contents.