Re: [exim] Exim-users Digest, Vol 65, Issue 2

Top Page
Delete this message
Reply to this message
Author: Sven Agnew
Date:  
To: exim-users
Subject: Re: [exim] Exim-users Digest, Vol 65, Issue 2
On 2009/10/02 13:00 +0200, Paul [pdw@???] wrote:

Hi,

> 1. Why doesn't it retry during that 8 hour period? Surely the
> successful send at 20:54 should reset the retry rules?

To be honest, I'm fairly new to Exim so I can't promise to be of much
assistance. Having said that, check out chapter 32 of the Exim4 spec
available at
http://www.exim.org/exim-html-current/doc/html/spec_html/ch32.html . It
describes the retry function in detail.

The bit I noticed near the top was

"Exim's retry processing in this case is applied on a per-host
(strictly, per IP address) basis, not on a per-message basis. Thus, if
one message has recently been delayed, delivery of a new message to the
same host is not immediately tried, but waits for the host's retry time
to arrive. If the retry_defer log selector is set, the message "retry
time not reached" is written to the main log whenever a delivery is
skipped for this reason. "

Based on the log extracts provided, the retry configuration and your
description of the time-line of events I would guess that the initial
failed message ( due to timeout ) was to an IP which remains unavailable
for an extended period of time. The successful delivery is to a
different ( reachable ) IP address, but does not affect the retry values
for the original failed message. The "retry not reached" messages are
for _new_ messages ( you did say it was a high volume server ) delivered
into the queue, for which routing lookups returned the "failed" IP address.

The above is merely a guess. The log snippets you provided seem to me to
be somewhat obfuscated. Please provide exact log extracts, along with
the output of exim -bP ( which shows the runtime configuration values
for Exim ). Further reading required.

However, this may all be moot because...
> 2. Does setting route_data to an A record with multiple IPs achieve
> the redundancy I'm looking for? As far as I can tell, exim makes no
> attempt to fall back on the second IP after the connection failure: it
> hadn't seen a connection failure on the other IP for around 3 hours
> prior to going into "won't send any mail" mode.

Short answer; I'm not sure.

--snip--
mail1:/usr/share/doc/exim4# host -t MX mythic-beasts.com
mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com.
mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com.
--snip--

The above indicates MX hosts with identical priority, yet different
host-names and IP addresses. If it's just redundancy you want, I would
ask why you don't simply have a primary and secondary MX with differing
priority values.

Also, why are you using just the one host-name in the router
configuration instead of adding both host-names to the route_data value
? See section 20.1 at
http://www.exim.org/exim-html-current/doc/html/spec_html/index.html#toc0194

Not falling back on the "other" IP seems either like an artifact of some
kind of look-up caching or a result of using the manualroute router
without a route_list. Again, this is a guess.
> I'm separately trying to get to the bottom of why we're seeing the
> connection refusal in the first place, but I'd like to understand why
> our setup isn't as robust as I think it should be.
>
> many thanks,
>
> Paul

All of the above may or may not be of use - I am certainly no Exim
expert. My only hope is that it doesn't lead you down a rabbit-hole ;-)

Ciao,
Sven