Go to the first, previous, next, last section, table of contents.


Regular expressions

Exim uses the PCRE regular expression library; this provides regular expression matching that is compatible with Perl 5. The syntax and semantics of these regular expressions is discussed in many Perl reference books, and also in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly, ISBN 1-56592-257-3).

The PCRE distribution files, which are included in the directory `src/pcre' in the Exim distribution, contain a man page for PCRE which describes exactly what it supports, so no further description is included here. The PCRE functions are called from Exim using the default option settings, except when processing the `matches' action in an Exim filter, where PCRE_CASELESS is set to cause matching to be independent of the case of letters.

Testing regular expressions

A program called `pcretest' forms part of the PCRE distribution and is built with PCRE during the process of building Exim. It is primarily intended for testing PCRE itself, but it can also be used for experimenting with regular expressions. The binary can be found in the `pcre' sub-directory of the Exim build directory. There is documentation of various options in `src/pcre/README', but for simple testing, none are needed. This is the output of a sample run of `pcretest':


  re> /^([^@]+)@.+\.(ac|edu)\.(?!kr)[a-z]{2}$/
  data> x@y.ac.uk
 0: x@y.ac.uk
 1: x
 2: ac
  data> x@y.ac.kr
No match
  data> x@y.edu.com
No match
  data> x@y.edu.co
 0: x@y.edu.co
 1: x
 2: edu

Expanded strings are copied verbatim except when a dollar or backslash character is encountered. A dollar specifies the start of a portion of the string which is interpreted and replaced as described below.

An uninterpreted dollar can be included in the string by putting a backslash in front of it -- if the string appears in quotes, two backslashes are required because the quotes themselves cause interpretation of backslashes when the string is read in. A backslash can be used to prevent any character being treated specially in an expansion, including itself.

Testing string expansions

A program to test string expansions can be compiled by obeying the command


make test_expand

once Exim has been successfully compiled. This makes a binary called `test_expand' in the build directory. When run, it reads lines from the standard input, runs them through the string expansion code, and writes the results to the standard output. Since no message is being processed, variables such as `$local_part' have no value, but the program can be used for checking out file and database lookups, and the use of expansion operators such as `substr' and `hash'.

Expansion items

The following items are recognized in expanded strings. White space may be used between sub-items that are keywords or sub-strings enclosed in braces inside an outer set of braces, to improve readability.


$<variable name> or ${<variable name>}

Substitute the contents of the named variable; the latter form can be used to separate the name from subsequent alphanumeric characters. The names of the variables are given in section "" in chapter "String expansions""Expansion variables" below. If the name of a non-existent variable is given, the expansion fails.


$header_<header name>: or $h_<header name>:

Substitute the contents of the named message header, for example


$header_reply-to:

This particular expansion is intended mainly for use in filter files. The header names follow the syntax of RFC 822, which states that they may contain any printing characters except space and colon. Consequently, curly brackets do not terminate header names. Upper-case and lower-case letters are synonymous in header names. If the following character is white space, the terminating colon may be omitted. The white space is included in the expanded string. If the message does not contain the given header, the expansion item is replaced by an empty string. (See the `def' condition in section "" in chapter "String expansions""Expansion conditions" for a means of testing for the existence of a header.) If there is more than one header with the same name, they are all concatenated to form the substitution string, with a newline character between each of them.


${<op>:<string>}

The string is first itself expanded, and then the operation specified by <op> is applied to it. A list of operators is given in section "" in chapter "String expansions""Expansion operators" below. The string starts with the first character after the colon, which may be leading white space.


${if <condition> {<string1>}{<string2>}}

If <condition> is true, <string1> is expanded and replaces the whole item; otherwise <string2> is used. The second string need not be present; if it is not and the condition is not true, the item is replaced with nothing. Alternatively, the word `fail' may be present instead of the second string (without any curly brackets). In this case, the expansion fails if the condition is not true. The available conditions are described in section "" in chapter "String expansions""Expansion conditions" below.


${lookup{<key>} <search type> {<file>} {<string1>} {<string2>}}


${lookup <search type> {<query>} {<string1>} {<string2>}}

These items specify data lookups in files and databases, as discussed in chapter "File and database lookups". The first form is used for single-key lookups, and the second is used for query-style lookups. The <key>, <file>, and <query> strings are expanded before use.

If the lookup succeeds, then <string1> is expanded and replaces the entire item. During its expansion, a variable called `$value' is available, containing the data returned by the file lookup. If the lookup fails, <string2> is expanded and replaces the entire item. It may be omitted, in which case the replacement is null.

For single-key lookups, the string `partial-' is permitted to precede the search type in order to do partial matching, and * or *@ may follow a search type to request default lookups if the key does not match (see sections "Single-key lookup types" in chapter "File and database lookups" and "Partial matching in lookups" in chapter "File and database lookups").

If a partial search is used, the variables `$1' and `$2' contain the wild and non-wild parts of the key during the expansion of the replacement text. They return to their previous values at the end of the lookup item.

Instead of {<string2>} the word `fail' can appear, and in this case, if the lookup fails, the entire string expansion fails in a way that can be detected by the caller. The consequences of this depend on the circumstances.

This example looks up the postmaster alias in the conventional alias file.


${lookup {postmaster} lsearch {/etc/aliases} {$value}}

This example uses NIS+ to look up the full name of the user corresponding to the local part of an address, failing the expansion if it is not found.


"${lookup nisplus {[name=$local_part],passwd.org_dir:gcos} \
  {$value}fail}"

${lookup{<key:subkey>} <search type> {<file>} {<string1>} {<string2>}}

This searches for <key> in the file as described above for single-key lookups; if it succeeds, it extracts from the data a subfield which is identified by the <subkey>. The data related to the main key must be of the form:


<subkey1> = <value1>  <subkey2> = <value2> ...

where the equals signs are optional. If any of the values contain white space, they must be enclosed in double quotes, and any values that are enclosed in double quotes are subject to escape processing as described in section "String" in chapter "The Exim configuration file". For example, if a line in a linearly searched file contains


alice: uid=1984 gid=2001

then expanding the string


${lookup{alice:uid}lsearch{<file name>}{$value}}

yields the string `1984'. If the subkey is not found in <string1>, then <string2>, if present, is expanded and replaces the entire item. Otherwise the replacement is null.


${extract{<key>} {<string>}}

The key and the string are first expanded. Then the subfield identified by the key is extracted from the string, exactly as just described for `lookup' items with subkeys. If the key is not found in the string, the item is replaced by nothing.


${extract{<number>} {<separators>} {<string>}}

This is distinguished from the above form of `extract' by having three rather than two arguments. It extracts from the string the field whose number is given as the first argument. The first field is numbered one. If the number is negative or greater than the number of fields in the string, the result is empty; if it is zero the entire string is returned. The fields in the string are separated by any one of the characters in the separator string. For example:


${extract{3}{:}{exim:x:42:99:& Mailer::/bin/bash}}

yields `42'. Two successive separators mean that the field between them is empty (for example, the sixth field above). If the first argument is not numeric, the expansion fails.

Expansion operators

The following operations can be performed on portions of an expanded string:


${domain:<string>}

The string is interpreted as an RFC 822 address and the domain is extracted from it. If the string does not parse successfully, the result is empty.


${expand:<string>}

The `expand' operator causes a string to be expanded for a second time. For example,


${expand:${lookup{$domain}dbm{/some/file}{$value}}}

first looks up a string in a file while expanding the operand for `expand', and then re-expands what it has found.


${hash_<n>_<m>:<string>}

The two items <n> and <m> are numbers. If <n> is greater than or equal to the length of the string, the operator returns the string. Otherwise it computes a new string of length <n> by applying a hashing function to the string. The new string consists of characters taken from the first <m> characters of the string


abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQWRSTUVWXYZ0123456789

and if <m> is not present the value 26 is used, so that only lower case letters appear. These examples:


${hash_3:monty}
${hash_5:monty}
${hash_4_62:monty python}

yield


jmg
monty
fbWx

respectively. The abbreviation `h' can be used instead of `hash'.


${lc:<string>}

This forces the letters in the string into lower-case, for example:


${lc:$local_part}

${length_<number>:<string>}

The `length' operator can be used to extract the initial portion of a string. It is followed by an underscore and the number of characters required. For example


${length_50:$message_body}

The result of this operator is either the first <number> characters or the whole string, whichever is the shorter. The abbreviation `l' can be used instead of `length'.


${local_part:<string>}

The string is interpreted as an RFC 822 address and the local part is extracted from it. If the string does not parse successfully, the result is empty.


${quote:<string>}

The `quote' operator puts its argument into double quotes if it contains anything other than letters, digits, underscores, full stops (periods), and hyphens. Any occurrences of double quotes and backslashes are escaped with a backslash. For example,


${quote:ab*cd}

becomes


"ab*cd"

The place where this is useful is when the argument is a substitution from a variable or a message header.


${rxquote:<string>}

The `rxquote' operator inserts a backslash before any non-alphanumeric characters in its argument. This is useful when substituting the values of variables or headers inside regular expressions.


${substr_<start>_<length>:<string>}

The `substr' operator can be used to extract more general substrings than `length'. It is followed by an underscore and the starting offset, then a second underscore and the length required. For example


${substr_3_2:$local_part}

If the starting offset is greater than the string length the result is the null string; if the length plus starting offset is greater than the string length, the result is the right-hand part of the string, starting from the given offset. The first character in the string has offset zero. The abbreviation `s' can be used instead of `substr'.

The `substr' expansion operator can take negative offset values to count from the righthand end of its operand. The last character is offset -1, the second-last is offset -2, and so on. Thus, for example,


${substr_-5_2:1234567}

yields `34'. If the absolute value of a negative offset is greater than the length of the string, the substring starts at the beginning of the string, and the length is reduced by the amount of overshoot. Thus, for example,


${substr_-5_2:12}

yields an empty string, but


${substr_-3_2:12}

yields `1'.

If the second number is omitted from `substr', the remainder of the string is taken if the offset was positive. If it was negative, all characters in the string preceding the offset point are taken. For example, an offset of -1 and no length yields all but the last character of the string.

Expansion conditions

The following conditions are available for testing while expanding strings:


!<condition>

This negates the result of the condition.


def:<variable>

This condition is true if the named expansion variable does not contain the empty string, for example


${if def:sender_ident {from $sender_ident}}

Note that the variable name is given without a leading `$' character. If the variable does not exist, the expansion fails.


def:header_<header name>  or  def:h_<header name>

This condition is true if a message is being processed and the named header exists in the message. For example,


${if def:header_reply-to:{$h_reply-to:}{$h_from:}}

Note that no `$' appears before `header_' or `h_' in the condition, and that header names must be terminated by colons if white space does not follow.


exists {<file name>}

The substring is first expanded and then interpreted as an absolute path. The condition is true if the named file (or directory) exists. The existence test is done by calling the `stat()' function.


eq {<string1>}{<string2>}

The two substrings are first expanded. The condition is true if the two resulting strings are identical, including the case of letters.


match {<string1>}{<string2>}

The two substrings are first expanded. The second is then treated as a regular expression and applied to the first. Because of the pre-expansion, if the regular expression contains dollar or backslash characters, they must be escaped with backslashes. If the whole expansion string is in double quotes, further escaping of backslashes is also required.

The condition is true if the regular expression match succeeds. At the start of an "if" expansion the values of the numeric variable substitutions `$1' etc. are remembered. Obeying a "match" condition that succeeds causes them to be reset to the substrings of that condition and they will have these values during the expansion of the success string. At the end of the "if" expansion, the previous values are restored. After testing a combination of conditions using "or", the subsequent values of the numeric variables are those of the condition that succeeded.


or {{<cond1>}{<cond2>}...}

The sub-conditions are evaluated from left to right. The condition is true if any one of the sub-conditions is true. When a true sub-condition is found, the following ones are parsed but not evaluated. Thus if there are several `match' sub-conditions the values of the numeric variables are taken from the first one that succeeds.


and {{<cond1>}{<cond2>}...}

The sub-conditions are evaluated from left to right. The condition is true if all of the sub-conditions are true. When a false sub-condition is found, the following ones are parsed but not evaluated.

Expansion variables

The variable substitutions that are available for use in expansion strings are:

`$0', `$1', etc: When a `matches' expansion condition succeeds, these variables contain the captured substrings identified by the regular expression during subsequent processing of the success string of the containing "if" expansion item. They may also be set externally by some other matching process which precedes the expansion of the string. For example, the commands available in Exim filter files include an "if" command with its own regular expression matching condition.

`$caller_gid': The group id under which the process that called Exim was running. This is not the same as the group id of the originator of a message (see `$originator_gid'). If Exim re-execs itself, this variable in the new incarnation normally contains the Exim gid.

`$caller_uid': The user id under which the process that called Exim was running. This is not the same as the user id of the originator of a message (see `$originator_uid'). If Exim re-execs itself, this variable in the new incarnation normally contains the Exim uid.

`$compile_date': The date on which the Exim binary was compiled.

`$compile_number': The building process for Exim keeps a count of the number of times it has been compiled. This serves to distinguish different compilations of the same version of the program.

`$domain': When an address is being directed, routed, or delivered on its own, this variable contains the domain. In particular, it is set during user filtering, but not during system filtering, since a message may have many recipients and the system filter is called just once.

For remote addresses, the domain can change as routing proceeds, as a result of router actions. When a remote or local delivery is taking place, if all the addresses that are being handled simultaneously contain the same domain, it is placed in `$domain'. Otherwise this variable is empty. Transports should be restricted to handling only one domain at once if its value is required at transport time -- this is the default for local transports. For further details of the environment in which local transports are run, see chapter "Environment for running local transports".

Because configured address rewriting happens at the time a message is received, `$domain' normally contains the value after rewriting. However, when a rewrite item is actually being processed (see chapter "Address rewriting") `$domain' contains the domain portion of the address that is being rewritten; it can be used in the expansion of the replacement address, for example, to rewrite domains by file lookup.

When the `smtp_etrn_command' option is being expanded, `$domain' contains the complete argument of the ETRN command (see section "The ETRN command" in chapter "SMTP processing").

`$domain_data': When a director or a router has a setting of the `domains' generic option, and that involves a file lookup, the data associated with the key in the file is available during the running of the director or router as `$domain_data'. In all other situations, this variable expands to nothing.

`$errmsg_recipient': This is set to the recipient address of an error message while Exim is creating it. It is useful if a customized error message text file is in use (see chapter "Customizing error and warning messages").

`$home': A home directory may be set during a local delivery, either by the transport or by the director that handled the address. When this is the case, `$home' contains its value and may be used in any expanded options for the transport. The `forwardfile' director also makes use of `$home'. Full details are given in chapter "The forwardfile director". When interpreting a user's filter file, Exim is normally configured so that `$home' contains the user's home directory. When running a filter test via the `-bf' option, `$home' is set to the value of the environment variable HOME.

`$host': When a local transport is run as a result of routing a remote address, this variable is available to access the host name that the router defined. A router may set up many hosts; in this case `$host' refers to the first one. It is expected that this usage will be mainly via the domainlist router, setting up a single host for batched SMTP output, for example.

When used in a transport filter (see chapter "Generic options for transports") `$host' refers to the host involved in the current connection.

`$host_address': This variable is available only for use in transport filters (see chapter "Generic options for transports").

`$local_part': When an address is being directed, routed, or delivered on its own, this variable contains the local part. If a local part prefix or suffix has been recognized, it is not included in the value. When a number of addresses are being delivered in a batch by a local or remote transport, `$local_part' is not set.

If a single address is source-routed, that is, of the form


@a:c@d

then when its transport is running `$local_part' is set to `c@d' and `$domain' is set to `a'.

Because configured address rewriting happens at the time a message is received, `$local_part' normally contains the value after rewriting. However, when a rewrite item is actually being processed (see chapter "Address rewriting") `$local_part' contains the local part of the address that is being rewritten; it can be used in the expansion of the replacement address, for example, to rewrite local parts by file lookup.

`$local_part_data': When a director or a router has a setting of the `local_parts' generic option, and that involves a file lookup, the data associated with the key in the file is available during the running of the director or router as `$local_part_data'. In all other situations, this variable expands to nothing.

`$local_part_prefix': When an address is being directed or delivered locally, and a specific prefix for the local part was recognized, it is available in this variable. Otherwise it is empty.

`$local_part_suffix': When an address is being directed or delivered locally, and a specific suffix for the local part was recognized, it is available in this variable. Otherwise it is empty.

`$key': When a domain list is being searched, this variable contains the value of the key, so that it can be inserted into strings for query-style lookups. See chapter "File and database lookups" for details. In other circumstances this variable is empty.

`$message_body': This variable contains the initial portion of a message's body while it is being delivered, and is intended mainly for use in filter files. The maximum number of characters of the body that are used is set by the `message_body_visible' configuration option; the default is 500. Newlines are converted into spaces to make it easier to search for phrases that might be split over a line break.

`$message_headers': This variable contains a concatenation of all the header lines when a message is being processed. They are separated by newline characters.

`$message_id': When a message is being received or delivered, this variable contains the unique message id which is used by Exim to identify the message.

`$message_precedence': When a message is being delivered, the value of any `Precedence:' header is made available in this variable. If there is no such header, the value is the null string.

`$message_size': When a message is being received or delivered, this variable contains its size in bytes. The size includes those headers that were received with the message, but not those (such as `Envelope-to:') that are added to individual deliveries.

`$n0' -- `$n9': These variables are counters that can be incremented by means of the `add' command in filter files.

`$original_domain': When a top-level address is being processed for delivery, this contains the same value as `$domain'. However, if a `child' address (for example, generated by an alias, forward, or filter file) is being processed, this variable contains the domain of the original address. When more than one address is being delivered in a batch by a local or remote transport, `$original_domain' is not set.

Address rewriting happens as a message is received. Once it has happened, the previous form of the address is no longer accessible. It is the rewritten top-level address whose domain appears in this variable.

`$original_local_part': When a top-level address is being processed for delivery, this contains the same value as `$local_part'. However, if a `child' address (for example, generated by an alias, forward, or filter file) is being processed, this variable contains the local part of the original address. When more than one address is being delivered in a batch by a local or remote transport, `$original_local_part' is not set.

Address rewriting happens as a message is received. Once it has happened, the previous form of the address is no longer accessible. It is the rewritten top-level address whose local part appears in this variable.

`$originator_gid': The value of `$caller_gid' that was set when the message was received. For messages received via the command line, this is the gid of the sending user. For messages received by SMTP over TCP/IP, this is normally the gid of the Exim user.

`$originator_uid': The value of `$caller_uid' that was set when the message was received. For messages received via the command line, this is the uid of the sending user. For messages received by SMTP over TCP/IP, this is normally the uid of the Exim user.

`$pipe_addresses': This is not an expansion variable, but is mentioned here because the string `$pipe_addresses' is handled specially in the command specification for the `pipe' transport and in transport filters. It cannot be used in general expansion strings, and provokes an `unknown variable' error if encountered.

`$primary_hostname': The value set in the configuration file, or read by the `uname()' function.

`$qualify_domain': The value set for this option in the configuration file.

`$qualify_recipient': The value set for this option in the configuration file, or if not set, the value of `$qualify_domain'.

`$received_for': If there is only a single recipient address in an incoming message, then when the `Received:' header line is being built, this variable contains that address. Otherwise it is empty.

`$received_protocol': When a message is being processed, this variable contains the name of the protocol by which it was received.

`$recipients': This variable contains a list of envelope recipients for a message, but is recognized only in the system filter file, to prevent exposure of Bcc recipients to ordinary users. A comma and a space separate the addresses in the replacement text.

`$recipients_count': When a message is being processed, this variable contains the number of envelope recipients that came with the message. Duplicates are not excluded from the count.

`$reply_address': When a message is being processed, this variable contains the contents of the `Reply-to:' header if one exists, or otherwise the contents of the `From:' header.

`$return_path': When a message is being delivered, this variable contains the return path -- the sender field that is sent as part of the envelope. In many cases, this has the same value as `$sender_address', but if, for example, an incoming message to a mailing list has been expanded by a director which specifies a specific address for delivery error messages, then `$return_path' contains the new error address, while `$sender_address' contains the original sender address that was received with the message.

`$return_size_limit': This contains the value set in the `return_size_limit' option, rounded up to a multiple of 1000. It is useful when a customized error message text file is in use (see chapter "Customizing error and warning messages").

`$route_option': A router may set up an arbitrary string to be passed to a transport via this variable. Currently, only the `queryprogram' router has the ability to do so.

`$self_hostname': The generic router option `self' can be set to the value `local'. This causes the address to be passed over to the directors, as if its domain were a local domain. While subsequently directing (and doing any local deliveries) `$self_hostname' is set to the name of the local host that the router encountered. In other circumstances its contents are null.

`$sender_address': When a message is being processed, this variable contains the sender's address that was received in the message's envelope. For delivery failure reports, the value of this variable is the empty string.

`$sender_address_domain': The domain portion of `$sender_address'.

`$sender_address_local_part': The local part portion of `$sender_address'.

`$sender_fullhost': When a message has been received from a remote host, this variable contains the host name and IP address in a single string, which always ends with the IP address in square brackets. The format of the rest of the string depends on whether the host issued a HELO or EHLO SMTP command, and whether the host name was verified by looking up its IP address. (Looking up the IP address can be forced by the `host_lookup_nets' option, independent of verification.) A plain host name at the start of the string is a verified host name; if this is not present, verification either failed or was not requested. A host name in parentheses is the argument of a HELO or EHLO command. This is omitted if it is identical to the verified host name or to the host's IP address in square brackets.

`$sender_helo_name': When a message has been received from a remote host that has issued a HELO or EHLO command, the first item in the argument of that command is placed in this variable. It is also set if HELO or EHLO is used when a message is received using SMTP locally via the `-bs' or `-bS' options.

`$sender_host_address': When a message has been received from a remote host, this variable contains the host's IP address.

`$sender_host_name': When a message has been received from a remote host, this variable contains the host's name as verified by looking up its IP address. If verification failed, or was not requested, this variable contains the empty string.

`$sender_ident': When a message has been received from a remote host, this variable contains the identification received in response to an RFC 1413 request. When a message has been received locally, this variable contains the login name of the user that called Exim.

`$sender_rcvhost': This is provided specifically for use in `Received:' headers. It starts with either the verified host name (as obtained from a reverse DNS lookup) or, if there is no verified host name, the IP address in square brackets. After that there may be text in parentheses. When the first item is a verified host name, the first thing in the parentheses is the IP address in square brackets. There may also be items of the form `helo=xxxx' if HELO or EHLO was used and its argument was not identical to the real host name or IP address, and `ident=xxxx' if an RFC 1413 ident string is available. If all three items are present in the parentheses, a newline and tab are inserted into the string, to improve the formatting of the `Received:' header.

`$sn0' -- `$sn9': These variables are copies of the values of the `$n0' -- `$n9' accumulators that were current at the end of the system filter file. This allows a system filter file to set values that can be tested in users' filter files. For example, a system filter could set a value indicating how likely it is that a message is junk mail.

`$spool_directory': The name of Exim's spool directory.

`$tod_bsdinbox': The time of day and date, in the format required for BSD-style mailbox files, for example: Thu Oct 17 17:14:09 1995.

`$tod_full': A full version of the time and date, for example: Wed, 16 Oct 1995 09:51:40 +0100. The timezone is always given as a numerical offset from GMT.

`$tod_log': The time and date in the format used for writing Exim's log files, for example: 1995-10-12 15:32:29.

`$value': This variable contains the result of an expansion lookup operation, as described above. If used in other circumstances, its contents are null.

`$version_number': The version number of Exim.

Expansion string examples

Typical settings for defining a local mailbox to the `appendfile' transport are


file = /var/spool/mail/${local_part}
file = ${home}/inbox

The default setting for the `Received:' header is as follows:


received_header_text = "Received: \
    ${if def:sender_rcvhost {from ${sender_rcvhost}\n\t}\
    {${if def:sender_ident {from ${sender_ident} }}\
    ${if def:sender_helo_name {(helo=${sender_helo_name})\n\t}}}}\
    by ${primary_hostname} \
    ${if def:received_protocol {with ${received_protocol}}} \
    (Exim ${version_number} #${compile_number})\n\t\
    ${if def:received_for {for $received_for\n\t}}\
    id ${message_id}"


Go to the first, previous, next, last section, table of contents.