Previous Next Contents (Exim 4.10 Specification)

9. File and database lookups

Exim can be configured to look up data in files or databases as it processes messages. Two different kinds of syntax are used:

A string that is to be expanded may contain explicit lookup requests, which can cause parts of the string to be replaced by data which is looked up. String expansions are described in chapter 11.
Lists of domains, hosts, and email addresses can contain lookup requests as a way of avoiding excessively long linear lists. In this case, the data that is returned by the lookup is often (but not always) discarded; whether the lookup succeeds or fails is what really counts. These kinds of list are described in chapter 10.

This chapter describes the different lookup types that are available, and which can be used in either of the above circumstances. Two different styles of data lookup are implemented:

The single-key style requires the specification of a file in which to look, and a single key to search for. The lookup type determines how the file is searched.
The query style accepts a generalized database query.

The code for each lookup type is in a separate source file which is included in the binary of Exim only if the corresponding compile-time option is set. The default settings in src/EDITME are:

  LOOKUP_DBM=yes
  LOOKUP_LSEARCH=yes

which means that only linear searching and DBM lookups are included by default. For some types of lookup, you need to install appropriate libraries and header files before building Exim.

9.1. Single-key lookup types

The following single-key lookup types are implemented:

cdb: The given file is searched as a Constant DataBase file, using the key string without the terminating binary zero. The cdb format is designed for indexed files that are read frequently and never updated, except by total re-creation. As such, it is particulary suitable for large files containing aliases or other indexed data referenced by an MTA. Information about cdb can be found in several places:

  http://www.pobox.com/~djb/cdb.html
  ftp://ftp.corpit.ru/pub/tinycdb/
  http://packages.debian.org/stable/utils/freecdb.html

A cdb distribution is not needed in order to build Exim with cdb support, because the code for reading cdb files is included directly in Exim itself. However, no means of building or testing cdb files is provided with Exim, so you need to obtain a cdb distribution in order to do this.
dbm: Calls to DBM library functions are used to extract data from the given DBM file by looking up the record with the given key. The terminating binary zero is included in the key that is passed to the DBM library. See section 4.3 for a discussion of DBM libraries. For all versions of Berkeley DB, Exim uses the DB_HASH style of database when building DBM files using the exim_dbmbuild utility. However, when using Berkeley DB versions 3 or 4, it opens existing databases for reading with the DB_UNKNOWN option. This enables it to handle any of the types of database that the library supports, and can be useful for accessing DBM files created by other applications. (For earlier DB versions, DB_HASH is always used.)
dbmnz: This is the same as dbm, except that the terminating binary zero is not included in the key that is passed to the DBM library. You may need this if you want to look up data in files that are created by or shared with some other application that does not use terminating zeros. For example, you need to use dbmnz rather than dbm if you want to authenticate incoming SMTP calls using the passwords from Courier's /etc/userdbshadow.dat file. Exim's utility program for creating DBM files (exim_dbmbuild) includes the zeros by default, but has an option to omit them (see section 45.7).
dsearch: The given file must be a directory, which is searched for a file whose name is the key. The key may not contain any forward slash characters. The result of a successful lookup is the name of the file. An example of how this lookup can be used to support virtual domains is given in section 41.6.
lsearch: The given file is a text file which is searched linearly for a line beginning with the key, terminated by a colon or white space or the end of the line. The first occurrence that is found in the file is used. White space between the key and the colon is permitted. The remainder of the line, with leading and trailing white space removed, is the data. This can be continued onto subsequent lines by starting them with any amount of white space, but only a single space character is included in the data at such a junction. If the data begins with a colon, the key must be terminated by a colon, for example:

baduser: :fail:

Empty lines and lines beginning with # are ignored, even if they occur in the middle of an item. This is the traditional textual format of alias files. Note that the keys in an lsearch file are literal strings. There is no wildcarding of any kind.
nis: The given file is the name of a NIS map, and a NIS lookup is done with the given key, excluding the terminating binary zero. There is a variant called nis0 which does include the terminating binary zero in the key. This is reportedly needed for Sun-style alias files. Exim does not recognize NIS aliases; the full map names must be used.

9.2. Query-style lookup types

The following query-style lookup types are implemented:

dnsdb: This does a DNS search for a record whose domain name is the supplied query. The resulting data is the contents of the record. See section 9.8 below.
ldap: This does an LDAP lookup using a query in the form of a URL, and returns attributes from a single entry. There is a variant called ldapm which permits values from multiple entries to be returned. A third variant called ldapdn returns the Distinguished Name of a single entry instead of any attribute values. See section 9.9 below.
mysql: The format of the query is an SQL statement that is passed to a MySQL database. See section 9.14 below.
nisplus: This does a NIS+ lookup using a query that can specify the name of the field to be returned. See section 9.13 below.
oracle: The format of the query is an SQL statement that is passed to an Oracle database. See section 9.14 below.
pgsql: The format of the query is an SQL statement that is passed to a PostgreSQL database. See section 9.14 below.
testdb: This is a lookup type which is for use in debugging Exim. It is not likely to be useful in normal operation.
whoson: Whoson (http://whoson.sourceforge.net) is a proposed Internet protocol that allows Internet server programs to check whether a particular (dynamically allocated) IP address is currently allocated to a known (trusted) user and, optionally, to obtain the identity of the said user. In Exim, this can be used to implement ``POP before SMTP'' checking using ACL statements such as
```
  require condition = \
    ${lookup whoson {$sender_host_address}{yes}{no}}
```
The query consists of a single IP address. The value returned is the name of the authenticated user.

9.3. Temporary errors in lookups

Lookup functions can return temporary error codes if the lookup cannot be completed. For example, a NIS or LDAP database might be unavailable. For this reason, it is not advisable to use a lookup that might do this for critical options such as a list of local domains.

When a lookup cannot be completed in a router or transport, delivery of the message (to the relevant address) is deferred, as for any other temporary error. In other circumstances Exim may assume the lookup has failed, or may give up altogether.

9.4. Default values in single-key lookups

In this context, a ``default value'' is a value specified by the administrator that is to be used if a lookup fails.

If ``*'' is added to a single-key lookup type (for example, lsearch*) and the initial lookup fails, the key ``*'' is looked up in the file to provide a default value. See also the section on partial matching below.

Alternatively, if ``*@'' is added to a single-key lookup type (for example dbm*@) then, if the initial lookup fails and the key contains an @ character, a second lookup is done with everything before the last @ replaced by *. This makes it possible to provide per-domain defaults in alias files that include the domains in the keys. If the second lookup fails (or doesn't take place because there is no @ in the key), ``*'' is looked up.

9.5. Partial matching in single-key lookups

The normal operation of a single-key lookup is to search the file for an exact match with the given key. However, in a number of situations where domains are being looked up, it is useful to be able to do partial matching. In this case, information in the file that has a key starting with ``*.'' is matched by any domain that ends with the components that follow the full stop. For example, if a key in a DBM file is

*.dates.fict.example

then when partial matching is enabled this is matched by (amongst others) 2001.dates.fict.example and 1984.dates.fict.example. It is also matched by dates.fict.example, if that does not appear as a separate key in the file.

Partial matching is implemented by doing a series of separate lookups using keys constructed by modifying the original subject key. This means that it can be used with any of the single-key lookup types, provided that the special partial-matching keys beginning with ``*.'' are included in the data file. Keys in the file that do not begin with ``*.'' are matched only by unmodified subject keys when partial matching is in use.

Partial matching is requested by adding the string ``partial-'' to the front of the name of a single-key lookup type, for example, partial-dbm. When this is done, the subject key is first looked up unmodified; if that fails, ``*.'' is added at the start of the subject key, and it is looked up again. If that fails, further lookups are tried with dot-separated components removed from the start of the subject key, one-by-one, and ``*.'' added on the front of what remains.

A minimum number of two non-* components are required. This can be adjusted by including a number before the hyphen in the search type. For example, partial3-lsearch specifies a minimum of three non-* components in the modified keys. Omitting the number is equivalent to ``partial2-''. If the subject key is 2250.dates.fict.example then the following keys are looked up when the minimum number of non-* components is two:

  2250.dates.fict.example
  *.2250.dates.fict.example
  *.dates.fict.example
  *.fict.example

As soon as one key in the sequence is successfully looked up, the lookup finishes. If ``partial0-'' is used, the original key gets shortened right down to the null string, and the final lookup is for ``*'' on its own.

If the search type ends in ``*'' or ``*@'' (see section 9.4 above), the search for an ultimate default that this implies happens after all partial lookups have failed. If ``partial0-'' is specified, adding ``*'' to the search type has no effect, because the ``*'' key is already included in the sequence of partial lookups.

The use of ``*'' in lookup partial matching differs from its use as a wildcard in domain lists and the like. Partial matching works only in terms of dot-separated components; a key such as *fict.example in a database file is useless, because the asterisk in a partial matching subject key is always followed by a dot.

9.6. Lookup caching

Exim caches the most recent lookup result on a per-file basis for single-key lookup types, and keeps the relevant files open. In some types of configuration this can lead to many files being kept open for messages with many recipients. To avoid hitting the operating system limit on the number of simultaneously open files, Exim closes the least recently used file when it needs to open more files than its own internal limit, which can be changed via the lookup_open_max option.

For query-style lookups, a single data cache per lookup type is kept. The files are closed and the caches flushed at strategic points during delivery - for example, after all routing is complete.

9.7. Quoting lookup data

When data from an incoming message is included in a query-style lookup, there is the possibility of special characters in the data messing up the syntax of the query. For example, a NIS+ query that contains

  [name=$local_part]

will be broken if the local part happens to contain a closing square bracket. For NIS+, data can be enclosed in double quotes like this:

  [name="$local_part"]

but this still leaves the problem of a double quote in the data. The rule for NIS+ is that double quotes must be doubled. Other lookup types have different rules, and to cope with the differing requirements, an expansion operator of the following form is provided:

${quote_<lookup-type>:<string>}

For example, the safest way to write the NIS+ query is

  [name="${quote_nisplus:$local_part}"]

See chapter 11 for full coverage of string expansions. The quote operator can be used for all lookup types, but has no effect for single-key lookups, since no quoting is ever needed in their key strings.

9.8. More about dnsdb

The dnsdb lookup type uses the DNS as its database. A query consists of a record type and a domain name, separated by an equals sign. For example, an expansion string could contain:

  ${lookup dnsdb{mx=a.b.example}{$value}fail}

The supported record types are A, CNAME, MX, NS, PTR, and TXT, and, when Exim is compiled with IPv6 support, AAAA (and A6 if that is also configured). If no type is given, TXT is assumed. When the type is PTR, the address should be given as normal; it gets converted to the necessary inverted format internally. For example:

  ${lookup dnsdb{ptr=192.168.4.5}{$value}fail}

For MX records, both the preference value and the host name are returned, separated by a space. If multiple records are found (or, for A6 lookups, if a single record leads to multiple addresses), the data is returned as a concatenation, separated by newlines. The order, of course, depends on the DNS resolver.

9.9. More about LDAP

The original LDAP implementation came from the University of Michigan; this has become ``Open LDAP'', and there are now two different releases. Another implementation comes from Netscape, and Solaris 7 and subsequent releases contain inbuilt LDAP support. Unfortunately, though these are all compatible at the lookup function level, their error handling is different. For this reason it is necessary to set a compile-time variable when building Exim with LDAP, to indicate which LDAP library is in use. One of the following should appear in your Local/Makefile:

  LDAP_LIB_TYPE=UMICHIGAN
  LDAP_LIB_TYPE=OPENLDAP1
  LDAP_LIB_TYPE=OPENLDAP2
  LDAP_LIB_TYPE=NETSCAPE
  LDAP_LIB_TYPE=SOLARIS

If LDAP_LIB_TYPE is not set, Exim assumes OPENLDAP1, which has the same interface as the University of Michigan version.

There are three LDAP lookup types, which behave slightly differently in the way they handle the results of a query:

ldap requires the result to contain just one entry; if there are more, it gives an error.
ldapdn also requires the result to contain just one entry, but it is the Distinguished Name that is returned rather than any attribute values.
ldapm permits the result to contain more than one entry; the attributes from all of them are returned.

For ldap and ldapm, if a query finds only entries with no attributes, Exim behaves as if the entry did not exist, and the lookup fails. The format of the data returned by a successful lookup is described in the next section. First we explain how LDAP queries are coded.

9.10. Format of LDAP queries

An LDAP query takes the form of a URL as defined in RFC 2255. For example, in the configuration of a redirect router one might have this setting:

  data = ${lookup ldap \
    {ldap:///cn=$local_part,o=University%20of%20Cambridge,\
    c=UK?mailbox?base?}}

The URL may begin with ldap or ldaps if your LDAP library supports secure (encrypted) LDAP connections. The second of these ensures that an encrypted TLS connection is used.

Two levels of quoting are required in LDAP queries, the first for LDAP itself and the second because the LDAP query is represented as a URL. The quote_ldap expansion operator implements the following rules:

For LDAP quoting, the characters #,+"\<>;*() have to be preceded by a backslash. (In fact, only some of these need to be quoted in Distinguished Names, and others in LDAP filters, but it does no harm to have a single quoting rule for all of them.)
For URL quoting, all characters except alphanumerics and !$'()*+-._ are replaced by %xx where xx is the hexadecimal character code. Note that backslash has to be quoted in a URL, so characters that are escaped for LDAP end up preceded by %5C in the final encoding.

The example above does not specify an LDAP server. A server can be specified in a query by starting it with

ldap://<hostname>:<port>/...

If the port (and preceding colon) are omitted, the standard LDAP port (389) is used. When no server is specified in a query, a list of default servers is taken from the ldap_default_servers configuration option. This supplies a colon-separated list of servers which are tried in turn until one successfully handles a query, or there is a serious error. Successful handling either returns the requested data, or indicates that it does not exist. Serious errors are syntactical, or multiple values when only a single value is expected. Errors which cause the next server to be tried are connection failures, bind failures, and timeouts.

For each server name in the list, a port number can be given. The standard way of specifing a host and port is to use a colon separator (RFC 1738). Because ldap_default_servers is a colon-separated list, such colons have to be doubled. For example

  ldap_default_servers = ldap1.example.com::145:ldap2.example.com

If ldap_default_servers is unset, a URL with no server name is passed to the LDAP library with no server name, and the library's default (normally the local host) is used.

9.11. LDAP authentication and control information

The LDAP URL syntax provides no way of passing authentication and other control information to the server. To make this possible, the URL in an LDAP query may be preceded by any number of ``<name>=<value>'' settings, separated by spaces. If a value contains spaces it must be enclosed in double quotes, and when double quotes are used, backslash is interpreted in the usual way inside them. The following names are recognized:

  USER set the DN, for authenticating the LDAP bind
  PASS set the password, likewise
  SIZE set the limit for the number of entries returned
  TIME set the maximum waiting time for a query

The values may be given in any order. The default is no time limit, and no limit on the number of entries returned. Here is an example of an LDAP query in an Exim lookup which uses some of these values. This is a single line, folded for ease of reading:

  ${lookup ldap
    {user="cn=manager,o=University of Cambridge,c=UK" pass=secret
    ldap:///o=University%20of%20Cambridge,c=UK?sn?sub?(cn=foo)}
    {$value}fail}

The encoding of spaces as %20 is a URL thing which should not be done for any of the auxiliary data. Exim configuration settings that include lookups which contain password information should be preceded by ``hide'' to prevent non-admin users from using the -bP option to see their values.

The LDAP authentication mechanism can be used to check passwords as part of SMTP authentication. See the ldapauth expansion string condition in chapter 11.

9.12. Format of data returned by LDAP

The ldapdn lookup type returns the Distinguished Name from a single entry as a sequence of values, for example

  cn=manager, o=University of Cambridge, c=UK

The ldap lookup type generates an error if more than one entry matches the search filter, whereas ldapm permits this case, and inserts a newline in the result between the data from different entries. It is possible for multiple values to be returned for both ldap and ldapm, but in the former case you know that whatever values are returned all came from a single entry in the directory.

In the common case where you specify a single attribute in your LDAP query, the result is not quoted, and does not contain the attribute name. If the attribute has multiple values, they are separated by commas.

If you specify multiple attributes, the result contains space-separated, quoted strings, each preceded by the attribute name and an equals sign. Within the quotes, the quote character, backslash, and newline are escaped with backslashes, and commas are used to separate multiple values for the attribute. Apart from the escaping, the string within quotes takes the same form as the output when a single attribute is requested. Specifying no attributes is the same as specifying all of an entry's attributes.

Here are some examples of the output format. The first line of each pair is an LDAP query, and the second is the data that is returned. The attribute called attr1 has two values, whereas attr2 has only one value:

  ldap:///o=base?attr1?sub?(uid=fred)
  value1.1, value1.2
  
  ldap:///o=base?attr2?sub?(uid=fred)
  value two
  
  ldap:///o=base?attr1,attr2?sub?(uid=fred)
  attr1="value1.1, value1.2" attr2="value two"
  
  ldap:///o=base??sub?(uid=fred)
  objectClass="top" attr1="value1.1, value1.2" attr2="value two"

The extract operator in string expansions can be used to pick out individual fields from data that consists of key=value pairs. You can make use of Exim's -be option to run expansion tests and thereby check the results of LDAP lookups.

9.13. More about NIS+

NIS+ queries consist of a NIS+ indexed name followed by an optional colon and field name. If this is given, the result of a successful query is the contents of the named field; otherwise the result consists of a concatenation of field-name=field-value pairs, separated by spaces. Empty values and values containing spaces are quoted. For example, the query

  [name=mg1456],passwd.org_dir

might return the string

  name=mg1456 passwd="" uid=999 gid=999 gcos="Martin Guerre"
  home=/home/mg1456 shell=/bin/bash shadow=""

(split over two lines here to fit on the page), whereas

  [name=mg1456],passwd.org_dir:gcos

would just return

  Martin Guerre

with no quotes. A NIS+ lookup fails if NIS+ returns more than one table entry for the given indexed key. The effect of the quote_nisplus expansion operator is to double any quote characters within the text.

9.14. More about MySQL, PostgreSQL, and Oracle

If any MySQL, PostgreSQL, or Oracle lookups are used, the mysql_servers, pgsql_servers, or oracle_servers option (as appropriate) must be set to a colon-separated list of server information. Each item in the list is a slash-separated list of four items: host name, database name, user name, and password. In the case of Oracle, the host name field is used for the ``service name'', and the database name field is not used and should be empty. For example:

  hide oracle_servers = oracle.plc.example//ph10/abcdwxyz

Because password data is sensitive, you should always precede the setting with ``hide'', to prevent non-admin users from obtaining the setting via the -bP option. Here is an example where two MySQL servers are listed:

  hide mysql_servers = localhost/users/root/secret:\
                       otherhost/users/root/othersecret

For MySQL and PostgreSQL, a host may be specified as <name>:<port> but because this is a colon-separated list, the colon has to be doubled.

For MySQL, an empty host name, or the use of ``localhost'', causes a connection to the server on the local host by means of a Unix domain socket. An alternate socket can be specified in parentheses. The full syntax of each item in mysql_servers is:

<hostname>::<port>(<socket name>)/<database>/<user>/<password>

Any of the three sub-parts of the first field can be omitted. For normal use on the local host it can be left blank or set to just ``localhost'', as in the example above.

Also for MySQL, no database need be supplied - but if it is absent here, it must be given in the queries.

For each query, these parameter groups are tried in order until a connection and a query succeeds. Queries for these databases are SQL statements, so an example might be

  ${lookup mysql{select mailbox from users where id='ph10'}{$value}fail}

If the result of the query contains more than one field, the data for each field in the row is returned, preceded by its name, so the result of

  ${lookup pgsql{select home,name from users where id='ph10'}{$value}}

might be

  home=/home/ph10 name="Philip Hazel"

Values containing spaces and empty values are double quoted, with embedded quotes escaped by a backslash.

If the result of the query contains just one field, the value is passed back verbatim, without a field name, for example:

  Philip Hazel

If the result of the query yields more than one row, it is all concatenated, with a newline between the data for each row.

The quote_mysql, quote_pgsql, and quote_oracle expansion operators convert newline, tab, carriage return, and backspace to \n, \t, \r, and \b respectively, and the characters single-quote, double-quote, and backslash itself are escaped with backslashes. The quote_pgsql expansion operator, in addition, escapes the percent and underscore characters. This cannot be done for MySQL because these escapes are not recognized in contexts where these characters are not special.

Previous Next Contents (Exim 4.10 Specification)