46. Adding a local scan function to Exim

Chapter 46 - Adding a local scan function to Exim

In these days of email worms, viruses, and ever-increasing spam, some sites want to apply a lot of checking to messages before accepting them.

The content scanning extension (chapter 45) has facilities for passing messages to external virus and spam scanning software. You can also do a certain amount in Exim itself through string expansions and the condition condition in the ACL that runs after the SMTP DATA command or the ACL for non-SMTP messages (see chapter 44), but this has its limitations.

To allow for further customization to a site’s own requirements, there is the possibility of linking Exim with a private message scanning function, written in C. If you want to run code that is written in something other than C, you can of course use a little C stub to call it.

The local scan function is run once for every incoming message, at the point when Exim is just about to accept the message. It can therefore be used to control non-SMTP messages from local processes as well as messages arriving via SMTP.

Exim applies a timeout to calls of the local scan function, and there is an option called local_scan_timeout for setting it. The default is 5 minutes. Zero means “no timeout”. Exim also sets up signal handlers for SIGSEGV, SIGILL, SIGFPE, and SIGBUS before calling the local scan function, so that the most common types of crash are caught. If the timeout is exceeded or one of those signals is caught, the incoming message is rejected with a temporary error if it is an SMTP message. For a non-SMTP message, the message is dropped and Exim ends with a non-zero code. The incident is logged on the main and reject logs.

1. Building Exim to use a local scan function

To make use of the local scan function feature, you must tell Exim where your function is before building Exim, by setting both HAVE_LOCAL_SCAN and LOCAL_SCAN_SOURCE in your Local/Makefile. A recommended place to put it is in the Local directory, so you might set

HAVE_LOCAL_SCAN=yes
LOCAL_SCAN_SOURCE=Local/local_scan.c

for example. The function must be called local_scan(); the source file(s) for it should first #define LOCAL_SCAN and then #include "local_scan.h". It is called by Exim after it has received a message, when the success return code is about to be sent. This is after all the ACLs have been run. The return code from your function controls whether the message is actually accepted or not. There is a commented template function (that just accepts the message) in the file _src/local_scan.c_.

If you want to make use of Exim’s runtime configuration file to set options for your local_scan() function, you must also set

LOCAL_SCAN_HAS_OPTIONS=yes

in Local/Makefile (see section 46.3 below).

2. API for local_scan()

You must include this line near the start of your code:

#define LOCAL_SCAN
#include "local_scan.h"

This header file defines a number of variables and other values, and the prototype for the function itself. Exim is coded to use unsigned char values almost exclusively, and one of the things this header defines is a shorthand for unsigned char called uschar. It also makes available the following macro definitions, to simplify casting character strings and pointers to character strings:

#define CS   (char *)
#define CCS  (const char *)
#define CSS  (char **)
#define US   (unsigned char *)
#define CUS  (const unsigned char *)
#define USS  (unsigned char **)

The function prototype for local_scan() is:

extern int local_scan(int fd, uschar **return_text);

The arguments are as follows:

fd is a file descriptor for the file that contains the body of the message (the -D file). The file is open for reading and writing, but updating it is not recommended. Warning: You must not close this file descriptor.

The descriptor is positioned at character 26 of the file, which is the first character of the body itself, because the first 26 characters (19 characters before Exim 4.97) are the message id followed by -D and a newline. If you rewind the file, you should use the macro SPOOL_DATA_START_OFFSET to reset to the start of the data, just in case this changes in some future version.
return_text is an address which you can use to return a pointer to a text string at the end of the function. The value it points to on entry is NULL.

The function must return an int value which is one of the following macros:

LOCAL_SCAN_ACCEPT: The message is accepted. If you pass back a string of text, it is saved with the message, and made available in the variable $local_scan_data. No newlines are permitted (if there are any, they are turned into spaces) and the maximum length of text is 1000 characters.
LOCAL_SCAN_ACCEPT_FREEZE: This behaves as LOCAL_SCAN_ACCEPT, except that the accepted message is queued without immediate delivery, and is frozen.
LOCAL_SCAN_ACCEPT_QUEUE: This behaves as LOCAL_SCAN_ACCEPT, except that the accepted message is queued without immediate delivery.
LOCAL_SCAN_REJECT: The message is rejected; the returned text is used as an error message which is passed back to the sender and which is also logged. Newlines are permitted – they cause a multiline response for SMTP rejections, but are converted to \n in log lines. If no message is given, “Administrative prohibition” is used.
LOCAL_SCAN_TEMPREJECT: The message is temporarily rejected; the returned text is used as an error message as for LOCAL_SCAN_REJECT. If no message is given, “Temporary local problem” is used.
LOCAL_SCAN_REJECT_NOLOGHDR: This behaves as LOCAL_SCAN_REJECT, except that the header of the rejected message is not written to the reject log. It has the effect of unsetting the rejected_header log selector for just this rejection. If rejected_header is already unset (see the discussion of the log_selection option in section 53.15), this code is the same as LOCAL_SCAN_REJECT.
LOCAL_SCAN_TEMPREJECT_NOLOGHDR: This code is a variation of LOCAL_SCAN_TEMPREJECT in the same way that LOCAL_SCAN_REJECT_NOLOGHDR is a variation of LOCAL_SCAN_REJECT.

If the message is not being received by interactive SMTP, rejections are reported by writing to stderr or by sending an email, as configured by the -oe command line options.

3. Configuration options for local_scan()

It is possible to have option settings in the main configuration file that set values in static variables in the local_scan() module. If you want to do this, you must have the line

LOCAL_SCAN_HAS_OPTIONS=yes

in your Local/Makefile when you build Exim. (This line is in OS/Makefile-Default, commented out). Then, in the local_scan() source file, you must define static variables to hold the option values, and a table to define them.

The table must be a vector called local_scan_options, of type optionlist. Each entry is a triplet, consisting of a name, an option type, and a pointer to the variable that holds the value. The entries must appear in alphabetical order. Following local_scan_options you must also define a variable called local_scan_options_count that contains the number of entries in the table. Here is a short example, showing two kinds of option:

static int my_integer_option = 42;
static uschar *my_string_option = US"a default string";

optionlist local_scan_options[] = {
  { "my_integer", opt_int,       &my_integer_option },
  { "my_string",  opt_stringptr, &my_string_option }
};

int local_scan_options_count =
  sizeof(local_scan_options)/sizeof(optionlist);

The values of the variables can now be changed from Exim’s runtime configuration file by including a local scan section as in this example:

begin local_scan
my_integer = 99
my_string = some string of text...

The available types of option data are as follows:

opt_bool: This specifies a boolean (true/false) option. The address should point to a variable of type BOOL, which will be set to TRUE or FALSE, which are macros that are defined as “1” and “0”, respectively. If you want to detect whether such a variable has been set at all, you can initialize it to TRUE_UNSET. (BOOL variables are integers underneath, so can hold more than two values.)
opt_fixed: This specifies a fixed point number, such as is used for load averages. The address should point to a variable of type int. The value is stored multiplied by 1000, so, for example, 1.4142 is truncated and stored as 1414.
opt_int: This specifies an integer; the address should point to a variable of type int. The value may be specified in any of the integer formats accepted by Exim.
opt_mkint: This is the same as opt_int, except that when such a value is output in a -bP listing, if it is an exact number of kilobytes or megabytes, it is printed with the suffix K or M.
opt_octint: This also specifies an integer, but the value is always interpreted as an octal integer, whether or not it starts with the digit zero, and it is always output in octal.
opt_stringptr: This specifies a string value; the address must be a pointer to a variable that points to a string (for example, of type uschar *).
opt_time: This specifies a time interval value. The address must point to a variable of type int. The value that is placed there is a number of seconds.

If the -bP command line option is followed by local_scan, Exim prints out the values of all the local_scan() options.

4. Available Exim variables

The header local_scan.h gives you access to a number of C variables. These are the only ones that are guaranteed to be maintained from release to release. Note, however, that you can obtain the value of any Exim expansion variable, including $recipients, by calling expand_string(). The exported C variables are as follows:

int body_linecount

This variable contains the number of lines in the message’s body. It is not valid if the spool_wireformat option is used.

int body_zerocount

This variable contains the number of binary zero bytes in the message’s body. It is not valid if the spool_wireformat option is used.

unsigned int debug_selector

This variable is set to zero when no debugging is taking place. Otherwise, it is a bitmap of debugging selectors. Two bits are identified for use in local_scan(); they are defined as macros:

The D_v bit is set when -v was present on the command line. This is a testing option that is not privileged – any caller may set it. All the other selector bits can be set only by admin users.
The D_local_scan bit is provided for use by local_scan(); it is set by the +local_scan debug selector. It is not included in the default set of debugging bits.

Thus, to write to the debugging output only when +local_scan has been selected, you should use code like this:

if ((debug_selector & D_local_scan) != 0)
  debug_printf("xxx", ...);

uschar *expand_string_message

After a failing call to expand_string() (returned value NULL), the variable expand_string_message contains the error message, zero-terminated.

header_line *header_list

A pointer to a chain of header lines. The header_line structure is discussed below.

header_line *header_last

A pointer to the last of the header lines.

const uschar *headers_charset

The value of the headers_charset configuration option.

BOOL host_checking

This variable is TRUE during a host checking session that is initiated by the -bh command line option.

uschar *interface_address

The IP address of the interface that received the message, as a string. This is NULL for locally submitted messages.

int interface_port

The port on which this message was received. When testing with the -bh command line option, the value of this variable is -1 unless a port has been specified via the -oMi option.

uschar *message_id

This variable contains Exim’s message id for the incoming message (the value of $message_exim_id) as a zero-terminated string.

uschar *received_protocol

The name of the protocol by which the message was received.

int recipients_count

The number of accepted recipients.

recipient_item *recipients_list

The list of accepted recipients, held in a vector of length recipients_count. The recipient_item structure is discussed below. You can add additional recipients by calling receive_add_recipient() (see below). You can delete recipients by removing them from the vector and adjusting the value in recipients_count. In particular, by setting recipients_count to zero you remove all recipients. If you then return the value LOCAL_SCAN_ACCEPT, the message is accepted, but immediately blackholed. To replace the recipients, you can set recipients_count to zero and then call receive_add_recipient() as often as needed.

uschar *sender_address

The envelope sender address. For bounce messages this is the empty string.

uschar *sender_host_address

The IP address of the sending host, as a string. This is NULL for locally-submitted messages.

uschar *sender_host_authenticated

The name of the authentication mechanism that was used, or NULL if the message was not received over an authenticated SMTP connection.

uschar *sender_host_name

The name of the sending host, if known.

int sender_host_port

The port on the sending host.

BOOL smtp_input

This variable is TRUE for all SMTP input, including BSMTP.

BOOL smtp_batched_input

This variable is TRUE for BSMTP input.

int store_pool

The contents of this variable control which pool of memory is used for new requests. See section 46.8 for details.

5. Structure of header lines

The header_line structure contains the members listed below. You can add additional header lines by calling the header_add() function (see below). You can cause header lines to be ignored (deleted) by setting their type to *.

struct header_line *next: A pointer to the next header line, or NULL for the last line.
int type: A code identifying certain headers that Exim recognizes. The codes are printing characters, and are documented in chapter 57 of this manual. Notice in particular that any header line whose type is * is not transmitted with the message. This flagging is used for header lines that have been rewritten, or are to be removed (for example, Envelope-sender: header lines.) Effectively, * means “deleted”.
int slen: The number of characters in the header line, including the terminating and any internal newlines.
uschar *text: A pointer to the text of the header. It always ends with a newline, followed by a zero byte. Internal newlines are preserved.

6. Structure of recipient items

The recipient_item structure contains these members:

uschar *address: This is a pointer to the recipient address as it was received.
int pno: This is used in later Exim processing when top level addresses are created by the one_time option. It is not relevant at the time local_scan() is run and must always contain -1 at this stage.
uschar *errors_to: If this value is not NULL, bounce messages caused by failing to deliver to the recipient are sent to the address it contains. In other words, it overrides the envelope sender for this one recipient. (Compare the errors_to generic router option.) If a local_scan() function sets an errors_to field to an unqualified address, Exim qualifies it using the domain from qualify_recipient. When local_scan() is called, the errors_to field is NULL for all recipients.

7. Available Exim functions

The header local_scan.h gives you access to a number of Exim functions. These are the only ones that are guaranteed to be maintained from release to release:

pid_t child_open(uschar **argv, uschar **envp, int newumask, int *infdptr, int *outfdptr, BOOL make_leader)

This function creates a child process that runs the command specified by argv. The environment for the process is specified by envp, which can be NULL if no environment variables are to be passed. A new umask is supplied for the process in newumask.

Pipes to the standard input and output of the new process are set up and returned to the caller via the infdptr and outfdptr arguments. The standard error is cloned to the standard output. If there are any file descriptors “in the way” in the new process, they are closed. If the final argument is TRUE, the new process is made into a process group leader.

The function returns the pid of the new process, or -1 if things go wrong.

int child_close(pid_t pid, int timeout)

This function waits for a child process to terminate, or for a timeout (in seconds) to expire. A timeout value of zero means wait as long as it takes. The return value is as follows:

>= 0

The process terminated by a normal exit and the value is the process ending status.
< 0 and > –256

The process was terminated by a signal and the value is the negation of the signal number.
–256

The process timed out.
–257

The was some other error in wait(); errno is still set.

pid_t child_open_exim(int *fd)

This function provide you with a means of submitting a new message to Exim. (Of course, you can also call /usr/sbin/sendmail yourself if you want, but this packages it all up for you.) The function creates a pipe, forks a subprocess that is running

exim -t -oem -oi -f <>

and returns to you (via the int * argument) a file descriptor for the pipe that is connected to the standard input. The yield of the function is the PID of the subprocess. You can then write a message to the file descriptor, with recipients in To:, Cc:, and/or Bcc: header lines.

When you have finished, call child_close() to wait for the process to finish and to collect its ending status. A timeout value of zero is usually fine in this circumstance. Unless you have made a mistake with the recipient addresses, you should get a return code of zero.

pid_t child_open_exim2(int *fd, uschar *sender, uschar *sender_authentication)

This function is a more sophisticated version of child_open(). The command that it runs is:

exim -t -oem -oi -f sender -oMas sender_authentication

The third argument may be NULL, in which case the -oMas option is omitted.

void debug_printf(char *, ...)

This is Exim’s debugging function, with arguments as for (printf(). The output is written to the standard error stream. If no debugging is selected, calls to debug_printf() have no effect. Normally, you should make calls conditional on the local_scan debug selector by coding like this:

if ((debug_selector & D_local_scan) != 0)
  debug_printf("xxx", ...);

uschar *expand_string(uschar *string)

This is an interface to Exim’s string expansion code. The return value is the expanded string, or NULL if there was an expansion failure. The C variable expand_string_message contains an error message after an expansion failure. If expansion does not change the string, the return value is the pointer to the input string. Otherwise, the return value points to a new block of memory that was obtained by a call to store_get(). See section 46.8 below for a discussion of memory handling.

void header_add(int type, char *format, ...)

This function allows you to an add additional header line at the end of the existing ones. The first argument is the type, and should normally be a space character. The second argument is a format string and any number of substitution arguments as for sprintf(). You may include internal newlines if you want, and you must ensure that the string ends with a newline.

void header_add_at_position(BOOL after, uschar *name, BOOL topnot, int type, char *format, ...)

This function adds a new header line at a specified point in the header chain. The header itself is specified as for header_add().

If name is NULL, the new header is added at the end of the chain if after is true, or at the start if after is false. If name is not NULL, the header lines are searched for the first non-deleted header that matches the name. If one is found, the new header is added before it if after is false. If after is true, the new header is added after the found header and any adjacent subsequent ones with the same name (even if marked “deleted”). If no matching non-deleted header is found, the topnot option controls where the header is added. If it is true, addition is at the top; otherwise at the bottom. Thus, to add a header after all the Received: headers, or at the top if there are no Received: headers, you could use

header_add_at_position(TRUE, US"Received", TRUE,
  ' ', "X-xxx: ...");

Normally, there is always at least one non-deleted Received: header, but there may not be if received_header_text expands to an empty string.

void header_remove(int occurrence, uschar *name)

This function removes header lines. If occurrence is zero or negative, all occurrences of the header are removed. If occurrence is greater than zero, that particular instance of the header is removed. If no header(s) can be found that match the specification, the function does nothing.

BOOL header_testname(header_line *hdr, uschar *name, int length, BOOL notdel)

This function tests whether the given header has the given name. It is not just a string comparison, because white space is permitted between the name and the colon. If the notdel argument is true, a false return is forced for all “deleted” headers; otherwise they are not treated specially. For example:

if (header_testname(h, US"X-Spam", 6, TRUE)) ...

uschar *lss_b64encode(uschar *cleartext, int length)

This function base64-encodes a string, which is passed by address and length. The text may contain bytes of any value, including zero. The result is passed back in dynamic memory that is obtained by calling store_get(). It is zero-terminated.

int lss_b64decode(uschar *codetext, uschar **cleartext)

This function decodes a base64-encoded string. Its arguments are a zero-terminated base64-encoded string and the address of a variable that is set to point to the result, which is in dynamic memory. The length of the decoded string is the yield of the function. If the input is invalid base64 data, the yield is -1. A zero byte is added to the end of the output string to make it easy to interpret as a C string (assuming it contains no zeros of its own). The added zero byte is not included in the returned count.

int lss_match_domain(uschar *domain, uschar *list)

This function checks for a match in a domain list. Domains are always matched caselessly. The return value is one of the following:

`OK`	match succeeded
`FAIL`	match failed
`DEFER`	match deferred

DEFER is usually caused by some kind of lookup defer, such as the inability to contact a database.

int lss_match_local_part(uschar *localpart, uschar *list, BOOL caseless)

This function checks for a match in a local part list. The third argument controls case-sensitivity. The return values are as for lss_match_domain().

int lss_match_address(uschar *address, uschar *list, BOOL caseless)

This function checks for a match in an address list. The third argument controls the case-sensitivity of the local part match. The domain is always matched caselessly. The return values are as for lss_match_domain().

int lss_match_host(uschar *host_name, uschar *host_address, uschar *list)

This function checks for a match in a host list. The most common usage is expected to be

lss_match_host(sender_host_name, sender_host_address, ...)

An empty address field matches an empty item in the host list. If the host name is NULL, the name corresponding to $sender_host_address is automatically looked up if a host name is required to match an item in the list. The return values are as for lss_match_domain(), but in addition, lss_match_host() returns ERROR in the case when it had to look up a host name, but the lookup failed.

void log_write(unsigned int selector, int which, char *format, ...)

This function writes to Exim’s log files. The first argument should be zero (it is concerned with log_selector). The second argument can be LOG_MAIN or LOG_REJECT or LOG_PANIC or the inclusive “or” of any combination of them. It specifies to which log or logs the message is written. The remaining arguments are a format and relevant insertion arguments. The string should not contain any newlines, not even at the end.

void receive_add_recipient(uschar *address, int pno)

This function adds an additional recipient to the message. The first argument is the recipient address. If it is unqualified (has no domain), it is qualified with the qualify_recipient domain. The second argument must always be -1.

This function does not allow you to specify a private errors_to address (as described with the structure of recipient_item above), because it pre-dates the addition of that field to the structure. However, it is easy to add such a value afterwards. For example:

 receive_add_recipient(US"monitor@mydom.example", -1);
 recipients_list[recipients_count-1].errors_to =
   US"postmaster@mydom.example";

BOOL receive_remove_recipient(uschar *recipient)

This is a convenience function to remove a named recipient from the list of recipients. It returns true if a recipient was removed, and false if no matching recipient could be found. The argument must be a complete email address.

uschar rfc2047_decode(uschar *string, BOOL lencheck, uschar *target, int zeroval, int *lenptr, uschar **error)

This function decodes strings that are encoded according to RFC 2047. Typically these are the contents of header lines. First, each “encoded word” is decoded from the Q or B encoding into a byte-string. Then, if provided with the name of a charset encoding, and if the iconv() function is available, an attempt is made to translate the result to the named character set. If this fails, the binary string is returned with an error message.

The first argument is the string to be decoded. If lencheck is TRUE, the maximum MIME word length is enforced. The third argument is the target encoding, or NULL if no translation is wanted.

If a binary zero is encountered in the decoded string, it is replaced by the contents of the zeroval argument. For use with Exim headers, the value must not be 0 because header lines are handled as zero-terminated strings.

The function returns the result of processing the string, zero-terminated; if lenptr is not NULL, the length of the result is set in the variable to which it points. When zeroval is 0, lenptr should not be NULL.

If an error is encountered, the function returns NULL and uses the error argument to return an error message. The variable pointed to by error is set to NULL if there is no error; it may be set non-NULL even when the function returns a non-NULL value if decoding was successful, but there was a problem with translation.

int smtp_fflush(void)

This function is used in conjunction with smtp_printf(), as described below.

void smtp_printf(char *,BOOL, ...)

The arguments of this function are almost like printf(); it writes to the SMTP output stream. You should use this function only when there is an SMTP output stream, that is, when the incoming message is being received via interactive SMTP. This is the case when smtp_input is TRUE and smtp_batched_input is FALSE. If you want to test for an incoming message from another host (as opposed to a local process that used the -bs command line option), you can test the value of sender_host_address, which is non-NULL when a remote host is involved.

If an SMTP TLS connection is established, smtp_printf() uses the TLS output function, so it can be used for all forms of SMTP connection.

The second argument is used to request that the data be buffered (when TRUE) or flushed (along with any previously buffered, when FALSE). This is advisory only, but likely to save on system-calls and packets sent when a sequence of calls to the function are made.

The argument was added in Exim version 4.90 - changing the API/ABI. Nobody noticed until 4.93 was imminent, at which point the ABI version number was incremented.

Strings that are written by smtp_printf() from within local_scan() must start with an appropriate response code: 550 if you are going to return LOCAL_SCAN_REJECT, 451 if you are going to return LOCAL_SCAN_TEMPREJECT, and 250 otherwise. Because you are writing the initial lines of a multi-line response, the code must be followed by a hyphen to indicate that the line is not the final response line. You must also ensure that the lines you write terminate with CRLF. For example:

smtp_printf("550-this is some extra info\r\n");
return LOCAL_SCAN_REJECT;

Note that you can also create multi-line responses by including newlines in the data returned via the return_text argument. The added value of using smtp_printf() is that, for instance, you could introduce delays between multiple output lines.

The smtp_printf() function does not return any error indication, because it does not guarantee a flush of pending output, and therefore does not test the state of the stream. (In the main code of Exim, flushing and error detection is done when Exim is ready for the next SMTP input command.) If you want to flush the output and check for an error (for example, the dropping of a TCP/IP connection), you can call smtp_fflush(), which has no arguments. It flushes the output stream, and returns a non-zero value if there is an error.

void *store_get(int,BOOL)

This function accesses Exim’s internal store (memory) manager. It gets a new chunk of memory whose size is given by the first argument. The second argument should be given as TRUE if the memory will be used for data possibly coming from an attacker (eg. the message content), FALSE if it is locally-sourced. Exim bombs out if it ever runs out of memory. See the next section for a discussion of memory handling.

void *store_get_perm(int,BOOL)

This function is like store_get(), but it always gets memory from the permanent pool. See the next section for a discussion of memory handling.

uschar *string_copy(uschar *string)

See below.

uschar *string_copyn(uschar *string, int length)

See below.

uschar *string_sprintf(char *format, ...)

These three functions create strings using Exim’s dynamic memory facilities. The first makes a copy of an entire string. The second copies up to a maximum number of characters, indicated by the second argument. The third uses a format and insertion arguments to create a new string. In each case, the result is a pointer to a new string in the current memory pool. See the next section for more discussion.

8. More about Exim’s memory handling

No function is provided for freeing memory, because that is never needed. The dynamic memory that Exim uses when receiving a message is automatically recycled if another message is received by the same process (this applies only to incoming SMTP connections – other input methods can supply only one message at a time). After receiving the last message, a reception process terminates.

Because it is recycled, the normal dynamic memory cannot be used for holding data that must be preserved over a number of incoming messages on the same SMTP connection. However, Exim in fact uses two pools of dynamic memory; the second one is not recycled, and can be used for this purpose.

If you want to allocate memory that remains available for subsequent messages in the same SMTP connection, you should set

store_pool = POOL_PERM

before calling the function that does the allocation. There is no need to restore the value if you do not need to; however, if you do want to revert to the normal pool, you can either restore the previous value of store_pool or set it explicitly to POOL_MAIN.

The pool setting applies to all functions that get dynamic memory, including expand_string(), store_get(), and the string_xxx() functions. There is also a convenience function called store_get_perm() that gets a block of memory from the permanent pool while preserving the value of store_pool.

<-previous Table of Contents next->

Exim Internet Mailer