[exim] some exim processes hang 100% CPU

Top Page
Delete this message
Reply to this message
Author: Heiko Lehmann
Date:  
To: exim-users
Subject: [exim] some exim processes hang 100% CPU

Hallo *!

Configured is a dynamic Blacklist with postgresql.

problem:
- some exim processes hang and using 100% CPU.

details:
- only very few processes concerned. [2]
- strace [3] show  too much polls/sec.,
  guessing to sql-server
- netstat [4] show connections:
  - sql-server: connection not exist
  - mta: connection state: CLOSE_WAIT
    (for several hours. see [5])


summary:
- exim poll on half-closed connection.

suggestion:
- reduce this polling (usleep)
- add timeout to poll. like:
    if (pollcount > pollmax) { printlog("Error: sql-connect dead"); exit;}



regards heiko

------------------------------------------------------------------

- [1] sysinfo
- exim 4.50
- debian


- [2] top
top - 10:52:49 up 43 days, 21:20,  2 users,  load average: 3.75, 2.53, 1.26
Tasks: 359 total,   6 running, 353 sleeping,   0 stopped,   0 zombie
Cpu(s): 54.8% us, 43.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  2.0% si
Mem:    393356k total,   389892k used,     3464k free,     5008k buffers
Swap:    65528k total,     3240k used,    62288k free,   112164k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  14875 Debian-e  25   0  9476 2244 1740 R 32.9  0.6   2:47.91 exim4
  14273 Debian-e  25   0  9476 2248 1744 R 32.2  0.6   2:48.49 exim4
  26421 root      25   0 29860  25m 2468 R 17.9  6.6   5:19.82 spamd
  28736 root      25   0 30860  26m 2448 R 15.9  6.9   0:55.85 spamd
  18269 root      16   0  2264 1260  848 R  0.3  0.3   0:00.02 top
  18274 Debian-e  15   0  9460  912  472 S  0.3  0.2   0:00.01 exim4





- [3] strace
smtp # strace -p14273
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
poll([{fd=10, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(10, "", 1, 0)                      = 0
  - ca. 100/s



- [4] netstat
  wie to finde exim-pid - exim-message?
smtp # netstat -anep | grep 14273
tcp        1      0 x.x.x.56:25        81.213.219.21:34507     CLOSE_WAIT
103        52561311   14273/exim4
tcp        0      0 x.x.x.56:37907     x.x.x.73:5432      CLOSE_WAIT 103
52563694   14273/exim4


sql # netstat -anep | grep 37907
sql #

  - on MTA connection state CLOSE_WAIT
    on SQL-Server connection not exist


- [5] exim_mainlog
smtp # grep 81.213.219.21 /var/log/exim4/mainlog | tail | sed
"s/foo/mydomain/"
2007-10-01 08:48:19 H=(dsl.static8121321921.ttnet.net.tr) [81.213.219.21]
F=<harkaitz@???> rejected RCPT <achanta@???>:
User account unknown
2007-10-01 08:48:49 H=(dsl.static8121321921.ttnet.net.tr) [81.213.219.21]
F=<harkaitz@???> rejected RCPT <achenes@???>:
User account unknown
2007-10-01 08:49:20 H=(dsl.static8121321921.ttnet.net.tr) [81.213.219.21]
F=<harkaitz@???> rejected RCPT <bolling@???>:
User account unknown
2007-10-01 12:00:14 SMTP connection from
(dsl.static8121321921.ttnet.net.tr) [81.213.219.21] closed after SIGTERM