Nasty qmail-remote hangs forever bug

I am very happy with my current mail setup. But a nasty bug popped up out of nowhere and i can’t trace it…. Some qmail-remote processes, in some circunstances yet to determine, just hang-up forever eating all the available CPU. The qmail-remote is the piece of Qmail that takes care of message delivery to recipients at a remote host.

When this happens, the stuck process doesn’t conform to the timeoutremote control and stays active forever. The truss command (FreeBSD equivalent to strace) doesn’t show any activity and neither appears to be related network activity… it looks like some kind of race condition.

For now i couldn’t really address the issue, to both lack of time and deep understanding about C and debugging with GDB, so i just mitigate the problem with a cron running a bash script to detect and kill the offending processes.

#!/usr/local/bin/bash

# limit in seconds
LIMIT_TIME=120
# limit in cpu
LIMIT_CPU=35


IFS=$'\n'

for line in `ps -xao pid,etime,command,%cpu | grep qmail-remote`; do

  pid=`echo $line | awk '{split($0,a," "); print a[1]}'`
  time=`echo $line | awk '{split($0,a," "); print a[2]}'`
  cpu=`echo $line | awk '{split($0,a," "); print a[5]}'`
  cpu=`echo "($cpu)/1+1" | bc`

  IFS=$':'
  time_parts=($time)

  if [ ${#time_parts[@]} -lt 3 ]; then
    elapsed=`echo ${time_parts[0]}*60 + ${time_parts[1]} | bc`
  else
    elapsed=`echo ${time_parts[0]}*3600 + ${time_parts[1]}*60 + ${time_parts[2]} | bc`
  fi

  if [ $elapsed -gt $LIMIT_TIME -a $cpu -gt $LIMIT_CPU ]; then
    kill -s 9 $pid
   fi

IFS=$'\n'
done

But i’m not really happy with this “solution”, and will be pursuing a real understanding and solution for this proble.

Some interesting links about other people with the same problem:

http://permalink.gmane.org/gmane.os.freebsd.stable/82760
http://copilotco.com/mail-archives/qmail.2002/msg08733.html

UPDATE and SOLUTION

All credits, to where credits are due.
To replicate this, you should catch an hanging qmail-remote with top. Then filter the offending qmail-remote pid trough ps to get full arguments list:

ps -wwaux | grep pid_number

You should get something like ‘qmail-remote mailserver from@email to@email’. With this information, and with top and truss you can invoke qmail-remote from the command line and get a nice qmail-remote hang…

truss /var/qmail/bin/qmail-remote mailserver my@email nonexisten@email < ./test

test is just a file with some bogus input to serve as the sending email, as qmail-remote expects a message in stdin. Just for the note, most of the hangs happens when talking to the Symantec Email Gateway software.

Now, with an updated ports tree, just recompile qmail, you can follow my guide (all good there). But, just issue make, no need to make install. Then move qmail-remote to /var/qmail/bin/ and set the right permissions (711) and ownership (root:qmail).

And voilá, if you repeat the test procedure, you will find that qmail-remote is not hanging anymore 🙂

3 thoughts on “Nasty qmail-remote hangs forever bug”

  1. IT’S FIXED AT LAST!!

    http://svnweb.freebsd.org/ports/head/mail/qmail-tls/Makefile?view=log

    Modified Tue Oct 16 13:35:28 2012 UTC (5 weeks, 1 day ago) by garga

    – Update TLS patch to v2, what address an issue that qmail-remote loops on
    malformed server response

    I’ve just upgraded and tested with that fuckin’ symantec mx.
    My guess is that the malformed header here is the SIZE:
    250-ENHANCEDSTATUSCODES
    250-SIZE 10485760
    250-8BITMIME
    250 PIPELINING

    It has an extra space.

Leave a Reply to rui machadoCancel reply