First, I understand this is not a Toaster issue, but a vpopmail issue. I am posting here in hopes someone has run into this and has some insight.
Since upgrading to vpopmail 5.4.6_1, I get the following email about once per day (always exactly the same):
Subject: Cron <root@mail> /usr/local/vpopmail/bin/clearopensmtpvmysql: error creating table 'relay': MySQL server has gone awayvmysql: sql error[8]: MySQL server has gone away
Repeated runs (100+) of clearopensmtp do not reproduce this message.
A quick review of the vauth.c/vmysql.c shows that error[8] occurs in the vupdate_rules function, namely when the "select ip_addr from relay" SQL statement fails.
I can conclude the issue is not with the select statement - if it were, the problem would not be as sporadic. Also, repeated (100+) runs of this query with the mysql client do not generate this error. As this is an extremely low-volume mail server, there are typically only 1-5 records in the relay table at any given time.
I am fairly certain I see what is occurring - vpopmail maintains persistent connections to the MySQL server - by default, MySQL closes inactive connections after 8 hours. I believe this may be the cause for this behavior. Unfortunately I cannot increase/disable the wait_time to test this theory, as I use the MySQL server for other purposes as well. What really baffles me is why this was not an issue prior to 5.4.6_1. I suppose I could browse the CVS repository and see what changed recently, but I'm lazy.
Obviously the message is not critical, it is simply a minor nuisance, which I can live with if need-be.
Has anyone else encountered this? Any insight would be much appreciated.
(If I'm posting these to the wrong forum, I apologize - please move them)
This problem has now gotten MUCH worse. I seem to recall having this issue with a previous version of vpopmail, but it was fixed!
Two subsequent delivery attempts to a local address failed with chkusr saying the address was unknown. The mail log shows the error in the subject twice. The delivery attempts were 33 seconds apart.
Reviewing vauth.c/vmysql.c I can see this error is occurring in valias_select, with the following SQL statement:
select valias_line from valias where alias = '%s' and domain = '%s'
The obvious side-effect is that deliveries are now being intermittently rejected with a permanent error. At the present time, unless I can find a fix, the only work-around I can think of is to rebuild qmail-smtpd without the chkusr patch.
Suggestions? Anyone?
tail your mysql logs
tail the mysql error log
view your mysql status using mytop
verify your aren't exceeding your connection limits
if so, raise them in my.cnf or reduce your usage
LogicX, thank you for your reply.
All the MySQL log shows is as follows:
050316 16:53:47 mysqld started
050316 16:53:48 InnoDB: Started; log sequence number 0 368950522
/usr/local/libexec/mysqld: ready for connections.
Version: '4.1.10a-log' socket: '/tmp/mysql.sock' port: 3306 FreeBSD port: mysql-server-4.1.10a
max_connections is is 500, of which typically 20-40 are in use.
mysql> show processlist;+--------+-------------+-----------+----------+-------------+---------+----------------------------------------------------------------+------------------+| Id | User | Host | db | Command | Time | State | Info |+--------+-------------+-----------+----------+-------------+---------+----------------------------------------------------------------+------------------+| 87330 | vpopmail | localhost | vpopmail | Sleep | 343 | | NULL || 87331 | vpopmail | localhost | vpopmail | Sleep | 252 | | NULL || 89599 | replication | XXX:1211 | NULL | Binlog Dump | 1142384 | Has sent all binlog to slave; waiting for binlog to be updated | NULL || 109754 | vpopmail | localhost | vpopmail | Sleep | 327 | | NULL || 109756 | vpopmail | localhost | vpopmail | Sleep | 967 | | NULL || 109757 | vpopmail | localhost | vpopmail | Sleep | 371 | | NULL || 109758 | vpopmail | localhost | vpopmail | Sleep | 353 | | NULL || 109760 | vpopmail | localhost | vpopmail | Sleep | 393 | | NULL || 109761 | vpopmail | localhost | vpopmail | Sleep | 666 | | NULL || 109766 | vpopmail | localhost | vpopmail | Sleep | 325 | | NULL || 109767 | vpopmail | localhost | vpopmail | Sleep | 666 | | NULL || 181973 | wimgam | XXX:50977 | wimg | Sleep | 15933 | | NULL || 227191 | wimgam | XXX:2938 | wimg | Sleep | 3980 | | NULL || 236871 | wimgam | XXX:2645 | wimg | Sleep | 998 | | NULL || 237070 | wimgam | XXX:1088 | wimg | Sleep | 1220 | | NULL || 237644 | wimgam | XXX:2218 | wimg | Sleep | 4709 | | NULL || 238088 | wimgam | XXX:2254 | wimg | Sleep | 10 | | NULL || 238175 | wimgam | XXX:1029 | wimg | Sleep | 267 | | NULL || 238816 | wimgam | XXX:4441 | wimg | Sleep | 1753 | | NULL || 239411 | xerion | XXX:4539 | NULL | Sleep | 75 | | NULL || 241553 | syslog | localhost | syslog | Sleep | 1148 | | NULL || 241718 | xerion | XXX:1675 | NULL | Sleep | 1 | | NULL || 241727 | xerion | localhost | dummy | Query | 0 | NULL | show processlist |+--------+-------------+-----------+----------+-------------+---------+----------------------------------------------------------------+------------------+23 rows in set (0.00 sec)
Anything else I should check?
yes, turn on mysql logging
log=/path/to/file.log (in my.cnf)
and watch as the process connects, see what errors occur.
also use mysqladmin/mysqlcheck and verify that the database(s) are fine -- do a check, do an analyze/optimize
do various levels of checks
I enjoy having this in my my.cnf:
#this does an auto-check on tables during startup and repairs them
myisam-recover=BACKUP,FORCE
YMMV
The vpopmail database tables are all in MyISAM format.
mysql> check table dir_control, lastauth, relay, valias, vpopmail;+----------------------+-------+----------+----------+| Table | Op | Msg_type | Msg_text |+----------------------+-------+----------+----------+| vpopmail.dir_control | check | status | OK || vpopmail.lastauth | check | status | OK || vpopmail.relay | check | status | OK || vpopmail.valias | check | status | OK || vpopmail.vpopmail | check | status | OK |+----------------------+-------+----------+----------+5 rows in set (0.09 sec)
I have enabled mysql logging. An interesting side-effect of restarting MySQL is that I no longer see idle connections from vpopmail. I will monitor the server and if the problem arises again, I will try to isolate the log events from that query. Sample log entry:
050406 18:31:49 2 Connect vpopmail@localhost on 2 Init DB vpopmail 2 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "fiery" and pw_domain = "domain.net" 2 Query replace into lastauth set user='fiery', domain='domain.net', remote_ip='192.168.2.75', timestamp=1112837509 2 Query replace into relay ( ip_addr, timestamp ) values ( '192.168.2.75', 1112837509 ) 2 Quit
Thanks for your time. I'll keep you posted.
Okay I'm not sure if this provides any insight, but I find it curious.
Notice in my previous post that vpopmail connected on MySQL thead #2 and immediately disconnected when it was done. Compare that to below, where thread #248 was never closed, and several queries have been issued on it. The querylog shows this behavior on several threads -- not just #248.
050406 19:29:48 248 Connect vpopmail@localhost on vpopmail 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user1" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user1', domain='domain.net', remote_ip='imap', timestamp=1112840988050406 19:52:07 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user2" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user2', domain='domain.net', remote_ip='imap', timestamp=1112842327050406 20:39:57 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user1" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user1', domain='domain.net', remote_ip='imap', timestamp=1112845197050406 20:43:19 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user3" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user3', domain='domain.net', remote_ip='imap', timestamp=1112845399050406 21:56:14 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user4" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user4', domain='domain.net', remote_ip='imap', timestamp=1112849774050406 22:00:15 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "user4" and pw_domain = "domain.net" 248 Query replace into lastauth set user='user4', domain='domain.net', remote_ip='imap', timestamp=1112850015 248 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "postmaster" and pw_domain = "domain.net" 248 Query replace into lastauth set user='postmaster', domain='domain.net', remote_ip='imap', timestamp=1112850357
Again, show processlist is showing many idle connections from vpopmail. After restarting MySQL, I checked my mail via POP3, IMAP and sent myself mail from gmail. After each process, vpopmail closed its connection. I am at a loss as to why it has stopped doing that:
mysql> show processlist;+------+-------------+-----------+----------+----------------+-------+----------------------------------------------------------------+------------------+| Id | User | Host | db | Command | Time | State | Info |+------+-------------+-----------+----------+----------------+-------+----------------------------------------------------------------+------------------+| 5 | replication | XXX:1055 | NULL | Binlog Dump | 11379 | Has sent all binlog to slave; waiting for binlog to be updated | NULL || 28 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 162 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 248 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 327 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 406 | vpopmail | localhost | vpopmail | Sleep | 236 | | NULL || 427 | vpopmail | localhost | vpopmail | Sleep | 12 | | NULL || 428 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 429 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 431 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 432 | vpopmail | localhost | vpopmail | Sleep | 259 | | NULL || 494 | syslog | localhost | syslog | Sleep | 171 | | NULL || 1506 | xerion | XXX:2603 | NULL | Sleep | 199 | | NULL || 1515 | DELAYED | | syslog | Delayed insert | 171 | Waiting for INSERT | || 1539 | xerion | localhost | dummy | Query | 0 | NULL | show processlist |+------+-------------+-----------+----------+----------------+-------+----------------------------------------------------------------+------------------+15 rows in set (0.00 sec)
Does anyone else see vpopmail maintaining persistent connections to MySQL on their systems?
Okay one more tidbit of info... restarting courier-authdaemon clears the vpopmail connections, so obivously it's Courier-IMAP maintaining the persistent MySQL connections. Perhaps I'm looking in the wrong direction on this one - certainly Courier has nothing to do with qmail-smtp/chkusr, and I have yet to see a failure authenticating to the server while checking mail.
This problem just occured again - unfortunately the querylog does not show any activity:
2005-04-07 13:47:09.664656500 vmysql: error creating table 'valias': MySQL server has gone away2005-04-07 13:47:09.664692500 vmysql: sql error[j]: MySQL server has gone away050407 13:46:53 7789 Connect syslog@localhost on 7789 Init DB syslog 7789 Query SELECT host, date, time, message FROM syslog ORDER BY date DESC,time DESC LIMIT 0, 25 7789 Quit 7790 Connect vpopmail@localhost on 7790 Init DB vpopmail 7790 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "fiery" and pw_domain = "domain.net" 7790 Query replace into lastauth set user='fiery', domain='domain.net', remote_ip='192.168.2.75', timestamp=1112906813 7790 Query replace into relay ( ip_addr, timestamp ) values ( '192.168.2.75', 1112906813 ) 7790 Quit050407 13:47:04 7791 Connect vpopmail@localhost on 7791 Init DB vpopmail 7791 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "fiery" and pw_domain = "domain.net" 7791 Query replace into lastauth set user='fiery', domain='domain.net', remote_ip='192.168.2.75', timestamp=1112906824050407 13:47:14 7791 Query replace into relay ( ip_addr, timestamp ) values ( '192.168.2.75', 1112906834 ) 7791 Quit050407 13:47:24 7793 Connect vpopmail@localhost on 7793 Init DB vpopmail 7793 Query select pw_name, pw_passwd, pw_uid, pw_gid, pw_gecos, pw_dir, pw_shell , pw_clear_passwd from vpopmail where pw_name = "fiery" and pw_domain = "domain.net" 7793 Query replace into lastauth set user='fiery', domain='domain.net', remote_ip='192.168.2.75', timestamp=1112906844 7793 Query replace into relay ( ip_addr, timestamp ) values ( '192.168.2.75', 1112906844 ) 7793 Quit
Anything else to try/check before recompiling qmail-smtpd?
why not recompile vpopmail?
be sure your toaster file is updated so the latest version gets installed