Bug #11897 connections remaining in "sleep" state longer than wait_timeout
Submitted: 13 Jul 2005 3:15 Modified: 30 Jun 2006 12:37
Reporter: sean finney Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1 (possibly eariler) OS:Linux (debian gnu/linux)
Assigned to: CPU Architecture:Any

[13 Jul 2005 3:15] sean finney
Description:
originally reported in the debian bts:

http://bugs.debian.org/318011

original bug submission text:

I am connecting to the server with Apache + PHP, using "persistent connections". Using
this feature, PHP does not disconnect from the server after every request
served, but keeps the connections for future requests. When not used, these
connections are in a "sleep" state, as can be seen with "mysqladmin
processlist".

Usually, the mysql server will kill such processes if their idle time (the
"Time" value in the processlist) exceeds the value of wait_timeout.

On my machine, wait_timeout is set to 15 seconds, as can be confirmed with
"mysqladmin variables". As expected, connections disappear from the
processlist after this time.

Now from time to time, mysqld stops shutting down these idle processes. They
keep hanging around in the processlist, and after a short period of time,
the available max_connections are used up. Clients trying to connect will
get a "too many connections" error message. At that point, the processlist
will show connections with "Time" values of up to several hundred seconds.
[Probably that would further increase if I would not have to restart
mysqld.]

Maybe there's a problem with the way PHP handles/reuses the connections, but
that is not the point here - mysqld should shut down these connections no
matter how the "client" plans to re-use them.

The problem occurs only under load, currently the server is handling about
40 queries / sec. If everything works well, there are not more than a dozen
(or two) persistent connections at the same time, with a wait_timeout as
above. After restarting the server, it may take from minutes to about one or
two hours for the problem to occur again.

I cannot make out anything special that triggers this. I investigated the
mysql query log but found nothing special. Even when the problem occurs, new
connections and queries are served correctly until the max_connections count
has been reached.

I just moved my databases from a woody system to this machine, running sarge
(upgraded from woody, if that matters). The problem occured right from the
start, not related to some speficic upgrade or the like.

The machine is a dual XEON with hyperthreading enabled.

How to repeat:
i suppose use a php app with persistant connections and put your server under load
[13 Jul 2005 3:17] sean finney
note that i set the severity to serious.  the rationale was that it caused a dos for the server when the non-timed-out connections eventually prevent others from being established.  feel free to adjust.
[18 Jul 2005 16:03] Christopher Stach II
We are having the same problem with MySQL 4.1.11 on a dual Xeon Dell PE2850 RedHat ES 3.0 (Linux xxx 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux).  The more current kernel release has the same problem.  We still can't pin down what the situation is when MySQL will stop killing connections, but it's pretty often.  Neither interactive_timeout nor wait_timeout seem to matter for these connections.  The timers surpass both.
[18 Jul 2005 16:08] Matthias Pigulla
[I reported the initial bug to the Debian BTS.]

We're also running the dual Xeon PE2850 from Dell here, kernel is a custom-built 2.6.11.8. So it does not seem to be 2.6-kernel or Debian-related...
[29 Jul 2005 16:58] Chung Lau
I am having the same problem runing RedHat AS4 IA64, MySQL 4.1.10a-log

show variables like 'wait_timeout':
| wait_timeout  | 180   |

/etc/my.cnf:
[mysqld]
...
wait_timeout=180
...

show processlist:
| 125835 | xxxxx   | 192.168.x.x:4223 | inventory    | Sleep          |  3878 |                    | NULL             |
| 125838 | xxxxx   | 192.168.x.x:4225 | inventory    | Sleep          |   569 |                    | NULL             |
| 125842 | xxxxx   | 192.168.x.x:4270 | inventory    | Sleep          |  2479 |                    | NULL             |
| 125850 | xxxxx   | 192.168.x.x:4301 | inventory    | Sleep          |  3736 |                    | NULL             |
| 125854 | xxxxx   | 192.168.x.x:4372 | inventory    | Sleep          | 76344 |                    | NULL             |
| 125882 | xxxxx   | 192.168.x.x:4858 | inventory    | Sleep          |  3060 |                    | NULL             |
| 125901 | xxxxx   | 192.168.x.x:1230 | inventory    | Sleep          |  3009 |                    | NULL             |
| 125914 | xxxxx   | 192.168.x.x:1405 | inventory    | Sleep          |   788 |                    | NULL             |
| 125914 | xxxxx   | 192.168.x.x:1405 | inventory    | Sleep          |   81  |                    | NULL             |
| 125919 | xxxxx   | 192.168.x.x:1485 | inventory    | Sleep          | 70456 |                    | NULL             |
[29 Jul 2005 16:59] Matthias Pigulla
@Chung: Is that a DELL PE2850 or similar hardware?
[3 Aug 2005 16:50] Aleksey Kishkin
We have another bug with similar description: http://bugs.mysql.com/bug.php?id=11985 . And as condition of that behaviour  author spot NPTL. What thread library you use for mysql?
[3 Aug 2005 17:00] Christopher Stach II
I believe that all of the RedHats mentioned here have NPTL backported into them; however, I'm running the static binaries from MySQL.
[3 Aug 2005 17:05] Chung Lau
I'm running MySQL on Dell PowerEdge 3250 (Dual Itaniums) and mysql was installed via the RPM from RHN... so whatever thread library they used to compile.
[4 Aug 2005 9:34] Matthias Pigulla
I asked Sean Finney, the maintainer of the Debian package. Quoting:

i believe it uses nptl.

just to make sure, i built a local copy from the svn tree, and this is what shows up in config.log:

configure:25433: checking "Linux threads"
configure:25437: result: "starting"
configure:25440: checking "getconf GNU_LIBPTHREAD_VERSION"
configure:25444: result: "NPTL"
configure:25542: result: "no need to check headers"
[10 Aug 2005 12:53] Mathias Ortmann
It really looks like MySQL Server should not be used on (some?) SMP machines...

My platform:

Dual Xeon 3.2 with HyperThreading enabled, RedHat ES3, kernel 2.4.21-27.ELsmp
MySQL 4.1.13-max from the precompiled binaries, ndbcluster

max_connections (set to 2500) would be reached within seconds after starting up mysqld. After shutting down all clients, the mysqld threads would terminate very slowly (at a rate of 1-5 per second).

This problem rendered the MySQL server effectively useless.

My solution: I switched to the uniprocessor kernel. Result: Stable operation with about 400 clients connected simultaneously.

I would really like to be able to use HyperThreading and that expensive second Xeon, though. Is there a fix or workaround other than crippling my server?
[30 Aug 2005 4:13] Darrel Clute
I can confirm that this is occurring with 4.0.23 provided by cPanel on RH9 running on dual xeons with hyperthreading enabled.
[2 Sep 2005 13:43] Andreas Nolte
Hello,

we also ran into this problem. The servers all run on SLES9 on HP DL380G3/G4 servers with dual processors and hyperthreading on. Would switching hyperthreading off help ?

Last working version seems to 4.1.10a - but even this one not on evey maschine. Also tested versions were 4.1.12  and 4.1.14 - same behavior.

But: I believe, the problem is somewhere different.

When I set wait_timeout in the my.cnf, then it shouls ab in @@wait_timeout like it should but seams to not be used.  @wait_timeout is still the default. When I now to a set wait_timeout=10 or s.th. for the session, then the timeout works!

Is it possible, that the global variables are not correctly used ?

cheers

Andreas
[3 Sep 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[4 Sep 2005 8:59] sean finney
i'm confused... why did you just suspend this bug.  you asked for feedback, and recieved feedback not just from the OP in the debian BTS, but a large number of "me too" reports from other people affected by this bug using different versions and distributions.
[5 Sep 2005 9:10] Andreas Nolte
Thread library in my case is pthread:

ldd /usr/sbin/mysqld
        linux-gate.so.1 =>  (0xffffe000)
        librt.so.1 => /lib/tls/librt.so.1 (0x40022000)
        libz.so.1 => /lib/libz.so.1 (0x40029000)
        libdl.so.2 => /lib/libdl.so.2 (0x4003a000)
        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x4003d000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x4004d000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x4007f000)
        libm.so.6 => /lib/tls/libm.so.6 (0x40094000)
        libc.so.6 => /lib/tls/libc.so.6 (0x400b6000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Version:

Name        : glibc                        Relocations: (not relocatable)
Version     : 2.3.3                             Vendor: SuSE Linux AG, Nuernberg, Germany
Release     : 98.47                         Build Date: Sun Jun  5 14:09:14 2005
Install date: Thu Aug  4 12:49:27 2005      Build Host: nelson.suse.de
Group       : System/Libraries              Source RPM: glibc-2.3.3-98.47.src.rpm
Size        : 6771454                          License: GPL, LGPL
Signature   : DSA/SHA1, Sun Jun  5 17:43:27 2005, Key ID a84edae89c800aca
Packager    : http://www.suse.de/feedback
URL         : http://www.gnu.org/software/libc/libc.html
Summary     : The standard shared libraries (from the GNU C Library)
Description :
The GNU C Library provides the most important standard libraries used
by nearly all programs: the stndard C library, the standard math
library and the POSIX thread library. Without these libraries, the
system is not functional.
Distribution: SUSE SLES-9 (i686)

The Package for mysqld is: MySQL-server-standard-4.1.14-0.sles9
[6 Sep 2005 13:40] Andreas Nolte
@sean finney - you must be the case owner, correct ? Can you please tell which thread library you are using and then set the case to open again so s.o. picks it up ? I do not really want to setup a redundant bug .....
[7 Sep 2005 13:51] sean finney
this information has been provided already.  look back to around august 4th, and there's a post from the OP in the debian BTS who forwarded my response that i believe the debian version of mysql server uses NPTL.

and i believe the status was reset to open with my last reply, so afaict we're waiting to hear back from someone.
[23 Sep 2005 0:42] MySQL Verification Team
I tested this issue with current 4.1 source and was unable to repeat the
behavior reported, we would like to solve this, but need a repeatable
test case first.
[23 Sep 2005 7:53] Matthias Pigulla
Did you notice that most folks reported they got this bug on dual Xeon machines with hyperthreading enabled? Can you perform tests on such specific hardware?
[23 Sep 2005 8:53] sean finney
hi miguel,

what hardware configuration did you use to test for this bug?  it seems that everyone who
is having this problem is using at least smp+hyperthreading, maybe xeon+smp+ht.  not sure
whether nptl is playing an issue as well.
[26 Sep 2005 13:30] Andreas Nolte
Hello,

we have this problem with Dual Xeon machines with and without Hyperthreading. Suprisingly, the problem still persists when SMP is completely disabled !

But maybe we do not look at the right problem:

- when I set wait_timeout in the session, everything is fine
- when I set wait_timeout in my.cnf, it does not

Now I saw this:

mysql> show global variables like "wait%";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 300   |
+---------------+-------+
1 row in set (0.00 sec)

mysql> select @@wait_timeout;
+----------------+
| @@wait_timeout |
+----------------+
|          28800 |
+----------------+
1 row in set (0.00 sec)

should these values not match ?

cheers

Andreas
[1 Dec 2005 19:52] Ondrej KudlĂ­k
I have same problem but with non smp kernel and uniprocessor machine.

$ mysqladmin variables | grep wait_timeout
| wait_timeout                    | 120
$ mysqladmin processlist
+--------+----------------+-----------------+----------------+---------+--------+-------+------------------+
| Id     | User           | Host            | db             | Command | Time   | State | Info             |
+--------+----------------+-----------------+----------------+---------+--------+-------+------------------+
| 605    | user1 | localhost:33777 | db1 | Sleep   | 30353  |       |                  |
| 620    | user1 | localhost:33158 | db1 | Sleep   | 24758  |       |                  |
| 3212   | user1 | localhost:46482 | db1 | Sleep   | 32390  |       |                  |
| 3331   | user2           | localhost       | db2           | Sleep   | 63896  |       |                  |

mysql> show global variables like "wait%";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 120   |
+---------------+-------+
1 row in set (0.00 sec)

mysql> select @@wait_timeout;
+----------------+
| @@wait_timeout |
+----------------+
|          28800 |
+----------------+
1 row in set (0.01 sec)

My machine is  AMD Athlon(tm) XP 1800+ with kernel Linux 2.6.11.12-grsec
[1 Dec 2005 21:10] Irinel Bucur
Hi there,

I saw the problem do not continue after I reinstalled the system and I used only RPMs files from right distribution (i mean if I use red hat es 3, I do not used rpms from fedora 3)...
BUT, it is not enough... I recommand you to read how to optimize your machine (I mean to configure correct mysql)
Then, set a resonable wait-timeout to 20-25 sec. for me it is working excelent and I have no processes in the list... also, try to increment the number of connections... be aware with some config. variables as it is per thread.... (I mean open files for example.. it should be incremented to a resonable value as if not, you'll get more processes in the queue which I think causes the issue of non answering)

the next point is to configure correctly the software which you use to connect to mysql... then, I think the last point is to use last version of mysql... I am using 5.0.16 and I have no issues and also, it have some more speed... 

BUT, I saw for example another issue with mysql on another machine... mysql is not able to read from socket file at 1.03AM and it is happening daily... I was not able to find the reason but, if someone know just let me know..

thanks
[13 Jan 2006 10:30] Struan Bartlett
I experienced the same problem running MySQL 4.1.11-Debian_4sarge1-log on a dual-processor Dell PE2850 running kernel 2.6.13.  Reverting the kernel to 2.4.29 resolves the problem.

If someone at MySQL would like to see the problem with their own eyes, it may be possible for me to arrange for me to reboot our server into 2.6.13 and for them to log in.
[28 Jan 2006 6:09] Aleksey Kishkin
bug #15679 marked as duplicate of this bug report
[28 Jan 2006 6:11] Aleksey Kishkin
bug #3978 marked as duplicate of this bugreport
[29 May 2006 18:11] Valeriy Kravchuk
All reporters (Struan Bartlett in particular):

Are you able to repeat the behaviour described here with newer versions of MySQL server? 4.1.19, for example?
[29 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[30 Jun 2006 9:34] Matthias Pigulla
What about Andreas' remark (Sep 26, 2005) that @@wait_timeout and the system variable mismatch? 

Independent from the hardware (on different machines), I can observe the following with 4.1.15:

mysql> show variables like 'wait%'; 
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

mysql> show global variables like 'wait%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 15    |
+---------------+-------+
1 row in set (0.00 sec)

With 4.1.20 -- using the same config -- both values match.

Unfortunately, I cannot try 4.1.20 under exactly the same conditions as I reported the initial bug with because the setup is on a production machine and I cannot upgrade. 

However, when connecting to the server (with a PHP script, for example) and simply idling for more than 15 seconds (in my case), under 4.1.15 the process will show up in "mysqladmin processlist" even after more than 15 seconds. When connecting against a 4.1.20 server, it will disappear after that time.

So, does the inconsistency between the SESSION and GLOBAL wait_timeout variable explain what's going on? Has there been an (intentional) fix between 4.1.15 and 4.1.20?

As a workaround for those using older versions, is there a way of reducing the session variable value as well by using a central configuration option?
[30 Jun 2006 10:05] Matthias Pigulla
Oops -- forget parts of my last comment. I assume that when connecting to the server throuh the mysql command line client, it will use an interactive session and the SESSION wait_timeout value will be set to the interactive_timeout value, so the SESSION wait_timeout value is the GLOBAL interactive_timeout and not the GLOBAL wait_timeout.

Yet 4.1.20 seems to fix the problem:

<?php
$c = mysql_connect(':/path/to/socket', 'root', '');
$r = mysql_query('SHOW VARIABLES LIKE "%timeout%"');
while ($row = mysql_fetch_array($r)) print "$row[0] $row[1]\n";
sleep(30);
?>

Against 4.1.15 this shows (among other timeouts)
interactive_timeout 28800
wait_timeout 15
... but the connection is still in a "sleep" state after more than 15 seconds.

Against 4.1.20 the output is the same, but the connection will disappear from the "mysqladmin processlist" after wait_timeout seconds.
[30 Jun 2006 12:37] Valeriy Kravchuk
So, the bug is not repeatable in 4.1.20.
[21 Jul 2006 9:39] Alexandre Lissy
I'm having this issue on some Dell PE2850 (Dual Xeon 3.00Ghz), using FreeBSD 5.3-RELEASE.

It's MySQL 4.1.20, wait_timeout is correctly set, but all connections are not killed after they sleep too many.
(e.g. wait_timeout is 5, and some still sleep 1800 secs.
[21 Jul 2006 11:08] Valeriy Kravchuk
Please, try to repeat with FreeBSD 6.x. We have no FreeBSD 5.x operational around to check, anyway, and it was known for many issues with threads.
[21 Jul 2006 23:02] Fanto Jose
Due to the latest kernel PRCTL Core Dump Vulnerability, I have updated kernel to 2.6.17 and in all servers mysql sleep seems to be high. 

Linux xxx.xxx.xxx 2.6.17.6 #1 SMP Wed Jul 19 03:04:03 EEST 2006 i686 i686 i386 GNU/Linux with 4 x Intel(R) Xeon(TM) CPU 3.20GHz

I also upgraded mysql to 4.1.20 still no luck. Any fix.
[22 Jul 2006 14:04] Alexandre Lissy
We cannot switch to FreeBSD 6, it's a production server.
[22 Jul 2006 14:07] Alexandre Lissy
Right, we're lucky, we're going to install a new MySQL dedicated server with FreeBSD 6.1, to test for the bug.
[22 Jul 2006 14:13] Alexandre Lissy
Ok, sorry for the noise, they updated MySQL to 5.0, and no more connection is more aged than wait_timeout.

Thanks !
[2 Oct 2006 10:13] Christian Hammers
I can reproduce this problem with 4.1.20 on a Quad Dualcore Xeon without Hyperthreading with kernel 2.6.17 and NPTL.

Here I can't even set wait_timeout :)

mysql> SET GLOBAL wait_timeout := 600;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE "wait_timeout";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SELECT @@wait_timeout;
+----------------+
| @@wait_timeout |
+----------------+
|          28800 |
+----------------+
1 row in set (0.00 sec)
[3 Oct 2006 9:01] Valeriy Kravchuk
Christian,

You are not checking properly, it seems. Look:

mysql> set global wait_timeout := 600;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like 'wait%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 600   |
+---------------+-------+
1 row in set (0.01 sec)

mysql> select @@wait_timeout;
+----------------+
| @@wait_timeout |
+----------------+
|          28800 |
+----------------+
1 row in set (0.00 sec)

As explicitely documented in http://dev.mysql.com/doc/refman/5.0/en/set-option.html:

"When you refer to a system variable in an expression as @@var_name (that is, when you do not specify @@global. or @@session.), MySQL returns the session value if it exists and the global value otherwise."
[21 Nov 2006 23:24] James Day
This was seen normally at Wikipedia on 4.0. It's not usually a problem but occasionally one of the hanging threads will be because a client has failed to close the connection and that client may be holding locks. I had our (Wikipedia) monitoring program kill the longest-running of the sleeping threads over the limit and that was always effective in closing the rest. Non-persistent connections from PHP. Dual Opteron DB servers using Fedora Core 2 and 3.

WORKAROUND: create a connection from a client and kill that connection based on a unique user name for the connection's login, both in a cron job.

Ask Domas if its still happening...
[21 Nov 2006 23:37] James Day
See bug #9678 fixed in 4.0.28, 4.1.22, 5.0.25 and 5.1.12 and please report whether the problem still exists with those versions of MySQL or later.
[3 Oct 2007 19:00] Carlos Perez
I don't see it as a bug, i solve this by chanching in my.ini the variables
wait_timeout=600
as well for the interactive session and it works!!
this kill the sleep threats and there are no longer to much connections
[4 Oct 2007 4:22] James Day
Carlos, there were cases where the timeouts didn't work. This has probably been fixed by the fix for bug #9678.