Bug #16995 idle connections not being killed due to timeout when NPTL is used
Submitted: 1 Feb 2006 5:44 Modified: 30 Jan 2007 18:12
Reporter: Matthew Lord Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0 OS:Linux (Linux 2.6.x)
Assigned to: Konstantin Osipov CPU Architecture:Any

[1 Feb 2006 5:44] Matthew Lord
Description:
A signal is not being handled correctly when using mysqld with NPTL.  Idle connections are not being
killed by the server when the interactive/wait timeout is reached.  When doing export LD_ASSUME_KERNEL=2.4.1 so that linux threads is used the connections are properly killed.

When using NPTL if you manually kill one of the idle connections then they are all cleaned up.

How to repeat:
./mysqld --no-defaults --basedir=.. --datadir=../data --skip-grant-tables &
./mysql -A
set global wait_timeout=10;
set global interactive_timeout=10;

./mysql -A <ctrl-z>
./mysql -A <ctrl-z>

Now NPTL is being used and you will see that the idle connections are not being
killed.  If you kill one of them then the others are also killed.

Now kill mysqld and run export LD_ASSUME_KERNEL=2.4.1 prior to running
through the exact same steps.  With LT the queries are getting killed as they should.

Suggested fix:
This is one of at least two cases where mysqld and NPTL don't seem to play well together.  With linux distributions starting to drop LT altogether we need to get these straightened out.  The other
big problem is with connections just "hanging" under very high load.
[24 Mar 2006 19:58] Konstantin Osipov
In which version shall we fix this bug?
[29 Mar 2006 13:13] Konstantin Osipov
I wasn't able to verify it on my machine:
kostja@dragonfly:~/work/mysql-4.1-root/sql> ./mysqld --log --interactive_timeout=3;
060329 17:10:54 [Warning] setrlimit could not change the size of core files to 'infinity';  We may not be able to generate a core file on signals
./mysqld: ready for connections.
Version: '4.1.19-valgrind-max-debug-log'  socket: '/opt/local/var/mysql/mysql.sock'  port: 3307  Source distribution

tail -f /opt/local/var/mysql/dragonfly.log:
./mysqld, Version: 4.1.19-valgrind-max-debug-log. started with:
Tcp port: 3307  Unix socket: /opt/local/var/mysql/mysql.sock
Time                 Id Command    Argument
060329 17:10:59	      1 Connect     kostja@localhost on test
		      1 Query       show databases
		      1 Query       show tables
		      1 Query       select 1
060329 17:11:01	      1 Query       select 1
060329 17:11:05	      2 Connect     kostja@localhost on test
		      2 Query       show databases
		      2 Query       show tables
		      2 Query       select 1
060329 17:11:09	      3 Connect     kostja@localhost on test
		      3 Query       show databases
		      3 Query       show tables
		      3 Query       select 1

Console log:

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    2
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    3
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> 

It might just as well be a bug in the NPTL itself. I'm using SuSE 10, with one of the latest kernels:
kostja@dragonfly:~> cat /etc/issue

Welcome to SUSE LINUX 10.0 (i586) - Kernel \r (\l).

kostja@dragonfly:~> uname -a
Linux dragonfly 2.6.13-15.8-default #1 Tue Feb 7 11:07:24 UTC 2006 i686 i686 i386 GNU/Linux
[29 Mar 2006 13:22] Konstantin Osipov
I was able to verify the bug
[29 Mar 2006 13:25] Konstantin Osipov
I was able to verify the bug using 4.0 version of the server. This may be a build-related issue, as 4.0 servers are compiled statically. The bug is not present in 4.1 or 5.0.
Shall we investigate and fix this in 4.0?
[29 Mar 2006 16:08] Konstantin Osipov
Matt, as I stated in my report, 4.1 and 5.0 servers are not affected by this bug, whereas 4.0 server very well demonstrates the described behaviour. Please provide a reproducible test case for 4.1.
[29 Mar 2006 21:04] Brian Aker
Upgrade to a later version of the server. We do not currently support NTPL threads with servers of 4.1 or below.
[3 Aug 2006 15:10] Konstantin Osipov
Was able to repeat it with 4.0.28-bk (BUILD/compile-pentium-valgrind-max)
[22 Jan 2007 0:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18511

ChangeSet@1.2206, 2007-01-22 02:32:07+02:00, jani@a88-113-38-195.elisa-laajakaista.fi +8 -0
  Fix for configure to detect library correctly.
  Fix to check library in use during runtime.
  Fix for Bug#16995, "idle connections not being killed due to timeout when NPTL is used".
[30 Jan 2007 18:10] Jani Tolonen
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html