Bug #16995 idle connections not being killed due to timeout when NPTL is used
Submitted: 1 Feb 2006 6:44 Modified: 30 Jan 2007 19:12
Reporter: Matthew Lord
Status: Closed
Category:Server Severity:S2 (Serious)
Version:4.0 OS:Linux (Linux 2.6.x)
Assigned to: Konstantin Osipov Target Version:

[1 Feb 2006 6:44] Matthew Lord
Description:
A signal is not being handled correctly when using mysqld with NPTL.  Idle connections
are not being
killed by the server when the interactive/wait timeout is reached.  When doing export
LD_ASSUME_KERNEL=2.4.1 so that linux threads is used the connections are properly
killed.

When using NPTL if you manually kill one of the idle connections then they are all
cleaned up.

How to repeat:
./mysqld --no-defaults --basedir=.. --datadir=../data --skip-grant-tables &
./mysql -A
set global wait_timeout=10;
set global interactive_timeout=10;

./mysql -A <ctrl-z>
./mysql -A <ctrl-z>

Now NPTL is being used and you will see that the idle connections are not being
killed.  If you kill one of them then the others are also killed.

Now kill mysqld and run export LD_ASSUME_KERNEL=2.4.1 prior to running
through the exact same steps.  With LT the queries are getting killed as they should.

Suggested fix:
This is one of at least two cases where mysqld and NPTL don't seem to play well together.
 With linux distributions starting to drop LT altogether we need to get these straightened
out.  The other
big problem is with connections just "hanging" under very high load.
[24 Mar 2006 20:58] Konstantin Osipov
In which version shall we fix this bug?
[29 Mar 2006 15:13] Konstantin Osipov
I wasn't able to verify it on my machine:
kostja@dragonfly:~/work/mysql-4.1-root/sql> ./mysqld --log --interactive_timeout=3;
060329 17:10:54 [Warning] setrlimit could not change the size of core files to
'infinity';  We may not be able to generate a core file on signals
./mysqld: ready for connections.
Version: '4.1.19-valgrind-max-debug-log'  socket: '/opt/local/var/mysql/mysql.sock' 
port: 3307  Source distribution

tail -f /opt/local/var/mysql/dragonfly.log:
./mysqld, Version: 4.1.19-valgrind-max-debug-log. started with:
Tcp port: 3307  Unix socket: /opt/local/var/mysql/mysql.sock
Time                 Id Command    Argument
060329 17:10:59	      1 Connect     kostja@localhost on test
		      1 Query       show databases
		      1 Query       show tables
		      1 Query       select 1
060329 17:11:01	      1 Query       select 1
060329 17:11:05	      2 Connect     kostja@localhost on test
		      2 Query       show databases
		      2 Query       show tables
		      2 Query       select 1
060329 17:11:09	      3 Connect     kostja@localhost on test
		      3 Query       show databases
		      3 Query       show tables
		      3 Query       select 1

Console log:

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    2
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

mysql> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    3
Current database: test

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> 

It might just as well be a bug in the NPTL itself. I'm using SuSE 10, with one of the
latest kernels:
kostja@dragonfly:~> cat /etc/issue

Welcome to SUSE LINUX 10.0 (i586) - Kernel \r (\l).

kostja@dragonfly:~> uname -a
Linux dragonfly 2.6.13-15.8-default #1 Tue Feb 7 11:07:24 UTC 2006 i686 i686 i386
GNU/Linux
[29 Mar 2006 15:22] Konstantin Osipov
I was able to verify the bug
[29 Mar 2006 15:25] Konstantin Osipov
I was able to verify the bug using 4.0 version of the server. This may be a build-related
issue, as 4.0 servers are compiled statically. The bug is not present in 4.1 or 5.0.
Shall we investigate and fix this in 4.0?
[29 Mar 2006 18:08] Konstantin Osipov
Matt, as I stated in my report, 4.1 and 5.0 servers are not affected by this bug, whereas
4.0 server very well demonstrates the described behaviour. Please provide a reproducible
test case for 4.1.
[29 Mar 2006 23:04] Brian Aker
Upgrade to a later version of the server. We do not currently support NTPL threads with
servers of 4.1 or below.
[3 Aug 2006 17:10] Konstantin Osipov
Was able to repeat it with 4.0.28-bk (BUILD/compile-pentium-valgrind-max)
[22 Jan 2007 1:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18511

ChangeSet@1.2206, 2007-01-22 02:32:07+02:00, jani@a88-113-38-195.elisa-laajakaista.fi +8
-0
  Fix for configure to detect library correctly.
  Fix to check library in use during runtime.
  Fix for Bug#16995, "idle connections not being killed due to timeout when NPTL is
used".
[30 Jan 2007 19:10] Jani Tolonen
Thank you for your bug report. This issue has been committed to our source repository of
that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available
version, including the bug fix. More information about accessing the source trees is
available at

    http://dev.mysql.com/doc/en/installing-source.html