Bug #19524 MySQL deadlocks
Submitted: 4 May 2006 1:30 Modified: 13 Jul 2006 15:25
Reporter: Alan Kasindorf Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.21-max OS:Linux (Debian x86_64)
Assigned to: CPU Architecture:Any

[4 May 2006 1:30] Alan Kasindorf
Description:
After an upgrade from 4.0/4.1 to 5.0 (tables were dumped, not copied), we will be sending INSERT's to MyISAM and instead of working, the thread handling the INSERT will deadlock forever.

I have not been able to verify that this condition happens on SELECT's or UPDATE's as well...

So, it appears that mostly when a table has not been opened yet, an INSERT hits it, and the thread goes away. The MyISAM lock stays and any other queries will stack up. The thread is in state "Updating" the whole time, which has been thousands of seconds.

Upon trying to issue a kill to that thread, it will change state to "Killed" but will not die. Trying to shut down MySQL gives a warning like:
060503 18:02:08 [Warning] /usr/local/mysql/bin/mysqld: Forcing close of thread 61488  user: 'anihq'
 - but MySQL never stops and must be killed with a -9.

We're running MySQL in 64-bit mode on a 64-bit OS on a dualcore opteron setup.

How to repeat:
Any of our MyISAM tables across multiple test databases are having the same issue. So a table that is at least:

 CREATE TABLE `configlog` (
  `reason` varchar(255) NOT NULL default '',
  `timestamp` int(11) NOT NULL default '0',
  `user_id` mediumint(8) NOT NULL default '0',
  KEY `timestamp` (`timestamp`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1

Then any INSERT into this table, mostly if this is the first access to the table since the server has started, has a chance of deadlocking. This is also very common if the mysql_recover option is set and the table was not closed properly.
[5 May 2006 21:31] Mark Leith
Hi,

Unfortunately we do not currently have a Debian x86_64 machine to test against at this time, and we also need a complete test case to try and run against this. Is it always randomly just the *first* thread or *any* thread that causes the hanging? 

We have seen certain cases where we seem to have a problem when using the NPTL threading library (rogue hung threads), where switching to LinuxThreads has fixed the issue. 

You can check which threading library is in use with:

getconf GNU_LIBPTHREAD_VERSION

You can try to force it to LinuxThreads with:

export LD_ASSUME_KERNEL=2.4.0

You will need to restart MySQL for this to take effect (and should add it to your start up script, near the start).

Also check the notes here:

http://hashmysql.org/index.php?title=Opteron_HOWTO

And there was one possible glibc bug:

https://launchpad.net/distros/ubuntu/+source/glibc/+bug/18012

So also please let us know your glibc version. If you could also come up with a more definitive test case that would be great.

Look forward to hearing from you.

Best regards

Mark
[5 May 2006 21:54] Alan Kasindorf
It is using NPTL 0.40, which is an incredibly old version of that.

Next week I will try an upgraded version of glibc and try to reproduce the bug in cases of old glibc, and new glibc, then I will update the bug report with further information.

Thanks!
[12 May 2006 11:33] Valeriy Kravchuk
Please, reopen this bug report when you'll have results of your tests.
[22 May 2006 17:57] Alan Kasindorf
glibc was the latest in debian sarge, which ran NPTL 0.4.0 (libc6 2.3.2.ds1-22)

I upgraded glibc to the latest from debian etch, version 2.3.6-3.

After doing this the MyISAM deadlock went away. I have not been able to reproduce a deadlock using MyISAM tables. however I can still deadlock a *busy* server by running:

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
UNLOCK TABLES;

- all within a couple seconds. Looks like *any* queries which entered "Waiting for readlock" mode during that short time will *never* exit that mode. Server exhibits the same issues as before and has to be force restarted. New queries that come in after the unlocking appear to work though? All queries which have been deadlockes are running against InnoDB tables. No errors appear in the error log.

When I do the same on a slave system which is just running replication, it will not lock up.

Running: export LD_ASSUME_KERNEL=2.4.0
Then: getconf GNU_LIBPTHREAD_VERSION

displays an error:
getconf: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory.
Think I'm being an idiot again... If I can reproduce the crash on a slave I'll run gdb on it and provide a trace. Keep in mind for all intents the system that MySQL should be touching is as new as, or newer than, debian testing (custom vanilla kernel 2.6.16.6 and the latest Etch glibc). I'm unsure what other old crappy part of my OS could be causing thread hangups like this.
[24 May 2006 14:34] Mark Leith
Hi Alan, 

OK it looks like you may have hit another bug that we have seen with the NTPL threading library, which I feel has had a fix commited today. Namely Bug #20048, found and fixed by Monty very recently:

http://bugs.mysql.com/bug.php?id=20048

Regards 

Mark
[13 Jun 2006 15:25] Valeriy Kravchuk
As bug #20048 fixed in 5.0.23, please, either try to build from current sources or  wait for 5.0.23 to be offically released and reopen this report if the deadlocks described will still occur.
[13 Jul 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".