MySQL Bugs: #19524: MySQL deadlocks

Bug #19524	MySQL deadlocks
Submitted:	4 May 2006 1:30	Modified:	13 Jul 2006 15:25
Reporter:	Alan Kasindorf	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	5.0.21-max	OS:	Linux (Debian x86_64)
Assigned to:		CPU Architecture:	Any

Description:
After an upgrade from 4.0/4.1 to 5.0 (tables were dumped, not copied), we will be sending INSERT's to MyISAM and instead of working, the thread handling the INSERT will deadlock forever.

I have not been able to verify that this condition happens on SELECT's or UPDATE's as well...

So, it appears that mostly when a table has not been opened yet, an INSERT hits it, and the thread goes away. The MyISAM lock stays and any other queries will stack up. The thread is in state "Updating" the whole time, which has been thousands of seconds.

Upon trying to issue a kill to that thread, it will change state to "Killed" but will not die. Trying to shut down MySQL gives a warning like:
060503 18:02:08 [Warning] /usr/local/mysql/bin/mysqld: Forcing close of thread 61488  user: 'anihq'
 - but MySQL never stops and must be killed with a -9.

We're running MySQL in 64-bit mode on a 64-bit OS on a dualcore opteron setup.

How to repeat:
Any of our MyISAM tables across multiple test databases are having the same issue. So a table that is at least:

 CREATE TABLE `configlog` (
  `reason` varchar(255) NOT NULL default '',
  `timestamp` int(11) NOT NULL default '0',
  `user_id` mediumint(8) NOT NULL default '0',
  KEY `timestamp` (`timestamp`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1

Then any INSERT into this table, mostly if this is the first access to the table since the server has started, has a chance of deadlocking. This is also very common if the mysql_recover option is set and the table was not closed properly.

Hi,

Unfortunately we do not currently have a Debian x86_64 machine to test against at this time, and we also need a complete test case to try and run against this. Is it always randomly just the *first* thread or *any* thread that causes the hanging? 

We have seen certain cases where we seem to have a problem when using the NPTL threading library (rogue hung threads), where switching to LinuxThreads has fixed the issue. 

You can check which threading library is in use with:

getconf GNU_LIBPTHREAD_VERSION

You can try to force it to LinuxThreads with:

export LD_ASSUME_KERNEL=2.4.0

You will need to restart MySQL for this to take effect (and should add it to your start up script, near the start).

Also check the notes here:

http://hashmysql.org/index.php?title=Opteron_HOWTO

And there was one possible glibc bug:

https://launchpad.net/distros/ubuntu/+source/glibc/+bug/18012

So also please let us know your glibc version. If you could also come up with a more definitive test case that would be great.

Look forward to hearing from you.

Best regards

Mark

It is using NPTL 0.40, which is an incredibly old version of that.

Next week I will try an upgraded version of glibc and try to reproduce the bug in cases of old glibc, and new glibc, then I will update the bug report with further information.

Thanks!

Please, reopen this bug report when you'll have results of your tests.

glibc was the latest in debian sarge, which ran NPTL 0.4.0 (libc6 2.3.2.ds1-22)

I upgraded glibc to the latest from debian etch, version 2.3.6-3.

After doing this the MyISAM deadlock went away. I have not been able to reproduce a deadlock using MyISAM tables. however I can still deadlock a *busy* server by running:

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
UNLOCK TABLES;

- all within a couple seconds. Looks like *any* queries which entered "Waiting for readlock" mode during that short time will *never* exit that mode. Server exhibits the same issues as before and has to be force restarted. New queries that come in after the unlocking appear to work though? All queries which have been deadlockes are running against InnoDB tables. No errors appear in the error log.

When I do the same on a slave system which is just running replication, it will not lock up.

Running: export LD_ASSUME_KERNEL=2.4.0
Then: getconf GNU_LIBPTHREAD_VERSION

displays an error:
getconf: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory.
Think I'm being an idiot again... If I can reproduce the crash on a slave I'll run gdb on it and provide a trace. Keep in mind for all intents the system that MySQL should be touching is as new as, or newer than, debian testing (custom vanilla kernel 2.6.16.6 and the latest Etch glibc). I'm unsure what other old crappy part of my OS could be causing thread hangups like this.

Hi Alan, 

OK it looks like you may have hit another bug that we have seen with the NTPL threading library, which I feel has had a fix commited today. Namely Bug #20048, found and fixed by Monty very recently:

http://bugs.mysql.com/bug.php?id=20048

Regards 

Mark

As bug #20048 fixed in 5.0.23, please, either try to build from current sources or  wait for 5.0.23 to be offically released and reopen this report if the deadlocks described will still occur.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".