Bug #12684 Total hang of MySQL slave server on table locks, maybe other conditions as well.
Submitted: 19 Aug 2005 18:45 Modified: 25 Oct 2005 11:11
Reporter: Robin Powell Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.0.25 OS:Linux (Debian AMD64 3.1 Stable)
Assigned to: CPU Architecture:Any

[19 Aug 2005 18:45] Robin Powell
Description:
We have a single master and a single slave, both running MySQL
4.0.25 (specifically,
mysql-standard-4.0.25-unknown-linux-gnu-x86_64-glibc23 ).  On a
fairly regular basis (every few days, at least), the slave hangs,
and will not respond to new connections, although old ones appear to
continue to function.

A "pkill -9 mysqld" is required to shut the server down at this
point.

I have collected full strace data and logs.  The logs show nothing
at all when the hang happens, except that the connection to the
master is still working, and updates received from it are still
occurring.  strace data is available at
http://teddyb.org/~rlpowell/media/regular/mysql_bug/

The strace logs are before running the script, after running the
script, and after killing the database.

-Robin

How to repeat:
In some cases, we have no idea what causes this problem.  However,
most of the time the following script will cause it:

- -----------

#!/bin/sh

echo "Running mysql."

/usr/pkg/mysql/bin/mysql -v --disable-pager --batch -u root -h
localhost mysql <<EOF

USE mysql;

LOCK TABLES user WRITE, db WRITE, tables_priv WRITE;

DELETE from db;

GRANT ALL ON visions_furl.* TO
'visions_fadmin'@'sv-furlweb2i.looksmart.com';
GRANT ALL ON cofe.* TO
'visions_fadmin'@'sv-furlweb2i.looksmart.com';

GRANT ALL ON visions_furl.* TO
'visions_fadmin'@'sv-furlweb1i.looksmart.com';
GRANT ALL ON cofe.* TO
'visions_fadmin'@'sv-furlweb1i.looksmart.com';

GRANT ALL ON visions_furl.* TO
'visions_fadmin'@'sv-newfurlweb1i.looksmart.com';
GRANT ALL ON cofe.* TO
'visions_fadmin'@'sv-newfurlweb1i.looksmart.com';

UNLOCK TABLES;

EOF

echo "Done refreshing perms."

- -----------

It might require two or three runs of the script, but 9 times out of
10, sooner or later that script will cause a total hang of the
system (the remaining 1 time out of 10, the server seems immune to
the effect; restarting the server "fixes" that "problem").

Suggested fix:
Not a clue.  No workaround has been found.  In the case of this
particular script, removing the lock/unlock and the delete fixes it,
but the hang still occurs, only we don't know why when this script
doesn't cause it.
[19 Aug 2005 18:51] Robin Powell
Just to clarify: this script repeatably causes the bug, *however*,
even if this script never runs, the slaves (and, occasionally, the
master) will hang on their own occasionally.  It seems to be exactly
the same type of hang as that caused by this script, so I'm hoping
fixing one will fix the other.

-Robin
[20 Aug 2005 11:59] MySQL Verification Team
Unfortunately, what you describe, we have witnessed.

But only with Debian made builds.

Please try our own build and see if you can repeat a problem.
[20 Aug 2005 12:00] MySQL Verification Team
One additional comment.

Please use our static binary, if available for AMD64.
[20 Aug 2005 13:27] Robin Powell
I *am* using your build.  4.0.25 Standard for AMD64.  Specifically, I'm using:

mysql-standard-4.0.25-unknown-linux-gnu-x86_64-glibc23

-Robin
[20 Aug 2005 13:31] MySQL Verification Team
Thanks.

Next suspect on the list is TLS set of libraries.

Do you have /lib64/tls/ directory ??

If you do, stop mysql server, rename temporarily the above /lib64/tls/ to /lib64/tls_unused/ and re-start server.

See if it helps.
[20 Aug 2005 13:41] Robin Powell
I do not:

sv-furldb2i:/usr/pkg# cd /lib64/
sv-furldb2i:/lib64# find . -name '*tls*'
sv-furldb2i:/lib64#

In fact:

sv-furldb2i:/# ls -l /lib64
lrwxrwxrwx  1 root root 3 Aug  8 22:00 /lib64 -> lib

Which seems a teensy bit odd to me, but then the system seems to be running fine otherwise.

I *do*, however, have:

/emul/ia32-linux/lib/tls

Moving that as you describe didn't work; I thought it had for a minute, but then I restarted the server and made it hang on the first try.

-Robin
[20 Aug 2005 13:46] MySQL Verification Team
Check if binary is dynamic ...

If it is , run this before running directly mysqld from the shell:

export  LD_ASSUME_KERNEL=2.4.1
[20 Aug 2005 19:25] Robin Powell
sv-furldb2i:/usr/pkg/mysql#  ./bin/mysqld_safe --user=lksm --log-warnings --log-isam --log-long-format
Starting mysqld daemon with databases from /prod/furl/data/mysql
date: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory
rm: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
STOPPING server from pid file /prod/furl/data/mysql/sv-furldb2i.pid
date: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory
tee: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
tee: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

Limiting the variable to only occuring on the lines in which mysqld is called gives this in the log file:

nohup: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
050820 12:24:14  mysqld ended
[21 Aug 2005 14:41] Robin Powell
Just to be extra clear:

sv-furldb2i:/usr/pkg/mysql# export  LD_ASSUME_KERNEL=2.4.1
sv-furldb2i:/usr/pkg/mysql# ./bin/mysqld
./bin/mysqld: error while loading shared libraries: librt.so.1: cannot open shared object file: No such file or directory

In fact, even /bin/sh can't run with that variable set, and ldconfig seg faults.

-Robin
[22 Aug 2005 12:07] MySQL Verification Team
LD_ASSUME... should not change dependencies that much.

run ldd before this export and after ....

See which libs are missing ...

Also try finding if we have static binary for AMD-64...
[22 Aug 2005 12:23] Robin Powell
I can't run ldd with that variable set, because it's a shell script and my shell crashes!

sv-furldb2i:/usr/pkg/mysql# ldd ./bin/mysqld
        librt.so.1 => /lib/librt.so.1 (0x0000002a9566c000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95774000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a95877000)
        libz.so.1 => /usr/lib/libz.so.1 (0x0000002a9598b000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x0000002a95aa0000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x0000002a95bd3000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95ce9000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95e70000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)
sv-furldb2i:/usr/pkg/mysql# export  LD_ASSUME_KERNEL=2.4.1
sv-furldb2i:/usr/pkg/mysql# ldd ./bin/mysqld
/bin/bash: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory

You do not have a statically linked version.
[22 Aug 2005 12:28] Robin Powell
Used a strategic echo to find out what ldd was actually running:

sv-furldb2i:/usr/pkg/mysql# LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= ./bin/mysqld
        librt.so.1 => /lib/librt.so.1 (0x0000002a9566c000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000002a95774000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000002a95877000)
        libz.so.1 => /usr/lib/libz.so.1 (0x0000002a9598b000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x0000002a95aa0000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x0000002a95bd3000)
        libm.so.6 => /lib/libm.so.6 (0x0000002a95ce9000)
        libc.so.6 => /lib/libc.so.6 (0x0000002a95e70000)
        /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)
sv-furldb2i:/usr/pkg/mysql# export  LD_ASSUME_KERNEL=2.4.1
sv-furldb2i:/usr/pkg/mysql# LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= ./bin/mysqld
        librt.so.1 => not found
        libdl.so.2 => not found
        libpthread.so.0 => not found
        libz.so.1 => /usr/lib/libz.so.1 (0x0000002a9566c000)
        libcrypt.so.1 => not found
        libnsl.so.1 => not found
        libm.so.6 => not found
        libc.so.6 => not found
        libc.so.6 => not found

Please note that there has never been a 2.4.x version of AMD64, so that probably explains what's going on here.
[22 Aug 2005 22:08] Robin Powell
So, since you keep asking about shared libraries issues, I'm guessing that you have a particular cause in mind.  Can you point me to more information about what's going on?  Is there any chance this bug is fixed in 4.1 ?

-Robin
[24 Aug 2005 21:24] Robin Powell
This seems to go away if I compile MySQL myself, but we'd really rather not run with a self-built binary in production.  Is there anything else we can do to try to figure out what's causing this?

-Robin
[25 Aug 2005 18:13] Robin Powell
Just FYI, the Debian AMD64 port's version of mysql 4.0.24 (mysql-server package) seems to not have this bug.

-Robin
[25 Sep 2005 11:11] Valeriy Kravchuk
So, it worked in 4.0.24 from Debian, but does not work in 4.0.25 from MySQL... Have you tried to use newer 4.0.26 version of our binaries (http://dev.mysql.com/get/Downloads/MySQL-4.0/mysql-standard-4.0.26-unknown-linux-gnu-x86_6...)?

I noted the following in http://dev.mysql.com/doc/mysql/en/news-4-0-26.html:

"When two threads compete for the same table, a deadlock could occur if one thread has also a lock on another table through LOCK TABLES and the thread is attempting to remove the table in some manner and the other thread want locks on both tables. (Bug #10600)". 

May be not related at all. Just wanted to remind everybody about this issue.
[25 Oct 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".