Bug #25232 MySQL hangs in FUTEX_WAIT
Submitted: 21 Dec 2006 13:24 Modified: 16 Jan 2007 8:44
Reporter: Carsten Schabacker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.30 OS:Linux (Linux AMD64 - SLES9-SP3 2.6.17)
Assigned to: CPU Architecture:Any
Tags: FUTEX_WAIT, sysbench

[21 Dec 2006 13:24] Carsten Schabacker
Description:
Hello

I did some performance tests on a 4 processor AMD (2 Dual-Core 275) with "sysbench"-OLTP Benchmark.

I am testing a table with 100 Mio rows.

When i run the test with 1 thead, everything looks fine, machine is running and after some time i get the benchmark-results

Then i started the same test with 2 parallel threads. Now the machine seems to be idle (top, vmstat shows no activiy) - only update says "load 2.0".

I wanted to shutdown mysql - but it fails.

Strace shows following output:
# strace  -p 21671
Process 21671 attached - interrupt to quit
futex(0xa8cca4, FUTEX_WAIT, 6, NULL

I can repeat the effect on a second server (same hw+os).

How to repeat:

Use sysbench (0.4.8) - perpare a table with 100 Mio Rows and run the "complex" OLTP -Benchmark.
[21 Dec 2006 13:34] Carsten Schabacker
Output of GDB:

(gdb) sli0000816:~ # gdb mysqld 21671
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /usr/sbin/mysqld, process 21671
0x00002b7b386cdcfd in ?? ()
(gdb) where
#0  0x00002b7b386cdcfd in ?? ()
#1  0x0000000200000000 in ?? ()
#2  0x0000000000a8cca0 in tab_uni_gbk6 ()
#3  0x0000000000a8d1e0 in tab_uni_gbk6 ()
#4  0x0000000000000002 in ?? ()
#5  0x00002b7b386cdbb0 in ?? ()
#6  0x00007fff7281a260 in ?? ()
#7  0x0000000000a8d1e0 in tab_uni_gbk6 ()
#8  0x0000000000000000 in ?? ()
[21 Dec 2006 15:29] Carsten Schabacker
even after a reboot, an removing the database (rm -fr) an creating a new on (mysql_init_db) the same effect apears...

Maybe the Filesystem (XFS) is corrupt?
[21 Dec 2006 15:38] Carsten Schabacker
xfs_check does not show any errors...
[25 Dec 2006 19:54] Sveta Smirnova
Thank you for the report.

Please provide full command line for sysbench.

Also please provide output of sysbench --debug
[8 Jan 2007 10:57] Carsten Schabacker
Sysbench commandline:

# sysbench --db-driver=mysql  --num-threads=2 --test=oltp --mysql-user=root --max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb run

Test with --debug is currently running...

thanks
   Carsten
[8 Jan 2007 12:26] Carsten Schabacker
after I run sysbench today (with --debug) i saw the following effect:

sysbench is gone (no process) - output has stoped 

then i tried to shutdown mysql, after mysql catches the signal mysql hangs in FUTEX_WAIT ...

i guess sysbench crashed and did not released a mutex?

# strace -p 11798
Process 11798 attached - interrupt to quit
select(14, [11 13], NULL, NULL, NULL)   = ? ERESTARTNOHAND (To be restarted)
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
rt_sigaction(SIGUSR1, {0x7ee7c0, [], SA_RESTORER, 0x2b5714bd7730}, NULL, 8) = 0
rt_sigreturn(0xa)                       = -1 EINTR (Interrupted system call)
futex(0xa8cca4, 0x4 /* FUTEX_??? */, 1) = 1
futex(0xa8cca4, FUTEX_WAIT, 5, NULL

DEBUG: mysql_stmt_num_rows(0x12225c0) = 1
DEBUG: mysql_stmt_fetch(0x12225c0) = 0
DEBUG: mysql_stmt_fetch(0x12225c0) = 100
DEBUG: mysql_stmt_free_result(0x12225c0)
DEBUG: mysql_stmt_execute(0x9a7760) = 0
DEBUG: mysql_stmt_store_result(0x9a7760) = 0
DEBUG: mysql_stmt_num_rows(0x9a7760) = 1
DEBUG: mysql_stmt_fetch(0x9a7760) = 0
DEBUG: mysql_stmt_fetch(0x9a7760) = 100
DEBUG: mysql_stms

yes the output ends here...
[9 Jan 2007 16:44] Sveta Smirnova
Thank you for additional information.

I could not repeat crash on 64-bit machine. On Intel Mac sysbench has been crashed, but not MySQL server.
[9 Jan 2007 16:54] Carsten Schabacker
Thanks for your testing.

But I can not shutdown mysql in such cases, I will try to dig deeper into it....

carsten
[10 Jan 2007 14:58] Carsten Schabacker
Problemdescription:

It seem mysql is not fully hanging, only the innoDB relevant part.  At the time, mysql is hanging, you can connect via mysql and query mysql.user for example but nothing in innoDB.

When i try to select data from a table in innoDB i see in strace of mysqld again the futex_wait:

[pid 17169] <... read resumed> "$\0\0\0", 4) = 4
[pid 17169] read(52, "\3select * from sbtest where k < "..., 36) = 36
[pid 17169] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [HUP INT QUIT PIPE ALRM TERM TSTP], 8) = 0
[pid 17169] rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT PIPE ALRM TERM TSTP], NULL, 8) = 0
[pid 17169] fcntl(52, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 17169] sched_setscheduler(17169, SCHED_OTHER, { 6 }) = -1 EINVAL (Invalid argument)
[pid 17169] sched_yield()               = 0
[pid 17169] futex(0x17aeddc, FUTEX_WAIT, 89, NULL <unfinished ...>
[pid 15345] <... select resumed> )      = 0 (Timeout)

But i get no response, CPU utilization is zero, disk-IO is zero ...
[10 Jan 2007 16:42] Carsten Schabacker
Logfile from mysql-Server with InnoDB Monitor Output

Attachment: mysqlserver.err.gz (application/x-gzip, text), 6.30 KiB.

[10 Jan 2007 16:42] Carsten Schabacker
I added the mysql Logfile with output from innodb-Monitor (innotop running in parallel).

Maybe the Logfile can help?
[11 Jan 2007 9:34] Sveta Smirnova
Unfortunately I can not repeat the crash.

On Mac it is not repeatable too after I've upgraded MacOS
[11 Jan 2007 10:22] Carsten Schabacker
I supose the problem is plattform specific (Linux, 64bit).
[11 Jan 2007 16:29] Valeriy Kravchuk
I tried to repeat as described on Dual AMD Opteron 244 with latest 5.0.34-BK. I had started sysbench as follows (after prepare):

mysqldev@rh-x86-64:~/valeriy/sysbench> bin/sysbench --db-driver=mysql  --num-th
reads=2 --test=oltp --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333
--max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb --m
ysql-db=test run
sysbench v0.4.8:  multi-threaded system evaluation benchmark

WARNING: Preparing of "BEGIN" is unsupported, using emulation
(last message repeated 1 times)
Running the test with following options:
Number of threads: 2

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 p
ct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Maximum number of requests for OLTP test is limited to 20000
Threads started!

and it looks like(!) hang. In another shell I get:

mysqldev@rh-x86-64:~> strace -p 4537
Process 4537 attached - interrupt to quit
futex(0x40a009f0, FUTEX_WAIT, 4541, NULL <unfinished ...>
Process 4537 detached
mysqldev@rh-x86-64:~> top | grep sysbench
 4537 mysqldev  16   0 45780 2016 1336 S  2.0  0.2   0:04.45 sysbench
 4537 mysqldev  16   0 45780 2016 1336 S  1.0  0.2   0:04.48 sysbench
 4537 mysqldev  16   0 45780 2016 1336 S  1.3  0.2   0:04.52 sysbench
 4537 mysqldev  16   0 45780 2016 1336 S  1.7  0.2   0:04.57 sysbench
 4537 mysqldev  16   0 45780 2016 1336 S  1.0  0.2   0:04.60 sysbench
...

So, yes, that FUTEX_WAIT is for sysbench, but NOT server:

mysqldev@rh-x86-64:~/valeriy/dbs/5.0> ps -ef | grep valeriy | grep mysqld
mysqldev 29803 29770 11 14:56 pts/6    00:14:24 /data0/users/mysqldev/valeriy/db
s/5.0/libexec/mysqld --basedir=/data0/users/mysqldev/valeriy/dbs/5.0 --datadir=/
data0/users/mysqldev/valeriy/dbs/5.0/var --pid-file=/data0/users/mysqldev/valeri
y/dbs/5.0/mysqld.pid --skip-external-locking --port=3333 --socket=/data0/users/m
ysqldev/valeriy/dbs/5.0/mysql.sock
mysqldev  5522  5358  0 17:03 pts/8    00:00:00 grep valeriy
mysqldev@rh-x86-64:~/valeriy/dbs/5.0> strace -p 29803
Process 29803 attached - interrupt to quit
select(13, [10 12], NULL, NULL, NULL <unfinished ...>
Process 29803 detached

And yes, simple statement like:

select count(*) from sbtest;

"hanged" in the following state:

mysql> show processlist;
+----+------+-----------------+------+---------+------+--------------+----------
-----------------------+
| Id | User | Host            | db   | Command | Time | State        | Info
                       |
+----+------+-----------------+------+---------+------+--------------+----------
-----------------------+
| 11 | root | localhost:56479 | test | Query   |  214 | Sending data | select co
unt(*) from sbtest     |
| 13 | root | localhost:57181 | test | Execute |    1 | statistics   | SELECT c
from sbtest where id=? |
| 14 | root | localhost:57182 | test | Execute |    1 | statistics   | SELECT c
from sbtest where id=? |
| 15 | root | localhost:57321 | test | Query   |    0 | NULL         | show proc
esslist                |
+----+------+-----------------+------+---------+------+--------------+----------
-----------------------+
4 rows in set (0.00 sec)

From INNODB STATUS:

------------
TRANSACTIONS
------------
Trx id counter 0 64283
Purge done for trx's n:o < 0 59500 undo n:o < 0 0
History list length 2392
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 29803, OS thread id 1158990176
MySQL thread id 15, query id 652046 localhost 127.0.0.1 root
show innodb status
---TRANSACTION 0 64281, ACTIVE 0 sec, process no 29803, OS thread id 1158457696,
 thread declared inside InnoDB 424
mysql tables in use 1, locked 0
MySQL thread id 14, query id 652045 localhost 127.0.0.1 root Sorting result
SELECT c from sbtest where id between ? and ? order by c
Trx read view will not see trx with id >= 0 64282, sees < 0 59498
---TRANSACTION 0 64280, COMMITTED IN MEMORY, process no 29803, OS thread id 1158
723936 committing
, undo log entries 5
MySQL thread id 13, query id 652036 localhost 127.0.0.1 root
COMMIT
---TRANSACTION 0 59498, ACTIVE 327 sec, process no 29803, OS thread id 115819145
6 fetching rows, thread declared inside InnoDB 257
mysql tables in use 1, locked 0
MySQL thread id 11, query id 601790 localhost 127.0.0.1 root Sending data
select count(*) from sbtest
Trx read view will not see trx with id >= 0 59499, sees < 0 59491

and I've got result, but only after almost 9 mins:

...
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (8 min 54.90 sec)

...

and, finally, sysbench jusr returned me results:

Done.

OLTP test statistics:
    queries performed:
        read:                            280000
        write:                           100000
        other:                           40000
        total:                           420000
    transactions:                        20000  (8.33 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 380000 (158.18 per sec.)
    other operations:                    40000  (16.65 per sec.)

Test execution summary:
    total time:                          2402.3218s
    total number of events:              20000
    total time taken by event execution: 4804.1625
    per-request statistics:
         min:                            0.0154s
         avg:                            0.2402s
         max:                            2.4859s
         approx.  95 percentile:         1.1309s

Threads fairness:
    events (avg/stddev):           10000.0000/111.00
    execution time (avg/stddev):   2402.0813/0.00

With 1 thread I've got:

mysqldev@rh-x86-64:~/valeriy/sysbench> bin/sysbench --db-driver=mysql  --num-th
reads=1 --test=oltp --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333
--max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb --m
ysql-db=test run
sysbench v0.4.8:  multi-threaded system evaluation benchmark

WARNING: Preparing of "BEGIN" is unsupported, using emulation
Running the test with following options:
Number of threads: 1

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 p
ct cases)
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Maximum number of requests for OLTP test is limited to 20000
Threads started!
Done.

OLTP test statistics:
    queries performed:
        read:                            280000
        write:                           100000
        other:                           40000
        total:                           420000
    transactions:                        20000  (6.95 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 380000 (131.96 per sec.)
    other operations:                    40000  (13.89 per sec.)

Test execution summary:
    total time:                          2879.5532s
    total number of events:              20000
    total time taken by event execution: 2879.1130
    per-request statistics:
         min:                            0.0143s
         avg:                            0.1440s
         max:                            1.6703s
         approx.  95 percentile:         1.0397s

Threads fairness:
    events (avg/stddev):           20000.0000/0.00
    execution time (avg/stddev):   2879.1130/0.00

This is not SLES9:

mysqldev@rh-x86-64:~/valeriy/sysbench> uname -a
Linux rh-x86-64.mysql.com 2.6.9-34.0.2.EL #1 Fri Jun 30 10:22:45 EDT 2006 x86_64
 x86_64 x86_64 GNU/Linux

but it is the closest system to yours I can test on.

Summary:

Bug is not repeatable with current 5.0.34-BK. With 2 threads it works, and even faster. Although, other actions on the same MySQL server almost(!) blocked.
[12 Jan 2007 11:33] Carsten Schabacker
I did 

sysbench --db-driver=mysql  --num-threads=2 --test=oltp --mysql-user=root --max-requests=20000 --oltp-table-size=20000 --mysql-table-engine=innodb run

after about 4 minutes (see following screenshot from kload) i can see futex_wait in sysbench _and_ mysql (as you said):

# ps auxuw|grep sysbench
root     11721  0.0  0.3   8068  1396 pts/2    S+   12:33   0:00 /bin/sh ./sysbench_sar.sh
root     12872  0.0  0.3   8068  1448 pts/1    S+   12:45   0:00 /bin/sh ./sysbench2_run.sh
root     18642  0.0  0.1   8068   852 pts/1    S+   13:57   0:00 /bin/sh ./sysbench2_run.sh
root     18660  1.6  0.3  19128  1612 pts/1    Sl+  13:57   0:06 /root/CSC/sysbench-0.4.8/sysbench/sysbench --db-driver mysql --num-threads 2 --test oltp --mysql-user root --max-requests 20000 --oltp-table-size 10000000 --mysql-table-engine innodb run
root     19202  0.0  0.1   2880   692 pts/6    S+   14:04   0:00 grep sysbench
# strace -p 18660
Process 18660 attached - interrupt to quit
futex(0x402009f0, FUTEX_WAIT, 18663, NULL

# ps auxw|grep mysqld
root     18590  0.0  0.3   8068  1520 pts/1    S+   13:57   0:00 /bin/sh /usr/local/bin/mysqld_safe --datadir=/data-slow/mysql --pid-file=/data-slow/mysql/sli0000816.pid
mysql    18623  9.5 73.0 530280 323792 pts/1   Sl+  13:57   0:43 /usr/local/libexec/mysqld --basedir=/usr/local --datadir=/data-slow/mysql --user=mysql --pid-file=/data-slow/mysql/sli0000816.pid --skip-external-locking --open-files-limit=8192 --port=3306 --socket=/data-slow/mysql/mysql.sock
root     19242  0.0  0.1   2880   692 pts/7    S+   14:04   0:00 grep mysqld
# strace -f -p 18623
Process 18677 attached with 12 threads - interrupt to quit
[pid 18623] select(13, [10 12], NULL, NULL, NULL <unfinished ...>
[pid 18624] futex(0xf915cc, FUTEX_WAIT, 1, NULL <unfinished ...>
[pid 18625] futex(0xf9166c, FUTEX_WAIT, 65, NULL <unfinished ...>
[pid 18626] futex(0xf9170c, FUTEX_WAIT, 913, NULL <unfinished ...>
[pid 18627] futex(0xf917ac, FUTEX_WAIT, 313, NULL <unfinished ...>
[pid 18629] select(0, NULL, NULL, NULL, {0, 580000} <unfinished ...>
[pid 18630] futex(0x174601c, FUTEX_WAIT, 250, NULL <unfinished ...>
[pid 18631] futex(0xf9184c, FUTEX_WAIT, 3, NULL <unfinished ...>
[pid 18677] futex(0x174601c, FUTEX_WAIT, 250, NULL <unfinished ...>
[pid 18632] rt_sigtimedwait([HUP QUIT ALRM TERM TSTP],  <unfinished ...>
[pid 18629] <... select resumed> )      = 0 (Timeout)
[pid 18629] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
[pid 18629] select(0, NULL, NULL, NULL, {1, 0}

Anyway, I mysql 5.0.33 is now released - i will try to update to linux 2.6.19.2 and mysql-5.0.33 and see what will happen. :-)

Thanks so far.

Carsten

(Kload-Screenshot will follow)
[12 Jan 2007 11:34] Carsten Schabacker
KLoad-Screenshot

Attachment: Screenshot-kload.png (image/png, text), 41.67 KiB.

[12 Jan 2007 16:05] Carsten Schabacker
After updating to Linux Kernel 2.6.19.2 (same mysql, same sysbench, same machine) everything works fine.  Sysbench and MySQL is running very well.
[16 Jan 2007 8:44] Carsten Schabacker
no more problems