Bug #25232 | MySQL hangs in FUTEX_WAIT | ||
---|---|---|---|
Submitted: | 21 Dec 2006 13:24 | Modified: | 16 Jan 2007 8:44 |
Reporter: | Carsten Schabacker | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.0.30 | OS: | Linux (Linux AMD64 - SLES9-SP3 2.6.17) |
Assigned to: | CPU Architecture: | Any | |
Tags: | FUTEX_WAIT, sysbench |
[21 Dec 2006 13:24]
Carsten Schabacker
[21 Dec 2006 13:34]
Carsten Schabacker
Output of GDB: (gdb) sli0000816:~ # gdb mysqld 21671 GNU gdb 6.4 Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". Attaching to program: /usr/sbin/mysqld, process 21671 0x00002b7b386cdcfd in ?? () (gdb) where #0 0x00002b7b386cdcfd in ?? () #1 0x0000000200000000 in ?? () #2 0x0000000000a8cca0 in tab_uni_gbk6 () #3 0x0000000000a8d1e0 in tab_uni_gbk6 () #4 0x0000000000000002 in ?? () #5 0x00002b7b386cdbb0 in ?? () #6 0x00007fff7281a260 in ?? () #7 0x0000000000a8d1e0 in tab_uni_gbk6 () #8 0x0000000000000000 in ?? ()
[21 Dec 2006 15:29]
Carsten Schabacker
even after a reboot, an removing the database (rm -fr) an creating a new on (mysql_init_db) the same effect apears... Maybe the Filesystem (XFS) is corrupt?
[21 Dec 2006 15:38]
Carsten Schabacker
xfs_check does not show any errors...
[25 Dec 2006 19:54]
Sveta Smirnova
Thank you for the report. Please provide full command line for sysbench. Also please provide output of sysbench --debug
[8 Jan 2007 10:57]
Carsten Schabacker
Sysbench commandline: # sysbench --db-driver=mysql --num-threads=2 --test=oltp --mysql-user=root --max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb run Test with --debug is currently running... thanks Carsten
[8 Jan 2007 12:26]
Carsten Schabacker
after I run sysbench today (with --debug) i saw the following effect: sysbench is gone (no process) - output has stoped then i tried to shutdown mysql, after mysql catches the signal mysql hangs in FUTEX_WAIT ... i guess sysbench crashed and did not released a mutex? # strace -p 11798 Process 11798 attached - interrupt to quit select(14, [11 13], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted) --- SIGUSR1 (User defined signal 1) @ 0 (0) --- rt_sigaction(SIGUSR1, {0x7ee7c0, [], SA_RESTORER, 0x2b5714bd7730}, NULL, 8) = 0 rt_sigreturn(0xa) = -1 EINTR (Interrupted system call) futex(0xa8cca4, 0x4 /* FUTEX_??? */, 1) = 1 futex(0xa8cca4, FUTEX_WAIT, 5, NULL DEBUG: mysql_stmt_num_rows(0x12225c0) = 1 DEBUG: mysql_stmt_fetch(0x12225c0) = 0 DEBUG: mysql_stmt_fetch(0x12225c0) = 100 DEBUG: mysql_stmt_free_result(0x12225c0) DEBUG: mysql_stmt_execute(0x9a7760) = 0 DEBUG: mysql_stmt_store_result(0x9a7760) = 0 DEBUG: mysql_stmt_num_rows(0x9a7760) = 1 DEBUG: mysql_stmt_fetch(0x9a7760) = 0 DEBUG: mysql_stmt_fetch(0x9a7760) = 100 DEBUG: mysql_stms yes the output ends here...
[9 Jan 2007 16:44]
Sveta Smirnova
Thank you for additional information. I could not repeat crash on 64-bit machine. On Intel Mac sysbench has been crashed, but not MySQL server.
[9 Jan 2007 16:54]
Carsten Schabacker
Thanks for your testing. But I can not shutdown mysql in such cases, I will try to dig deeper into it.... carsten
[10 Jan 2007 14:58]
Carsten Schabacker
Problemdescription: It seem mysql is not fully hanging, only the innoDB relevant part. At the time, mysql is hanging, you can connect via mysql and query mysql.user for example but nothing in innoDB. When i try to select data from a table in innoDB i see in strace of mysqld again the futex_wait: [pid 17169] <... read resumed> "$\0\0\0", 4) = 4 [pid 17169] read(52, "\3select * from sbtest where k < "..., 36) = 36 [pid 17169] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [HUP INT QUIT PIPE ALRM TERM TSTP], 8) = 0 [pid 17169] rt_sigprocmask(SIG_SETMASK, [HUP INT QUIT PIPE ALRM TERM TSTP], NULL, 8) = 0 [pid 17169] fcntl(52, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 17169] sched_setscheduler(17169, SCHED_OTHER, { 6 }) = -1 EINVAL (Invalid argument) [pid 17169] sched_yield() = 0 [pid 17169] futex(0x17aeddc, FUTEX_WAIT, 89, NULL <unfinished ...> [pid 15345] <... select resumed> ) = 0 (Timeout) But i get no response, CPU utilization is zero, disk-IO is zero ...
[10 Jan 2007 16:42]
Carsten Schabacker
Logfile from mysql-Server with InnoDB Monitor Output
Attachment: mysqlserver.err.gz (application/x-gzip, text), 6.30 KiB.
[10 Jan 2007 16:42]
Carsten Schabacker
I added the mysql Logfile with output from innodb-Monitor (innotop running in parallel). Maybe the Logfile can help?
[11 Jan 2007 9:34]
Sveta Smirnova
Unfortunately I can not repeat the crash. On Mac it is not repeatable too after I've upgraded MacOS
[11 Jan 2007 10:22]
Carsten Schabacker
I supose the problem is plattform specific (Linux, 64bit).
[11 Jan 2007 16:29]
Valeriy Kravchuk
I tried to repeat as described on Dual AMD Opteron 244 with latest 5.0.34-BK. I had started sysbench as follows (after prepare): mysqldev@rh-x86-64:~/valeriy/sysbench> bin/sysbench --db-driver=mysql --num-th reads=2 --test=oltp --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333 --max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb --m ysql-db=test run sysbench v0.4.8: multi-threaded system evaluation benchmark WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 1 times) Running the test with following options: Number of threads: 2 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 p ct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Maximum number of requests for OLTP test is limited to 20000 Threads started! and it looks like(!) hang. In another shell I get: mysqldev@rh-x86-64:~> strace -p 4537 Process 4537 attached - interrupt to quit futex(0x40a009f0, FUTEX_WAIT, 4541, NULL <unfinished ...> Process 4537 detached mysqldev@rh-x86-64:~> top | grep sysbench 4537 mysqldev 16 0 45780 2016 1336 S 2.0 0.2 0:04.45 sysbench 4537 mysqldev 16 0 45780 2016 1336 S 1.0 0.2 0:04.48 sysbench 4537 mysqldev 16 0 45780 2016 1336 S 1.3 0.2 0:04.52 sysbench 4537 mysqldev 16 0 45780 2016 1336 S 1.7 0.2 0:04.57 sysbench 4537 mysqldev 16 0 45780 2016 1336 S 1.0 0.2 0:04.60 sysbench ... So, yes, that FUTEX_WAIT is for sysbench, but NOT server: mysqldev@rh-x86-64:~/valeriy/dbs/5.0> ps -ef | grep valeriy | grep mysqld mysqldev 29803 29770 11 14:56 pts/6 00:14:24 /data0/users/mysqldev/valeriy/db s/5.0/libexec/mysqld --basedir=/data0/users/mysqldev/valeriy/dbs/5.0 --datadir=/ data0/users/mysqldev/valeriy/dbs/5.0/var --pid-file=/data0/users/mysqldev/valeri y/dbs/5.0/mysqld.pid --skip-external-locking --port=3333 --socket=/data0/users/m ysqldev/valeriy/dbs/5.0/mysql.sock mysqldev 5522 5358 0 17:03 pts/8 00:00:00 grep valeriy mysqldev@rh-x86-64:~/valeriy/dbs/5.0> strace -p 29803 Process 29803 attached - interrupt to quit select(13, [10 12], NULL, NULL, NULL <unfinished ...> Process 29803 detached And yes, simple statement like: select count(*) from sbtest; "hanged" in the following state: mysql> show processlist; +----+------+-----------------+------+---------+------+--------------+---------- -----------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------------+------+---------+------+--------------+---------- -----------------------+ | 11 | root | localhost:56479 | test | Query | 214 | Sending data | select co unt(*) from sbtest | | 13 | root | localhost:57181 | test | Execute | 1 | statistics | SELECT c from sbtest where id=? | | 14 | root | localhost:57182 | test | Execute | 1 | statistics | SELECT c from sbtest where id=? | | 15 | root | localhost:57321 | test | Query | 0 | NULL | show proc esslist | +----+------+-----------------+------+---------+------+--------------+---------- -----------------------+ 4 rows in set (0.00 sec) From INNODB STATUS: ------------ TRANSACTIONS ------------ Trx id counter 0 64283 Purge done for trx's n:o < 0 59500 undo n:o < 0 0 History list length 2392 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, process no 29803, OS thread id 1158990176 MySQL thread id 15, query id 652046 localhost 127.0.0.1 root show innodb status ---TRANSACTION 0 64281, ACTIVE 0 sec, process no 29803, OS thread id 1158457696, thread declared inside InnoDB 424 mysql tables in use 1, locked 0 MySQL thread id 14, query id 652045 localhost 127.0.0.1 root Sorting result SELECT c from sbtest where id between ? and ? order by c Trx read view will not see trx with id >= 0 64282, sees < 0 59498 ---TRANSACTION 0 64280, COMMITTED IN MEMORY, process no 29803, OS thread id 1158 723936 committing , undo log entries 5 MySQL thread id 13, query id 652036 localhost 127.0.0.1 root COMMIT ---TRANSACTION 0 59498, ACTIVE 327 sec, process no 29803, OS thread id 115819145 6 fetching rows, thread declared inside InnoDB 257 mysql tables in use 1, locked 0 MySQL thread id 11, query id 601790 localhost 127.0.0.1 root Sending data select count(*) from sbtest Trx read view will not see trx with id >= 0 59499, sees < 0 59491 and I've got result, but only after almost 9 mins: ... +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (8 min 54.90 sec) ... and, finally, sysbench jusr returned me results: Done. OLTP test statistics: queries performed: read: 280000 write: 100000 other: 40000 total: 420000 transactions: 20000 (8.33 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 380000 (158.18 per sec.) other operations: 40000 (16.65 per sec.) Test execution summary: total time: 2402.3218s total number of events: 20000 total time taken by event execution: 4804.1625 per-request statistics: min: 0.0154s avg: 0.2402s max: 2.4859s approx. 95 percentile: 1.1309s Threads fairness: events (avg/stddev): 10000.0000/111.00 execution time (avg/stddev): 2402.0813/0.00 With 1 thread I've got: mysqldev@rh-x86-64:~/valeriy/sysbench> bin/sysbench --db-driver=mysql --num-th reads=1 --test=oltp --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333 --max-requests=20000 --oltp-table-size=10000000 --mysql-table-engine=innodb --m ysql-db=test run sysbench v0.4.8: multi-threaded system evaluation benchmark WARNING: Preparing of "BEGIN" is unsupported, using emulation Running the test with following options: Number of threads: 1 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 p ct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Maximum number of requests for OLTP test is limited to 20000 Threads started! Done. OLTP test statistics: queries performed: read: 280000 write: 100000 other: 40000 total: 420000 transactions: 20000 (6.95 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 380000 (131.96 per sec.) other operations: 40000 (13.89 per sec.) Test execution summary: total time: 2879.5532s total number of events: 20000 total time taken by event execution: 2879.1130 per-request statistics: min: 0.0143s avg: 0.1440s max: 1.6703s approx. 95 percentile: 1.0397s Threads fairness: events (avg/stddev): 20000.0000/0.00 execution time (avg/stddev): 2879.1130/0.00 This is not SLES9: mysqldev@rh-x86-64:~/valeriy/sysbench> uname -a Linux rh-x86-64.mysql.com 2.6.9-34.0.2.EL #1 Fri Jun 30 10:22:45 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux but it is the closest system to yours I can test on. Summary: Bug is not repeatable with current 5.0.34-BK. With 2 threads it works, and even faster. Although, other actions on the same MySQL server almost(!) blocked.
[12 Jan 2007 11:33]
Carsten Schabacker
I did sysbench --db-driver=mysql --num-threads=2 --test=oltp --mysql-user=root --max-requests=20000 --oltp-table-size=20000 --mysql-table-engine=innodb run after about 4 minutes (see following screenshot from kload) i can see futex_wait in sysbench _and_ mysql (as you said): # ps auxuw|grep sysbench root 11721 0.0 0.3 8068 1396 pts/2 S+ 12:33 0:00 /bin/sh ./sysbench_sar.sh root 12872 0.0 0.3 8068 1448 pts/1 S+ 12:45 0:00 /bin/sh ./sysbench2_run.sh root 18642 0.0 0.1 8068 852 pts/1 S+ 13:57 0:00 /bin/sh ./sysbench2_run.sh root 18660 1.6 0.3 19128 1612 pts/1 Sl+ 13:57 0:06 /root/CSC/sysbench-0.4.8/sysbench/sysbench --db-driver mysql --num-threads 2 --test oltp --mysql-user root --max-requests 20000 --oltp-table-size 10000000 --mysql-table-engine innodb run root 19202 0.0 0.1 2880 692 pts/6 S+ 14:04 0:00 grep sysbench # strace -p 18660 Process 18660 attached - interrupt to quit futex(0x402009f0, FUTEX_WAIT, 18663, NULL # ps auxw|grep mysqld root 18590 0.0 0.3 8068 1520 pts/1 S+ 13:57 0:00 /bin/sh /usr/local/bin/mysqld_safe --datadir=/data-slow/mysql --pid-file=/data-slow/mysql/sli0000816.pid mysql 18623 9.5 73.0 530280 323792 pts/1 Sl+ 13:57 0:43 /usr/local/libexec/mysqld --basedir=/usr/local --datadir=/data-slow/mysql --user=mysql --pid-file=/data-slow/mysql/sli0000816.pid --skip-external-locking --open-files-limit=8192 --port=3306 --socket=/data-slow/mysql/mysql.sock root 19242 0.0 0.1 2880 692 pts/7 S+ 14:04 0:00 grep mysqld # strace -f -p 18623 Process 18677 attached with 12 threads - interrupt to quit [pid 18623] select(13, [10 12], NULL, NULL, NULL <unfinished ...> [pid 18624] futex(0xf915cc, FUTEX_WAIT, 1, NULL <unfinished ...> [pid 18625] futex(0xf9166c, FUTEX_WAIT, 65, NULL <unfinished ...> [pid 18626] futex(0xf9170c, FUTEX_WAIT, 913, NULL <unfinished ...> [pid 18627] futex(0xf917ac, FUTEX_WAIT, 313, NULL <unfinished ...> [pid 18629] select(0, NULL, NULL, NULL, {0, 580000} <unfinished ...> [pid 18630] futex(0x174601c, FUTEX_WAIT, 250, NULL <unfinished ...> [pid 18631] futex(0xf9184c, FUTEX_WAIT, 3, NULL <unfinished ...> [pid 18677] futex(0x174601c, FUTEX_WAIT, 250, NULL <unfinished ...> [pid 18632] rt_sigtimedwait([HUP QUIT ALRM TERM TSTP], <unfinished ...> [pid 18629] <... select resumed> ) = 0 (Timeout) [pid 18629] select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) [pid 18629] select(0, NULL, NULL, NULL, {1, 0} Anyway, I mysql 5.0.33 is now released - i will try to update to linux 2.6.19.2 and mysql-5.0.33 and see what will happen. :-) Thanks so far. Carsten (Kload-Screenshot will follow)
[12 Jan 2007 11:34]
Carsten Schabacker
KLoad-Screenshot
Attachment: Screenshot-kload.png (image/png, text), 41.67 KiB.
[12 Jan 2007 16:05]
Carsten Schabacker
After updating to Linux Kernel 2.6.19.2 (same mysql, same sysbench, same machine) everything works fine. Sysbench and MySQL is running very well.
[16 Jan 2007 8:44]
Carsten Schabacker
no more problems