Bug #40892 maria: Livelock in sysbench OLTP_RW test
Submitted: 20 Nov 2008 16:10 Modified: 8 Jan 2009 10:38
Reporter: Alexey Stroganov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S1 (Critical)
Version:6.0.8-release OS:Any
Assigned to: Sergei Golubchik CPU Architecture:Any

[20 Nov 2008 16:10] Alexey Stroganov
Description:
The deadlock happened when running OLTP_RW sysbench test with 128 threads:

All threads are in 'Table lock' state:

 mysqladmin proc -uroot
+-----+------+-----------------+--------+---------+------+--------------+---------------------------------------------------
| Id  | User | Host            | db     | Command | Time | State        | Info
+-----+------+-----------------+--------+---------+------+--------------+---------------------------------------------------
| 5   | root | localhost:37264 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 6   | root | localhost:37265 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 7   | root | localhost:37266 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 8   | root | localhost:37267 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 9   | root | localhost:37268 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 10  | root | localhost:37269 | sbtest | Execute | 1448 | Table lock   | INSERT INTO sbtest values(?,0,' ','aaaaaaaaaafffff
| 11  | root | localhost:37270 | sbtest | Execute | 1448 | Table lock   | SELECT DISTINCT c from sbtest where id between ? a
...

Full output will be attached.

backtraces:

All threads are blocked on pthread_cond_wait and one thread on _lf_pinbox_real_free:
Thread 40 (process 18525):
#0  0x0000000000a90b87 in _lf_pinbox_real_free (pins=0x3435600) at lf_alloc-pin.c:392
#1  0x0000000000a90cf1 in _lf_pinbox_put_pins (pins=0x3435600) at lf_alloc-pin.c:224
#2  0x0000000000a110cd in trnman_end_trn (trn=0x40c19a30, commit=<value optimized out>) at ../../include/lf.h:175
#3  0x0000000000a7da07 in ma_commit (trn=0x11e67a8) at ma_commit.c:37
#4  0x0000000000a23fce in ha_maria::external_lock (this=0x2aab4d90c768, thd=0x1fd4230, lock_type=2) at ha_maria.cc:2273
#5  0x000000000073c1aa in handler::ha_external_lock (this=0x11e67a8, thd=0x0, lock_type=-1) at handler.cc:5298
#6  0x0000000000641f08 in mysql_unlock_tables (thd=0x1fd4230, sql_lock=0x2d5a9e0) at lock.cc:821
#7  0x0000000000634845 in select_send::send_eof (this=0x2ed9bd0) at sql_class.cc:1662
#8  0x00000000006b407b in do_select (join=0x1fdf9e8, fields=0x1fe1400, table=0x0, procedure=<value optimized out>) at sql_se
#9  0x00000000006c6e62 in JOIN::exec (this=0x1fdf9e8) at sql_select.cc:2829

Full output for all threads will be attached.

How to repeat:
Instructions and test case will be provided later.
[20 Nov 2008 16:12] Alexey Stroganov
Scacktraces  for issue

Attachment: deadlock.out.gz (application/gzip, text), 17.12 KiB.

[20 Nov 2008 16:13] Alexey Stroganov
Output of mysqladmin proc

Attachment: mysqladmin.proc.out.gz (application/gzip, text), 1.21 KiB.

[20 Nov 2008 16:17] Alexey Stroganov
Just observed the similar deadlock and with 256 threads. Again all threads stuck 
 in pthread_cond_wait and one thread in _lf_pinbox_real_free.
[25 Nov 2008 14:34] Guilhem Bichot
Looking at deadlock.out.gz they are all in wait_for_lock() of thr_lock, except two threads, of stack traces:

Thread 49 (process 18516):
#0  _lf_pinbox_real_free (pins=0x3435600) at lf_alloc-pin.c:380
#1  0x0000000000a8f8c8 in lf_hash_delete (hash=0x11e6780, pins=0x3435600, key=0x2aab4db4cf98, keylen=8) at lf_hash.c:226
#2  0x0000000000a1105e in trnman_end_trn (trn=<value optimized out>, commit=<value optimized out>) at trnman.c:465
#3  0x0000000000a7da07 in ma_commit (trn=0x40aee5e8) at ma_commit.c:37
#4  0x0000000000a23fce in ha_maria::external_lock (this=0x2aab4dd29738, thd=0x1f51fd0, lock_type=2) at ha_maria.cc:2273
#5  0x000000000073c1aa in handler::ha_external_lock (this=0x40aee5e8, thd=0x8, lock_type=1) at handler.cc:5298
#6  0x0000000000641f08 in mysql_unlock_tables (thd=0x1f51fd0, sql_lock=0x2aab4e7012a0) at lock.cc:821
#7  0x0000000000634845 in select_send::send_eof (this=0x2d92030) at sql_class.cc:1662
#8  0x00000000006b407b in do_select (join=0x1f5d788, fields=0x1f5f1a0, table=0x0, procedure=<value optimized out>) at sql_select.cc:13497
#9  0x00000000006c6e62 in JOIN::exec (this=0x1f5d788) at sql_select.cc:2829
#10 0x00000000006c2cce in mysql_select (thd=0x1f51fd0, rref_pointer_array=0x2d90790, tables=<value optimized out>, wild_num=<value optimized out>, fields=<value optimized out>, conds=0
x2d92620, og_num=1, order=0x2d91fd8, group=0x0, having=0x0, proc_param=0x0, select_options=1, result=0x2d92030, unit=0x2d90148, select_lex=0x2d905a8) at sql_select.cc:3019
#11 0x00000000006c875c in handle_select (thd=0x1f51fd0, lex=0x2d900a8, result=0x2d92030, setup_tables_done_option=0) at sql_select.cc:301
#12 0x0000000000656bb7 in execute_sqlcom_select (thd=0x1f51fd0, all_tables=0x2d91290) at sql_parse.cc:4646
#13 0x000000000065a9de in mysql_execute_command (thd=0x1f51fd0) at sql_parse.cc:2066
#14 0x00000000006d4029 in Prepared_statement::execute (this=0x2d8fd30, expanded_query=<value optimized out>, open_cursor=false) at sql_prepare.cc:3578
#15 0x00000000006d6ffc in Prepared_statement::execute_loop (this=0x2d8fd30, expanded_query=0x40af2470, open_cursor=false, packet=<value optimized out>, packet_end=<value optimized out>
) at sql_prepare.cc:3245
#16 0x00000000006d76a5 in mysql_stmt_execute (thd=0x1f51fd0, packet_arg=0x1f55441 "", packet_length=0) at sql_prepare.cc:2467
#17 0x000000000065febe in dispatch_command (command=COM_STMT_EXECUTE, thd=0x1f51fd0, packet=0x1f55441 "", packet_length=19) at sql_parse.cc:960
#18 0x0000000000652a72 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1156
#19 0x00002af9b058c193 in start_thread () from /lib64/libpthread.so.0
#20 0x00002af9b0e0445d in clone () from /lib64/libc.so.6
#21 0x0000000000000000 in ?? ()
Current language:  auto; currently c

============

Thread 40 (process 18525):
#0  0x0000000000a90b87 in _lf_pinbox_real_free (pins=0x3435600) at lf_alloc-pin.c:392
#1  0x0000000000a90cf1 in _lf_pinbox_put_pins (pins=0x3435600) at lf_alloc-pin.c:224
#2  0x0000000000a110cd in trnman_end_trn (trn=0x40c19a30, commit=<value optimized out>) at ../../include/lf.h:175
#3  0x0000000000a7da07 in ma_commit (trn=0x11e67a8) at ma_commit.c:37
#4  0x0000000000a23fce in ha_maria::external_lock (this=0x2aab4d90c768, thd=0x1fd4230, lock_type=2) at ha_maria.cc:2273
#5  0x000000000073c1aa in handler::ha_external_lock (this=0x11e67a8, thd=0x0, lock_type=-1) at handler.cc:5298
#6  0x0000000000641f08 in mysql_unlock_tables (thd=0x1fd4230, sql_lock=0x2d5a9e0) at lock.cc:821
#7  0x0000000000634845 in select_send::send_eof (this=0x2ed9bd0) at sql_class.cc:1662
#8  0x00000000006b407b in do_select (join=0x1fdf9e8, fields=0x1fe1400, table=0x0, procedure=<value optimized out>) at sql_select.cc:13497
#9  0x00000000006c6e62 in JOIN::exec (this=0x1fdf9e8) at sql_select.cc:2829
#10 0x00000000006c2cce in mysql_select (thd=0x1fd4230, rref_pointer_array=0x2ed8330, tables=<value optimized out>, wild_num=<value optimized out>, fields=<value optimized out>, conds=0
x2eda1c0, og_num=1, order=0x2ed9b78, group=0x0, having=0x0, proc_param=0x0, select_options=1, result=0x2ed9bd0, unit=0x2ed7ce8, select_lex=0x2ed8148) at sql_select.cc:3019
#11 0x00000000006c875c in handle_select (thd=0x1fd4230, lex=0x2ed7c48, result=0x2ed9bd0, setup_tables_done_option=0) at sql_select.cc:301
#12 0x0000000000656bb7 in execute_sqlcom_select (thd=0x1fd4230, all_tables=0x2ed8e30) at sql_parse.cc:4646
#13 0x000000000065a9de in mysql_execute_command (thd=0x1fd4230) at sql_parse.cc:2066
#14 0x00000000006d4029 in Prepared_statement::execute (this=0x2ed78d0, expanded_query=<value optimized out>, open_cursor=false) at sql_prepare.cc:3578
#15 0x00000000006d6ffc in Prepared_statement::execute_loop (this=0x2ed78d0, expanded_query=0x40c1b470, open_cursor=false, packet=<value optimized out>, packet_end=<value optimized out>
) at sql_prepare.cc:3245
#16 0x00000000006d76a5 in mysql_stmt_execute (thd=0x1fd4230, packet_arg=0x1fd76a1 "", packet_length=1303507072) at sql_prepare.cc:2467
#17 0x000000000065febe in dispatch_command (command=COM_STMT_EXECUTE, thd=0x1fd4230, packet=0x1fd76a1 "", packet_length=19) at sql_parse.cc:960
#18 0x0000000000652a72 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1156
#19 0x00002af9b058c193 in start_thread () from /lib64/libpthread.so.0
#20 0x00002af9b0e0445d in clone () from /lib64/libc.so.6
#21 0x0000000000000000 in ?? ()

and which are probably:

| 86  | root | localhost:37345 | sbtest | Execute | 1448 | Sending data | SELECT DISTINCT c from sbtest where id between ? and ? order by c    
| 95  | root | localhost:37354 | sbtest | Execute | 1448 | Sending data | SELECT DISTINCT c from sbtest where id between ? and ? order by c  

The wait_for_lock threads are SELECT/INSERT/DELETE/UPDATE.
[26 Nov 2008 18:50] Guilhem Bichot
Given that the two threads which block the system are very probably the lf* ones, and the lines at which they are, in the stack traces, are instructions which are not mutex lock or timed waits, but instructions which take a short time (assignments, tests), it is a livelock
http://en.wikipedia.org/wiki/Deadlock#Livelock
not a deadlock. They probably loop checking some state which never changes.
[15 Dec 2008 10:08] Bugs System
Pushed into 6.0.9-alpha  (revid:serg@mysql.com-20081205105151-tqxe1q4g3wqz0mk7) (version source revid:guilhem@mysql.com-20081213204800-0nubni3t4ihn4hv9) (pib:5)
[8 Jan 2009 10:38] MC Brown
Test case failure. No documentation needed.