Bug #80436 Failed !lock->recursive assert in sync0rw.cc
Submitted: 19 Feb 2016 2:34 Modified: 11 Aug 2017 10:43
Reporter: Daniel Axtens (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.6 OS:Linux
Assigned to: CPU Architecture:Any

[19 Feb 2016 2:34] Daniel Axtens
Description:
On a heavily loaded PowerPC 64 bit system, a crash was observed with an assertion failing at storage/innobase/sync0rw.cc:552: ut_a(!lock->recursive);

How to repeat:
The crash is intermittently reproducible with sysbench's oltp test.

A backtrace of all threads is attached. We have access to a core dump if that's helpful.

Suggested fix:
This has been fixed in MariaDB in https://github.com/MariaDB/server/commit/e13459a11eb5938b54b88c7a1529491df6dd3b49 . A similar commit for MySQL is suggested.
[19 Feb 2016 2:39] Daniel Axtens
Backtrace of all threads

Attachment: thread-bt.txt (text/plain), 211.41 KiB.

[19 Feb 2016 2:42] Daniel Axtens
Here's a gdb session showing the immediate vicinity of the crash.

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/mysql/bin/mysqld...done.
[New LWP 26981]
...
[New LWP 26798]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `bin/mysqld --no-defaults --pid-file=/tmp/slave.pid --port 3320 --datadir=/dev/s'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff82f3fcb0 in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:61
(gdb) up
#1  0x0000000010645de0 in my_write_core (sig=<optimized out>) at /home/ubuntu/mysql-server/mysys/stacktrace.c:422
422	  pthread_kill(pthread_self(), sig);
(gdb) up
#2  0x00000000102f5f0c in handle_fatal_signal (sig=<optimized out>) at /home/ubuntu/mysql-server/sql/signal_handler.cc:230
230	    my_write_core(sig);
(gdb) up
#3  <signal handler called>
(gdb) up
#4  0x00003fff82b30a88 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
(gdb) up
#5  0x00003fff82b3686c in __GI_abort () at abort.c:89
(gdb) up
#6  0x000000001018dcb8 in rw_lock_x_lock_low (line=<optimized out>, file_name=<optimized out>, pass=<optimized out>, lock=<optimized out>) at /home/ubuntu/mysql-server/storage/innobase/sync/sync0rw.cc:552
552			ut_a(!lock->recursive);
(gdb) list
547	
548			/* lock->recursive also tells us if the writer_thread
549			field is stale or active. As we are going to write
550			our own thread id in that field it must be that the
551			current writer_thread value is not active. */
552			ut_a(!lock->recursive);
553	
554			/* Decrement occurred: we are writer or next-writer. */
555			rw_lock_set_writer_id_and_recursion_flag(
556				lock, pass ? FALSE : TRUE);
(gdb) up
#7  0x0000000010804ce0 in rw_lock_x_lock_low (line=<optimized out>, file_name=<optimized out>, pass=<optimized out>, lock=0x1003d10e808) at /home/ubuntu/mysql-server/storage/innobase/sync/sync0rw.cc:578
578					--lock->lock_word;
(gdb) list
573			    && os_thread_eq(lock->writer_thread, thread_id)) {
574				/* Relock */
575				if (lock->lock_word == 0) {
576					lock->lock_word -= X_LOCK_DECR;
577				} else {
578					--lock->lock_word;
579				}
580	
581			} else {
582				/* Another thread locked before us */
(gdb) print *lock
$1 = {lock_word = 0, waiters = 1, recursive = 1, writer_thread = 70356347187584, event = 0x1003d10e890, wait_ex_event = 0x1003d10e920, list = {prev = 0x1003d10fe38, next = 0x1003d0fc910}, pfs_psi = 0x0, count_os_wait = 2, 
  cfile_name = 0x10abf9c0 "/home/ubuntu/mysql-server/storage/innobase/btr/btr0sea.cc", last_s_file_name = 0x10abf9c0 "/home/ubuntu/mysql-server/storage/innobase/btr/btr0sea.cc", last_x_file_name = 0x10abf9c0 "/home/ubuntu/mysql-server/storage/innobase/btr/btr0sea.cc", 
  writer_is_wait_ex = 0, cline = 173, last_s_line = 907, last_x_line = 1654}
(gdb) quit
[19 Feb 2016 2:47] Daniel Axtens
Update category to Locking, not Logging.
[19 Sep 2016 3:06] Daniel Black
Patch against 5.7

Attachment: xx.patch (text/x-patch), 2.66 KiB.

[19 Sep 2016 3:08] Daniel Black
patch/bug also applies to 8.0
[15 Sep 2017 6:24] Daniel Black
Given https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/sync/sync0rw.cc#L94

So lock->recursive is the same as isvalid(lock->writer_thread), and the implement supports the comment, so why have it as a variable? Removing this and using writer_thread instead would fix this bug.

also:

lock->writer_thread is compared against the current thread (determined by a library or syscall) with one exception, storage/innobase/sync/sync0arr.cc, where it is part of the the output of SHOW ENGINE INNODB STATUS.

For a non-GA release, would it make sense to have rw_lock_x_lock_func_nowait_inline take a mtr_t* argument an use that for the lock->writer_thread (perhaps renamed) to indicating that we are checking a lock we own? This would possible be a first step in supporting parallel queries by reducing the reliance on os threads an indicator of self trampling.