MySQL Bugs: #16333: Assertion failure in lock.cc in the sysbench test

Bug #16333	Assertion failure in lock.cc in the sysbench test
Submitted:	10 Jan 2006 11:17	Modified:	16 Feb 2006 9:32
Reporter:	Ingo Strüwing	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S1 (Critical)
Version:	5.0.19	OS:	Linux (Linux SMP)
Assigned to:	Magnus Blåudd	CPU Architecture:	Any

Description:
Osku Salerma <osku.salerma@oracle.com> writes:

I can reproduce the problem reliably running sysbench as follows:

./sysbench --test=oltp --mysql-table-type=innodb --oltp-table-size=1000000 --mysql-db=test prepare

./sysbench --num-threads=8 --max-requests=100000 --test=oltp --mysql-table-type=innodb --oltp-table-size=1000000 --mysql-db=test run

It crashes ~18 minutes into the run. The machine has 2 x Xeon 3.2 GHz
(hyper-threaded, 4 logical CPUs) CPUs and 6 GB of RAM. However, I can not
reproduce the crash using a large buffer pool and log file size, only with
the default settings (only non-standard settings are innodb_file_per_table
and skip-networking).

How to repeat:
The following makes it easier to repeat:

Change the source so: add the assertion
  safe_mutex_assert_not_owner(&LOCK_open);
at the end of sql/sql_parse.cc:dispatch_command() right after the line
  thd->proc_info="cleaning up";

And add the same assertion at the beginning of 
sql/sql_parse.cc:mysql_execute_command() right after
  DBUG_ENTER("mysql_execute_command");

The result should be:

Osku: "I added both of the above and now it crashes very soon after starting sysbench."

Suggested fix:
In mysys/thr_mutex.c:safe_mutex_unlock() add
  mp->thread= 0;
at the end of the function right before
  pthread_mutex_unlock(&mp->global);

The result should be:

Osku: "I've now run the sysbench tests twice with the "mp->thread = 0" fix
applied, once with 8 threads and once with 16, and both went fine."

Background:

In mysys/thr_mutex.c:safe_mutex_unlock() the 'thread' element is not
cleared. In mysys/thr_mutex.c:safe_mutex_lock() 'count' is incremented
before 'thread' is set. So there is a short moment where it seems like
the old thread has the mutex. The macro 'safe_mutex_assert_not_owner()'
is vulnerable to this moment of wrong mutex assignment.

Another possible fix would be to make the macro into a function
and grab mp->global while doing the check. But I think it is less efficient
then the first suggestion, which seems to be more correct anyway.

BTW, this is an issue in -debug servers only (safe_mutex).

Increased priority, severity and showstopper state since noone cared for verification of this bug since a full month.

Osku Salerma <osku.salerma@oracle.com> writes:

I thought you said you were going to take care of this? This is important since new 5.1 versions of InnoDB will not work reliably without the fix.

Verified just as described (with code patches added) on quadxeon:

vkravchuk@quadxeon sysbench $ bin/sysbench --test=oltp --mysql-socket=/tmp/vk.sock --mysql-table-engine=innodb --num-threads=8 --max-requests=100000 --oltp-table-size=1000000 --mysql-db=test run
sysbench v0.4.4-sp:  multi-threaded system evaluation benchmark

No DB drivers specified, using mysql
WARNING: Preparing of "BEGIN" is unsupported, using emulation
(last message repeated 7 times)
Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Using Special distribution (12 iterations,  1 pct of values are returned in 75 p
ct cases)
Using "BEGIN" for starting transactions
Maximum number of requests for OLTP test is limited to 100000
Threads started!
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
FATAL: database error, exiting...
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
FATAL: database error, exiting...
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
ALERT: failed to execute MySQL query: `BEGIN`:
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
FATAL: database error, exiting...
(last message repeated 1 times)
ALERT: Error 2013 Lost connection to MySQL server during query
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
FATAL: database error, exiting...
(last message repeated 1 times)
ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL
server during query
FATAL: database error, exiting...
(last message repeated 2 times)
Done.

vkravchuk@quadxeon sysbench $
Number of processes running now: 0
060212 16:03:05  mysqld restarted

vkravchuk@quadxeon sysbench $ cd ~/dbs/5.0-quad/
vkravchuk@quadxeon 5.0-quad $ tail -50 var/quadxeon.mysql.com.err
Cannot determine thread, fp=0xb4c30548, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x815756c
0xb7569d66
0xb7329e01
0xb756758b
0xb7329ba4
0xb732b10b
0xb7322e99
0x816bc57
0x816aa5b
0x8169d66
0xb7564e21
0xb73df08a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow
instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8d68ca0 = COMMIT
thd->thread_id=8
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
060212 16:03:05  mysqld restarted
060212 16:03:07  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
060212 16:03:07  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 529348492.
InnoDB: Doing recovery: scanned up to log sequence number 0 530020807
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 5 row operations to undo
InnoDB: Trx id counter is 0 3072
060212 16:03:07  InnoDB: Starting an apply batch of log records to the database.
..
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 7
3 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 0, file name
InnoDB: Starting in background the rollback of uncommitted transactions
060212 16:03:10  InnoDB: Rolling back trx with id 0 2730, 5 rows to undo
060212 16:03:10  InnoDB: Started; log sequence number 0 530020807

InnoDB: Rolling back of trx id 0 2730 completed
060212 16:03:10  InnoDB: Rollback of non-prepared transactions completed
060212 16:03:10 [Note] /users/vkravchuk/dbs/5.0-quad/libexec/mysqld: ready for connections.
Version: '5.0.18-debug'  socket: '/tmp/vk.sock'  port: 0  Source distribution

So, yes, there is a crash. Works OK on 5.0.19 on a single-processor system.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/2581

Pushed to 5.0.19 and 5.1.7