Bug #55223 assert in Protocol::end_statement during CREATE DATABASE
Submitted: 13 Jul 2010 17:00 Modified: 12 Aug 2010 19:58
Reporter: Matthias Leich Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.5.6-m3-debug-log OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Tags: assert
Triage: Triaged: D1 (Critical)

[13 Jul 2010 17:00] Matthias Leich
Description:
The assert happens in protocol.cc:540 :
void Protocol::end_statement()
{
...
  default:
    DBUG_ASSERT(0);   <-------
    error= send_ok(thd->server_status, 0, 0, 0, NULL);
    break;

RQG test with concurrent DDL/DML workload
- 32 threads
- current heavy simplified conf/runtime/WL5004_* grammars

Result on 5.5.6-m3-debug-log
(mysql-trunk-runtime revno: 3078 2010-07-13)
--------------------------------------------
#0  0x000000382860bd02 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000b56b5d in my_write_core (sig=6) at stacktrace.c:326
#2  0x0000000000699e13 in handle_segfault (sig=6) at mysqld.cc:2789
#3  <signal handler called>
#4  0x0000003827a30265 in raise () from /lib64/libc.so.6
#5  0x0000003827a31d10 in abort () from /lib64/libc.so.6
#6  0x0000003827a296e6 in __assert_fail () from /lib64/libc.so.6
#7  0x000000000068a06e in Protocol::end_statement (this=0xce23d10) at protocol.cc:540
#8  0x00000000006b549e in dispatch_command (command=COM_QUERY, thd=0xce23860, packet=0xce26561 "CREATE DATABASE testdb_N CHARACTER SET = utf8", packet_length=45) at sql_parse.cc:1507
#9  0x00000000006b581c in do_command (thd=0xce23860) at sql_parse.cc:807
#10 0x00000000006a3e25 in do_handle_one_connection (thd_arg=0xce23860) at sql_connect.cc:1195
#11 0x00000000006a3eeb in handle_one_connection (arg=0xce23860) at sql_connect.cc:1134
#12 0x000000382860673d in start_thread () from /lib64/libpthread.so.0
#13 0x0000003827ad3d1d in clone () from /lib64/libc.so.6

How to repeat:
My command line:
----------------
nice -20 runall.pl \
--gendata=conf/runtime/WL5004_data.zz \
--duration=200 \
--queries=30000 \
--reporter=Deadlock,Backtrace,Shutdown \
--threads=32 \
--basedir=<mysql install directory> \
--mysqld=--lock-wait-timeout=1 \
--mysqld=--innodb-lock-wait-timeout=2 \
--mysqld=--log-output=none \
--mysqld=--secure-file-priv=/tmp/ \
--mysqld=--loose-innodb-lock-wait-timeout=2 \
--grammar=1.yy \
--vardir=<I recommend a directory on a tmpfs>
--seed=131

gendata and seed values are not important.

Content of 1.yy:
----------------
query:
        CREATE DATABASE testdb_N CHARACTER SET = utf8 |
        COMMIT ; SELECT MIN(id) INTO @kill_id FROM test . executors ; COMMIT ; KILL QUERY @kill_id;

query_init:
        CREATE TABLE IF NOT EXISTS test . executors (id BIGINT, PRIMARY KEY(id)) ENGINE = MEMORY ; INSERT HIGH_PRIORITY IGNORE INTO test.executors SET id = CONNECTION_ID() ; COMMIT;

You might need more than one attempt but the probability
was on my box > 1 / 3.
It is also unsure if all statements above are needed for
replaying the problem.
[14 Jul 2010 17:03] Jon Olav Hauglid
Simplified grammar:

query:
        CREATE DATABASE testdb_N |
        SELECT MIN(id) INTO @kill_id FROM executors; KILL QUERY @kill_id;

query_init:
        CREATE TABLE executors (id INT); INSERT INTO executors SET id = CONNECTION_ID();

For the connection doing CREATE DATABASE:
  (gdb) p thd->killed
  $1 = THD::KILL_QUERY
  (gdb) p thd->mysys_var->abort
  $2 = 1

So it looks like doing KILL QUERY on the connection doing CREATE DATABASE at the wrong moment causes this assert to be triggered. I've verified that it's  possible to trigger this assert for other statements in a similar way.
[20 Jul 2010 11:26] Jon Olav Hauglid
More details:

CREATE DATABASE is aborted by KILL QUERY while CREATE DATABASE is waiting on a metadata lock (introduced by the 5.5 patch for bug#54360).
CREATE DATABASE exits with mysys_var->abort=1 but without thd->killed set.

Since CREATE DATABASE fails, my_ok() won't be called, and since thd->killed
is not set, thd->send_kill_message() is not called either. This triggers the assert.

THD::awake used by KILL QUERY sets thd->killed before mysys_var->abort,
but this must have been reordered by the compiler.
[20 Jul 2010 13:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/113953

3081 Jon Olav Hauglid	2010-07-20
      Bug #55223 assert in Protocol::end_statement during CREATE DATABASE
      
      The problem was that a statement could cause an assert if it was aborted by
      KILL QUERY while it waited on a metadata lock. This assert checks that a
      statement either sends OK or an error to the client. If the bug was triggered
      on release builds, it caused OK to be sent to the client instead of
      ER_QUERY_INTERRUPTED.
      
      The root cause of the problem is that there are two separate ways to tell if a
      statement is killed: thd->killed and mysys_var->abort. KILL QUERY causes both
      to be set, thd->killed before mysys_var->abort. The latter was used to check
      if a wait for a metadata lock should be aborted. If waiting is aborted, no OK
      message is sent and thd->killed is checked to see if ER_QUERY_INTERRUPTED
      should be sent to the client. The assert could be triggered if the compiler
      reordered the KILL QUERY code so that mysys_var->abort was set before
      thd->killed, and the aborted statement reached the thd->killed check before
      thd->killed was set by KILL QUERY.
      
      This patch fixes the problem by changing the metadata lock waiting code to
      check thd->killed.
      
      No test case added as reproducing the assert is dependent on very exact timing
      of two (or more) threads and compiler reording of instructions. The patch has
      been checked using RQG and the grammar posted on the bug report.
[21 Jul 2010 15:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114049

3081 Jon Olav Hauglid	2010-07-21
      Bug #55223 assert in Protocol::end_statement during CREATE DATABASE
      
      The problem was that a statement could cause an assert if it was aborted by
      KILL QUERY while it waited on a metadata lock. This assert checks that a
      statement either sends OK or an error to the client. If the bug was triggered
      on release builds, it caused OK to be sent to the client instead of
      ER_QUERY_INTERRUPTED.
      
      The root cause of the problem is that there are two separate ways to tell if a
      statement is killed: thd->killed and mysys_var->abort. KILL QUERY causes both
      to be set, thd->killed before mysys_var->abort. The latter was used to check
      if a wait for a metadata lock should be aborted. If waiting is aborted, no OK
      message is sent and thd->killed is checked to see if ER_QUERY_INTERRUPTED
      should be sent to the client. The assert could be triggered if the compiler
      reordered the KILL QUERY code so that mysys_var->abort was set before
      thd->killed, and the aborted statement reached the thd->killed check before
      thd->killed was set by KILL QUERY.
      
      This patch fixes the problem by changing the metadata lock waiting code to
      check thd->killed.
      
      No test case added as reproducing the assert is dependent on very exact timing
      of two (or more) threads and compiler reording of instructions. The patch has
      been checked using RQG and the grammar posted on the bug report.
[22 Jul 2010 8:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114113

3081 Jon Olav Hauglid	2010-07-22
      Bug #55223 assert in Protocol::end_statement during CREATE DATABASE
      
      The problem was that a statement could cause an assert if it was aborted by
      KILL QUERY while it waited on a metadata lock. This assert checks that a
      statement either sends OK or an error to the client. If the bug was triggered
      on release builds, it caused OK to be sent to the client instead of
      ER_QUERY_INTERRUPTED.
      
      The root cause of the problem was that there are two separate ways to tell if a
      statement is killed: thd->killed and mysys_var->abort. KILL QUERY causes both
      to be set, thd->killed before mysys_var->abort. Also, both values are reset
      at the end of statement execution. This means that it is possible for
      KILL QUERY to first set thd->killed, then have the killed statement reset
      both thd->killed and mysys_var->abort and finally have KILL QUERY set
      mysys_var->abort. This means that the connection with the killed statement
      will start executing the next statement with the two values out of sync - i.e.
      thd->killed not set but mysys_var->abort set.
      
      Since mysys_var->abort is used to check if a wait for a metadata lock should
      be aborted, the next statement would immediately abort any such waiting.
      When waiting is aborted, no OK message is sent and thd->killed is checked to
      see if ER_QUERY_INTERRUPTED should be sent to the client. But since
      the->killed had been reset, neither OK nor an error message was sent to the
      client. This then triggered the assert.
      
      This patch fixes the problem by changing the metadata lock waiting code to
      check thd->killed.
      
      No test case added as reproducing the assert is dependent on very exact timing
      of two (or more) threads. The patch has been checked using RQG and the grammar
      posted on the bug report.
[22 Jul 2010 8:36] Jon Olav Hauglid
Pushed to mysql-trunk-runtime (Ver 5.5.6).
[4 Aug 2010 7:53] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100731131027-1n61gseejyxsqk5d) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 8:04] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 8:20] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (merge vers: 5.1.49) (pib:18)
[4 Aug 2010 9:01] Bugs System
Pushed into mysql-next-mr (revid:alik@ibmvm-20100804081630-ntapn8bf9pko9vj3) (version source revid:marko.makela@oracle.com-20100621094008-o9fa153s3f09merw) (pib:20)
[12 Aug 2010 19:58] Paul Dubois
Noted in 5.5.6 changelog.

A statement that was aborted by KILL QUERY while it waited on a
metadata lock could raise an assertion in debug builds, or send OK to
the client instead of ER_QUERY_INTERRUPTED in regular builds.