Bug #38933 Falcon crash in Transaction::hasRecords during concurrent DDL
Submitted: 21 Aug 2008 9:56 Modified: 30 Sep 2008 14:13
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0 OS:Any
Assigned to: Vladislav Vaintroub CPU Architecture:Any

[21 Aug 2008 9:56] Philip Stoev
Description:
When executing the falcon_online_alter Random query generation test, mysqld crashed as follows:

#0  0x00be8402 in __kernel_vsyscall ()
#1  0x0011a067 in pthread_kill () from /lib/libpthread.so.0
#2  0x087d5ea2 in my_write_core (sig=11) at stacktrace.c:307
#3  0x082a8f68 in handle_segfault (sig=11) at mysqld.cc:2657
#4  <signal handler called>
#5  0x08541876 in Transaction::hasRecords (this=0xb7248078, table=0xb7464d40) at Transaction.cpp:942
#6  0x0854772e in TransactionManager::hasUncommittedRecords (this=0xb70e1e68, table=0xb7464d40, transaction=0x0) at TransactionManager.cpp:172
#7  0x08562bf7 in Database::hasUncommittedRecords (this=0xb72db640, table=0xb7464d40, transaction=0x0) at Database.cpp:2282
#8  0x08564b8d in Database::dropTable (this=0xb72db640, table=0xb7464d40, transaction=0x0) at Database.cpp:1401
#9  0x085f30fc in Nfs::Statement::dropTable (this=0xb7267be8, syntax=0xb72c60b8) at Statement.cpp:1144
#10 0x085f69a7 in Nfs::Statement::executeDDL (this=0xb7267be8) at Statement.cpp:832
#11 0x085f7954 in Nfs::Statement::execute (this=0xb7267be8, sqlString=0xa7f82524 "drop table TEST.\"#SQL-5DF6_7\"", isQuery=false) at Statement.cpp:2701
#12 0x085f79d0 in Nfs::Statement::execute (this=0xb7267be8, sqlString=0xa7f82524 "drop table TEST.\"#SQL-5DF6_7\"") at Statement.cpp:161
#13 0x08521486 in StorageDatabase::deleteTable (this=0xb70da130, storageConnection=0xb725ba50, tableShare=0xb7260fa0) at StorageDatabase.cpp:543
#14 0x0852a060 in StorageTableShare::deleteTable (this=0xb7260fa0, storageConnection=0xb725ba50) at StorageTableShare.cpp:178
#15 0x08527e84 in StorageTable::deleteTable (this=0xb44bece0) at StorageTable.cpp:88
#16 0x08514502 in StorageInterface::delete_table (this=0xa7260680, tableName=0xa722764c "./test/#sql-5df6_7") at ha_falcon.cpp:1004
#17 0x083cf00e in handler::ha_delete_table (this=0xa7260680, name=0xa722764c "./test/#sql-5df6_7") at handler.cc:3196
#18 0x082fdbe2 in rm_temporary_table (base=0xb2fd770, path=0xa722764c "./test/#sql-5df6_7", frm_only=false) at sql_base.cc:4635
#19 0x082ff8c9 in close_temporary (table=0xa7226d60, free_share=true, delete_table=true) at sql_base.cc:1968
#20 0x082ffa7a in close_temporary_table (thd=0xb8a0098, table=0xa7226d60, free_share=true, delete_table=true) at sql_base.cc:1940
#21 0x083f6047 in mysql_alter_table (thd=0xb8a0098, new_db=0xbc863c8 "test", new_name=0xbc861a0 "CCC", create_info=0xa7f83698, table_list=0xbc861c8,
    alter_info=0xa7f83b7c, order_num=0, order=0x0, ignore=false) at sql_table.cc:7129
#22 0x082bbd33 in mysql_execute_command (thd=0xb8a0098) at sql_parse.cc:2705
#23 0x082c330b in mysql_parse (thd=0xb8a0098,
    inBuf=0xbc85e60 "ALTER TABLE CCC ADD KEY i ( datetime_key , time_nokey ) , ADD UNIQUE p ( time_nokey , int_nokey )", length=97,
    found_semicolon=0xa7f84280) at sql_parse.cc:5841
#24 0x082c3d14 in dispatch_command (command=COM_QUERY, thd=0xb8a0098,
    packet=0xb9e4441 "ALTER TABLE CCC ADD KEY i ( datetime_key , time_nokey ) , ADD UNIQUE p ( time_nokey , int_nokey )", packet_length=97)
    at sql_parse.cc:1120
#25 0x082c4fd5 in do_command (thd=0xb8a0098) at sql_parse.cc:807
#26 0x082b1911 in handle_one_connection (arg=0xb8a0098) at sql_connect.cc:1153
#27 0x0011545b in start_thread () from /lib/libpthread.so.0
#28 0x0026fc4e in clone () from /lib/libc.so.6

937     {
938             // This lock is to avoid race with writeComplete
939             Sync sync(&syncIndexes, "Transaction::hasRecords");
940             sync.lock(Exclusive);
941             for (RecordVersion *rec = firstRecord; rec; rec = rec->nextInTrans)
942   >>>>>>>>>>>>      if (rec->format->table == table) <<<<<<<<<< HERE
943                             return true;
944
945             return false;
946     }

(gdb) print rec
$2 = (class RecordVersion *) 0xeeeeeeee

Note that the backtrace appears to show that this happens during the removal of a temporary table created while performing a non-online ALTER.

How to repeat:
This failure was observed while testing the new PushBuild. If it happens again, more information will be provided.
[21 Aug 2008 10:12] Vladislav Vaintroub
Philip, could you please publish the full stack trace of all threads?
thanks
[21 Aug 2008 10:31] Philip Stoev
Stack threads for bug 38933

Attachment: bug38933.stacks.txt (text/plain), 34.90 KiB.

[21 Aug 2008 13:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52206

2792 Vladislav Vaintroub	2008-08-21
      Bug #38933 - crash if rollback and concurrent DDL operations
      
      Problem : Transaction record list is modified and entries of
      the list are freed during Transaction::rollback.If there is 
      a parallel execution of DDL statement, DDL  will traverse the 
      list and can crash accessing freed memory
      
      Fix: protect record list on Transaction::rollback. The syncObject
      for this purpose has always been syncIndexes and it is used here
      once again. syncIndexes usage is possibly a hack. TODO: have 
      dedicated syncObject for protecting the list.
[22 Aug 2008 20:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52356

2798 Vladislav Vaintroub	2008-08-22
      Bug#38933 - Falcon crash in Transaction::hasRecords during concurrent DDL
      
      The problem is that Transaction::hasRecords traverses  transaction's record
      list while a concurrent operation (rollback, writeComplete,addRecord) may 
      concurrently modify it.
      
      This patch removed whacko-reuse of the Transaction::syncIndexes to protect
      the record list and introduces new lock Transaction::syncRecords for exactly 
      this purpose.
      
      (hopefully) all places where the list is read or modified are now protected 
      by this lock. When the list is freed, firstRecord  is set to NULL to prevent
      crash in parallel Transaction::hasRecords() can crash.
[25 Aug 2008 18:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52482

2798 Vladislav Vaintroub	2008-08-25
      Bug#38933 - Falcon crash in Transaction::hasRecords during concurrent DDL
            
      The problem is that Transaction::hasRecords traverses  transaction's record
      list while a concurrent operation (rollback, writeComplete,addRecord) may 
      concurrently modify it.
           
      This patch removed whacko-reuse of the Transaction::syncIndexes to protect
      the record list and introduces new lock Transaction::syncRecords for exactly 
      this purpose.
            
      Hopefully all places where the list is read or modified are now protected 
      by this lock,except Transaction destructor, where there should be nobody 
      accessing the list concurrently. When the list is freed, firstRecord  is 
      set to NULL to prevent crash in parallel Transaction::hasRecords().
[25 Aug 2008 18:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52484

2798 Vladislav Vaintroub	2008-08-25
      Bug#38933 - Falcon crash in Transaction::hasRecords during concurrent DDL
            
      The problem is that Transaction::hasRecords traverses  transaction's record
      list while a concurrent operation (rollback, writeComplete,addRecord) may 
      concurrently modify it.
           
      This patch removed whacko-reuse of the Transaction::syncIndexes to protect
      the record list and introduces new lock Transaction::syncRecords for exactly 
      this purpose.
            
      Hopefully all places where the list is read or modified are now protected 
      by this lock,except Transaction destructor, where there should be nobody 
      accessing the list concurrently. When the list is freed, firstRecord  is 
      set to NULL to prevent crash in parallel Transaction::hasRecords().
[28 Aug 2008 4:29] Bugs System
Pushed into 6.0.7-alpha  (revid:vvaintroub@mysql.com-20080825182459-owxl767xkqe4jkf9) (version source revid:vvaintroub@mysql.com-20080827094944-sh582y3m7duhzrpj) (pib:3)
[13 Sep 2008 21:58] Bugs System
Pushed into 6.0.6-alpha  (revid:vvaintroub@mysql.com-20080825182459-owxl767xkqe4jkf9) (version source revid:hakan@mysql.com-20080716105246-eg0utbybp122n2w9) (pib:3)
[30 Sep 2008 14:13] Jon Stephens
Documented as follows in the 6.0.6 changelog:

        Trying to execute a DDL statement while a transaction was being rolled
        back could cause the server to crash.