Bug #49603 Assertion in mi_create on TRUNCATE + BACKUP/RESTORE
Submitted: 10 Dec 2009 20:29 Modified: 2 Mar 2010 1:36
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:6.0-backup OS:Any
Assigned to: Rafal Somla CPU Architecture:Any

[10 Dec 2009 20:29] Philip Stoev
Description:
When executing RESTORE concurrently with DDL, mysqld asserted as follows:

mysqld: mi_create.c:651: mi_create: Assertion `(options & 16) || !mi_log_tables_physical || !my_hash_search(mi_log_tables_physical, filename, strlen(filename))' failed.

#6  0x000000315a42bec9 in __assert_fail () from /lib64/libc.so.6
#7  0x00000000009bc8a0 in mi_create (name=0x7f9e29950b40 "./testdb_N/t1_base7_N", keys=2, keydefs=0x1fc1898, columns=4, recinfo=0x1fc1718, uniques=0,
    uniquedefs=0x0, ci=0x7f9e29950d40, flags=0) at mi_create.c:649
#8  0x00000000009cf331 in ha_myisam::create (this=0x2381ba0, name=0x7f9e29951f80 "./testdb_N/t1_base7_N", table_arg=0x7f9e29950e90,
    ha_create_info=0x7f9e29952190) at ha_myisam.cc:1886
#9  0x00000000007d68ad in handler::ha_create (this=0x2381ba0, name=0x7f9e29951f80 "./testdb_N/t1_base7_N", form=0x7f9e29950e90, info=0x7f9e29952190)
    at handler.cc:3421
#10 0x00000000007db481 in ha_create_table (thd=0x7f9e242aa978, path=0x7f9e29951f80 "./testdb_N/t1_base7_N", db=0x24a2360 "testdb_N",
    table_name=0x24a2370 "t1_base7_N", create_info=0x7f9e29952190, update_create_info=true) at handler.cc:3628
#11 0x00000000007557f1 in mysql_truncate (thd=0x7f9e242aa978, table_list=0x24a23b8, dont_send_ok=false) at sql_delete.cc:1260
#12 0x0000000000688228 in mysql_execute_command (thd=0x7f9e242aa978) at sql_parse.cc:3373
#13 0x000000000068d71b in mysql_parse (thd=0x7f9e242aa978, inBuf=0x24a22b0 "TRUNCATE  testdb_N . t1_base7_N", length=31, found_semicolon=0x7f9e29953f00)
    at sql_parse.cc:5985
#14 0x000000000068e2ac in dispatch_command (command=COM_QUERY, thd=0x7f9e242aa978, packet=0x7f9e24221489 "TRUNCATE  testdb_N . t1_base7_N ",
    packet_length=32) at sql_parse.cc:1078
#15 0x000000000068f81e in do_command (thd=0x7f9e242aa978) at sql_parse.cc:760
#16 0x000000000067c644 in handle_one_connection (arg=0x7f9e242aa978) at sql_connect.cc:1164
#17 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#18 0x000000315a4e627d in clone () from /lib64/libc.so.6

644       /*
645         TRUNCATE TABLE does not work with physical logging. If we changed TRUNCATE
646         to always use mi_delete_all_rows() (remove HTON_CAN_RECREATE from MyISAM)
647         this would solve the problem.
648      */
649       DBUG_ASSERT((options & HA_OPTION_TMP_TABLE) || !mi_log_tables_physical ||
650                   !my_hash_search(mi_log_tables_physical, filename,
651                                   strlen(filename)));
652
653       if ((file= my_create_with_symlink(linkname_ptr, filename, 0, create_mode,

How to repeat:
If this is repeatable, a test case will be provided. In the meantime, the core and the binary will be made available.
[10 Dec 2009 20:38] Philip Stoev
c:\docs\mysql\bugs\bug49603.threads

Attachment: bug49603.threads.txt (text/plain), 39.22 KiB.

[10 Dec 2009 20:40] Philip Stoev
core and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug49603.zip

code:

revision-id: rafal.somla@sun.com-20091210124049-98jyu3nog0vmhbvg
date: 2009-12-10 13:40:49 +0100
build-date: 2009-12-10 22:30:51 +0200
revno: 2907
branch-nick: mysql-6.0-backup

The thread stacks have been uploaded separately. Note that the crash is on TRUNCATE and the other concurrently executing thread is a BACKUP. This is odd, because the log file lists the last operation before the crash as a RESTORE:

091210 17:33:22 [Note] Restore: Starting restore process
091210 17:33:22 [ERROR] Restore: Can't execute this command because another BACKUP/RESTORE operation is in progress
091210 17:33:22 [Warning] Restore: Operation aborted
mysqld: mi_create.c:651: mi_create: Assertion `(options & 16) || !mi_log_tables_physical || !my_hash_search(mi_log_tables_physical, filename, strlen(filename))' failed.
[11 Dec 2009 8:22] Rafal Somla
Looking at the threads I can see these non-idle ones:

T32: BACKUP: polling myisam driver
T35: myisam locking thread: open_and_lock_tables

T1:  TRUNCATE TABLE: creating table
T2:  FLUSH TABLE: general_log_write
T3:  ?: close_thread_tables
T5:  FLUSH TABLE: reload_acl_and_cache
T10: DELETE FROM
T14: CREATE TABLE: openning tables
T16: CREATE TRIGGER: locking table
T27: DEALLOCATE PREPARE st1
T36: LOCK TABLES

What looks wrong is that there are some DDL statements running in parallel with BACKUP - they should be disabled by the Backup Metadata Lock (sql_parse.cc:2067). Is BML broken?
[11 Dec 2009 12:53] Rafal Somla
REFINED PROBLEM DESCRIPTION
---------------------------
In short: second BACKUP/RESTORE operation can remove BML lock which was established by another such operation which has not finished yet.

Suppose that BACKUP operation is running. At the beginning it calls obs::bml_get() (see Backup_restore_ctx::prepare()) which sets BML_instance::xlock_set to thd value. When BML_instance::xlock_set is not NULL, then other DDLs are blocked as expected.

Now imagine that another BACKUP/RESTORE operation is started. At the beginning, in Backup_restore_ctx::prepare() (line 670), it checks if another BACKUP/RESTORE operation is running. Since this is the case, preparations fail and operation aborts with error. But then Backup_restore_ctx destructor is called, which calls Backup_restore_ctx::close() which calls obs::bml_release().

The obs::bml_relese() function sets BML_instance::xlock_set to NULL, even though it was set by another thread which is holding exclusive BML lock. This is the problem.

PROPOSED SOLUTION
-----------------
Fix obs::bml_release so that it releases the BML only if called from the thread which has acquired it. This can be done by modifying Backup_sx_lock::release_exclusive_lock() (in bml.cc) so that it checks if xlock_set equals to the current thread before releasing it.

Alternative solution: have a flag in Backup_restore_ctx which indicates if BML has been acquired or not. Release it only if it was acquired.
[15 Dec 2009 11:44] Rafal Somla
Stealing this bug from Thava who will work on other ones.
[16 Dec 2009 14:49] Rafal Somla
I can repeat the problem with the following test case:
--------------------------------------------------------------------------------
call mtr.add_suppression("Can't execute this command because another BACKUP/RESTORE operation is in progress");

--connect(bup1,localhost,root,,)
--connect(bup2,localhost,root,,)

  --connection bup1
  SET DEBUG_SYNC= 'after_backup_restore_prepare SIGNAL bml_taken WAIT_FOR bup_cont';
  send BACKUP DATABASE test TO 'test.bkp';

    --connection bup2
    --echo # Waiting for BACKUP.
    SET DEBUG_SYNC= 'now WAIT_FOR bml_taken';
    --echo # Another BACKUP should remove the BML
    --error 1651
    BACKUP DATABASE test TO 'test.bkp';

    --echo # Now run a DDL
    CREATE DATABASE db1;

    --echo # Check that BACKUP operation is in progress.
    SHOW PROCESSLIST;

    --echo # Check that db1 exists, i.e. DDL was not blocked.
    SHOW DATABASES LIKE 'db%';

    --echo # Signal BACKUP to finish.
    SET DEBUG_SYNC= 'now SIGNAL bup_cont';
  --connection bup1
  reap;

--connection default
--echo # Cleanup
DROP DATABASE db1;
SET DEBUG_SYNC= 'reset';
--exit
--------------------------------------------------------------------------------

Note that DDL in connection bup2 gets executed despite the fact that BACKUP has taken a BML lock.

One question is why our tests do not catch this error.
[17 Dec 2009 15:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94787

2918 Rafal Somla	2009-12-17
      BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE
      
      Problem: A BML lock which has been activated by a BACKUP/RESTORE operation,
      can be deactivated by a concurrently issued BACKUP/RESTORE even when 
      the latter detects that another BACKUP/RESTORE is running and refuses to work.
      
      Solution: Have the obs::bml_release() function accept a THD parameter and 
      release the BML lock only if called by the owner of that lock. In fact, 
      this is done on the level of Backup_sx_lock implementation.
      
      Backup_bml_block test is updated to catch the issue.
     @ mysql-test/suite/backup/t/backup_bml_block.test
        After trying to run concurrent BACKUP/RESTORE operation, issue a DDL 
        statement and check that it is still blocked. Before this fix it will not be 
        because BML lock is removed by concurrent BACKUP/RESTORE.
     @ sql/backup/data_backup.cc
        Pass thd to bcb_release() call.
     @ sql/backup/kernel.cc
        Pass thd to bml_release() call.
     @ sql/bml.cc
        In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the
        lock only if called by its owner.
     @ sql/bml.h
        Add THD parameter to release_exclusive_lock() method.
     @ sql/si_objects.cc
        Account for new THD parameter to release_exclusive_lock(thd).
     @ sql/si_objects.h
        Add THD parameter to X_release() functions.
[18 Dec 2009 11:49] Jørgen Løland
Patch approved, pending changes requested on reply to commit email.
[19 Dec 2009 15:05] Rafal Somla
Changing to "patch approved" since both reviewers have approved.
[19 Dec 2009 15:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95106

2922 Rafal Somla	2009-12-19
      BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE
      
      Problem: A BML lock which has been activated by a BACKUP/RESTORE operation,
      can be deactivated by a concurrently issued BACKUP/RESTORE even when 
      the latter detects that another BACKUP/RESTORE is running and refuses to work.
      
      Solution: Have the obs::bml_release() function accept a THD parameter and 
      release the BML lock only if called by the owner of that lock. In fact, 
      this is done on the level of Backup_sx_lock implementation.
      
      Backup_bml_block test is updated to catch the issue.
     @ mysql-test/suite/backup/t/backup_bml_block.test
        After trying to run concurrent BACKUP/RESTORE operation, issue a DDL 
        statement and check that it is still blocked. Before this fix it will not be 
        because BML lock is removed by concurrent BACKUP/RESTORE.
     @ sql/backup/data_backup.cc
        Pass thd to bcb_release() call.
     @ sql/backup/kernel.cc
        Pass thd to bml_release() call.
     @ sql/bml.cc
        In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the
        lock only if called by its owner.
     @ sql/bml.h
        Add THD parameter to release_exclusive_lock() method.
     @ sql/si_objects.cc
        Account for new THD parameter to release_exclusive_lock(thd).
     @ sql/si_objects.h
        Add THD parameter to {bcb,bml}_release() functions.
[20 Dec 2009 16:52] Rafal Somla
Pushed to mysql-6.0-backup tree.
revid:rafal.somla@sun.com-20091219152303-6b2wt5xca0d7bf00
[20 Feb 2010 9:18] Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[2 Mar 2010 1:36] Paul DuBois
Noted in 6.0.14 changelog.

The backup metadata lock put in place by one BACKUP DATABASE or
RESTORE operation could incorrectly be deactivated by another such
operation issued concurrently.