| Bug #49603 | Assertion in mi_create on TRUNCATE + BACKUP/RESTORE | ||
|---|---|---|---|
| Submitted: | 10 Dec 2009 20:29 | Modified: | 2 Mar 2010 1:36 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Backup | Severity: | S2 (Serious) |
| Version: | 6.0-backup | OS: | Any |
| Assigned to: | Rafal Somla | CPU Architecture: | Any |
[10 Dec 2009 20:38]
Philip Stoev
c:\docs\mysql\bugs\bug49603.threads
Attachment: bug49603.threads.txt (text/plain), 39.22 KiB.
[10 Dec 2009 20:40]
Philip Stoev
core and binary: http://mysql-systemqa.s3.amazonaws.com/var-bug49603.zip code: revision-id: rafal.somla@sun.com-20091210124049-98jyu3nog0vmhbvg date: 2009-12-10 13:40:49 +0100 build-date: 2009-12-10 22:30:51 +0200 revno: 2907 branch-nick: mysql-6.0-backup The thread stacks have been uploaded separately. Note that the crash is on TRUNCATE and the other concurrently executing thread is a BACKUP. This is odd, because the log file lists the last operation before the crash as a RESTORE: 091210 17:33:22 [Note] Restore: Starting restore process 091210 17:33:22 [ERROR] Restore: Can't execute this command because another BACKUP/RESTORE operation is in progress 091210 17:33:22 [Warning] Restore: Operation aborted mysqld: mi_create.c:651: mi_create: Assertion `(options & 16) || !mi_log_tables_physical || !my_hash_search(mi_log_tables_physical, filename, strlen(filename))' failed.
[11 Dec 2009 8:22]
Rafal Somla
Looking at the threads I can see these non-idle ones: T32: BACKUP: polling myisam driver T35: myisam locking thread: open_and_lock_tables T1: TRUNCATE TABLE: creating table T2: FLUSH TABLE: general_log_write T3: ?: close_thread_tables T5: FLUSH TABLE: reload_acl_and_cache T10: DELETE FROM T14: CREATE TABLE: openning tables T16: CREATE TRIGGER: locking table T27: DEALLOCATE PREPARE st1 T36: LOCK TABLES What looks wrong is that there are some DDL statements running in parallel with BACKUP - they should be disabled by the Backup Metadata Lock (sql_parse.cc:2067). Is BML broken?
[11 Dec 2009 12:53]
Rafal Somla
REFINED PROBLEM DESCRIPTION --------------------------- In short: second BACKUP/RESTORE operation can remove BML lock which was established by another such operation which has not finished yet. Suppose that BACKUP operation is running. At the beginning it calls obs::bml_get() (see Backup_restore_ctx::prepare()) which sets BML_instance::xlock_set to thd value. When BML_instance::xlock_set is not NULL, then other DDLs are blocked as expected. Now imagine that another BACKUP/RESTORE operation is started. At the beginning, in Backup_restore_ctx::prepare() (line 670), it checks if another BACKUP/RESTORE operation is running. Since this is the case, preparations fail and operation aborts with error. But then Backup_restore_ctx destructor is called, which calls Backup_restore_ctx::close() which calls obs::bml_release(). The obs::bml_relese() function sets BML_instance::xlock_set to NULL, even though it was set by another thread which is holding exclusive BML lock. This is the problem. PROPOSED SOLUTION ----------------- Fix obs::bml_release so that it releases the BML only if called from the thread which has acquired it. This can be done by modifying Backup_sx_lock::release_exclusive_lock() (in bml.cc) so that it checks if xlock_set equals to the current thread before releasing it. Alternative solution: have a flag in Backup_restore_ctx which indicates if BML has been acquired or not. Release it only if it was acquired.
[15 Dec 2009 11:44]
Rafal Somla
Stealing this bug from Thava who will work on other ones.
[16 Dec 2009 14:49]
Rafal Somla
I can repeat the problem with the following test case:
--------------------------------------------------------------------------------
call mtr.add_suppression("Can't execute this command because another BACKUP/RESTORE operation is in progress");
--connect(bup1,localhost,root,,)
--connect(bup2,localhost,root,,)
--connection bup1
SET DEBUG_SYNC= 'after_backup_restore_prepare SIGNAL bml_taken WAIT_FOR bup_cont';
send BACKUP DATABASE test TO 'test.bkp';
--connection bup2
--echo # Waiting for BACKUP.
SET DEBUG_SYNC= 'now WAIT_FOR bml_taken';
--echo # Another BACKUP should remove the BML
--error 1651
BACKUP DATABASE test TO 'test.bkp';
--echo # Now run a DDL
CREATE DATABASE db1;
--echo # Check that BACKUP operation is in progress.
SHOW PROCESSLIST;
--echo # Check that db1 exists, i.e. DDL was not blocked.
SHOW DATABASES LIKE 'db%';
--echo # Signal BACKUP to finish.
SET DEBUG_SYNC= 'now SIGNAL bup_cont';
--connection bup1
reap;
--connection default
--echo # Cleanup
DROP DATABASE db1;
SET DEBUG_SYNC= 'reset';
--exit
--------------------------------------------------------------------------------
Note that DDL in connection bup2 gets executed despite the fact that BACKUP has taken a BML lock.
One question is why our tests do not catch this error.
[17 Dec 2009 15:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/94787 2918 Rafal Somla 2009-12-17 BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE Problem: A BML lock which has been activated by a BACKUP/RESTORE operation, can be deactivated by a concurrently issued BACKUP/RESTORE even when the latter detects that another BACKUP/RESTORE is running and refuses to work. Solution: Have the obs::bml_release() function accept a THD parameter and release the BML lock only if called by the owner of that lock. In fact, this is done on the level of Backup_sx_lock implementation. Backup_bml_block test is updated to catch the issue. @ mysql-test/suite/backup/t/backup_bml_block.test After trying to run concurrent BACKUP/RESTORE operation, issue a DDL statement and check that it is still blocked. Before this fix it will not be because BML lock is removed by concurrent BACKUP/RESTORE. @ sql/backup/data_backup.cc Pass thd to bcb_release() call. @ sql/backup/kernel.cc Pass thd to bml_release() call. @ sql/bml.cc In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the lock only if called by its owner. @ sql/bml.h Add THD parameter to release_exclusive_lock() method. @ sql/si_objects.cc Account for new THD parameter to release_exclusive_lock(thd). @ sql/si_objects.h Add THD parameter to X_release() functions.
[18 Dec 2009 11:49]
Jørgen Løland
Patch approved, pending changes requested on reply to commit email.
[19 Dec 2009 15:05]
Rafal Somla
Changing to "patch approved" since both reviewers have approved.
[19 Dec 2009 15:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95106 2922 Rafal Somla 2009-12-19 BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE Problem: A BML lock which has been activated by a BACKUP/RESTORE operation, can be deactivated by a concurrently issued BACKUP/RESTORE even when the latter detects that another BACKUP/RESTORE is running and refuses to work. Solution: Have the obs::bml_release() function accept a THD parameter and release the BML lock only if called by the owner of that lock. In fact, this is done on the level of Backup_sx_lock implementation. Backup_bml_block test is updated to catch the issue. @ mysql-test/suite/backup/t/backup_bml_block.test After trying to run concurrent BACKUP/RESTORE operation, issue a DDL statement and check that it is still blocked. Before this fix it will not be because BML lock is removed by concurrent BACKUP/RESTORE. @ sql/backup/data_backup.cc Pass thd to bcb_release() call. @ sql/backup/kernel.cc Pass thd to bml_release() call. @ sql/bml.cc In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the lock only if called by its owner. @ sql/bml.h Add THD parameter to release_exclusive_lock() method. @ sql/si_objects.cc Account for new THD parameter to release_exclusive_lock(thd). @ sql/si_objects.h Add THD parameter to {bcb,bml}_release() functions.
[20 Dec 2009 16:52]
Rafal Somla
Pushed to mysql-6.0-backup tree. revid:rafal.somla@sun.com-20091219152303-6b2wt5xca0d7bf00
[20 Feb 2010 9:18]
Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[2 Mar 2010 1:36]
Paul DuBois
Noted in 6.0.14 changelog. The backup metadata lock put in place by one BACKUP DATABASE or RESTORE operation could incorrectly be deactivated by another such operation issued concurrently.

Description: When executing RESTORE concurrently with DDL, mysqld asserted as follows: mysqld: mi_create.c:651: mi_create: Assertion `(options & 16) || !mi_log_tables_physical || !my_hash_search(mi_log_tables_physical, filename, strlen(filename))' failed. #6 0x000000315a42bec9 in __assert_fail () from /lib64/libc.so.6 #7 0x00000000009bc8a0 in mi_create (name=0x7f9e29950b40 "./testdb_N/t1_base7_N", keys=2, keydefs=0x1fc1898, columns=4, recinfo=0x1fc1718, uniques=0, uniquedefs=0x0, ci=0x7f9e29950d40, flags=0) at mi_create.c:649 #8 0x00000000009cf331 in ha_myisam::create (this=0x2381ba0, name=0x7f9e29951f80 "./testdb_N/t1_base7_N", table_arg=0x7f9e29950e90, ha_create_info=0x7f9e29952190) at ha_myisam.cc:1886 #9 0x00000000007d68ad in handler::ha_create (this=0x2381ba0, name=0x7f9e29951f80 "./testdb_N/t1_base7_N", form=0x7f9e29950e90, info=0x7f9e29952190) at handler.cc:3421 #10 0x00000000007db481 in ha_create_table (thd=0x7f9e242aa978, path=0x7f9e29951f80 "./testdb_N/t1_base7_N", db=0x24a2360 "testdb_N", table_name=0x24a2370 "t1_base7_N", create_info=0x7f9e29952190, update_create_info=true) at handler.cc:3628 #11 0x00000000007557f1 in mysql_truncate (thd=0x7f9e242aa978, table_list=0x24a23b8, dont_send_ok=false) at sql_delete.cc:1260 #12 0x0000000000688228 in mysql_execute_command (thd=0x7f9e242aa978) at sql_parse.cc:3373 #13 0x000000000068d71b in mysql_parse (thd=0x7f9e242aa978, inBuf=0x24a22b0 "TRUNCATE testdb_N . t1_base7_N", length=31, found_semicolon=0x7f9e29953f00) at sql_parse.cc:5985 #14 0x000000000068e2ac in dispatch_command (command=COM_QUERY, thd=0x7f9e242aa978, packet=0x7f9e24221489 "TRUNCATE testdb_N . t1_base7_N ", packet_length=32) at sql_parse.cc:1078 #15 0x000000000068f81e in do_command (thd=0x7f9e242aa978) at sql_parse.cc:760 #16 0x000000000067c644 in handle_one_connection (arg=0x7f9e242aa978) at sql_connect.cc:1164 #17 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #18 0x000000315a4e627d in clone () from /lib64/libc.so.6 644 /* 645 TRUNCATE TABLE does not work with physical logging. If we changed TRUNCATE 646 to always use mi_delete_all_rows() (remove HTON_CAN_RECREATE from MyISAM) 647 this would solve the problem. 648 */ 649 DBUG_ASSERT((options & HA_OPTION_TMP_TABLE) || !mi_log_tables_physical || 650 !my_hash_search(mi_log_tables_physical, filename, 651 strlen(filename))); 652 653 if ((file= my_create_with_symlink(linkname_ptr, filename, 0, create_mode, How to repeat: If this is repeatable, a test case will be provided. In the meantime, the core and the binary will be made available.