Bug #49603 | Assertion in mi_create on TRUNCATE + BACKUP/RESTORE | ||
---|---|---|---|
Submitted: | 10 Dec 2009 20:29 | Modified: | 2 Mar 2010 1:36 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Backup | Severity: | S2 (Serious) |
Version: | 6.0-backup | OS: | Any |
Assigned to: | Rafal Somla | CPU Architecture: | Any |
[10 Dec 2009 20:29]
Philip Stoev
[10 Dec 2009 20:38]
Philip Stoev
c:\docs\mysql\bugs\bug49603.threads
Attachment: bug49603.threads.txt (text/plain), 39.22 KiB.
[10 Dec 2009 20:40]
Philip Stoev
core and binary: http://mysql-systemqa.s3.amazonaws.com/var-bug49603.zip code: revision-id: rafal.somla@sun.com-20091210124049-98jyu3nog0vmhbvg date: 2009-12-10 13:40:49 +0100 build-date: 2009-12-10 22:30:51 +0200 revno: 2907 branch-nick: mysql-6.0-backup The thread stacks have been uploaded separately. Note that the crash is on TRUNCATE and the other concurrently executing thread is a BACKUP. This is odd, because the log file lists the last operation before the crash as a RESTORE: 091210 17:33:22 [Note] Restore: Starting restore process 091210 17:33:22 [ERROR] Restore: Can't execute this command because another BACKUP/RESTORE operation is in progress 091210 17:33:22 [Warning] Restore: Operation aborted mysqld: mi_create.c:651: mi_create: Assertion `(options & 16) || !mi_log_tables_physical || !my_hash_search(mi_log_tables_physical, filename, strlen(filename))' failed.
[11 Dec 2009 8:22]
Rafal Somla
Looking at the threads I can see these non-idle ones: T32: BACKUP: polling myisam driver T35: myisam locking thread: open_and_lock_tables T1: TRUNCATE TABLE: creating table T2: FLUSH TABLE: general_log_write T3: ?: close_thread_tables T5: FLUSH TABLE: reload_acl_and_cache T10: DELETE FROM T14: CREATE TABLE: openning tables T16: CREATE TRIGGER: locking table T27: DEALLOCATE PREPARE st1 T36: LOCK TABLES What looks wrong is that there are some DDL statements running in parallel with BACKUP - they should be disabled by the Backup Metadata Lock (sql_parse.cc:2067). Is BML broken?
[11 Dec 2009 12:53]
Rafal Somla
REFINED PROBLEM DESCRIPTION --------------------------- In short: second BACKUP/RESTORE operation can remove BML lock which was established by another such operation which has not finished yet. Suppose that BACKUP operation is running. At the beginning it calls obs::bml_get() (see Backup_restore_ctx::prepare()) which sets BML_instance::xlock_set to thd value. When BML_instance::xlock_set is not NULL, then other DDLs are blocked as expected. Now imagine that another BACKUP/RESTORE operation is started. At the beginning, in Backup_restore_ctx::prepare() (line 670), it checks if another BACKUP/RESTORE operation is running. Since this is the case, preparations fail and operation aborts with error. But then Backup_restore_ctx destructor is called, which calls Backup_restore_ctx::close() which calls obs::bml_release(). The obs::bml_relese() function sets BML_instance::xlock_set to NULL, even though it was set by another thread which is holding exclusive BML lock. This is the problem. PROPOSED SOLUTION ----------------- Fix obs::bml_release so that it releases the BML only if called from the thread which has acquired it. This can be done by modifying Backup_sx_lock::release_exclusive_lock() (in bml.cc) so that it checks if xlock_set equals to the current thread before releasing it. Alternative solution: have a flag in Backup_restore_ctx which indicates if BML has been acquired or not. Release it only if it was acquired.
[15 Dec 2009 11:44]
Rafal Somla
Stealing this bug from Thava who will work on other ones.
[16 Dec 2009 14:49]
Rafal Somla
I can repeat the problem with the following test case: -------------------------------------------------------------------------------- call mtr.add_suppression("Can't execute this command because another BACKUP/RESTORE operation is in progress"); --connect(bup1,localhost,root,,) --connect(bup2,localhost,root,,) --connection bup1 SET DEBUG_SYNC= 'after_backup_restore_prepare SIGNAL bml_taken WAIT_FOR bup_cont'; send BACKUP DATABASE test TO 'test.bkp'; --connection bup2 --echo # Waiting for BACKUP. SET DEBUG_SYNC= 'now WAIT_FOR bml_taken'; --echo # Another BACKUP should remove the BML --error 1651 BACKUP DATABASE test TO 'test.bkp'; --echo # Now run a DDL CREATE DATABASE db1; --echo # Check that BACKUP operation is in progress. SHOW PROCESSLIST; --echo # Check that db1 exists, i.e. DDL was not blocked. SHOW DATABASES LIKE 'db%'; --echo # Signal BACKUP to finish. SET DEBUG_SYNC= 'now SIGNAL bup_cont'; --connection bup1 reap; --connection default --echo # Cleanup DROP DATABASE db1; SET DEBUG_SYNC= 'reset'; --exit -------------------------------------------------------------------------------- Note that DDL in connection bup2 gets executed despite the fact that BACKUP has taken a BML lock. One question is why our tests do not catch this error.
[17 Dec 2009 15:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/94787 2918 Rafal Somla 2009-12-17 BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE Problem: A BML lock which has been activated by a BACKUP/RESTORE operation, can be deactivated by a concurrently issued BACKUP/RESTORE even when the latter detects that another BACKUP/RESTORE is running and refuses to work. Solution: Have the obs::bml_release() function accept a THD parameter and release the BML lock only if called by the owner of that lock. In fact, this is done on the level of Backup_sx_lock implementation. Backup_bml_block test is updated to catch the issue. @ mysql-test/suite/backup/t/backup_bml_block.test After trying to run concurrent BACKUP/RESTORE operation, issue a DDL statement and check that it is still blocked. Before this fix it will not be because BML lock is removed by concurrent BACKUP/RESTORE. @ sql/backup/data_backup.cc Pass thd to bcb_release() call. @ sql/backup/kernel.cc Pass thd to bml_release() call. @ sql/bml.cc In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the lock only if called by its owner. @ sql/bml.h Add THD parameter to release_exclusive_lock() method. @ sql/si_objects.cc Account for new THD parameter to release_exclusive_lock(thd). @ sql/si_objects.h Add THD parameter to X_release() functions.
[18 Dec 2009 11:49]
Jørgen Løland
Patch approved, pending changes requested on reply to commit email.
[19 Dec 2009 15:05]
Rafal Somla
Changing to "patch approved" since both reviewers have approved.
[19 Dec 2009 15:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95106 2922 Rafal Somla 2009-12-19 BUG#49603 - Assertion in mi_create on TRUNCATE + BACKUP/RESTORE Problem: A BML lock which has been activated by a BACKUP/RESTORE operation, can be deactivated by a concurrently issued BACKUP/RESTORE even when the latter detects that another BACKUP/RESTORE is running and refuses to work. Solution: Have the obs::bml_release() function accept a THD parameter and release the BML lock only if called by the owner of that lock. In fact, this is done on the level of Backup_sx_lock implementation. Backup_bml_block test is updated to catch the issue. @ mysql-test/suite/backup/t/backup_bml_block.test After trying to run concurrent BACKUP/RESTORE operation, issue a DDL statement and check that it is still blocked. Before this fix it will not be because BML lock is removed by concurrent BACKUP/RESTORE. @ sql/backup/data_backup.cc Pass thd to bcb_release() call. @ sql/backup/kernel.cc Pass thd to bml_release() call. @ sql/bml.cc In Backup_sx_lock::release_exclusive_lock() add THD parameter and release the lock only if called by its owner. @ sql/bml.h Add THD parameter to release_exclusive_lock() method. @ sql/si_objects.cc Account for new THD parameter to release_exclusive_lock(thd). @ sql/si_objects.h Add THD parameter to {bcb,bml}_release() functions.
[20 Dec 2009 16:52]
Rafal Somla
Pushed to mysql-6.0-backup tree. revid:rafal.somla@sun.com-20091219152303-6b2wt5xca0d7bf00
[20 Feb 2010 9:18]
Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[2 Mar 2010 1:36]
Paul DuBois
Noted in 6.0.14 changelog. The backup metadata lock put in place by one BACKUP DATABASE or RESTORE operation could incorrectly be deactivated by another such operation issued concurrently.