Bug #49399 | BACKUP/RESTORE thread fails to wake up on concurrent BACKUP/RESTORE | ||
---|---|---|---|
Submitted: | 3 Dec 2009 13:53 | Modified: | 30 Dec 2009 13:44 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Backup | Severity: | S2 (Serious) |
Version: | 6.0-backup | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[3 Dec 2009 13:53]
Philip Stoev
[3 Dec 2009 13:54]
Philip Stoev
grammar for bug 49399
Attachment: bug49399.yy (application/octet-stream, text), 69.74 KiB.
[3 Dec 2009 13:57]
Philip Stoev
To reproduce with the RQG $ perl runall.pl \ --grammar=conf/49399.yy \ --basedir=/build/bzr/mysql-6.0-backup \ --queries=100K \ --mysqld=--mysql-backup \ --gendata=conf/WL5004_data.zz \ --threads=10 Please disregard all output from the test itself. Shortly after takeoff, you will observe "Backup_sx_lock: waiting for release of all Shared locks" forming in the processlist. At that point, hit Ctrl+C on the RQG to terminate all connections except the offending one. The grammar contains a huge variety of DDL and DML and has not been simplified due to the inability of the simplification tool to handle single-connection hangs like this bug. Apologies for that and please let me know if a simplifed test case is indeed required to fix this bug.
[3 Dec 2009 14:15]
Rafal Somla
Possible problem (wild guess): bml_get() relies on the fact that each call to BML_instance->get_shared_lock() is matched with a call to BML_instance->release_shared_lock(). If a DDL which takes shared lock is interrupted and because of that does not release it, then we would get such hang. Suggested solution. Represent shared lock as an object instance whose destructor releases the lock automatically. Destructor is always called, regardless of the way statement was interrupted. The usage could look as follows: /* begining of DDL execution */ BML_ticket bml_ticket; if (!bml_ticker.is_valid()) { /* Opps, we could not obtain shared BML lock... */ } /* do other stuff */ When bml_ticket is created, the shared bml lock is taken. When it is destroyed, lock is released.
[3 Dec 2009 14:40]
Chuck Bell
May be related to BUG#49398.
[3 Dec 2009 15:35]
Rafal Somla
Idea for a test which could verify my hypothesis. Run DDL and stop it after taking the shared bml lock (using a sync point). Then run BACKUP and wait until it hangs on bml_get(). Then kill connection with DDL and try to reap BACKUP command. All this can be done from a test script. If it hangs we have a problem.
[10 Dec 2009 8:26]
Ritheesh Vedire
The Test as described in the Rafal's hypothesis. No hang was observed.
Attachment: bug49399.test (application/octet-stream, text), 1.16 KiB.
[10 Dec 2009 8:28]
Ritheesh Vedire
The result file. Backup got clearly executed even after a different thread,which acquired a shared lock is killed.
Attachment: bug49399.result (application/octet-stream, text), 549 bytes.
[14 Dec 2009 14:22]
Philip Stoev
To reproduce this bug, please apply the following diff to the RQG === modified file 'lib/GenTest/Executor/MySQL.pm' --- lib/GenTest/Executor/MySQL.pm 2009-12-04 11:40:28 +0000 +++ lib/GenTest/Executor/MySQL.pm 2009-12-14 14:20:45 +0000 @@ -399,7 +399,7 @@ ER_BACKUP_SEND_DATA1() => STATUS_BACKUP_FAILURE, ER_BACKUP_SEND_DATA2() => STATUS_BACKUP_FAILURE, - ER_BACKUP_PROGRESS_TABLES() => STATUS_BACKUP_FAILURE, +# ER_BACKUP_PROGRESS_TABLES() => STATUS_BACKUP_FAILURE, ER_BACKUP_RUNNING() => STATUS_SEMANTIC_ERROR, ER_CANT_CREATE_THREAD() => STATUS_ENVIRONMENT_FAILURE, This is going to make the ER_BACKUP_PROGRESS_TABLES error non-fatal, so that the test can continue and the situation described in the bug report can form. The RQG itself will not detect the problem, please manually issue SHOW PROCESSLIST in order to view the problematic BACKUP or RESTORE operation.
[30 Dec 2009 13:44]
Ritheesh Vedire
BUG#49603 fixes this too. So setting the bug as duplicate.