Bug #39813 Backup: possible deadlock when default and myisam drivers are used
Submitted: 2 Oct 2008 14:29 Modified: 24 Nov 2008 13:01
Reporter: Rafal Somla Email Updates:
Status: Can't repeat Impact on me:
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0-backup OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any

[2 Oct 2008 14:29] Rafal Somla
SYNOPIS: It is possible to get a deadlock if BACKUP is run in parallel with a DML and both the default and the native myisam backup drivers are used.

Note: This is similar to BUG#39602 but the situation described here is not related to commit blocker - the deadlock happens before commit blocker is activated.

There are two backup drivers which take exclusive locks on tables on which they operate: the default driver and the native MyISAM driver.

The locking is done in the prepare phase of a backup operation which starts with calling prelock() methods of the drivers. In prelock() both drivers spawn a separate locking threads which execute lock_tables(). Then kernel is waiting for both drivers to be ready for the lock call. The drivers wait for the locking threads to acquire the locks and when it happens they inform the kernel that they are ready and backup operation can continue.

Because the order in which both drivers acquire their locks is non-deterministic, a deadlock is possible. Consider this situation where native myisam driver is handling table t1 and default driver is handling table t2.

 T0 = thread executing BACKUP command
     T1 = locking thread for default driver
         T2 = locking thread for myisam driver
             T3 = other connection thread

 start BACKUP command 
 finish initial phase
 enter synchronization phase
 call prelock() for default driver - this spawns T1
 call prelock() for myisam driver - this spawns T2
 now T0 waits for both drivers to be ready for lock()

     get exclusive lock on t2

             start DML which updates both t1 and t2
             get exclusive lock on t1
             try to get exclusive lock on t2
             HANGS waiting for t2 lock (taken by T1)

          try to get exclusive lock on t1
          HANGS waiting for t1 lock (taken by T3)

 LOOPS waiting for myisam driver to be ready for lock()

Thus the backup thread will be waiting for the myisam driver, which will be waiting for t1 lock, taken by T3 which is waiting for t2 lock. The t2 lock is taken by the default driver who is waiting for the backup thread to continue the operation = DEADLOCK.

How to repeat:
Very difficult to force the above interleaving of the threads involved. So right now it is based only on a theoretical analysis .

Suggested fix:
Do not lock tables inside backup drivers. Instead, move locking to the backup kernel. The kernel will lock tables for both the default and myisam driver at the same time. However, after that we would have to first unlock myisam tables and later, separately unlock the default driver tables. I don't know if such partial unlocking is supported by the server.

Note: Since locking for the myisam driver must be done after the initial phase, in the above solution tables of the default driver will be also locked at the same time. Then we must assume that the default driver starts working only after the initial phase has been finished. It is the case in the current code, but it is not enforced by the protocol.
[21 Nov 2008 10:23] Jørgen Løland
Since we temporarily solved the deadlock problem described in BUG#39602 by moving the commit blocker before drv->prepare, this deadlock can no longer happen. 

This deadlock problem will be reintroduced when WL#4610 has been solved, but that WL is currently scheduled for 6.x. Commented on in the WL.
[24 Nov 2008 13:01] Jørgen Løland
Closing as "Can't repeat" since the deadlock cannot occur with the current commit blocker. The refined commit blocker worklog (see above) tracks this issue as well.