Description:
SYNOPIS: It is possible to get a deadlock if BACKUP is run in parallel with a DML and both the default and the native myisam backup drivers are used.
Note: This is similar to BUG#39602 but the situation described here is not related to commit blocker - the deadlock happens before commit blocker is activated.
There are two backup drivers which take exclusive locks on tables on which they operate: the default driver and the native MyISAM driver.
The locking is done in the prepare phase of a backup operation which starts with calling prelock() methods of the drivers. In prelock() both drivers spawn a separate locking threads which execute lock_tables(). Then kernel is waiting for both drivers to be ready for the lock call. The drivers wait for the locking threads to acquire the locks and when it happens they inform the kernel that they are ready and backup operation can continue.
Because the order in which both drivers acquire their locks is non-deterministic, a deadlock is possible. Consider this situation where native myisam driver is handling table t1 and default driver is handling table t2.
T0 = thread executing BACKUP command
T1 = locking thread for default driver
T2 = locking thread for myisam driver
T3 = other connection thread
start BACKUP command
finish initial phase
enter synchronization phase
call prelock() for default driver - this spawns T1
call prelock() for myisam driver - this spawns T2
now T0 waits for both drivers to be ready for lock()
get exclusive lock on t2
start DML which updates both t1 and t2
get exclusive lock on t1
try to get exclusive lock on t2
HANGS waiting for t2 lock (taken by T1)
try to get exclusive lock on t1
HANGS waiting for t1 lock (taken by T3)
LOOPS waiting for myisam driver to be ready for lock()
Thus the backup thread will be waiting for the myisam driver, which will be waiting for t1 lock, taken by T3 which is waiting for t2 lock. The t2 lock is taken by the default driver who is waiting for the backup thread to continue the operation = DEADLOCK.
How to repeat:
Very difficult to force the above interleaving of the threads involved. So right now it is based only on a theoretical analysis .
Suggested fix:
Do not lock tables inside backup drivers. Instead, move locking to the backup kernel. The kernel will lock tables for both the default and myisam driver at the same time. However, after that we would have to first unlock myisam tables and later, separately unlock the default driver tables. I don't know if such partial unlocking is supported by the server.
Note: Since locking for the myisam driver must be done after the initial phase, in the above solution tables of the default driver will be also locked at the same time. Then we must assume that the default driver starts working only after the initial phase has been finished. It is the case in the current code, but it is not enforced by the protocol.