Bug #97352 MySQL hangs with DROP UNDO TABLESPACE and ALTER INSTANCE ROTATE MASTER KEY
Submitted: 24 Oct 2019 6:18 Modified: 13 Dec 2019 19:34
Reporter: Satya Bodapati (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:8.0.17 OS:Any
Assigned to: CPU Architecture:Any

[24 Oct 2019 6:18] Satya Bodapati
Description:
There is deadlock between ALTER INSTANCE ROTATE INNODB MASTER KEY and DROP UNDO TABLESPACE undo_001.

Thread 55: Holds master_key_id mutex and waits for MDL on undo_001
Thread 37: Holds MDL on undo_001 and waits for master_key_id mutex

Now we get nice Deadlock! (hang)

How to repeat:
check mtr testcase attached

Suggested fix:
Apart from fixing this bug, there should be a way to detect circular dependencies (deadlock graph) across server and innodb mutexes latches.

THis is the first case I saw a deadlock involving MDLs (Server objects) and InnoDB Mutexes (SE)
[24 Oct 2019 6:20] Satya Bodapati
stack trace

Attachment: stack_trace.txt (text/plain), 9.20 KiB.

[24 Oct 2019 6:21] Satya Bodapati
mtr test

Attachment: satya.test (application/octet-stream, text), 714 bytes.

[24 Oct 2019 6:22] Satya Bodapati
mtr testfile

Attachment: satya-master.opt (application/octet-stream, text), 151 bytes.

[24 Oct 2019 6:26] Satya Bodapati
debug_sync points

Attachment: debug_sync_undo.diff (text/x-patch), 1.15 KiB.

[24 Oct 2019 6:27] Satya Bodapati
Please apply patch, use debug build (for debug_sync) and then run mtr testcase
[24 Oct 2019 8:55] MySQL Verification Team
Hello Satya,

Thank you for the report and test case.

regards,
Umesh
[4 Nov 2019 10:11] Marc Alff
About this comment in the bug report:

"
Apart from fixing this bug, there should be a way to detect circular dependencies (deadlock graph) across server and innodb mutexes latches.
"

Agreed and already done.
Please check the following doxygen documentation:
https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_LOCK_ORDER.html
[4 Nov 2019 11:31] Satya Bodapati
@marc alff

Thats great to hear!

By running this test with "Lock Order mode', does it detect the latch order violation?
[13 Dec 2019 19:34] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.18 release, and here's the changelog entry:

An exclusive backup lock is now used to block tablespace truncate
operations during master key rotation. Previously, metadata locks on undo
tablespace names were used to synchronize the operations. This patch also
addresses a deadlock that could occur between master key rotation and drop
undo tablespace operations.