Bug #101860 | Deadlock happend under heavily write and delete workload | ||
---|---|---|---|
Submitted: | 3 Dec 2020 18:15 | Modified: | 8 Feb 2021 13:51 |
Reporter: | Zongzhi Chen (OCA) | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.6/5.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[3 Dec 2020 18:15]
Zongzhi Chen
[3 Dec 2020 18:16]
Zongzhi Chen
the origin pstack
Attachment: pstack.txt (text/plain), 150.16 KiB.
[3 Dec 2020 18:17]
Zongzhi Chen
pt-pmp pstack
Attachment: pt-pstack.txt (text/plain), 10.55 KiB.
[4 Dec 2020 13:13]
MySQL Verification Team
Hi Mr. zongzhi, Thank you for your bug report. I have analysed the stacktraces carefully and it seems to me that you are correct. However, I am not sure whether this bug can be fixed in versions that are older then 8.0. Can you please let me know whether you succeeded in observing this deadlock in 8.0, at all ??? Many thanks in advance.
[6 Dec 2020 18:44]
Zongzhi Chen
No, This Deadlock won't happen in MySQL 8.0. Since mtr_commit() won't call buf_pool_get_oldest_modification(). I thought we can move the logic ouside of scope of log_sys.mutex to avoid this deadlock in mysql 5.6/5.7 This is the code that I suggest diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index bcfba19d5b9..23110d646c7 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -963,11 +963,26 @@ mtr_t::Command::execute() log_flush_order_mutex_enter(); } + lsn_t lsn; + lsn = log->lsn; + lsn_t max_modified_age_sync = log->max_modified_age_sync; + lsn_t max_checkpoint_age_async = log->max_checkpoint_age_async; /* It is now safe to release the log mutex because the flush_order mutex will ensure that we are the first one to insert into the flush list. */ log_mutex_exit(); + // It's time to check whether need to make checkpoint of flush + oldest_lsn = buf_pool_get_oldest_modification(); + + if (!oldest_lsn + || lsn - oldest_lsn > max_modified_age_sync; + || checkpoint_age > max_checkpoint_age_async) { + + // need to change check_flush_or_checkpoint to atomic + log->check_flush_or_checkpoint = true; + } + m_impl->m_mtr->m_commit_lsn = m_end_lsn; release_blocks();
[8 Dec 2020 12:52]
MySQL Verification Team
Hi Mr. zongzhi, Our Development department has taken a look at the code and concluded that this is not a problem with versions older than 8.0, since that parallelism was not present in the previous versions. Not a bug.
[9 Dec 2020 7:36]
Zongzhi Chen
can you ask Development department check the analysis I published in this issue.. There is a deadlock in our case indeed..
[9 Dec 2020 14:00]
MySQL Verification Team
Hi Mr. zongzhi, Our Development department has analysed the entire report and did not find a problem with the locks that you are writing about.
[14 Dec 2020 0:15]
Zongzhi Chen
Hello, I still confuse about the stack.. Can your Development department explain why there is a deadlock in the stack that Then Dead lock happened, thread 74 hold the log_sys.mutex and wait for flush_list.mutex and thread 81 hold the flush_list.mutex and wait for log_sys.mutex This is really a dead lock..
[14 Dec 2020 13:59]
MySQL Verification Team
Hi, There is no parallelism in that code in the version 5.6 and 5.7. Hence, what we would like to see is a fully repetitive test case that leads to the deadlock that you describe ...
[6 Feb 2021 18:39]
Zongzhi Chen
It's my mistake.. The user use Percona version mysql, it's version number is 5.6.27-75.0 in upstream version, flushing the page won't hold the flush list mutex.
[8 Feb 2021 13:51]
MySQL Verification Team
Hi, Thank you for your feedback. Since it is not repeated with our server, we are setting this status to the correct value.