Bug #96374 | binlog rotation deadlock when innodb concurrency limit setted | ||
---|---|---|---|
Submitted: | 30 Jul 2019 6:53 | Modified: | 31 May 2021 16:29 |
Reporter: | jia liu | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0.16 8.0.17 5.7.25 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | binlog rotaion, innodb-commit-concurrency, innodb-thread-concurrency |
[30 Jul 2019 6:53]
jia liu
[30 Jul 2019 6:54]
jia liu
gdb outputs
Attachment: gdb.txt (text/plain), 1.63 MiB.
[30 Jul 2019 6:54]
jia liu
show engine innodb status
Attachment: show engine innodb status.txt (text/plain), 70.15 KiB.
[30 Jul 2019 6:55]
jia liu
show processlist
Attachment: show porcesslist.txt (text/plain), 135.61 KiB.
[12 Aug 2019 5:15]
MySQL Verification Team
Hi, I'm running this for weeks and I'm not able to reproduce the problem. What environment are you using to reproduce this? thanks Bogdan
[13 Aug 2019 11:45]
jia liu
this is my comment, it is too long to post
Attachment: analyze.txt (text/plain), 24.03 KiB.
[14 Aug 2019 9:08]
jia liu
I am sorry for my misleading suspect, this bug is nothing to do with master-info-repository=TABLE, just with innodb-commit-concurrency\innodb-thread-concurrency.
[15 Aug 2019 3:33]
jia liu
With more investigation, we find it is more like to to trigger when update 0 rows.(insert ignore\delete 0 rows also possible, but not tested yet) And it also affects 5.7.25 but not only 5.7.25.In my suspect, it will affect all versions which gtid_executed is an innodb table. I still suggest to fix this problem by "bypass innodb concurrency limits for system internal operations", not to only patchs for update 0 rows.
[21 Aug 2019 13:58]
Sven Petai
We have also run into this deadlock twice in production (didn't have debug tools in place first time around to narrow it down). Our normal write load is several thousand commits per second 24/7 and it takes 3-4 months of that to trigger the deadlock so it definitely requires careful massaging of both mysqld config and the test script to trigger it reliably in a reasonable amount of time. To replicate the deadlock reliably on mysql-community-8.0.17 server the following non-default settings seem to be important: innodb_thread_concurrency = 4 # low value in order to increase probability of the deadlock max_binlog_size=2M # since deadlock involves binlog rotation we set binlog file size to a low value to increase rotation frequency binlog_format=mixed # haven't been able to reproduce it with the default ROW yet gtid_mode = ON # haven't been able to replicate it without GTID enforce_gtid_consistency = ON For the test client I used sysbench 1.1.0-174f3aa with the bundled oltp_update_non_index.lua test with a small change to ./src/lua/oltp_common.lua to throw in some updates that do not match any rows as Jia suggested. I'm not sure yet if these zero-updates are essential for triggering of the bug or just increase the probability but with some zero-updates thrown in it will hang in a couple of seconds whereas without these I haven't been able to replicate it in an hour. Replace get_id() function in sysbenches oltp_common.lua with something like this: local function get_id() local id = sysbench.rand.default(1, sysbench.opt.table_size) if (id % 10) == 0 then return id+10000000 else return id end end After that populate the test table: sysbench ./src/lua/oltp_update_non_index.lua --time=00 --threads=230 --mysql-user=sbtest --mysql-password=sbtest --mysql-db=test --tables=1 --table-size=1000000 prepare and run the test: sysbench ./src/lua/oltp_update_non_index.lua --time=00 --threads=230 --mysql-user=sbtest --mysql-password=sbtest --mysql-db=test --tables=1 --table-size=1000000 run You should see hang within a couple of seconds.
[21 Aug 2019 16:00]
MySQL Verification Team
Hi, Thanks for the VM, reproduced on the first try. All best Bogdan
[31 May 2021 16:29]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.35 release, and here's the proposed changelog entry from the documentation team: A binary log rotation deadlock occurred on a system using statement-based replication where there was high number of concurrent update operations and low innodb_thread_concurrency setting.