Bug #94881 | slave replication lock wait timeout because of supremum record | ||
---|---|---|---|
Submitted: | 3 Apr 2019 4:37 | Modified: | 11 Apr 2019 1:24 |
Reporter: | Zhenghu Wen (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.7.20 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | lock,supremum |
[3 Apr 2019 4:37]
Zhenghu Wen
[3 Apr 2019 13:38]
MySQL Verification Team
Hi, Thank you for your bug report. I truly do not see what is a bug here. You should know that INSERT command is setting insert intention and gap locks before it physically inserts the row. Hence, knowing that, the behaviour is expected. However, the difference in behaviour on whether the gap is within the same InnoDB page or on different page is not well documented. Hence, I recommend that this gets documented properly in our Reference Manual. Regarding getting lock wait timeouts, this should not happen if the transactions are well designed and do not wait for any processing, user input, backup procedures and similar behaviour. It can also happen when the server is overloaded. So, from that point of view, it is not a bug. We only need your opinion on whether it should be documented better in our Reference Manual.
[4 Apr 2019 1:16]
Zhenghu Wen
Hi Sinisa The problem is that It will break down replication. I think if two transcations could commit sucessfully in master, they should commit in slave too, while in this situation, they could not.
[4 Apr 2019 12:56]
MySQL Verification Team
Hi Wen, I already wrote that this behaviour, when it comes to the insert intention and gap locks working on the same or different pages, is a behaviour by design. Also, it is impossible to use replication to order slave to put next rows(s) in a new page. That design does NOT break your replication. What breaks your replication is that lock wait timeouts on the slave. I wrote already what can cause long waiting times on lock. You can also increase the timeout. Hence, would you like us to verify it as a documentation request. The other option is to set it as "Not a Bug".
[10 Apr 2019 7:27]
Zhenghu Wen
Hi Sinisa Maybe I have not explained clearly. It is not about insert intention or gap locks. It is unique index checking. It could not be Fixed by increasing innodb_lock_wait_timeout. binlog order in file: binlog1 (last committed is 1):xa prepare (acquired lock r1) -> binlog2 (last committed is 2): waiting for lock r1 -> binlog3 (last committed is 3): xa commit for binlog1. binlog2 will never acquire lock r1 only if binlog3 executed, bug binlog3 should waiting for binlog2 executed. so binlog2 will always wait timeout no matter how large innodb_lock_wait_timeout you set.
[10 Apr 2019 12:54]
MySQL Verification Team
Hi Zhenghu, I must admit that your last reply only adds to confusion and does not clear out any misunderstanding. Hence, I have to ask several quesitons. First of all, does it happen only with XA transactions, or with normal transactions as well ??? Second, how many database servers are directly involved in your XA transactions. When I write "directly involved", I am referring to masters only. Next, what is binlog1, binlog2 .... binlogN ??? Each slave can have several relay logs, but only one binary log. Next, what do all those binary logs have to do with difference in behaviour, when it comes to having all relevant records in one page, or in several pages ??? Unique index checking does use some locking , but very few and those are short in duration. When you write about lock `r1` or any other, can you specify EXACTLY which locks do you have in mind ??? You wrote also "bug binlog3" ..... is that a typo ???? Shouldn't it be "but binlog3" ????
[10 Apr 2019 13:34]
Zhenghu Wen
Hi Sinisa I am sorry, my english is poor. First of all, does it happen only with XA transactions, or with normal transactions as well ??? ------only with XA transaction. Second, how many database servers are directly involved in your XA transactions. When I write "directly involved", I am referring to masters only. -------one database server. Next, what is binlog1, binlog2 .... binlogN ??? Each slave can have several relay logs, but only one binary log. -------binlog means a transaction (log events in a transaction, with the same gtid). binlog1~3 are all in a binary log file. Next, what do all those binary logs have to do with difference in behaviour, when it comes to having all relevant records in one page, or in several pages ??? ------ Unique index checking does use some locking , but very few and those are short in duration. When you write about lock `r1` or any other, can you specify EXACTLY which locks do you have in mind ??? ------------------------------------------------- node4336-information_schema>select * from information_schema.INNODB_LOCK_WAITS; +-------------------+--------------------------+-----------------+--------------------------+ | requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id | +-------------------+--------------------------+-----------------+--------------------------+ | 1265010783 | 1265010783:30:1372653:41 | 1264997114 | 1264997114:30:1372653:41 | +-------------------+--------------------------+-----------------+--------------------------+ 1 row in set, 1 warning (0.00 sec) node4336-information_schema>select * from information_schema.INNODB_LOCKS; +--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+ | lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data | +--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+ | 1265010783:30:1372653:41 | 1265010783 | S | RECORD | `assembler3`.`Music_ArtistUserSubscribe` | idx_user_artist | 30 | 1372653 | 41 | 1521413434, 9808 | | 1264997114:30:1372653:41 | 1264997114 | X | RECORD | `assembler3`.`Music_ArtistUserSubscribe` | idx_user_artist | 30 | 1372653 | 41 | 1521413434, 9808 | +--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+ CREATE TABLE `Music_ArtistUserSubscribe` ( `Id` bigint(20) NOT NULL COMMENT 'Id', `ArtistId` bigint(20) DEFAULT '0', `UserId` bigint(20) DEFAULT '0', PRIMARY KEY (`Id`), UNIQUE KEY `idx_user_artist` (`UserId`,`ArtistId`), KEY `idx_aid_id` (`ArtistId`,`Id`) ) ENGINE=InnoDB r1 means "1521413434, 9808". it is first unqiue index/record in leaf page 2. and locked by transaction of binlog1 (xa prepared). transaction of binlog2 wants to insert a unique index/record "1521413434, 8103" (there is no other records between "1521413434, 8103" and "1521413434, 9808") . when insert "1521413434, 8103", it will first found supremum in leaf page 1 (followed by leaf page 2). and will try to lock "1521413434, 9808" for unique index check. -------------------------- You wrote also "bug binlog3" ..... is that a typo ???? Shouldn't it be "but binlog3" ???? ----------sorry, is typo. "but binlog3"
[10 Apr 2019 13:49]
MySQL Verification Team
Hi, First of all, running XA transaction on a single server makes no sense. It just adds to the CPU overhead. Do keep in mind that our normal transactions all, by default, have two-phase commit. So, do you get the same behaviour with non-XA transactions as well, or don't you ??? Next, you are using 5.7.20. Please, try to reproduce this behaviour with 5.7.25, so that we would not finish chasing old, fixed bugs. Last, but not least, if you have the same problems with 5.7.25, we do require a fully repeatable test case. Let us know whether you know what such a test case should contain.
[11 Apr 2019 1:24]
Zhenghu Wen
Thank you for your reply!
[23 Apr 2019 12:34]
MySQL Verification Team
Hi, You are truly welcome.