Bug #94881 slave replication lock wait timeout because of supremum record
Submitted: 3 Apr 4:37 Modified: 11 Apr 1:24
Reporter: Zhenghu Wen (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: lock,supremum

[3 Apr 4:37] Zhenghu Wen
Description:
1. a table has a unique index(idx_a) and two record with idx_a value x,z。
x is the last record of page1, z is first record of page2,(page1,page2 is leaf page, and page2 next to page1)

2. z has been locked by trx1;

3. if we insert a record y (x < y < z) in trx2, this insert will report lock wait timeout. 

from the code we could see that: 
a. row_ins_sec_index_entry_low will check if y is exist, by using 
https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/row/row0ins.cc#L3003
(gdb) bt
#0  btr_cur_search_to_nth_level (index=0x1a690d08, level=0, tuple=0x1a12c1b8, mode=PAGE_CUR_LE, latch_mode=514, cursor=0x7f18c367c900, 
    has_search_latch=0,
#1  0x000000000119ca36 in row_ins_sec_index_entry_low (flags=flags@entry=0, mode=mode@entry=2, index=index@entry=0x1a690d08, 

and it will get supremum record of page1.

b. if found supremum (and idx_a is unique index), it seem the judgment condition will satisfied
https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/row/row0ins.cc#L3030 
---------------
	if (dict_index_is_unique(index)
	    && (cursor.low_match >= n_unique || cursor.up_match >= n_unique)) {
---------------

and goto unqiue check function row_ins_scan_sec_index_for_duplicate
https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/row/row0ins.cc#L2038

it will try to get s lock of z, and timeout。

c. if in master , x and z is in the same page (will not go into row_ins_scan_sec_index_for_duplicate), then in slave is the condition below (x is the last record of page1, z is first record of page2), replication will break down.

How to repeat:
it is hard to repeat in test environment, as it depends on the record/index entry position in physical page.
[3 Apr 13:38] Sinisa Milivojevic
Hi,

Thank you for your bug report.

I truly do not see what is a bug here. You should know that INSERT command is setting insert intention and gap locks before it physically inserts the row. Hence, knowing that, the behaviour is expected.

However, the difference in behaviour on whether the gap is within the same InnoDB page or on different page is not well documented. Hence, I recommend that this gets documented properly in our Reference Manual.

Regarding getting lock wait timeouts, this should not happen if the transactions are well designed and do not wait for any processing, user input, backup procedures and similar behaviour. It can also happen when the server is overloaded. So, from that point of view, it is not a bug.

We only need your opinion on whether it should be documented better in our Reference Manual.
[4 Apr 1:16] Zhenghu Wen
Hi Sinisa
  
   The problem is that It will break down replication. 
   
   I think if two transcations could commit sucessfully in master, they should commit in slave too, while in this situation, they could not.
[4 Apr 12:56] Sinisa Milivojevic
Hi Wen,

I already wrote that this behaviour, when it comes to the insert intention and gap locks working on the same or different pages, is a behaviour by design. Also, it is impossible to use replication to order slave to put next rows(s) in a new page.

That design does NOT break your replication. What breaks your replication is that lock wait timeouts on the slave. I wrote already what can cause long waiting times on lock. You can also increase the timeout.

Hence, would you like us to verify it as a documentation request. The other option is to set it as "Not a Bug".
[10 Apr 7:27] Zhenghu Wen
Hi Sinisa
   
   Maybe I have not explained clearly. It is not about insert
intention or gap locks. It is unique index checking.

   It could not be Fixed by increasing innodb_lock_wait_timeout. 

binlog order in file: binlog1 (last committed is 1):xa prepare (acquired lock r1) -> binlog2 (last committed is 2): waiting for lock r1 -> binlog3 (last committed is 3): xa commit for binlog1.

binlog2 will never acquire lock r1 only if binlog3 executed, bug binlog3 should waiting for binlog2 executed. so binlog2 will always wait timeout no matter how large innodb_lock_wait_timeout you set.
[10 Apr 12:54] Sinisa Milivojevic
Hi Zhenghu,

I must admit that your last reply only adds to confusion and does not clear out any misunderstanding. Hence, I have to ask several quesitons.

First of all, does it happen only with XA transactions, or with normal transactions as well ???

Second, how many database servers are directly involved in your XA transactions. When I write "directly involved", I am referring to masters only.

Next, what is binlog1, binlog2 .... binlogN ??? Each slave  can have several relay logs, but only one binary log. 

Next, what do all those binary logs have to do with difference in behaviour, when it comes to having all relevant records in one page, or in several pages ???

Unique index checking does use some locking , but very few and those are short in duration.

When you write about lock `r1` or any other, can you specify EXACTLY which locks do you have in mind ???

You wrote also "bug binlog3" ..... is that a typo ???? Shouldn't it be "but binlog3" ????
[10 Apr 13:34] Zhenghu Wen
Hi Sinisa

I am sorry, my english is poor.

First of all, does it happen only with XA transactions, or with normal transactions as well ???
 ------only with XA transaction.

Second, how many database servers are directly involved in your XA transactions. When I write "directly involved", I am referring to masters only.
-------one database server.

Next, what is binlog1, binlog2 .... binlogN ??? Each slave  can have several relay logs, but only one binary log. 
-------binlog means a transaction (log events in a transaction, with the same gtid). binlog1~3 are all in a binary log file.

Next, what do all those binary logs have to do with difference in behaviour, when it comes to having all relevant records in one page, or in several pages ???
------

Unique index checking does use some locking , but very few and those are short in duration.

When you write about lock `r1` or any other, can you specify EXACTLY which locks do you have in mind ???
-------------------------------------------------
node4336-information_schema>select * from information_schema.INNODB_LOCK_WAITS;
+-------------------+--------------------------+-----------------+--------------------------+
| requesting_trx_id | requested_lock_id        | blocking_trx_id | blocking_lock_id         |
+-------------------+--------------------------+-----------------+--------------------------+
| 1265010783        | 1265010783:30:1372653:41 | 1264997114      | 1264997114:30:1372653:41 |
+-------------------+--------------------------+-----------------+--------------------------+
1 row in set, 1 warning (0.00 sec)

node4336-information_schema>select * from information_schema.INNODB_LOCKS;
+--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+
| lock_id                  | lock_trx_id | lock_mode | lock_type | lock_table                               | lock_index      | lock_space | lock_page | lock_rec | lock_data        |
+--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+
| 1265010783:30:1372653:41 | 1265010783  | S         | RECORD    | `assembler3`.`Music_ArtistUserSubscribe` | idx_user_artist |         30 |   1372653 |       41 | 1521413434, 9808 |
| 1264997114:30:1372653:41 | 1264997114  | X         | RECORD    | `assembler3`.`Music_ArtistUserSubscribe` | idx_user_artist |         30 |   1372653 |       41 | 1521413434, 9808 |
+--------------------------+-------------+-----------+-----------+------------------------------------------+-----------------+------------+-----------+----------+------------------+

CREATE TABLE `Music_ArtistUserSubscribe` (
  `Id` bigint(20) NOT NULL COMMENT 'Id',
  `ArtistId` bigint(20) DEFAULT '0',
  `UserId` bigint(20) DEFAULT '0',
  PRIMARY KEY (`Id`),
  UNIQUE KEY `idx_user_artist` (`UserId`,`ArtistId`),
  KEY `idx_aid_id` (`ArtistId`,`Id`)
) ENGINE=InnoDB

r1 means "1521413434, 9808".  it is first unqiue index/record in leaf page 2. and locked by transaction of binlog1 (xa prepared).  transaction of binlog2 wants to insert a unique index/record "1521413434, 8103" (there is no other records between "1521413434, 8103" and "1521413434, 9808") .

when insert "1521413434, 8103", it will first found supremum in leaf page 1 (followed by leaf page 2). and will try to lock "1521413434, 9808" for unique index check.

--------------------------
You wrote also "bug binlog3" ..... is that a typo ???? Shouldn't it be "but binlog3" ????
----------sorry, is typo. "but binlog3"
[10 Apr 13:49] Sinisa Milivojevic
Hi,

First of all, running XA transaction on a single server makes no sense. It just adds to the CPU overhead. Do keep in mind that our normal transactions all, by default, have two-phase commit.

So, do you get the same behaviour with non-XA transactions as well, or don't you ???

Next, you are using 5.7.20. Please, try to reproduce this behaviour with 5.7.25, so that we would not finish chasing old, fixed bugs.

Last, but not least, if you have the same problems with 5.7.25, we do require  a fully repeatable test case. Let us know whether you know what such a test case should contain.
[11 Apr 1:24] Zhenghu Wen
Thank you for your reply!
[23 Apr 12:34] Sinisa Milivojevic
Hi,

You are truly welcome.