Bug #116291 innodb crash recovery optimize for MLOG_REC_INSERT
Submitted: 3 Oct 12:11 Modified: 4 Oct 9:16
Reporter: Zihao Wang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: REDO Log

[3 Oct 12:11] Zihao Wang
Description:
When writing a redo log of type MLOG_REC_INSERT, InnoDB will try to find the overlap between the cursor record and the insert record, thereby reducing the amount of redo log written.

Before 8.0.29, MySQL will try to find the overlap when extra_size of cursor_rec equals to extra_size of insert_rec. 

However, after WL #13899, MySQL will additionally check whether cursor_rec.version and insert_rec.version are consistent. And it will only try to find the overlap when two versions are inconsistent, which is a wrong logic and it does not match the comments above the code. 

How to repeat:
You can just go and read the code: https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/storag...

You can also use gdb to see this phenomenon, the SQL is: 

mysql> create table test_mismatch(a int, b int, c int, primary key(a, b));
Query OK, 0 rows affected (0.05 sec)

mysql> insert into test_mismatch values (1,1,1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into test_mismatch values (1,2,1);
Query OK, 1 row affected (0.00 sec)

In any version before 8.0.29, the length of record data (you can calc in this way: (end_seg_len >> 1)) in MLOG_REC_INSERT for the first insert statement is 31, and the length for the second is 18. The 13-byte difference consists of a 6-byte header, a 4-byte a, and a 3-byte b.

In version after 8.0.29, because of the wrong logic, both length are 31.

Suggested fix:
Change the inequality sign in comparing the versions to an equality sign.
[3 Oct 12:26] MySQL Verification Team
Hi Mr. Wang,

Thank you very much for your bug report.

However, this is a forum for the reports for the reports with fully repeatable test cases. Each test case should consist. of a set of SQL statement that always lead to the performance problem that you are reporting.

You have a test case, but the only part missing is getting the data that will prove your report after the insertion.

We also accept the in-depth code analysis that prove the point.

Hence, we are waiting on your feedback.
[3 Oct 12:28] MySQL Verification Team
Hi Mr. Wang,

We would especially like to know how do we measure this part of your report:

"

In any version before 8.0.29, the length of record data (you can calc in this way: (end_seg_len >> 1)) in MLOG_REC_INSERT for the first insert statement is 31, and the length for the second is 18. The 13-byte difference consists of a 6-byte header, a 4-byte a, and a 3-byte b.

In version after 8.0.29, because of the wrong logic, both length are 31.

"

Last, but not least, can you please explain which is it a wrong logic when both lengths are 31 ???

Many thanks in advance.
[3 Oct 15:55] Zihao Wang
So let me answer the question one by one.

First, how to repeat. Since MySQL and InnoDB do not provide Redo Log parsing tools, the only way you can see the composition of a redo log is using gdb.

I've provided some SQL statements above, and you can debug in this way: 

1. mysql> set global innodb_checkpoint_disabled=on;

2. execute the SQL I provided

3. kill mysqld

4.1 in the newest code, make a breakpoint on this line: https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/storag...

4.2 in MySQL 5.7, make a breakpoint on this line: https://github.com/mysql/mysql-server/blob/f7680e98b6bbe3500399fbad465d08a6b75d7a5c/storag...

4.3 check the code below and you will find that the value of 'end_seg_len >>= 1;' in this line is the length of the insert record data in this Redo Log

5. using gdb to debug the crash recovery process, and you will see:

* In any version before 8.0.29, the value of 'end_seg_len >> 1' in the first MLOG_REC_INSERT is 31; and the value of 'end_seg_len >> 1' in the second MLOG_REC_INSERT is 18;

* In any version after 8.0.29, the value of 'end_seg_len >> 1' in both MLOG_REC_INSERT is 31.

===============================

Second, let me explain "which is it a wrong logic when both lengths are 31 ???" in an easy-to-understand way.

In InnoDB, when recording a MLOG_REC_INSERT type redo log, the storage engine have to record the insert data. Only in this way can innodb ensure that the insert operation can be redo properly.

Since innodb redo insert operation will use cursor_rec(insert_rec will be inserted next to cursor_rec), innodb does a optimization in MLOG_REC_INSERT, which is: When 'cursor_rec.extra_size == insert_rec.extra_size', innodb will try to find anything common between the data of cursor_rec and insert_rec. If there is overlap, then MLOG_REC_INSERT don't need to write it into redo logs, which saves I/O and disk spaces.

In WL #13899, the version of instant ddl was introduced. Once the row_version between cursor_rec and insert_rec is not equal, the probability that cursor_rec and insert_rec have something in common is very low. So the developer of WL #13899 wrote the code below:

"
  /* If versions are different, then don't compare the records */
  if (cur_version != ins_version && cur_extra_size == extra_size) {
  https://github.com/mysql/mysql-server/blob/596f0d238489a9cf9f43ce1ff905984f58d227b6/storag...
"

He wrote 'cur_version != ins_version' instead of 'cur_version == ins_version' by mistake.

This mistake causes no error, but makes the optimization I just mentioned to become ineffective. With this mistake, only after user does a instant ddl, and insert a record with same extra_size with cursor_rec, innodb will try to find the common part. 

===============================

Third, let me explain the calculation I did before.

1. Why the length of the record data is 31?

Because the row_format is dynamic in default, and in table 'test_mismatch' all columns are fixed-length, so the length of record header is 6 bytes. The body of the record consists of: 'a', 'b', 'trx-id', 'roll-ptr', 'c', so the length of record data is 4 + 4 + 6 + 7 + 4 = 25. The total length is 6 + 25.

2. Why the length of the record data for the second insert is 18 in versions before 8.0.29?

Because before 8.0.29, innodb only check the extra_size of cursor_rec and insert_rec. Both recs have a 6-byte header, so they meet the condition, innodb will try to find the overlap in there record data.

Let's see there record data, inside innodb it is: 'a' + 'b' + 'trx-id' + 'roll-ptr' + 'c'. The 'a' of both insert is '1', get 4-byte equal; the 'b' of two insert is '1' and '2', different values but still get 3-byte equal with all zero.

Now it's clear, innodb can save 13-byte I/O and disk spaces in this situation with a correct logic. Sadly now MySQL can't do this optimization. The only thing you need to do is change 'cur_version != ins_version' to 'cur_version == ins_version'.
[4 Oct 9:16] MySQL Verification Team
Thank you, Mr. Wang,

We have checked your logic and we agree with you.

This is now a verified bug for version 8.0 and all higher versions.

Thanks again.