Bug #111647 Blob partial update in undo record may lead potential crash
Submitted: 4 Jul 2023 2:41 Modified: 6 Jul 2023 2:36
Reporter: andrew zhao Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:8.0.33 OS:Linux
Assigned to: CPU Architecture:x86

[4 Jul 2023 2:41] andrew zhao
Description:
The blob partial update may record in undo record, if it is a small change. We found that the binary diff server calculated may have the length 0. And this binary diff may send it to innodb and finally record in undo record.
During row_search_mvcc process, reading this record with length 0 may lead to a potential crash. It is verified to be crash in 8.0.25. However, due to some malloc mechanism, it is okay in 8.0.33.

How to repeat:
Follow the sql below it is easy to reproduce the case that binary diff is calculated to be 0:

1,
create table t (a int primary key, b json);
insert into t values (1, '[ "abc", "def" ]');
 
the memory of this record will be like :
       0x02 - type: small JSON array
        0x02 - number of elements (low byte)
        0x00 - number of elements (high byte)
        0x12 - number of bytes (low byte)
        0x00 - number of bytes (high byte)
        0x0C - type of element 0 (string)
        0x0A - offset of element 0 (low byte)
        0x00 - offset of element 0 (high byte)
        0x0C - type of element 1 (string)
        0x0E - offset of element 1 (low byte)
        0x00 - offset of element 1 (high byte)
        0x03 - length of element 0
        'a'
        'b'  - content of element 0
        'c'
        0x03 - length of element 1
        'd'
        'e'  - content of element 1
        'f'

2, UPDATE t SET b = JSON_REMOVE(b, '$[1]') where a = 1;
After this update the memory layout may like this:
                    0x02 - type: small JSON array
 CHANGED 0x01 - number of elements (low byte)
 CHANGED 0x00 - number of elements (high byte)
                    0x12 - number of bytes (low byte)
                    0x00 - number of bytes (high byte)
                    0x0C - type of element 0 (string)
                    0x0A - offset of element 0 (low byte)
                    0x00 - offset of element 0 (high byte)
                    0x0C - type of element 1 (string)
                    0x0E - offset of element 1 (low byte)
                    0x00 - offset of element 1 (high byte)
                    0x03 - length of element 0
                    'a'
                    'b'  - content of element 0
                    'c'
 [Free]         0x03 - length of element 1
 [Free]         'd'
 [Free]         'e'  - content of element 1
 [Free]         'f'

Suggested fix:
In Value::remove_in_shadow  maybe need to check whether this binary diff length is zero. If it is zero, don't  invoke add_binary_diff.
[4 Jul 2023 2:51] andrew zhao
During the update sql executing, two parts of binary diff is calculated. First binary diff is array meta. It is showed in the memory layout and marked as CHANGED. This binary diff's length is 2 bytes. Another binary diff is array data copy. However, this json type is array and we need to remove the last one element in the array. According to the current code logic, we don't need any array data copy. The remove data is marked as freed and the binary diff length is 0.
[4 Jul 2023 3:15] andrew zhao
The potential crash happen if the  select sql goes the row_search_mvcc process. 
 In trx_undo_read_blob_update func, lob_diff.m_length is the parsed undo partial
 update length. Because server calculated blob binary diff and record it into undo record,  this could be zero.

lob_undo_data.copy_old_data may have different behavior on 8.0.25 and 8/0/33 when handling malloc array size is 0 condition.
In 8.0.25, it returns nullptr and pass this nullptr to undo_ptr. That leads to a crash when execute
mach_read_next_compressed(&undo_ptr);
In 8.0.33, it returns array meta instead of nullptr. 

    /* Copy the data only if the lob_undo is not null. */
    if (lob_seq != nullptr) {
      undo_ptr = lob_undo_data.copy_old_data(undo_ptr, lob_diff.m_length);
    } else {
      undo_ptr += lob_diff.m_length;
    }
...

    /* Read the number of LOB index entries modified. */
    ulint n_entry = mach_read_next_compressed(&undo_ptr);

In some other day, if the return value of malloc array with length 0 changed to be nullptr. This crash may happen.

The reproduce step of crash in 8.0.25 is:

client1
create table t (a int primary key, b json);
insert into t values (1, '[ "abc", "def" ]');
update t set b=JSON_SET(b, '$[0]', REPEAT('w', 10000));

client2
set SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
begin;
select * from t where a =1\G

client1
UPDATE t SET b = JSON_REMOVE(b, '$[1]') where a = 1;

client2
select * from t where a =1\G

The stack is the same as https://bugs.mysql.com/bug.php?id=107953
[4 Jul 2023 12:29] andrew zhao
To CORRECT the memory lay out after exec UPDATE t SET b = JSON_REMOVE(b, '$[1]') where a = 1; command.
Memory layout should be:
0x02 - type: small JSON array
 CHANGED 0x01 - number of elements (low byte)
 CHANGED 0x00 - number of elements (high byte) 
         0x12 - number of bytes (low byte)
         0x00 - number of bytes (high byte)
         0x0C - type of element 0 (string)
         0x0A - offset of element 0 (low byte)
         0x00 - offset of element 0 (high byte)
 [Free]  0x0C - type of element 1 (string)
 [Free]  0x0E - offset of element 1 (low byte)
 [Free]  0x00 - offset of element 1 (high byte)
         0x03 - length of element 0
         'a'
         'b'  - content of element 0
         'c'
         0x03 - length of element 1
         'd'
         'e'  - content of element 1
         'f'
Two binary diff is calculated. First part is number of elements. In this case this binary diff is calculated to be 2 bytes(marked as CHANGED). The second binary diff is array meta binary diff. In normal case, to ensure the array meta is continuous with no hollow, some binary meta need to be copied. However, removed element is the last one of this array. No binary meta data need to be copied. (marked as FREE)
[4 Jul 2023 12:38] MySQL Verification Team
Hi Mr. zhao,

Thank you for your bug report.

However, 8.0.24 is a bug fix release, just like 8.0.33 is a bug fix release. That means that bugs reported for 8.0.24 are fixed in later release. So, 8.0.33 bugs will be fixed in 8.0.34 or some later release.

Hence, it is expected behaviour that 8.0.33 has no problem. We repeated your test case and everything worked quite fine with 8.0.33.

Can't repeat.
[5 Jul 2023 2:37] andrew zhao
To simplify what I want to say. There are two parts of code in 8.0.33(current newest version) is not reasonable. JSON_REMOVE may produce a zero length binary diff. It is unnecessary and could cause a potential crash. Code in trx_undo_read_blob_update listed above could parse this zero length diff and could crash which depends on the different behavior of malloc array with 0 length.  Different release acts different behavior. (Details lists above). If this behavior changes again, crash may happen.
[5 Jul 2023 12:56] MySQL Verification Team
HI Mr. zhao,

Yes, malloc(0) is a hard error.

However, we test it in so many places that it is now very unlikely that it could happen.

Also, many system malloc libraries test it as well.

We can not verify a report based on "it might happen" .......
[6 Jul 2023 2:36] andrew zhao
Thank you for your response.
[6 Jul 2023 12:14] MySQL Verification Team
You are welcome .....