| Bug #101828 | Hash scan incorrectly applied the log resulting in HA_ERR_END_OF_FILE | ||
|---|---|---|---|
| Submitted: | 2 Dec 2020 2:37 | Modified: | 23 Apr 2:30 |
| Reporter: | Xiaodong Huang (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
| Version: | 8.0.42 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | Contribution, regression | ||
[2 Dec 2020 2:40]
Xiaodong Huang
Simple fix from TXSQL Team of Tencent (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: fix.patch (application/octet-stream, text), 466 bytes.
[2 Dec 2020 2:57]
Xiaodong Huang
The error reported in 8.0.22 is HA_ERR_END_OF_FILE
[2 Dec 2020 3:19]
Xiaodong Huang
In testcase: CREATE TABLE t1 ( a bigint unsigned not null, b bigint unsigned not null ) ENGINE=InnoDB DEFAULT CHARSET=utf8; Since 8.0 no longer uses "utf8", it generates warnings and the fix is: CREATE TABLE t1 ( a bigint unsigned not null, b bigint unsigned not null ) ENGINE=InnoDB;
[2 Dec 2020 4:28]
MySQL Verification Team
Hello Xiaodong, Thank you for the report and contribution. Verified as described. regards, Umesh
[25 Jun 2024 7:48]
Xiaodong Huang
In recent years, I have received a lot of feedback about this problem. Below, I will explain the reasons for the problem by combining the testcase above. Additionally, this problem can still be reproduced in version 8.0.37.
Background: In tables without a primary key, the hash scan algorithm is widely used when applying row-format binlog. We provide an example in this bug report to illustrate this problem:
Example:
```sql
SET @@global.slave_rows_search_algorithms= 'HASH_SCAN,INDEX_SCAN,TABLE_SCAN';
CREATE TABLE t1 (
a bigint unsigned not null,
b bigint unsigned not null
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- insert into t1 values(r[1]),(r[2]);
insert into t1 values(0xa8e8ee744ced7ca8, 0x6850119e455ee4ed),(0x135cd25c170db910, 0x6916c5057592c796);
-- update t1 set r[2] = r'[2]
update t1 set a=1 where a=0x135cd25c170db910 and b=0x6916c5057592c796;
```
Problem: In this update statement, table t1 {r[1],r[2]}, In here crc32(r[1])=crc32(r[2]), When an update is applied to r[2], it will be applied to r[1] instead of r[2] on the slave. This leads to inconsistent values of t1 on the master and slave. After executing update statement on the master, the table t1 becomes {r[1], r'[2]}, but on the slave, the table t1 becomes {r'[2], r[2]}.
Analysis:
Start analyzing the execution process of the above update statement on the slave.
1. When slave executes the update event related to the update satement, it will create a hash table for this event, the hash table is as following:
hash_value | hash_records[0] | hash records[1]
| before image | after image |
crc32((0x135cd25c170db910, 0x6916c5057592c796)) --> [(0x135cd25c170db910, 0x6916c5057592c796), (0x1, 0x6916c5057592c796)] --> nullptr
2. We then begin to scan table t1 {r[1], r[2]}. The process starts by reading r[1] , comparing the key, and applying it using following code segment. we using step1~step12 to illustrate the executing of update row event on this problem.
2.1. In step1~step7. we don't find a record to apply current event related to r[2], but table->record[0] is r[2].
2.2. In step8~step12, the while condition mistakenly believes that we have updated r[1] to r[2]. so directly update r[1] to r'[2].
```c++
do {
// step1. start, table->record[0] = r[1]
error = next_record_scan(i == 0);
i++;
switch (error) {
case 0: {
// step2. get first record of crc32((0x135cd25c170db910, 0x6916c5057592c796))
entry = m_hash.get(table, &this->m_local_cols);
do {
// step3. table->record[1] = table->record[0] = r[1]
// step8. table->record[1] = table->record[0] = r[2]
store_record(table, record[1]);
// iterator the hash records
while (entry) {
m_curr_row = entry->positions->bi_start;
m_curr_row_end = entry->positions->bi_ends;
prepare_record(table, &this->m_local_cols, false);
// step4, step9. table->record[0] = befor image of hash_records[0] = r[2]
if ((error = unpack_current_row(rli, &m_cols, false /*is not AI*/)))
goto close_table;
// step5, step10. judge if table->record[0] == table->record[1]
if (record_compare(table, &this->m_local_cols))
// step6. entry = hash_records[2] = nullptr
m_hash.next(&entry);
else
// step11. found a row operation log in event
break; // we found a match
}
// step6 entry is nullptr and skip apply
if (entry) {
// step12. apply this event [r[2], r'[2]] to r[1]
}
// step7. table->record[0] = r[2], So while condition is satisfied
} while (this->get_general_type_code() ==
binary_log::UPDATE_ROWS_EVENT &&
!is_pk_present && (entry = m_hash.get(table, &m_local_cols)));
} break;
}
} while (((idempotent_errors < m_hash.size()) && !m_hash.is_empty()) &&
(!error || (error == HA_ERR_RECORD_DELETED)));
```
Fix:
If the entry is nullptr, it mean that we can't find any more row operation logs (a update or delete in rows event) in current event on current record. So we add "entry != nullptr" to the while condition in step7 to move next record.
[23 Apr 2:21]
Jinghua Lin
seems this issue was fixed in 8.0.42? https://github.com/mysql/mysql-server/commit/dea401079e42c2cb87eda70db5dcf1ee6e57640a
[23 Apr 2:30]
Xiaodong Huang
In version 8.0.42, this bug is expected to have been resolved.

Description: Assume that a table t1 without a primary key and unique index has {r[1],r[2], r[3],...r[x],...,r[y],... }, which satisfies When crc32(r[x])=crc32(r[y]), the update to r[y] will be applied to r[x] on the slave. This leads to inconsistent values of r[x] and r[y] on the master and slave. Then on the master, executing delete /update r[x] or r[y] will cause HA_ERR_END_OF_FILE(1032) error. How to repeat: Execute the following testcase: --source include/master-slave.inc --source include/have_binlog_format_row.inc --echo # init =================================================================== connection master; --let $rows_alg_tmp=`select @@global.slave_rows_search_algorithms` SET @@global.slave_rows_search_algorithms= 'HASH_SCAN,INDEX_SCAN,TABLE_SCAN'; CREATE TABLE t1 ( a bigint unsigned not null, b bigint unsigned not null ) ENGINE=InnoDB DEFAULT CHARSET=utf8; # The two row have the same crc value --echo # crc32(null_bytes,row[1].a,row[1].b)=crc32(null_bytes,row[2].a,row[2].b) --echo # crc32(0xffa87ced4c74eee8a8ede45e459e115068)= crc32(0xff10b90d175cd25c1396c7927505c51669) insert into t1 values(0xa8e8ee744ced7ca8, 0x6850119e455ee4ed),(0x135cd25c170db910, 0x6916c5057592c796); --sync_slave_with_master connection slave; SET @@global.slave_rows_search_algorithms= 'HASH_SCAN,INDEX_SCAN,TABLE_SCAN'; select * from t1; --echo # phrase 1 =================================================================== --echo # The log of the second record is applied to the first record connection master; update t1 set a=1 where a=0x135cd25c170db910 and b=0x6916c5057592c796; select * from t1; --echo # slave======================================== --sync_slave_with_master connection slave; select * from t1; --echo # phrase 2 =================================================================== --echo # HA_ERR_KEY_NOT_FOUND error! connection master; update t1 set a=2 where a=1; select * from t1; --echo # slave======================================== --sync_slave_with_master connection slave; select * from t1; --echo # cleanup =================================================================== connection master; --eval SET @@global.slave_rows_search_algorithms='$rows_alg_tmp' drop table t1; --sync_slave_with_master connection slave; --eval SET @@global.slave_rows_search_algorithms='$rows_alg_tmp' --source include/rpl_end.inc Produces the following result, But it should be able to execute successfully. Slave_IO_State Waiting for master to send event Master_Host 127.0.0.1 Master_User root Master_Port 13000 Connect_Retry 1 Master_Log_File master-bin.000001 Read_Master_Log_Pos 1361 Relay_Log_File slave-relay-bin.000002 Relay_Log_Pos 1265 Relay_Master_Log_File master-bin.000001 Slave_IO_Running Yes Slave_SQL_Running No Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno 1032 Last_Error Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1330 Skip_Counter 0 Exec_Master_Log_Pos 1048 Relay_Log_Space 1787 Until_Condition None Until_Log_File Until_Log_Pos 0 Master_SSL_Allowed No Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert No Last_IO_Errno 0 Last_IO_Error Last_SQL_Errno 1032 Last_SQL_Error Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1330 Replicate_Ignore_Server_Ids Master_Server_Id 1 Master_UUID dc38226c-3400-11eb-a34e-fa163eca90dd Master_Info_File mysql.slave_master_info SQL_Delay 0 SQL_Remaining_Delay Slave_SQL_Running_State Master_Retry_Count 10 Master_Bind Last_IO_Error_Timestamp Last_SQL_Error_Timestamp 201201 21:13:07 Master_SSL_Crl Master_SSL_Crlpath Retrieved_Gtid_Set Executed_Gtid_Set Auto_Position 0 Replicate_Rewrite_DB Channel_Name Master_TLS_Version Master_public_key_path Get_master_public_key 0 Suggested fix: Fix from Tencent TXSQL Team. patch: