Bug #115741 | Slave stops with HA_ERR_KEY_NOT_FOUND with HASH_SCAN and index used to search fo | ||
---|---|---|---|
Submitted: | 1 Aug 2024 11:13 | Modified: | 7 Aug 2024 15:15 |
Reporter: | Jan Lindström | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[1 Aug 2024 11:13]
Jan Lindström
[2 Aug 2024 15:46]
Jean-François Gagné
This looks like a duplicate of Bug#105802.
[5 Aug 2024 12:33]
MySQL Verification Team
Hi Jan, thanks for the report. As JFG said this does look like duplicate of #105802 and I cannot reproduce this on latest 8.0. What version did you reproduce this on? Thanks
[6 Aug 2024 9:57]
Jan Lindström
he issue reproduces on mysql-8.0.39 tag debug built from the official repository. Here's the refined reproducer for this issue: {noformat} --source include/have_binlog_format_row.inc --source include/set_privilege_checks_user_as_system_user.inc --source include/master-slave.inc --source include/rpl_connection_slave.inc --source include/rpl_connection_master.inc # Create a table, with no primary key and an index. CREATE TABLE t1 (i INT, INDEX t1_i(i)); # Create a stored function so that only one Update_rows_log_event is generated. --delimiter | CREATE FUNCTION f1 () RETURNS INT BEGIN UPDATE t1 SET i = 0 WHERE i = 1; UPDATE t1 SET i = 2 WHERE i = 0; UPDATE t1 SET i = 3 WHERE i = 2; UPDATE t1 SET i = 4 WHERE i = 3; UPDATE t1 SET i = 5 WHERE i = 4; RETURN 0; END| --delimiter ; --source include/sync_slave_sql_with_master.inc --source include/rpl_connection_master.inc INSERT INTO t1 VALUES (1); SELECT f1(); --source include/sync_slave_sql_with_master.inc --let $assert_text= 'There is only one row in table t1' --let $assert_cond= [SELECT COUNT(*) FROM t1] = 1 --source include/assert.inc --source include/rpl_connection_master.inc # Verify that there is no difference between tables of master and slave. --let diff_tables=master:test.t1, slave:test.t1 --source include/diff_tables.inc # Cleanup --source include/rpl_connection_master.inc DROP FUNCTION f1; DROP TABLE t1; --source include/rpl_connection_slave.inc --source include/rpl_end.inc {noformat} We also analyzed #105802 ticket. Here's the reproducer for #105802 we used and that is described in the comment of the corresponding ticket: {noformat} --source include/have_binlog_format_row.inc --source include/set_privilege_checks_user_as_system_user.inc --source include/master-slave.inc --source include/rpl_connection_slave.inc --source include/rpl_connection_master.inc # Create a table, with no primary key. CREATE TABLE t1 (i INT); # Create a stored function so that only one Update_rows_log_event is generated. --delimiter | CREATE FUNCTION f1 () RETURNS INT BEGIN UPDATE t1 SET i = 1 WHERE i = 4; UPDATE t1 SET i = 2 WHERE i = 1; UPDATE t1 SET i = 3 WHERE i = 2; UPDATE t1 SET i = 2 WHERE i = 3; UPDATE t1 SET i = 1 WHERE i = 2; UPDATE t1 SET i = 4 WHERE i = 1; RETURN 0; END| --delimiter ; --source include/sync_slave_sql_with_master.inc --source include/rpl_connection_master.inc INSERT INTO t1 VALUES (1), (2), (3), (4); SELECT f1(); --source include/sync_slave_sql_with_master.inc --let $assert_text= 'There are only four rows in table t1' --let $assert_cond= [SELECT COUNT(*) FROM t1] = 4 --source include/assert.inc --source include/rpl_connection_master.inc # Verify that there is no difference between tables of master and slave. --let diff_tables=master:test.t1, slave:test.t1 --source include/diff_tables.inc # Cleanup --source include/rpl_connection_master.inc DROP FUNCTION f1; DROP TABLE t1; --source include/rpl_connection_slave.inc --source include/rpl_end.inc {noformat} Both reproducers fail, but they fail for two different reasons. The problem with #105802 is that intermediate updates to the same row might not be applied (skipped), and updates that left result in the error that not all entries are applied from the log event. The current issue is different and it's only reproduced when there's an index. The problem is that one of the intermediate updates to the same row has a key that sorts before the original key value of the row and then it's searched first in the index, generating key not found error. Two issues look similar because they have the same root cause: multiple updates to the same row within a single update rows log event. But they are different, please, try the refined tests once again.
[6 Aug 2024 10:44]
MySQL Verification Team
Hi Jan, I'll retest with your new test case on debug version of .39. Thanks.
[7 Aug 2024 15:15]
MySQL Verification Team
Verified as described
[22 Aug 2024 6:55]
huahua xu
Hi, all: It may be helpful to solving the problem: https://bugs.mysql.com/bug.php?id=106918
[1 Jan 16:34]
Varun Nagaraju
When a replica's slave_rows_search_algorithms is set to HASH_SCAN, it prepares a unique key list for all the rows in a particular Row_event. The same unique key list will later be used to retrieve all tuples associated to each key in the list from storage engine. In the case of multiple updates targeted at the same row like how it is shown in the testcase, it may happen that this unique key list filled with entries which don't exist yet in the table. This is a problem when there is an intermediate update which changes the value of the index column to a lesser value than the original entry and that changed value is used in another update. It is an issue because the unique key list is a std::set which internally sorts it's entries. When this sorting happens, the first entry of the list could potentially be a value which doesn't exist in the table and the when it is searched in next_record_scan() method, it fails returning HA_ERR_KEY_NOT_FOUND error. So, Instead of using std::set to store the distinct keys, a combination of unordered_set and a list is used to preserve the original order of updates and avoid duplicates at the same time which prevents the side effects of sorting.
[1 Jan 16:34]
Varun Nagaraju
adding the patch as contribution (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-PS-9391-Fixes-replication-break-error-because-of-HAS.patch (text/x-patch), 12.74 KiB.
[21 Jan 12:46]
Varun Nagaraju
Contributing a refactored patch based on v8.0.40 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-PS-9391-Fixes-replication-break-error-because-of-HAS.patch (text/x-patch), 13.52 KiB.