MySQL Bugs: #105802: Slave stops with HA_ERR_KEY_NOT_FOUND (INDEX_SCAN,HASH

Bug #105802	Slave stops with HA_ERR_KEY_NOT_FOUND (INDEX_SCAN,HASH_SCAN)
Submitted:	5 Dec 2021 12:43	Modified:	21 Dec 2021 1:27
Reporter:	zhijun long	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.; 8.0.	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Slave stops with HA_ERR_KEY_NOT_FOUND when slave_rows_search_algorithms is set to INDEX_SCAN,HASH_SCAN. And it is the default setting in version 8.0.

**** SHOW SLAVE STATUS on slave ****
SHOW SLAVE STATUS;
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	13000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	1264
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	985
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Do_DB	
Replicate_Ignore_DB	
Replicate_Do_Table	
Replicate_Ignore_Table	
Replicate_Wild_Do_Table	
Replicate_Wild_Ignore_Table	
Last_Errno	1032
Last_Error	Could not execute Update_rows event on table uniticket.t1; Can't find record in 't1', Error_code: 1032; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 1233

How to repeat:
MTR test:

--source include/master-slave.inc
--connection master
create database uniticket;
use uniticket;

CREATE TABLE `t1` (
  `a` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `b` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `d` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  UNIQUE KEY `k1` (`a`,`b`,`c`)
) ENGINE=InnoDB;

INSERT INTO `t1` (`d`,`a`,`c`,`b`) VALUES('2021-12-04 16:42:53','A','B','A')  ,('2021-12-04 16:42:54','A','B','A')  ,('2021-12-04 16:42:53','A','B','A')  ,('2021-12-04 16:42:55','A','B','A') ON DUPLICATE KEY UPDATE `d`=VALUES(`d`),`a`=VALUES(`a`),`c`=VALUES(`c`),`b`=VALUES(`b`);

--source include/sync_slave_sql_with_master.inc
--connection master
DROP TABLE t1;
--source include/rpl_end.inc

Thanks for the test case. Verified.

This is a serious issue (given that slave_row_search_algorithm default is INDEX_SCAN,HASH_SCAN in 8.0) and should be a S2 issue (not s3).

Hi Team,

Any updates on the fix?

This is a very unique corner case in which there are multiple updates targeted at the same row in a single row log event and the slave_rows_search_algorithms system variable is set to HASH_SCAN(By default it is INDEX_SCAN, HASH_SCAN).
So, the test case can be simplified as

--source include/master-slave.inc
--connection slave
set @prior_slave_rows_search_algorithms=@@global.slave_rows_search_algorithms;
set @@global.slave_rows_search_algorithms= 'HASH_SCAN';
--connection master
create database uniticket;
use uniticket;
CREATE TABLE `t1` (
  `a` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `b` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `d` varchar(60) COLLATE utf8mb4_general_ci DEFAULT NULL,
  UNIQUE KEY `k1` (`a`,`b`,`c`)
) ENGINE=InnoDB;
INSERT INTO 
  `t1` (`a`,`b`,`c`,`d`) 
VALUES
   ('A','A','B','2021-12-04 16:42:53'),
   ('A','A','B','2021-12-04 16:42:54'),
   ('A','A','B','2021-12-04 16:42:53'),
   ('A','A','B','2021-12-04 16:42:55')
ON DUPLICATE KEY UPDATE 
   `a`=VALUES(`a`),
   `b`=VALUES(`b`),
   `c`=VALUES(`c`),
   `d`=VALUES(`d`);

--echo # t1 contents on master
SELECT * FROM uniticket.t1;

--source include/sync_slave_sql_with_master.inc
--connection slave
--echo # t1 contents on slave
SELECT * FROM uniticket.t1;

--connection master
drop database uniticket;
--connection slave
set @@global.slave_rows_search_algorithms= @prior_slave_rows_search_algorithms;
--source include/rpl_end.inc

Hash scan generally works in the following way:
Whenever HASH_SCAN is set as the slave_row_search_algorithms, firstly, the hash table will be populated with the information from the write sets in the rows_log_event with
keys : Hash_slave_rows::make_hash_key(TABLE *table, MY_BITMAP *cols) dependant on the table object and the bitmap of cols from master.
values : A smart pointer to the positions of the before image of a write set.
Only after this hash table is generated from the write sets, the changes are attempted to be applied at Rows_log_event::do_scan_and_update(Relay_log_info const *rli).

int Rows_log_event::do_hash_scan_and_update(Relay_log_info const *rli) {
  DBUG_TRACE;
  assert(m_table && m_table->in_use != nullptr);
  // HASHING PART
  /* unpack the BI (and AI, if it exists) and add it to the hash map. */
  if (int error = this->do_hash_row(rli)) return error;
  /* We have not yet hashed all rows in the buffer. Do not proceed to the SCAN
   * part. */
  if (m_curr_row_end < m_rows_end) return 0;
  DBUG_PRINT("info", ("Hash was populated with %d records!", m_hash.size()));
  assert(m_curr_row_end == m_rows_end);
  // SCANNING & UPDATE PART
  return this->do_scan_and_update(rli);
}

do_scan_and_update(rli) performs a single table scan and for each row in the table scan, the hash table would be looked up for all the entries and the comparison of table record from storage engine and the before image from the hash entry is performed.
If there is a match, that particular hash entry is deleted and the write set to which the before image belonged to will be applied. This process will be repeated until there are no more entries in the hash.
(Error scenario : If the table scan completes and there are still some entries left in the hash, then it indicates an inconsistency between source and replica and fails.)

Coming to the corner case of multiple updates targeted at the same row in a single row log event, there have been some work done in that area. For e.g:-

    https://github.com/mysql/mysql-server/commit/dfd8d7d89bfb7cbde7c0e45827bcb611dcb8f36e
    https://github.com/mysql/mysql-server/commit/af5bfba0e28cd6839ce4e902ca89b023244efca0
    https://github.com/mysql/mysql-server/commit/1d2ad39cf2e7d978ebdf162f6b31cd3a13cfcb6c

But, they haven't covered this exact case when there are multiple entries in the hash with the same key(which means the before image of more than 1 write sets matches the content of the table record).
The relay log will contain 1 insert and 3 update write sets as expected.

'/*!*/;
### INSERT INTO `uniticket`.`t1`
### SET
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:53'
### UPDATE `uniticket`.`t1`
### WHERE
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:53'
### SET
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:54'
### UPDATE `uniticket`.`t1`
### WHERE
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:54'
### SET
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:53'
### UPDATE `uniticket`.`t1`
### WHERE
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:53'
### SET
###   @1='A'
###   @2='A'
###   @3='B'
###   @4='2021-12-04 16:42:55'

The internal implementation of the hash being used here is an unordered_multimap.

    The delete method is made sure to delete a single entry at a time instead of the default behaviour of deleting all the entries with the given key.
    But, the get method of hash class Hash_slave_rows doesn't guarantee to fetch the entries in the order they were inserted.

So, when the table scan starts and for the first and single row present in the table, the table's record and the before image  of the entry fetched from the hash will match. But, since the hash table has more than one entry with the same before image and in turn the same key, the wrong entry(corresponding to a different write set) is fetched from the hash table. It is the deleted from the hash table and changes will be applied and this causes other updates of the hash to NOT apply since the before image of the none of the entries in the hash table match with the no corresponding rows on the slave table.
This is the reason a failure with HA_KEY_NOT_FOUND is seen.

The above behaviour can be verified using the following debug patch

diff --git a/sql/log_event.cc b/sql/log_event.cc
index 4a0322dd131..95f12bd628c 100644
--- a/sql/log_event.cc
+++ b/sql/log_event.cc
@@ -9337,6 +9337,7 @@ int Rows_log_event::do_hash_row(Relay_log_info const *rli) {
     key based on record[0] contents (including BLOB fields).
    */
   m_hash.put(m_table, &this->m_local_cols, entry);
+  rli->report(INFORMATION_LEVEL, 0, "debug: Added record to hash with key= %u values: bi_start = %p bi_ends = %p", entry->preamble->hash_value, entry->positions->bi_start, entry->positions->bi_ends);
 
   if (m_key_index < MAX_KEY) add_key_to_distinct_keyset();
 
@@ -9441,7 +9442,7 @@ int Rows_log_event::do_scan_and_update(Relay_log_info const *rli) {
         */
         do {
           store_record(table, record[1]);
-
+          rli->report(INFORMATION_LEVEL, 0, "debug: Fetched entry key = %u values: bi_start = %p bi_ends = %p", entry->preamble->hash_value, entry->positions->bi_start, entry->positions->bi_ends);
           /**
              If there are collisions we need to be sure that this is
              indeed the record we want.  Loop through all records for

A possible way to go about implementing a solution for this issue is transforming/optimizing the write set information into less ambiguous write sets before storing them in the hash table which can be applied cleanly without any conflicts/errors.
For example, If we consider the testcase mysql-test/suite/rpl/t/rpl_row_multi_update_of_same_row.test which was added as part of dfd8d7d89bfb7cbde7c0e45827bcb611dcb8f36e

CREATE TABLE t1 (i INT); INSERT INTO `t1` VALUES (1), (2), (3), (4);

and if we consider the stored procedure to have the following updates in a single rows log event,

UPDATE t1 SET i = 1 WHERE i = 4;
UPDATE t1 SET i = 2 WHERE i = 1;
UPDATE t1 SET i = 3 WHERE i = 2;
UPDATE t1 SET i = 2 WHERE i = 3;
UPDATE t1 SET i = 1 WHERE i = 2;
UPDATE t1 SET i = 4 WHERE i = 1;

The write sets in the rows log event would be something like this and so would be the hash table

Write set                                                                                                                                                                                                       
4 1                                                                                                                                                                                                            
1 2                                                                                                                                                                                                            
2 3                                                                                                                                                                                                            
3 2                                                                                                                                                                                                            
2 1                                                                                                                                                                                                            
1 4 

Here, the write sets can be optimized.
we start from the first element
4 1 <- here
1 2
2 3
3 2
2 1
1 4

Then we move to the second element (1 2) and extract its Before Image (1 in this case)
4 1
1 2 <- here  (BI = 1)
2 3
3 2
2 1
1 4

and see that before <here> there was already an element that has AI identical to the BI in <here> (1 in this case), which meant that that previous element (4 1) can be replaced with (4 2).
After that replace we should move to the next element
4 2
1 2
2 3 <- here  (BI = 2)
3 2
2 1
1 4

If we continue the procedure (4 2) will become (4 3) and (1 2) will become (1 3)
4 3
1 3
2 3
3 2 <- here  (BI = 3)
2 1
1 4

Then
4 2
1 2
2 2
3 2
2 1 <- here  (BI = 2)
1 4

(notice here that there is an element with identical BI and AI, which is OK for an intermediate step)
After that
4 1
1 1
2 1
3 1
2 1
1 4 <- here  (BI = 1)

And finally
4 4
1 4
2 4
3 4
2 4
1 4
done <-here 

We need to do a few more steps here
1. Eliminate elements with identical BI and AI -  (4 4) in this case
   1 4
   2 4
   3 4
   2 4
   1 4

2. Eliminate elements with identical (BI AI) pairs. Please, notice that after this transformation there should be no pairs with identical BI but different AI (edited) 
   1 4
   2 4
   3 4
and this is the final result.

One more thing to do while transforming the write sets, is that we should mark the elements in the write set with additional flag (original / modified). At the beginning - everything should be original. On each step, if we optimize an element, we mark it with modified. This is useful for cases like the following

If we consider the sample example in this bug report and simplify it a little bit as follows

CREATE TABLE t1 (i INT); INSERT INTO `t1` VALUES (1);

and if we consider the stored procedure to have the following updates in a single rows log event,

UPDATE t1 SET i = 2 WHERE i = 1;
UPDATE t1 SET i = 1 WHERE i = 2;
UPDATE t1 SET i = 3 WHERE i = 1;

The write sets in the rows log event would be something like this and so would be the hash table
Write set                                                                                                                                                                                                       
1  2                                                                                                                                                                                                            
2  1                                                                                                                                                                                                            
1  3 

Here, before inserting entries to the hash table, 
1 2 (original) <-
2 1 (original)
1 3 (original)

1 1 (modified)
2 1 (original) <-
1 3 (original)

1 3 (modified)
2 3 (modified)
1 3 (original) <-

While removing, we now need to check for the original / modified flag as well.
Basically for all the records with identical (BI AI) image, we need to leave only the earliest and mark it with original only if all other identical elements were also original. Otherwise (if at least one of them was modified), the remaining element needs to be marked modified.

In our example we remove the third element 1 3 (original) and leave the first one 1 3 (modified) (marking it modified despite the fact that it is already modified).
1 3 (modified)
2 3 (modified)

Now, when applying this optimized write-set, it should be considered OK (should not trigger an error) to encounter a element that does not have a BI corresponding to this element if it is marked modified.

My attention has been brought to this bug after checking Bug#115741 which looks like a duplicate.

I quickly tried reproducing with 8.0.39 without success (nor .30), but I was able to reproduce in 8.0.15.

Also, in 8.0 documentation [1] and [2], it is indicated that slave_rows_search_algorithms is deprecated (in 8.0.11), and it looks removed from 8.4.

[1]: https://dev.mysql.com/doc/mysql-replication-excerpt/8.0/en/replication-options-replica.htm...

[2]: https://dev.mysql.com/doc/mysql-replication-excerpt/8.0/en/replication-features-row-search...

It would be good to have clarity on the status of this bug:

- if it is fixed, the bug should be updated with the version and release-note quote;

- if it is not fixed, updated repro should be shared, adding the way 8.4 and 9.0 are impacted.

Thanks, J-F

Reading this bug report in more details, things were done on this since 8.0.15 (it was in Varun Nagaraju's comment).  Summarizing below.

mysql-8.0.17 - Feb 27, 2019 - Bug#28846386: RBR + STORED FUNCTION WITHOUT PRIMARY KEY - CAN'T FIND RECORD IN
https://github.com/mysql/mysql-server/commit/dfd8d7d89bfb7cbde7c0e45827bcb611dcb8f36e

mysql-8.0.26 - May 5,  2021 - Bug#32221703 ERROR APPLYING EVENT UPDATING SAME ROW MULTIPLE TIMES WHEN USING PARTIAL JSON
https://github.com/mysql/mysql-server/commit/af5bfba0e28cd6839ce4e902ca89b023244efca0

mysql-8.0.31 - Jun 6, 2022  - Bug#34114296: Update_rows apply failure if event updates same row twice and replica extra PK
https://github.com/mysql/mysql-server/commit/1d2ad39cf2e7d978ebdf162f6b31cd3a13cfcb6c

Thanks to Jan Lindström's comment in Bug#115741, I was able to repro with 8.0.39, 8.4.2 and 9.0.1.  The workaround in 8.0.39 is to set slave_rows_search_algorithms to 'TABLE_SCAN,INDEX_SCAN'.  I did not find a workaround for 8.4.2 and 9.0.1, so it looks slave_rows_search_algorithms was deprecated and removed too early.

dbdeployer deploy replication 8.0.39

# s1 will break with the default (INDEX_SCAN,HASH_SCAN), and s2 will not break with below.
./s2 <<< "set global slave_rows_search_algorithms = 'TABLE_SCAN,INDEX_SCAN'"

./m <<< "create database test_jfg"
./m test_jfg <<< "
  CREATE TABLE t1 (i INT, INDEX t1_i(i));
  delimiter |
  CREATE FUNCTION f1 () RETURNS INT DETERMINISTIC BEGIN
    UPDATE t1 SET i = 0 WHERE i = 1;
    UPDATE t1 SET i = 2 WHERE i = 0;
    UPDATE t1 SET i = 3 WHERE i = 2;
    UPDATE t1 SET i = 4 WHERE i = 3;
    UPDATE t1 SET i = 5 WHERE i = 4;
    RETURN 0;
  END|
  delimiter ;
  INSERT INTO t1 VALUES (1);
  SELECT f1();" > /dev/null

./s1 <<< "show replica status\G" | grep "Last_SQL_Error:"
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000001, end_log_pos 9808. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

./s1 -N performance_schema <<< "
  select LAST_ERROR_MESSAGE
  from replication_applier_status_by_worker
  where LAST_ERROR_MESSAGE <> ''"
Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000001, end_log_pos 9808; Could not execute Update_rows event on table test_jfg.t1; Can't find record in 't1', Error_code: 1032; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's source log mysql-bin.000001, end_log_pos 9808

./s2 <<< "show replica status\G" | grep "Last_SQL_Error:"
               Last_SQL_Error: 

# Below also breaks for 9.0.1.
dbdeployer deploy multiple 8.4.2

./n1 <<< "RESET BINARY LOGS AND GTIDS"
./n1 <<< "CREATE USER 'repl'@'%' IDENTIFIED BY 'password'"
./n1 -u root <<< "GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%'"

port=$(./n1 -N <<< "select @@global.port")
sql="change replication source to SOURCE_HOST='127.0.0.1', SOURCE_PORT=$port"
sql="$sql, SOURCE_USER='repl', SOURCE_PASSWORD='password', SOURCE_SSL=1"
./n2 <<< "$sql; start replica";

./n1 <<< "create database test_jfg"
./n1 test_jfg <<< "
  CREATE TABLE t1 (i INT, INDEX t1_i(i));
  delimiter |
  CREATE FUNCTION f1 () RETURNS INT DETERMINISTIC BEGIN
    UPDATE t1 SET i = 0 WHERE i = 1;
    UPDATE t1 SET i = 2 WHERE i = 0;
    UPDATE t1 SET i = 3 WHERE i = 2;
    UPDATE t1 SET i = 4 WHERE i = 3;
    UPDATE t1 SET i = 5 WHERE i = 4;
    RETURN 0;
  END|
  delimiter ;
  INSERT INTO t1 VALUES (1);
  SELECT f1();" > /dev/null

./n2 <<< "show replica status\G" | grep "Last_SQL_Error:"
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000001, end_log_pos 2173. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

./n2 -N performance_schema <<< "
  select LAST_ERROR_MESSAGE
  from replication_applier_status_by_worker
  where LAST_ERROR_MESSAGE <> ''"
Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000001, end_log_pos 2173; Could not execute Update_rows event on table test_jfg.t1; Can't find record in 't1', Error_code: 1032; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's source log mysql-bin.000001, end_log_pos 2173

Another workaround for 8.0.39 once we have the error is to change slave_rows_search_algorithms and restart replication like below.

./s1 <<< "
  set global slave_rows_search_algorithms = 'TABLE_SCAN,INDEX_SCAN';
  start replica;"

Related: Bug#106918.

Am I hitting this bug or is this a different scenario? This is how I'm able to reproduce:

- replica has slave_rows_search_algorithms = 'index_scan,hash_scan' (just 'hash_scan' also repros)

- source:

```
mysql> create table t(a varchar(10), b varchar(10), key(a));
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t(a,b) values('aa','xx'),('xx','aa'),('Aa','yy'),('yy','Aa');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> delete from t where a='aa' or b='aa';
Query OK, 4 rows affected (0.01 sec)
```

It doesn't repro if I remove key(a) from the schema.