Bug #94827 Error 1032 could be more informative (handler error HA_ERR_KEY_NOT_FOUND)
Submitted: 29 Mar 2019 7:47 Modified: 29 Mar 2019 9:19
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.25 OS:CentOS (7)
Assigned to: CPU Architecture:x86
Tags: Error 1032, replication

[29 Mar 2019 7:47] Simon Mudd
During an RBR failure I see something like this:

               Last_SQL_Error: Could not execute Update_rows event on table some_db.some_table; Can't find record in 'some_table', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000012 end_log_pos 999999

This really is not helpful as it provides no context as to the actual problem.

It would be most convenient if this error message were slightly smarter and showed the WHERE clause (or condition) that failed as this makes tracing the upstream (on the master) row more quickly.

If using minimal RBR the "where condition" will be the primary key value (or a unique key value) and even if using full RBR providing the full row details would be good (or at least part of it).

Failing that there's no tooling to quickly identify the failed update so that this can be fixed y retrieving the data from the master and making a correction.

Such tooling to help with RBR failure would be most useful.

How to repeat:
See above.

If such a situation happens it's often much easier to "fix" the problem than rebuild the whole server. There is documentation which can be used to figure out a procedure to do this but there is no tooling in place to make the diagnostics of the broken row easier nor to fix that broken row so that things can continue.

Suggested fix:
* improve the error message to provide more context on the "missing row", or the change that is being attempted but failed. When using SBR this information was available as a truncated INSERT, UPDATE or DELETE statement was shown.
* provide tooling for capturing the broken row and recovering the state (or at least to avoid us all having to write our own individual tooling to do this)
[29 Mar 2019 7:50] Simon Mudd
This isn't the first time I've seen it but it does mean I have to trawl the relay logs to find the row update that failed (no tooling), and then find the details of that row on the master (no tooling), and then insert (without writing to binlogs) the upstream data into the slave, and then let replication continue (no tooling).
[29 Mar 2019 9:19] MySQL Verification Team
Hello Simon,

Thank you for the report and feedback.
Verifying this as a reasonable feature request to improve the error message in the reported scenario.

[7 Oct 2020 10:46] Hendrik Woltersdorf
I get the same messages, when we replicate a table without primary key from ndbcluster to inndodb. Inserts and Update work, but cause extreme cpu load and replication lag, when some replication backlog has accumulated.
Version: 5.6.47-ndb-7.4.27-cluster-gpl
[7 Oct 2020 12:04] MySQL Verification Team
Hendrik, you describe https://bugs.mysql.com/bug.php?id=53375
Summary: row_format=ROW or MIXED, every table should have a PRIMARY KEY.