Bug #86078 Bad Write Set tracking with UNIQUE KEY on a DELETE followed by an INSERT.
Submitted: 25 Apr 2017 19:14 Modified: 12 Apr 7:51
Reporter: Jean-François Gagné Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.1 OS:Any
Assigned to: CPU Architecture:Any

[25 Apr 2017 19:14] Jean-François Gagné
Description:
Hi,

I found an example of bad Write Set tracking on a UNIQUE KEY with DELETE followed by an INSERT.  With table my_table below, if we have transaction t1 that deletes a row followed by t2 inserting a new row on different primary key (auto_increment) but that have the same unique key, I get below binlog extract.

CREATE TABLE `my_table` (
  `a` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `b` int(10) unsigned,
  `c` int(10) unsigned,
  `d` varchar(80),
  `e` varchar(10),
  `f` timestamp,
  `g` timestamp,
  `h` binary(16),
  PRIMARY KEY (`a`),
  UNIQUE KEY `k1` (`b`,`c`,`d`),
  KEY `k2` (`h`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

#170415 23:49:04 ... last_committed=113398 sequence_number=113432 ...
#170415 23:49:04 ... last_committed=113398 sequence_number=113433 ...

Transaction with sequence_number=113433 (t2 which is an INSERT) should have last_committed=113432 (t1 which is the DELETE) but its last_committed is something else instead (113398).  This leads both transactions to be executed in parallel on slaves, where the INSERT might fail if it runs just before the DELETE.  When this happens, I get the following in SHOW SLAVE STATUS and in performance_schema.replication_applier_status_by_worker:

Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 9 failed executing transaction 'ANONYMOUS' at master log binlog.001124, end_log_pos 49846970. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

LAST_ERROR_MESSAGE: Worker 9 failed executing transaction 'ANONYMOUS' at master log binlog.001124, end_log_pos 49846970; Could not execute Write_rows event on table xxx.my_table; Duplicate entry 'i-j-str' for key 'k1', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.001124, end_log_pos 49846970

I had this case of bad interval generation many times and I am able to reproduce in production, but I am not able to reproduce on a test instance yet.  The clue I have so far is that the (wrong) last_committed of the INSERT (t2) is pointing to a DDL.

Thanks for looking into that,

JFG

How to repeat:
I am able to reproduce in production but not on a test instance.
[8 Jun 2017 6:48] Umesh Shastry
Hello Jean-François Gagné,

Thank you for the bug report.
Could you please provide exact reproducible test case, and conf file etc to reproduce this issue at our end? Thank you!

regards,
Umesh
[9 Jun 2017 8:56] Jean-François Gagné
Hi Umesh,

During my tests in April, I was able to reproduce the problem easily on my dataset and on my workload.  As written in [1], it happened 7 times on 5M transactions in one of my 7 test environments.

[1]: https://www.slideshare.net/JeanFranoisGagn/mysql-parallel-replication-logicalclock-all-the...

It also happened 3 times in 10 "test runs" on SSDs and 3 times on 15 "test runs" on magnetic disks ([2]).

[2]: https://www.slideshare.net/JeanFranoisGagn/mysql-parallel-replication-logicalclock-all-the...

However, I was not able to reproduce this on a "test" environment: sorry.

I will try to reproduce...

Best regards,

JFG
[15 Jun 2017 4:39] Umesh Shastry
Thank you Jean, verifying after confirming from Vitor to keep track of this internally.

regards,
Umesh
[2 Aug 2017 10:14] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.3:

When write sets are used for parallelization by a replication slave (as specified by the        binlog_transaction_dependency_tracking system variable), the case and accent sensitivity of the database are now taken into account when generating the write set information. Previously, duplicate keys could be incorrectly identified as different, causing transactions to have incorrect dependencies and so potentially be executed in the wrong order.
[4 Sep 2017 13:42] Nuno Carvalho
This bug was reopen, issues were found on the solution that required to withdraw it.
[30 Oct 2017 17:12] Margaret Fisher
Posted by developer:
 
Reinstated earlier changelog entry as MySQL 8.0.4 - closing.
[11 Jan 7:33] Jean-François Gagné
Related: Bug#89141.
[12 Jan 15:58] Jean-François Gagné
Related: Bug#89194.
[10 Apr 7:35] Margaret Fisher
Posted by developer:
 
Changelog entry marked for 5.7.22.
[12 Apr 7:51] Jean-François Gagné
See Bug#88120 for the 5.7 details.