MySQL Bugs: #89094: Data inconsistency on master after upgrading to 5.7.19

Bug #89094	Data inconsistency on master after upgrading to 5.7.19
Submitted:	2 Jan 2018 14:06	Modified:	3 Jan 2018 8:07
Reporter:	František Petružálek	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S2 (Serious)
Version:	5.7.19	OS:	Debian
Assigned to:		CPU Architecture:	Any

Description:
Short version:
After upgrading MySQL server from 5.6.* to 5.7.19 (in whole replication cluster) some rows on master server are not deleted, but on slaves they are deleted.

Long version:
There are three tables "alert_url", "alert_item" and "alert_changes":

CREATE TABLE `alert_url` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  ...
  PRIMARY KEY (`id`),
  ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

CREATE TABLE `alert_item` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  ...
  PRIMARY KEY (`id`),
  ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

CREATE TABLE `alert_changes` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `alert_item_id` int(10) unsigned NOT NULL,
  ...
  PRIMARY KEY (`id`),
  KEY `fk_alert_changes_alert_item1` (`alert_item_id`),
  ...
  CONSTRAINT `fk_alert_changes_alert_item1` FOREIGN KEY (`alert_item_id`) REFERENCES `alert_item` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

When some records are deleted in "alert_url" table, foreign key arrange to delete appropriate records in "aler_item" and "alert_change" tables, but on master server some records in "alert_changes" table are not deleted.

For example. Original delete statement on "alert_url" table in binary log in statement format looks like:

# at 82160214
#171207 21:25:20 server id 100  end_log_pos 82160442 CRC32 0xc17c926f 	Query	thread_id=249148048	exec_time=0	error_code=0
SET TIMESTAMP=1512678320/*!*/;
DELETE FROM alert_url WHERE id IN ('1581706','1581707','1581708','1581726','1581729','1581730','1581753','1581757','1581758','1581761','1581763','1581765')

Only records of one alert_url.id record is affected. For this example it is alert_url.id = 1581753.
Delete statement on "alert_item" table in binary log is logged in row format and looks like:

# at 82092025
#171207 21:25:19 server id 100  end_log_pos 82092366 CRC32 0x8785cea2 	Delete_rows: table id 13681122 flags: STMT_END_F
### DELETE FROM `datacore`.`alert_item`
### WHERE
###   @1=20358178
###   @2=1581753
...
### UPDATE `datacore`.`alert_url`
### WHERE
###   @1=1581753
...
### SET
###   @1=1581753
...
### DELETE FROM `datacore`.`alert_item`
### WHERE
###   @1=20358179
###   @2=1581753
...
### DELETE FROM `datacore`.`alert_item`
### WHERE
###   @1=20358180
###   @2=1581753
...

Undeleted record of alert_changes table is relative to alert_item.id = 20358178.
Interesting is that the delete statement of alert_item record is the first in group before update which is performed by after delete trigger.
This trigger is defined as follows:

             Trigger: ad_alert_item
               Event: DELETE
               Table: alert_item
           Statement: BEGIN
	UPDATE alert_url set `last_update` = NOW(), `points_sended` = NULL WHERE `id` = OLD.alert_url_id;
END
              Timing: AFTER
             Created: NULL
            sql_mode: NO_ENGINE_SUBSTITUTION
             Definer: *****@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: utf8_general_ci

Other interesting thing is that this trigger has defined with NO_ENGINE_SUBSTITUTION but other triggers on these tables has empty sql_mode.

Last thing we think might affect it is binlog_format set to MIXED. The documentation mentions that a mixed format can cause similar problems. We are not sure that this is exactly the situation because it started after the MySQL server upgrade. This bug appeared after upgrading MySQL server from 5.6.* to 5.7.19.

How to repeat:
We don't know how to repeat this.

Suggested fix:
Properly working replication, what's done on master should be done on slaves.

Thank you for the bug report. You have not provide a repeatable test case so we are not able to check it, just the description of the issue isn't enough for. Comment here when you will able to provide a test case including the my.cnf files.