Description:
Short version:
After upgrading MySQL server from 5.6.* to 5.7.19 (in whole replication cluster) some rows on master server are not deleted, but on slaves they are deleted.
Long version:
There are three tables "alert_url", "alert_item" and "alert_changes":
CREATE TABLE `alert_url` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (`id`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
CREATE TABLE `alert_item` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (`id`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
CREATE TABLE `alert_changes` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`alert_item_id` int(10) unsigned NOT NULL,
...
PRIMARY KEY (`id`),
KEY `fk_alert_changes_alert_item1` (`alert_item_id`),
...
CONSTRAINT `fk_alert_changes_alert_item1` FOREIGN KEY (`alert_item_id`) REFERENCES `alert_item` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
When some records are deleted in "alert_url" table, foreign key arrange to delete appropriate records in "aler_item" and "alert_change" tables, but on master server some records in "alert_changes" table are not deleted.
For example. Original delete statement on "alert_url" table in binary log in statement format looks like:
# at 82160214
#171207 21:25:20 server id 100 end_log_pos 82160442 CRC32 0xc17c926f Query thread_id=249148048 exec_time=0 error_code=0
SET TIMESTAMP=1512678320/*!*/;
DELETE FROM alert_url WHERE id IN ('1581706','1581707','1581708','1581726','1581729','1581730','1581753','1581757','1581758','1581761','1581763','1581765')
Only records of one alert_url.id record is affected. For this example it is alert_url.id = 1581753.
Delete statement on "alert_item" table in binary log is logged in row format and looks like:
# at 82092025
#171207 21:25:19 server id 100 end_log_pos 82092366 CRC32 0x8785cea2 Delete_rows: table id 13681122 flags: STMT_END_F
### DELETE FROM `datacore`.`alert_item`
### WHERE
### @1=20358178
### @2=1581753
...
### UPDATE `datacore`.`alert_url`
### WHERE
### @1=1581753
...
### SET
### @1=1581753
...
### DELETE FROM `datacore`.`alert_item`
### WHERE
### @1=20358179
### @2=1581753
...
### DELETE FROM `datacore`.`alert_item`
### WHERE
### @1=20358180
### @2=1581753
...
Undeleted record of alert_changes table is relative to alert_item.id = 20358178.
Interesting is that the delete statement of alert_item record is the first in group before update which is performed by after delete trigger.
This trigger is defined as follows:
Trigger: ad_alert_item
Event: DELETE
Table: alert_item
Statement: BEGIN
UPDATE alert_url set `last_update` = NOW(), `points_sended` = NULL WHERE `id` = OLD.alert_url_id;
END
Timing: AFTER
Created: NULL
sql_mode: NO_ENGINE_SUBSTITUTION
Definer: *****@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
Database Collation: utf8_general_ci
Other interesting thing is that this trigger has defined with NO_ENGINE_SUBSTITUTION but other triggers on these tables has empty sql_mode.
Last thing we think might affect it is binlog_format set to MIXED. The documentation mentions that a mixed format can cause similar problems. We are not sure that this is exactly the situation because it started after the MySQL server upgrade. This bug appeared after upgrading MySQL server from 5.6.* to 5.7.19.
How to repeat:
We don't know how to repeat this.
Suggested fix:
Properly working replication, what's done on master should be done on slaves.