Description:
consider two tables, a and b, where b references a via a foreign key constraint. a user attempts a multi-table delete on both, referencing a parent and child row.
-- WITHOUT cascade option in fk, it fails with the following error
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`ops_ia`.`b`, CONSTRAINT `asdf` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`))
-- this ideally would not error and the multi table delete would work as it's deleting the child row so not actually violating anything, but it's not that critical. the fix is tied into the more serious case, below:
-- WITH cascade option, it's catastrophic, takes all slaves down
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Delete_rows event on table ops_ia.b; Can't find record in 'b', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log spare-002-bin.000032, end_log_pos 408791771
-- this is probably because it deletes the record from 'a' with the first row event, which cascades down to 'b', then the second row event can't find it. slaves shouldn't be processing cascades, though
-- mysqlbinlog output, what the slave sees:
#190411 23:24:26 server id 184354198 end_log_pos 408791523 CRC32 0x490c3efc GTID last_committed=84998 sequence_number=84999 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ff0d4d97-2bf9-11e9-a7a6-008cfa1eefb4:2694913'/*!*/;
#190411 23:24:26 server id 184354198 end_log_pos 408791599 CRC32 0x4fb55be0 Query thread_id=229293 exec_time=0 error_code=0
SET TIMESTAMP=1555025066/*!*/;
SET @@session.pseudo_thread_id=229293/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1342701568/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
#190411 23:24:26 server id 184354198 end_log_pos 408791645 CRC32 0x18192d09 Table_map: `ops_ia`.`a` mapped to number 3917
#190411 23:24:26 server id 184354198 end_log_pos 408791691 CRC32 0xff6192ab Table_map: `ops_ia`.`b` mapped to number 3919
#190411 23:24:26 server id 184354198 end_log_pos 408791731 CRC32 0xaa406089 Delete_rows: table id 3917
#190411 23:24:26 server id 184354198 end_log_pos 408791771 CRC32 0xb8647900 Delete_rows: table id 3919 flags: STMT_END_F
BINLOG '
qsyvXBOWBf0KLgAAAF2qXRgAAE0PAAAAAAMABm9wc19pYQABYQABAwAACS0ZGA==
qsyvXBOWBf0KLgAAAIuqXRgAAE8PAAAAAAMABm9wc19pYQABYgABAwAAq5Jh/w==
qsyvXCCWBf0KKAAAALOqXRgAAE0PAAAAAAAAAgABAf4BAAAAiWBAqg==
qsyvXCCWBf0KKAAAANuqXRgAAE8PAAAAAAEAAgABAf4BAAAAAHlkuA==
'/*!*/;
### DELETE FROM `ops_ia`.`a`
### WHERE
### @1=1
### DELETE FROM `ops_ia`.`b`
### WHERE
### @1=1
#190411 23:24:26 server id 184354198 end_log_pos 408791802 CRC32 0xaea87584 Xid = 37138747
COMMIT/*!*/;
How to repeat:
-- ON MASTER:
CREATE TABLE `a` (
`id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `b` (
`a_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`a_id`),
CONSTRAINT `asdf` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into a set id=1;
insert into b set a_id=1;
delete a,b
from a
join b on b.a_id=a.id
where b.a_id=1;
-- check slaves, they will be stopped on error
ALTERNATE case [still with cascade, without no binlog is written because error on master], if I run:
delete from a where id=1;
then it logs only the 'a' row, and the slave processes the cascade.
#190411 23:34:24 server id 184354198 end_log_pos 408793209 CRC32 0xdf9d3bda GTID last_committed=85005 sequence_number=85006 rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ff0d4d97-2bf9-11e9-a7a6-008cfa1eefb4:2694920'/*!*/;
#190411 23:34:24 server id 184354198 end_log_pos 408793285 CRC32 0xa3f787bd Query thread_id=229315 exec_time=0 error_code=0
SET TIMESTAMP=1555025664/*!*/;
SET @@session.pseudo_thread_id=229315/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1342701568/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
#190411 23:34:24 server id 184354198 end_log_pos 408793331 CRC32 0xc5b28ff1 Table_map: `ops_ia`.`a` mapped to number 3917
#190411 23:34:24 server id 184354198 end_log_pos 408793371 CRC32 0x3f15ead7 Delete_rows: table id 3917 flags: STMT_END_F
BINLOG '
AM+vXBOWBf0KLgAAAPOwXRgAAE0PAAAAAAMABm9wc19pYQABYQABAwAA8Y+yxQ==
AM+vXCCWBf0KKAAAABuxXRgAAE0PAAAAAAEAAgABAf4BAAAA1+oVPw==
'/*!*/;
### DELETE FROM `ops_ia`.`a`
### WHERE
### @1=1
#190411 23:34:24 server id 184354198 end_log_pos 408793402 CRC32 0xc33d10f6 Xid = 37139216
COMMIT/*!*/;
Suggested fix:
Two fixes:
1. process cascades on the master and log the results as row events, vs processing them on slaves. this ensures all changes are deterministic (for example - if slaves have different constraints than master). this would be the same way triggers are currently handled (and everything else in the RBR world).
2. make multi-table deletes aware of constraints, basically so the no-cascade-fk option described would succeed, that would have the effect of making the cascade option succeed as well
Doing only #2 should solve this specific issue, but leaves a large risk of other issues, because nondeterministic replication is bad juju.