Bug #85605 Corrupted index for ON DELETE CASCADE with self-referential ON DELETE SET NULL
Submitted: 24 Mar 2017 9:56 Modified: 24 Mar 2017 10:29
Reporter: Vladimir Bolshutkin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: cascade, corrupt, foreign key

[24 Mar 2017 9:56] Vladimir Bolshutkin
Description:
Since version 5.7 MySQL cannot handle a case, when one table has both ON DELETE CASCADE and self-referential ON DELETE SET NULL.

Consider the following structure

create table A (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

create table B (
  `id` int(10) unsigned NOT NULL,
  `a_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned NULL DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  CONSTRAINT `b_a_fk` FOREIGN KEY (`a_id`) REFERENCES `A` (`id`) ON DELETE CASCADE,
  CONSTRAINT `b_b_fk` FOREIGN KEY (`parent_id`) REFERENCES `B` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB;

insert into A (id, name) values (1, 'test_a');
insert into B (id, a_id, name) values (1, 1, 'test_b_root');
insert into B (id, a_id, parent_id, name) values (2, 1, 1, 'test_b_root');

delete from A where id = 1;

Record b#2 is to be deleted by cascade actions, and is to be updated because its parent is deleted by a cascade action, the table becomes in a corrupted state. Seems, MySQL removes a record from index, but leaves it in the table with parent_id set to NULL. 

check table B;

Shows, that the index is corrupted.

Workaround is to setup trigger to perform cascade deletion manually.

The issue reproduced on Windows 5.7.12 and Linux 5.7.17. I also checked multiple MySQL versions via docker, images up to 5.6 handled the case without problems.

How to repeat:
1. Connect to MySQL 5.7 or higher (no matter standalone or docker) and execute the following script

create database if not exists testbug;
use testbug;

DROP table if exists A;
create table A (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

DROP table if exists B;
create table B (
  `id` int(10) unsigned NOT NULL,
  `a_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned NULL DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  CONSTRAINT `b_a_fk` FOREIGN KEY (`a_id`) REFERENCES `A` (`id`) ON DELETE CASCADE,
  CONSTRAINT `b_b_fk` FOREIGN KEY (`parent_id`) REFERENCES `B` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB;

insert into A (id, name) values (1, 'test_a');
insert into B (id, a_id, name) values (1, 1, 'test_b_root');
insert into B (id, a_id, parent_id, name) values (2, 1, 1, 'test_b_root');

delete from A where id = 1;

check table B;

2. Notice the table is corrupted
[24 Mar 2017 10:29] MySQL Verification Team
Thank you for the bug report and test case.
[29 Mar 2017 5:50] MySQL Verification Team
Adding a note for searching purposes.   This asserts a debug build!!

Version: '5.7.19-debug'  socket: ''  port: 3306  (Built on 2017/03/03)
InnoDB: Assertion failure in thread 4668 in file row0upd.cc line 2592
InnoDB: Failing assertion: !rec_get_deleted_flag(btr_cur_get_rec(btr_cur), dict_table_is_comp(index->table))

mysqld-debug.exe!my_sigabrt_handler()[my_thr_init.c:449]
ucrtbased.dll!raise()
ucrtbased.dll!abort()
mysqld-debug.exe!ut_dbg_assertion_failed()[ut0dbg.cc:68]
mysqld-debug.exe!row_upd_clust_rec()[row0upd.cc:2591]
mysqld-debug.exe!row_upd_clust_step()[row0upd.cc:2947]
mysqld-debug.exe!row_upd()[row0upd.cc:3012]
mysqld-debug.exe!row_upd_step()[row0upd.cc:3158]
mysqld-debug.exe!row_update_for_mysql_using_upd_graph()[row0mysql.cc:2530]
mysqld-debug.exe!row_update_for_mysql()[row0mysql.cc:2728]
mysqld-debug.exe!ha_innobase::delete_row()[ha_innodb.cc:8312]
mysqld-debug.exe!handler::ha_delete_row()[handler.cc:8050]
mysqld-debug.exe!Sql_cmd_delete::mysql_delete()[sql_delete.cc:468]
mysqld-debug.exe!Sql_cmd_delete::execute()[sql_delete.cc:1382]
mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:3617]
mysqld-debug.exe!mysql_parse()[sql_parse.cc:5612]
mysqld-debug.exe!dispatch_command()[sql_parse.cc:1463]
mysqld-debug.exe!do_command()[sql_parse.cc:999]
mysqld-debug.exe!handle_connection()[connection_handler_per_thread.cc:300]
mysqld-debug.exe!pfs_spawn_thread()[pfs.cc:2190]
mysqld-debug.exe!win_thread_start()[my_thread.c:37]
[26 Jun 2017 18:15] Raman Blyshchyk
Faced absolutely same issue on MySQL v5.7.18, OS Ubuntu 16.10
[30 Aug 2017 10:31] Ralph Mauritz Løvgren
Any chance that this will be fixed in a near upcoming bug fix release? This is a major problem for us and we really would like to stick to 5.7 after we upgraded our production system to it this spring...
[25 Jun 2019 15:23] Tiberiu-Ionut Stan
We hit this bug, and we we're doing everything inside transactions (I know it problably doesn't matter).

When selecting or joining by a specific column in the table (other than the primary key), the deleted row is still returned by MySQL.

This "cascaded" into our network switches that we configured based on configuration from a table. 

***Additional bug information****
Dumping and reimporting the database, also dumped the "orphaned" rows.
The behaviour of these rows of now showing up on a simple select was preserved.
So dumping the "deleted" rows is also a bug in itself, and problably not an index bug.

The only way to fix the "corrupted" indexes was to select after the value in the nullable column (which was not null after deletion), then delete the rows using the primary key value as where search condition.

***Gravity***

This bug created havoc in our datacenters.

Debugging this was a real pain.
To arrive at this page, we had to actually realise how the poltergeist worked through painful debugging which made me question my reality for hours.

I think this is a serious issue with MySQL, which, at least for the company I work for, also had important security ramifications.

***To help future readers***
To properly delete the rows, we had to do this:

set group_concat_max_len = 99999999999;

SELECT 
    group_concat(network_id)
FROM networks
inner join instance_interfaces on instance_interfaces.server_interface_id = server_interfaces.server_interface_id 
left join servers on server_interfaces.server_id = servers.server_id
where 
	instance_id not in (select instance_id from instances)
;

delete from instance_interfaces where instance_interface_id in (226809,226810,226811,226812,226557,226558,226559,226560,226661,226662,226665,226666,226669,226670,226677,226678,226685,226686,226649,226650)

Where instance_interfaces.server_interface_id is a nullable FK.
[25 Jun 2019 15:24] Tiberiu-Ionut Stan
*The behaviour of these rows of **not** showing up on a simple select was preserved.
[25 Jun 2019 15:28] Tiberiu-Ionut Stan
*Sorry, these are the correct queries:

set group_concat_max_len = 99999999999;

SELECT 
    group_concat(instance_interface_id)
FROM server_interfaces
inner join instance_interfaces on instance_interfaces.server_interface_id = server_interfaces.server_interface_id 
left join servers on server_interfaces.server_id = servers.server_id
where 
     instance_id not in (select instance_id from instances)
;

delete from instance_interfaces where instance_interface_id in (226809,226810,226811,226812,226557,226558,226559,226560,226661,226662,226665,226666,226669,226670,226677,226678,226685,226686,226649,226650)

Where instance_interfaces.server_interface_id is a nullable FK.