MySQL Bugs: #107916: Contribution by Tencent: restart fail if --innodb-force-recovery=1 after killed

Bug #107916	Contribution by Tencent: restart fail if --innodb-force-recovery=1 after killed
Submitted:	20 Jul 2022 6:32	Modified:	27 Jul 2022 12:33
Reporter:	qingxin sun (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
when mysql do lots of trx commits, clone_sys thread insert the gtids into mysql.gtid_executed table.
if the server is killed before clone sys thread inserting, and new gtids is write into binlog, the server will restart fail if innodb_force_recovery is 1, err log looks like:

2022-07-19T12:18:12.897149Z 0 [ERROR] [MY-012803] [InnoDB] innodb_force_recovery is on. We do not allow database mod
ifications by the user. Shut down mysqld and edit my.cnf to set innodb_force_recovery=0
mysqld: Operation not allowed when innodb_force_recovery > 0.
2022-07-19T12:18:12.898247Z 0 [Note] [MY-011980] [InnoDB] GTID compression after recovery. 
2022-07-19T12:18:12.899262Z 0 [ERROR] [MY-011980] [InnoDB] Error persisting GTIDs to table
2022-07-19T12:18:12.912966Z 0 [Note] [MY-012922] [InnoDB] Waiting for purge to start
2022-07-19T12:18:12.913979Z 0 [Note] [MY-011825] [InnoDB] NCDB: purge coordinator will be blocked for 10 seconds
2022-07-19T12:18:12.964098Z 0 [Note] [MY-011825] [InnoDB] ncdb finish start innodb, max space id: 1
2022-07-19T12:18:12.979679Z 0 [ERROR] [MY-012803] [InnoDB] innodb_force_recovery is on. We do not allow database mod
ifications by the user. Shut down mysqld and edit my.cnf to set innodb_force_recovery=0
mysqld: Operation not allowed when innodb_force_recovery > 0.
2022-07-19T12:18:12.981819Z 0 [ERROR] [MY-010119] [Server] Aborting
2022-07-19T12:18:12.982056Z 0 [Note] [MY-011245] [Server] Plugin mysqlx reported: 'Scheduler "work" stopped.'
2022-07-19T12:18:12.982132Z 0 [Note] [MY-011245] [Server] Plugin mysqlx reported: 'Scheduler "network" stopped.'
2022-07-19T12:18:12.983302Z 0 [Note] [MY-012330] [InnoDB] FTS optimize thread exiting.
2022-07-19T12:18:12.983540Z 0 [Note] [MY-011825] [InnoDB] dict stats thread finish wait for cost module cache finish
2022-07-19T12:18:13.828904Z 0 [Note] [MY-011825] [InnoDB] NCDB: master server thread for Master[IP:localhost, ID:100
001, TOKEN:1606526704] exit.
2022-07-19T12:18:13.915054Z 0 [Note] [MY-011825] [InnoDB] NCDB: purge coordinator released for system shutting down
2022-07-19T12:18:13.916387Z 0 [Note] [MY-010120] [Server] Binlog end

Both ERROR report [ERROR] [MY-012803] [InnoDB] is caused by Gtid_table_persistor::write_row function under --innodb-force-recovery=1

mysqld will write the newest gtids in binlog into mysql.gtid_executed table when restart at gtid_state->save(&gtids_in_binlog_not_in_table)
but the inserting is not allowd when --innodb-force-recovery=1, stopped by 
row_insert_for_mysql_using_ins_graph at
 "srv_force_recovery &&
             !(srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN &&
               dict_sys_t::is_dd_table_id(prebuilt->table->id)) "

How to repeat:
delay clone sys thread inserting by adding os_thread_sleep in  Clone_persist_gtid::flush_gtids

MTR:
diff --git a/mysql-test/r/srv_force_recovery.result b/mysql-test/r/srv_force_recovery.result
new file mode 100644
index 000000000..5935fe135
--- /dev/null
+++ b/mysql-test/r/srv_force_recovery.result
@@ -0,0 +1,11 @@
+use test;
+create table test(id int);
+insert into test values (1);
+insert into test values (1);
+# Kill and restart:--innodb-force-recovery=1
+select * from test.test;
+id
+1
+1
+# Kill and restart:--innodb-force-recovery=0
+drop table test.test;
diff --git a/mysql-test/t/srv_force_recovery-master.opt b/mysql-test/t/srv_force_recovery-master.opt
new file mode 100644
index 000000000..5d266e329
--- /dev/null
+++ b/mysql-test/t/srv_force_recovery-master.opt
@@ -0,0 +1,8 @@
+--tencent-root-cnf=$MYSQL_TEST_DIR/tmy.conf
+--admin-port=13008
+--admin-address=*
+--cdb_admin_port_allow_for_normal_user=off
+--log-bin=mysql-bin
+--binlog-format=row
+--gtid-mode=ON
+--enforce-gtid-consistency
diff --git a/mysql-test/t/srv_force_recovery.test b/mysql-test/t/srv_force_recovery.test
new file mode 100644
index 000000000..a7a2d9e4c
--- /dev/null
+++ b/mysql-test/t/srv_force_recovery.test
@@ -0,0 +1,22 @@
+#
+# test srv_force_recovery == 1 restart
+#
+
+--source include/have_debug.inc
+
+use test;
+create table test(id int);
+
+insert into test values (1);
+insert into test values (1);
+
+sleep 1;
+
+--let $restart_parameters=restart:--innodb-force-recovery=1
+--source include/kill_and_restart_mysqld.inc
+
+select * from test.test;
+
+--let $restart_parameters=restart:--innodb-force-recovery=0
+--source include/kill_and_restart_mysqld.inc
+drop table test.test;
\ No newline at end of file
diff --git a/storage/innobase/clone/clone0repl.cc b/storage/innobase/clone/clone0repl.cc
index d75b6e3f2..3b15ee62f 100644
--- a/storage/innobase/clone/clone0repl.cc
+++ b/storage/innobase/clone/clone0repl.cc
@@ -503,6 +503,7 @@ void Clone_persist_gtid::flush_gtids(THD *thd) {
     auto flush_list_number = switch_active_list();
     /* Exit trx mutex during write to table. */
     unlock();
+    os_thread_sleep(10000000); //delay insert
     err = write_to_table(flush_list_number, table_gtid_set, sid_map);
     m_flush_in_progress.store(false);
     /* Compress always after recovery, if GTIDs are added. */

Suggested fix:
skip gtids flush if innodb_force_recovery is not 0

Hi Mr. sun,

Thank you for your bug report.

However, your text is not entirely clear.

Please, elaborate much more on the need for skipping some functions and the need to introduce such a huge sleep in the clone source code.

Thank you in advance.

Hi Mr. sun,

We made lot's of analysis and concluded that fixing the bug would be feasible.

Verified as reported.