Bug #108522 A failed clone cleanup is incomplete
Submitted: 16 Sep 2022 11:36 Modified: 16 Sep 2022 12:00
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Clone Plugin Severity:S3 (Non-critical)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: Clone

[16 Sep 2022 11:36] Laurynas Biveinis
Description:
The clone docs [1] say:

"If a failure occurs while cloning data, the cloning operation is rolled back and all cloned data removed."

But, if we crash the remote donor in the middle of data copy, the client will stop the operation without removing the clone data dir.

[1]: https://dev.mysql.com/doc/refman/8.0/en/clone-plugin-failure-handling.html

How to repeat:
Using MTR with a debug build for crash injection. Add a crash point on donor:

diff --git a/storage/innobase/clone/clone0clone.cc b/storage/innobase/clone/clone0clone.cc
index 97f1b2d33b8..1a34b317466 100644
--- a/storage/innobase/clone/clone0clone.cc
+++ b/storage/innobase/clone/clone0clone.cc
@@ -1821,6 +1821,9 @@ int Clone_Task_Manager::change_state(Clone_Task *task,
         << "Clone Apply State Change : Number of tasks = " << m_num_tasks;
   }
 
+  if (new_state == CLONE_SNAPSHOT_REDO_COPY) {
+    DBUG_EXECUTE_IF("clone_donor_page_copy_end_crash", DBUG_SUICIDE(););
+  }
   err = m_clone_snapshot->change_state(state_desc, m_next_state,
                                        task->m_current_buffer,
                                        task->m_buffer_alloc_len, cbk);

MTR test .cnf file:

[mysqld.1]
server_id=1

[mysqld.2]
server_id=2

[ENV]
SERVER_PORT_1 = @mysqld.1.port
SERVER_SOCK_1 = @mysqld.1.socket

SERVER_PORT_2 = @mysqld.2.port
SERVER_SOCK_2 = @mysqld.2.socket

MTR test .test file:

--source include/have_debug.inc

--let $CLONE_DATADIR = $MYSQL_TMP_DIR/data_new

--echo Donor:

CREATE TABLE t1(col1 INT PRIMARY KEY, col2 CHAR(64));

INSERT INTO t1 VALUES (10, 'clone row 1');

--replace_result $CLONE_PLUGIN CLONE_PLUGIN
--eval INSTALL PLUGIN clone SONAME '$CLONE_PLUGIN'

--echo Client:
--connect(clone_conn_1,127.0.0.1,root,,test,$SERVER_PORT_2)
--replace_result $CLONE_PLUGIN CLONE_PLUGIN
--eval INSTALL PLUGIN clone SONAME '$CLONE_PLUGIN'

--echo Donor:
--connection default

SET GLOBAL clone_donor_timeout_after_network_failure=0;

--eval SET GLOBAL DEBUG="+d,clone_donor_page_copy_end_crash"

--source include/expect_crash.inc

--connect(donor_conn_2,127.0.0.1,root,,test,$SERVER_PORT_1)

--connection clone_conn_1

--echo Client:

--let $HOST = 127.0.0.1
--let $PORT = $SERVER_PORT_1
--let $USER = root
--let remote_clone = 1
--let clone_remote_err=ER_CLONE_DONOR
--source ../mysql-test/suite/clone/include/clone_command.inc

--connection donor_conn_2
--disable_reconnect
--source include/wait_until_disconnected.inc
--source include/start_mysqld.inc

--connection default
--enable_reconnect
--source include/wait_until_connected_again.inc

--disconnect donor_conn_2
--connection clone_conn_1

# THIS SHOULD NOT BE REQUIRED:
# --force-rmdir $CLONE_DATADIR

--echo Donor:
--connection default

SET GLOBAL clone_donor_timeout_after_network_failure=0;

--connection clone_conn_1

--echo Client:
--let clone_remote_err=
--source ../mysql-test/suite/clone/include/clone_command.inc

--connection default

DROP TABLE t1;
UNINSTALL PLUGIN clone;

--force-rmdir $CLONE_DATADIR

The test will fail on the second clone attempt (that should be succeeding) with:
mysqltest: At line 138: Query 'CLONE INSTANCE FROM $USER@$HOST:$PORT IDENTIFIED BY '' $remote_dir_clause' failed.
ERROR 1007 (HY000): Can't create database '/path/to/var/tmp/data_new'; database exists

The data_new directory will exist and there will be t1.ibd inside too.

If the commented-out --force-rmdir line is uncommented, then the test passes.
[16 Sep 2022 12:00] MySQL Verification Team
Hello Laurynas,

Thank you for the report and feedback!

regards,
Umesh