Bug #57335 abnormal termination of the backup process could leave the global lock active
Submitted: 8 Oct 2010 6:35 Modified: 28 Jan 2011 18:42
Reporter: Victor Kirkebo Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Backup Severity:S3 (Non-critical)
Version:3.5.2 OS:Any
Assigned to: Satya B CPU Architecture:Any

[8 Oct 2010 6:35] Victor Kirkebo
Description:
If you start a backup and the backup process terminates abnormally it might fail to release the global lock it acquired earlier when it issued "FLUSH TABLES WITH READ LOCK".

An example:
I started a backup on a running system and several hours later I came back and realized that the backup had failed. Specifically, mysqlbackup must have failed at the start of the backup of non-InnoDB tables since ibbackup had already produced the file ibbackup_suspended, the "FLUSH TABLES.." command had been issued but the backup image did not yet contain any non-InnoDB files.
From the processlist I could see that the global lock had not been released as it would have been had mysqlbackup terminated normally.

mysql> select * from backup_progress;
+-------------------+-------------+------------+---------------+---------------------+-------------------------------------------------------------------+
| backup_id         | tool_name   | error_code | error_message | current_time        | current_state                                                     |
+-------------------+-------------+------------+---------------+---------------------+-------------------------------------------------------------------+
| 12864972008011550 | mysqlbackup |          0 | NO_ERROR      | 2010-10-08 02:20:03 | Started mysqlbackup.                                              |
| 12864972008011550 | mysqlbackup |          0 | NO_ERROR      | 2010-10-08 02:20:03 | mysqlbackup starting ibbackup.                                    |
| 12864972008011550 | mysqlbackup |          0 | NO_ERROR      | 2010-10-08 02:20:03 | mysqlbackup pausing ibbackup.                                     |
| 12864972008011550 | mysqlbackup |          0 | NO_ERROR      | 2010-10-08 02:20:13 | mysqlbackup locking tables and copying .frm + other engines data. |
+-------------------+-------------+------------+---------------+---------------------+-------------------------------------------------------------------+

mysql> show processlist;
+------+-----------------+-----------------+------------------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id   | User            | Host            | db               | Command     | Time  | State                                                                 | Info                                                                                                 |
+------+-----------------+-----------------+------------------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|    4 | root            | localhost:49235 | NULL             | Binlog Dump | 20765 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL                                                                                                 |
|    9 | root            | localhost:49242 | NULL             | Sleep       |     3 |                                                                       | NULL                                                                                                 |
|   12 | event_scheduler | localhost       | NULL             | Daemon      | 20027 | Waiting for release of readlock                                       | NULL                                                                                                 |
| 3180 | root            | localhost:60672 | mysql            | Sleep       | 20150 |                                                                       | NULL                                                                                                 |
| 3213 | root            | localhost:60705 | mysql            | Query       | 20149 | Waiting for global metadata lock                                      | FLUSH TABLES WITH READ LOCK                                                                          |
| 3272 | systuser        | localhost:60759 | celosia_features | Query       | 20137 | Waiting for release of readlock                                       | CREATE OR REPLACE VIEW v_celosia_ddl AS
SELECT t1.id as t1_id, t2.id as t2_id, t1.user_num as t1_use |
| 3283 | root            | localhost:60771 | systest1         | Query       | 20137 | Waiting for release of readlock                                       | insert into tb0_eng1 (f1,f2,f3,f4 ) values
(int_rand(@f1_nums), concat('I:', NAME_CONST('str_pr',_la |
| 3289 | systuser        | localhost:60777 | test             | Query       | 20137 | Waiting for release of readlock                                       | CREATE TEMPORARY TABLE betony_features.load_data
AS SELECT user_num, name FROM betony_features.t_pa |
| 3290 | root            | localhost:60778 | systest1         | Query       | 20137 | Waiting for release of readlock                                       | insert into tb5_eng1 (f1,f2,f3,f4 ) values
(int_rand(@f1_nums), concat('I:', NAME_CONST('str_pr',_la |
| 3292 | systuser        | localhost:60780 | systest2         | Query       | 20137 | Waiting for release of readlock                                       | insert into tbstr0_eng1 (f1) values (concat('insert before implicit commit by create event (', @tbst |
| 3293 | root            | localhost:60781 | systest1         | Query       | 20136 | Waiting for release of readlock                                       | delete from table_logs.tb1_logs where i1 <  NAME_CONST('del_row_id',109421)                          |
| 3295 | systuser        | localhost:60784 | systest1         | Query       | 20120 | Waiting for release of readlock                                       | CREATE TEMPORARY TABLE systest1.t1_tmp AS
SELECT ROUND(RAND() * @max_val + 0.5) AS i1,
@connection_i |
| 4032 | root            | localhost:57540 | mysql            | Query       |     0 | NULL                                                                  | show processlist                                                                                     |
+------+-----------------+-----------------+------------------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

How to repeat:
Either:
 Run a backup
 Try to kill the backup process while in progress
or
 Read the source code and verify that "UNLOCK TABLES" may fail to be executed if mysqlbackup terminates abnormally

Suggested fix:
Can we add some signal handling for abnormal program termination where "UNLOCK TABLES" is called if necessary?
[11 Oct 2010 10:49] Susanne Ebrecht
Is it reproducable?
[11 Oct 2010 11:17] Victor Kirkebo
I order to reproduce this:
1) I started a database and with a client session I locked one table.
2) Then I started a backup and with a client I checked the processlist until I saw that the command FLUSH TABLES WITH READ LOCK appeared in the state "Waiting for global metadata lock" (5.5) or "Waiting for global read lock" (5.1). The backup will wait in this state because I locked one table in 1).
3) I killed the backup processes (mysqlbackup and ibbackup).
4) I checked the processlist and saw that FLUSH TABLES WITH READ LOCK was still waiting to get the global read lock even if the backup process had been killed.
5) I also saw other clients waiting now waiting in the state "Waiting for release of readlock" because of "FLUSH TABLES WITH READ LOCK".

In order to fix the problem I simply had to unlock the table that I locked in 1) above. However, I think it would be much better if the whole "FLUSH TABLES WITH READ LOCK" went away by itself when the backup process was killed but maybe this is not possible to get done from a client such as mysqlbackup.
[12 Oct 2010 6:46] Satya B
when Innobackup perl is killed, which in turn kills the mysql client. mysql client has signal handler which sends KILL QUERY before exiting.

With Innobackup C port (mysqlbackup), we use mysql c api not the mysql client binary. So the handling of signals should be done manually.
[12 Oct 2010 12:13] Satya B
this is what mysql client does:

mysql> FLUSH TABLES WITH READ LOCK;
^CCtrl-C -- sending "KILL QUERY 10" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted
mysql>
[15 Oct 2010 11:26] Satya B
revno: 247
revision-id: satya.bn@sun.com-20101015112113-zw6up1nnhaq8hf46
[15 Oct 2010 11:29] Satya B
Notes to Doc: Please Document this.

If kill -9 used to kill the mysqlbackup process, the user has manually check for the 'FLUSH TABLES WITH READ LOCK' query and should do 'UNLOCK TABLES' using the mysql client.
[15 Oct 2010 11:35] Satya B
Sorry, UNLOCK TABLES will not have any effect, since it is not from the same connection which has issued 'FTWRL'. 

KILL QUERY should be used to kill the 'FTWRL' query.

Please rephrase the above statements a bit :)
[28 Jan 2011 18:42] John Russell
Added to 3.5.2 manual:

If the mysqlbackup process is interrupted, such as by a Unix kill -9
command, a FLUSH TABLES WITH READ LOCK operation might remain
running. In this case, use the KILL QUERY command from the mysql
command line to kill the FLUSH TABLES WITH READ LOCK statement. This
issue is more likely to occur if the FLUSH TABLES operation is
stalled by a long-running query or transaction. Refer to <mysqlbackup introduction section> for guidelines about backup timing and performance.