Bug #53297 innobackup timeout while waiting to get readlock
Submitted: 29 Apr 2010 20:57 Modified: 15 Feb 2011 14:55
Reporter: Victor Kirkebo Email Updates:
Status: Unsupported Impact on me:
Category:MySQL Enterprise Backup Severity:S3 (Non-critical)
Version:3.1 OS:Any
Assigned to: CPU Architecture:Any

[29 Apr 2010 20:57] Victor Kirkebo
innobackup issues "FLUSH TABLES WITH READ LOCK" in order to lock MyISAM data before copying the data. If there is load running on the database innobackup might time out while waiting to get a global readlock.
This behavior is not new but it might affect the ability to perform a hot backup on a running system.

How to repeat:
1) Download the attached 'bug_test.tar.gz' file and untar it in the mysql-test 
2) If running in a source tree, edit the run_wait' script and change the value
3) Download ibbackup<License id> and innobackup<License id> from http://www.innodb.com/products/hot-backup/order/order/ and http://www.innodb.com/download/innobackup
   Make sure that these apps are renamed to ibbackup and innobackup.
   Copy the apps to CLIENT_PATH (see above)
4) Run the ./run_wait script (it will take about 20 minutes to complete)
5) You will observe output similar to the following:
   Thu Apr 29 21:53:40 CEST 2010 - Creating the Flight_Stat_test database and schema
   Thu Apr 29 21:53:41 CEST 2010 - Altering the tables
   Thu Apr 29 21:53:42 CEST 2010 - Loading the data
   Thu Apr 29 21:53:45 CEST 2010 - Adding procedures triggers  and IUDS tables
   Thu Apr 29 21:53:56 CEST 2010 - Started IUDS clients (in the background)
   Thu Apr 29 21:54:26 CEST 2010 - Running a backup

6) Shortly after the 'Running a backup' message appears, connect with a mysql client 
   and run 'show processlist' and observe the backup is waiting as long as
   the other clients are running.
   Alternatively run the following from the mysql client:
   | ID   | USER | HOST            | DB    | COMMAND | TIME | STATE                   | INFO                        |
   | 1019 | root | localhost:55282 | mysql | Query   |  903 | Waiting to get readlock | FLUSH TABLES WITH READ LOCK |

7) Observe that the backup under 4) fails with an error message similar to this:
   innobackup: Error: Connection to mysql child process (pid=2484) timedout. (Time limit of 900 seconds exceeded. You may adjust time limit by editing the value of parameter "$mysql_response_timeout" in this script.) while waiting for reply to MySQL request: 'FLUSH TABLES WITH READ LOCK;' at /export/home/mysql-advanced-5.1.46-solaris10-x86_64/bin/innobackup line 465.

Suggested fix:
As backup is expected to be an 'online' backup and systems are expected to be under load for extended durations, backup should be able to be performed during other activity.
[29 Apr 2010 20:58] Victor Kirkebo
scripts and data for reproducing the bug

Attachment: bug_test.tar.gz (application/x-gzip, text), 444.91 KiB.

[30 Apr 2010 6:59] Sveta Smirnova
Seems to be Solaris issue as not repeatable on Linux.
[30 Apr 2010 10:35] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with mysql-advanced-5.1.46-solaris10-x86_64 and InnoDB Hot Backup from 3.0 branch from SVN repository. Could you please if version from SVN works in your environment and if not provide additional circumstances need to repeat the bug.
[5 May 2010 8:06] Victor Kirkebo
The issue is that innobackup tries to take the server offline with FLUSH TABLES WITH READ LOCK and that this might fail if you have a heavy transaction load or some very long transactions. This is the case for any innobackup version and with any MySQL server. I don't know how hard it is to make innobackup work as an online backup tool - at least it should be online if you are only interested in backing up innodb tables since this can already be achieved with the ibbackup application.

"How to repeat" with a long running dummy transaction:
In the run_wait script used in "How to repeat" add these lines of code where the stress load is being run:

echo "$(date) - Create a long transaction (in the background)"
$CLIENT_PATH/mysql --user=root --port=10740 --protocol=tcp -e"create database tst; use tst; create table longtx(i int); select count(*)+sleep(1200) from longtx;" &
[5 May 2010 9:07] Sveta Smirnova
Thank you for the feedback.

Verified as described.
[15 Feb 2011 14:55] Sanjay Manwani
Thank you for taking the time to report a problem.  Unfortunately you are not using a current version of the product you reported a problem with -- the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.