Bug #39152 Intermittent debug_sync time-outs in backup tests
Submitted: 1 Sep 2008 9:12 Modified: 21 Sep 2008 17:14
Reporter: Øystein Grøvlen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Øystein Grøvlen CPU Architecture:Any

[1 Sep 2008 9:12] Øystein Grøvlen
Description:
backup_progress tests using debug_sync show intermittent timeouts in pushbuild.  Debug sync also regularly times out in other backup tests like backup_ddl_blocker and backup_commit_blocker (ref bug#38213).

If you look at the test output below, restore is reported to be running when it is supposed to be halted while starting.  It seems to me that restore manage to pass sync points without stopping.  

--- /export/home/pushbuild/pb/bzr_mysql-6.0-backup-merge/14/mysql-6.0.7-alpha-pb14/mysql-test/r/backup_progress.result Mon Aug 11 22:31:14 2008
+++ /export/home/pushbuild/pb/bzr_mysql-6.0-backup-merge/14/mysql-6.0.7-alpha-pb14/mysql-test/r/backup_progress.reject Tue Aug 12 22:10:57 2008
@@ -128,6 +128,8 @@
 RESTORE FROM 'backup_progress_orig.bak';
 con1: Wait for the restore to be started.
 SET DEBUG_SYNC= 'now WAIT_FOR started';
+Warnings:
+Warning    1720    debug sync point wait timed out
 con1: Display progress
 select * from backup_progress.t1_res;
 id
@@ -135,13 +137,15 @@
 INSERT INTO backup_progress.t1_res (id) VALUES (@bup_id);
 SELECT backup_state FROM mysql.online_backup AS ob JOIN backup_progress.t1_res as t1 ON ob.backup_id = t1.id;
 backup_state
-starting
+running
 con1: Let restore step to running state.
 SET DEBUG_SYNC= 'now SIGNAL do_run WAIT_FOR running';
+Warnings:
+Warning    1720    debug sync point wait timed out
 con1: Display progress
 SELECT backup_state FROM mysql.online_backup AS ob JOIN backup_progress.t1_res as t1 ON ob.backup_id = t1.id;
 backup_state
-running
+complete
 con1: Let restore do its job and finish.
 SET DEBUG_SYNC= 'now SIGNAL finish';
 con2: Finish restore command

How to repeat:
Run backup_progress repeatedly until it fails.  With a loop in a shell script, it took 15 minutes (about 30 runs) before it failed the first time.  Next time, it took several hours before I got a time-out.
[5 Sep 2008 9:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53308

2693 Oystein Grovlen	2008-09-05
      Bug#39152 Intermittent debug_sync time-outs in backup tests
      Timeouts have occurred due to race conditions in debug_sync
      Make sure all accesses to debug_sync_global.ds_signal is protected
      by mutex.
[5 Sep 2008 9:12] Øystein Grøvlen
When repeatedly running backup_progress, I got a timeout failure within some hours.

With the proposed fix, I have been able to run for days without getting a timeout.
[5 Sep 2008 15:14] Rafal Somla
Good to push.
[8 Sep 2008 9:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53491

2694 Oystein Grovlen	2008-09-08
      Bug#39152 Intermittent debug_sync time-outs in backup tests
      Timeouts have occurred due to race conditions in debug_sync
      Make sure all accesses to debug_sync_global.ds_signal is protected
      by mutex.
[8 Sep 2008 9:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53492

2694 Oystein Grovlen	2008-09-08
      Bug#39152 Intermittent debug_sync time-outs in backup tests
      Timeouts have occurred due to race conditions in debug_sync
      Make sure all accesses to debug_sync_global.ds_signal is protected
      by mutex.
[8 Sep 2008 9:30] Øystein Grøvlen
Patch has been committed to mysql-6.0-backup tree.
[20 Sep 2008 12:08] Øystein Grøvlen
Pushed into main tree for 6.0.8.
[21 Sep 2008 17:14] Paul DuBois
Changes related to test cases. No changelog entry needed.