Bug #39749 main.backup_timeout fails sporadically on OS X
Submitted: 30 Sep 2008 6:37 Modified: 2 Feb 2009 17:23
Reporter: Alexander Nozdrin Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Server Severity:S3 (Non-critical)
Version:6.0-TRUNK OS:MacOS (powermacg5)
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: pushbuild, sporadic, test failure, timeout

[30 Sep 2008 6:37] Alexander Nozdrin
Description:
main.backup_timeout fails sporadically on OS X:
mysqltest: At line 141: query 'reap' failed: 1726: The backup wait timeout has expired for query 'CREATE TABLE bup_ddl_blocker.t3 (col_a CHAR(40)) ENGINE=MEMORY'.

The result from queries just before the failure was:
< snip >
SET DEBUG_SYNC= 'now WAIT_FOR bup_blocked';
Set ddl timeout to 0 seconds
SET backup_wait_timeout = 0;
SHOW VARIABLES LIKE 'backup_wait%';
Variable_name	Value
backup_wait_timeout	0
con2: Try a ddl operation and it should expire
CREATE TABLE bup_ddl_blocker.t3 (col_a CHAR(40)) ENGINE=MEMORY;
ERROR HY000: The backup wait timeout has expired for query 'CREATE TABLE bup_ddl_blocker.t3 (col_a CHAR(40)) ENGINE=MEMORY'.
SET backup_wait_timeout = 100;
SHOW VARIABLES LIKE 'backup_wait%';
Variable_name	Value
backup_wait_timeout	0
con3: Try a ddl operation and it should not expire
CREATE TABLE bup_ddl_blocker.t3 (col_a CHAR(40)) ENGINE=MEMORY;
release the lock.
con5: Resume all.
SET DEBUG_SYNC= 'now SIGNAL timeout_done';
backup_id
#

How to repeat:
XREF: http://tinyurl.com/4ppp9y

Report after a failure in 6.0-falcon tree on Mon Sep 29 06:57:33:
http://tinyurl.com/3wc28x
[4 Dec 2008 15:13] Ingo Strüwing
The system variable "backup_wait_timeout" is a mystery to me.

- It's documentation in the reference manual is plain wrong.
  It is not BACKUP or RESTORE which time out on DDLs, but
  it is DDLs, which time out when BACKUP or RESTORE is running.

  However, it's a legitimate question: Why not both ways?
  The reason for the variable, as decribed in Bug #33414
  (Backup: DDL hangs indefinitely if ongoing backup),
  is to give the DBA an error message instead of letting his
  DDL hang for hours. Why wouldn't that apply to BACKUP and
  RESTORE too?

- The manual and initial description for the value of zero
  is that it means "no timeout". On May 1 once in a sudden
  it has been redefined as "timeout immediately".

  I found some email thread with a suggestion like this, but
  no agreement about it. There are some mentions of discussions
  with Peter, but they are not referenced anywhere. Digging in
  the mail archives, I found it. Anyway, this is poorly
  documented.

  The test case has a comment: "test timeout for a session with a
  timeout, and a session with no timeout (backup_wait_timeout = 0)",
  which suggests that zero means "no timeout". This is pretty
  confusing.

- I am sceptic about the default value. Especially as it cannot
  be set globally. It means that every DDL times out after 50
  seconds when BACKUP or RESTORE is running. If a user wants to
  change it, he needs to set it for every connection separately.

- When constructing a THD, we use a default value of 50, while
  on SET backup_wait_timeout=DEFAULT, we use
  BACKUP_WAIT_TIMEOUT_DEFAULT.

- The variable type is LONG and the maximum value is defined
  as LONG_MAX/1000 == 2147483, which is correct only on systems
  with 32-bit long. The factor of 1000 isn't explained, not
  intuitive, and I don't understand it. Why not limit it to
  LONG_MAX, or better even implicitly to ULONG_MAX?

- The use of ulong_value and ulonglong_value looks suspicious.
  Indeed we have Bug#40808 (The backup_wait_timeout variable is not
  working on powermac platform), which does probably fail on this
  inconsistency. And since backup_timeout.test fails on the same
  platform, because of a backup_wait_timeout value of zero, after
  assignment of 100, this is probably the same cause.

- I do not understand, why a session-only value is copied into
  thd->sys_var_tmp (in sys_var_backup_wait_timeout::value_ptr()).
  It looks like wasted cycles. Not a real issue, but still odd.
[4 Dec 2008 19:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60649

2737 Ingo Struewing	2008-12-04
      Bug#39749 - main.backup_timeout fails sporadically on OS X
      Bug#40808 - The backup_wait_timeout variable is not working
                  on powermac platform
      
      backup_timeout.test failed on POWER processors. Due to word ordering
      within a long long variable, the code was not portable between
      different processor types.
      
      Fixed by reworking the implementation of sys_var_backup_wait_timeout.
      
      Re-enabled backup_timeout.test.
      
      Included are unrelated fixes to get rid of compiler warnings.
[5 Dec 2008 20:41] Ingo Strüwing
Sorry, the patch doesn't work on sparc. Working on a new one.
[5 Dec 2008 21:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60781

2737 Ingo Struewing	2008-12-05
      Bug#39749 - main.backup_timeout fails sporadically on OS X
      Bug#40808 - The backup_wait_timeout variable is not working
                on powermac platform
      
      backup_timeout.test failed on POWER processors. Due to word ordering
      within a long long variable, the code was not portable between
      different processor types.
      
      Fixed by reworking the implementation of sys_var_backup_wait_timeout.
      
      Re-enabled backup_timeout.test.
      
      Included are unrelated fixes to get rid of compiler warnings.
[8 Dec 2008 20:47] Chuck Bell
Patch approved pending fix of compiler warning on Windows.

set_var.cc(3073) : warning C4244: '=' : conversion from 'ulonglong' to 'ulong', possible loss of data
[15 Dec 2008 10:00] Jørgen Løland
Good to push
[16 Dec 2008 11:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61753

2740 Ingo Struewing	2008-12-16
      Bug#39749 - main.backup_timeout fails sporadically on OS X
      Bug#40808 - The backup_wait_timeout variable is not working
              on powermac platform
      
      backup_timeout.test failed on POWER processors. Due to word ordering
      within a long long variable, the code was not portable between
      different processor types.
      
      Fixed by reworking the implementation of sys_var_backup_wait_timeout.
      
      Re-enabled backup_timeout.test.
      
      Included are unrelated fixes to get rid of compiler warnings.
[16 Dec 2008 11:56] Ingo Strüwing
Patch queued to 6.0-backup.
[2 Feb 2009 16:07] Bugs System
Pushed into 6.0.10-alpha (revid:sergefp@mysql.com-20090202090240-dlkxhmc1asrar5rl) (version source revid:sergefp@mysql.com-20090129100938-qvke7a9krg24l8pl) (merge vers: 6.0.10-alpha) (pib:6)
[2 Feb 2009 17:23] Paul DuBois
Noted in 6.0.10 changelog.

The implementation of the backup_wait_timeout system variable was 
machine dependent and did not work correctly on big-endian machines.

Also revised the documentation for backup_wait_timeout to say:
The number of seconds DDL statements wait for a BACKUP DATABASE or
RESTORE operation before aborting with an error. The default value is
50. A value of 0 means "immediate timeout."