Bug #30379 TC timeout check isn't very random
Submitted: 13 Aug 2007 1:55 Modified: 6 Nov 2007 8:45
Reporter: Stewart Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Stewart Smith CPU Architecture:Any

[13 Aug 2007 1:55] Stewart Smith
Description:
Better randomise time before retry in timeout check (DBTC)

timoOutLoopStartLab() checks if any transactions have been delayed
for so long that we are forced to perform some action (e.g. abort,
resend etc).

It is *MEANT* to (according to the comment):
> To avoid aborting both transactions in a deadlock detected by time-out
> we insert a random extra time-out of upto 630 ms by using the lowest
> six bits of the api connect reference.
> We spread it out from 0 to 630 ms if base time-out is larger than 3 sec,
> we spread it out from 0 to 70 ms if base time-out is smaller than 300 msec,
> and otherwise we spread it out 310 ms.

The comment (as all do) lies.

the API connect reference is not very random, producing incredibly
predictable "random" numbers. This could lead to both txns being
aborted instead of just one.

Before: 
timeout value: 123 3
timeout value: 122 2
timeout value: 122 2
timeout value: 122 2
timeout value: 123 3

After:
timeout value: 127 7
timeout value: 126 6
timeout value: 129 9
timeout value: 139 19
timeout value: 137 17
timeout value: 151 31
timeout value: 130 10
timeout value: 132 12

How to repeat:
do foo

Suggested fix:
see patch
[13 Aug 2007 1:56] Stewart Smith
make TC timeout randomisation more random

Attachment: tc_timeout.patch (text/x-patch), 5.19 KiB.

[25 Sep 2007 10:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34543

ChangeSet@1.2482, 2007-09-25 12:01:23+02:00, stewart@flamingspork.com +4 -0
  [PATCH] BUG#30379 Better randomise time before retry in timeout check (DBTC)
  
  timoOutLoopStartLab() checks if any transactions have been delayed
  for so long that we are forced to perform some action (e.g. abort,
  resend etc).
  
  It is *MEANT* to (according to the comment):
  > To avoid aborting both transactions in a deadlock detected by time-out
  > we insert a random extra time-out of upto 630 ms by using the lowest
  > six bits of the api connect reference.
  > We spread it out from 0 to 630 ms if base time-out is larger than 3 sec,
  > we spread it out from 0 to 70 ms if base time-out is smaller than 300 msec,
  > and otherwise we spread it out 310 ms.
  
  The comment (as all do) lies.
  
  the API connect reference is not very random, producing incredibly
  predictable "random" numbers. This could lead to both txns being
  aborted instead of just one.
  
  Before:
  timeout value: 123 3
  timeout value: 122 2
  timeout value: 122 2
  timeout value: 122 2
  timeout value: 123 3
  
  After:
  timeout value: 127 7
  timeout value: 126 6
  timeout value: 129 9
  timeout value: 139 19
  timeout value: 137 17
  timeout value: 151 31
  timeout value: 130 10
  timeout value: 132 12
  
  Index: ndb-work/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
  ===================================================================
[25 Sep 2007 10:03] Stewart Smith
Okayed by Jonas, pushed to 5.0-ndb
[10 Oct 2007 9:24] Jon Stephens
Documented bugfix in mysql-5.1.22-ndb-6.2.7 changelog as follows:

            Transaction timeouts were not handled well in some
            circumstances, leading to excessive number of transactions
            being aborted unnecessarily.

Left in Patch Queued status pending further merges.
[15 Oct 2007 17:54] Jon Stephens
Also documented in mysql-5.1.22-ndb-6.3.4 changelog; left Patch Queued status.
[5 Nov 2007 13:56] Bugs System
Pushed into 5.1.23-rc
[5 Nov 2007 13:58] Bugs System
Pushed into 5.0.52
[6 Nov 2007 8:45] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Now documented in 5.0.52 and 5.1.23 changelogs. Closed.