Bug #28804 Unknown error returned on a query using joins.
Submitted: 31 May 2007 10:20 Modified: 15 Sep 2007 11:05
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0.40 OS:Any
Assigned to: Stewart Smith CPU Architecture:Any
Tags: bfsm_2007_06_21, join, ndb, uknown

[31 May 2007 10:20] Geert Vanderkelen
Description:
A query using joins between tables having lots of data is returning Uknown Error after a while. In my case it takes about 350 seconds, but on another data set it is reported to take more than 450 seconds. So, apparently not being a timeout.

Tables are using BLOB fields (TEXT) but converting them to VARCHAR doesn't change anything.

Cluster itself is doing fine, no errors. Reducing the queries so the result set is smaller make the query work 'OK'.

How to repeat:
Repeatable, but provided privately.

Suggested fix:
At least say what's wrong, if at all possible.
[31 May 2007 10:23] Geert Vanderkelen
Verified using 5.0.34 (on which it was originally reported) and 5.0.40.
[31 May 2007 10:25] Geert Vanderkelen
Configuration used with 2 data nodes:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=4G
IndexMemory=300M

MaxNoOfOrderedIndexes = 256
MaxNoOfUniqueHashIndexes = 256

MaxNoOfConcurrentOperations = 300000
MaxNoOfConcurrentIndexOperations = 32000
NoOfFragmentLogFiles = 32
TimeBetweenLocalCheckpoints = 18
[20 Jun 2007 3:11] Stewart Smith
have reproduced.

spends maybe 5 minutes doing something, then does nothing for another 5 (or so), then unknown error.

now working on finding what "something" is and why we stop doing "something"
[20 Jun 2007 7:00] Stewart Smith
timeout during ha_ndbcluster::unique_index_read error being lost (not returned upwards), and next time there is an error detected, we translate 0 (no error) to 1 unknown error.

So really two bugs:

1) loss of correct error
2) why we're timing out.

The missing error is 4012,
$ perror --ndb 4012
NDB error code 4012: Request ndbd time-out, maybe due to high load or communication problems: Unknown result: Unknown result error

Which is certainly interesting... as I'm running on a single machine (so no communications problems) and not too high load...

The timeout that is being hit is set by:

  ndb_mgm_configuration_iterator iter(* props, CFG_SECTION_NODE);
...
  Uint32 timeout = 120000;
  iter.first();
  for (iter.first(); iter.valid(); iter.next())
  {
    Uint32 tmp1 = 0, tmp2 = 0;
    iter.get(CFG_DB_TRANSACTION_CHECK_INTERVAL, &tmp1);
    iter.get(CFG_DB_TRANSACTION_DEADLOCK_TIMEOUT, &tmp2);
    tmp1 += tmp2;
    if (tmp1 > timeout)
      timeout = tmp1;
  }

(where timeout is milliseconds).

So it's either 120,000 (2 minutes) OR the maximum TimeBetweenInactiveTransactionAbortCheck+TransactionDeadlockDetectionTimeout across all nodes.

The default TimeBetweenInactiveTransactionAbortCheck is 1000
The default TransactionDeadlockDetectionTimeout is 1200

So the default here would be 2200... which is LESS than 120,000, so 2minutes is selected.

But where this timout is called, we're using 3 * timeout... so really 6 minutes.

Which is consistent with what I've been seeing.
[21 Jun 2007 16:52] Stewart Smith
So further investigation into this has lead to some interesting points... but it ended up having to solve 2) first to really get a grip on 1). But the good news here is, I've been making good progress.

There is this resource in NDB known as "TransactionBufferMemory" and, by default, this is set to 1MB in the config file. This memory is used for a tiny amount of time during unique index lookups for storing some key information in. Only under intense parallel query load should this ever need increasing.

However.. my current theory is that we have a bug where in the case of a unique index lookup does not find a match, we keep this memory until the end of the transaction, and not just until the end of that operation.

Increasing TransactionBufferMemory does help alleviate the problem, but (in my test) even raising it to 10 or 100MB didn't allow the query to finish (although it ran for a while longer). So using this as a work around is not practical.

I currently have a work-in-progress patch here that does seem to free the memory at the right place and the query has been running for at least 18+ times longer than it has before for me (it's still running).

So I'm:
a) rather hopeful that this is the correct fix
b) 4 tracks into The Beatles white and it's still running (where previously it would have stopped well within the first track).
c) tomorrow, see the result of the query and start shaping the patch into something sensible and discuss further with other members of the development team.
[29 Jun 2007 17:17] Stewart Smith
patch 01 in series

Attachment: improve_timeout_waiting_for_reply_handling.patch (text/x-patch), 2.35 KiB.

[29 Jun 2007 17:17] Stewart Smith
patch 02 in series

Attachment: fix_ha_ndbcluster_error_on_index_read.patch (text/x-patch), 1005 bytes.

[29 Jun 2007 17:17] Stewart Smith
patch 03 in series

Attachment: release_indexop_on_op_complete_not_txn.patch (text/x-patch), 896 bytes.

[29 Jun 2007 17:18] Stewart Smith
patch 04 (of 4) in series

Attachment: send_reply_on_failed_saveINDXATTRINFO.patch (text/x-patch), 1.21 KiB.

[16 Jul 2007 5:40] Stewart Smith
New patches posted to commits:

http://lists.mysql.com/commits/30943

fixes based on review
[13 Aug 2007 7:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32429

ChangeSet@1.2473, 2007-08-13 09:22:42+02:00, jonas@perch.ndb.mysql.com +8 -0
  ndb - bug#28804
    Handle out of transaction buffer in TC for INDX lookups
[14 Aug 2007 3:47] Stewart Smith
I'm okay with the extra patch that Jonas added.

Autotest run seems good (better than the last time we ran a 5.0 tree)
[24 Aug 2007 5:45] Jonas Oreland
pushed to 50-ndb
[24 Aug 2007 5:46] Stewart Smith
jonas has pushed to 5.0-ndb
[7 Sep 2007 19:04] Jon Stephens
Documented in mysql-5.1.22-ndb-6.2.5 changelog as

              A query using joins between several large tables and
              requiring unique index lookups failed to complete,
              eventually returning <errortext>Uknown Error</errortext>
              after a very long period of time. This occurred due to
              inadequate handling of instances where the Transaction
              Coordinator ran out of
              <literal>TransactionBufferMemory</literal>, when the
              cluster should have returned NDB error code 4012
              (<errortext>Request ndbd time-out</errortext>). (Bug #28804)

Left in PQ status pending pushes to mainline trees.
[14 Sep 2007 16:25] Bugs System
Pushed into 5.0.50
[14 Sep 2007 16:26] Bugs System
Pushed into 5.1.23-beta
[15 Sep 2007 11:05] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Also documented fix in 5.0.50 and 5.1.23 changelogs.