Bug #28804 Unknown error returned on a query using joins.
Submitted: 31 May 2007 12:20 Modified: 15 Sep 2007 13:05
Reporter: Geert Vanderkelen
Status: Closed
Category:Server: Cluster Severity:S2 (Serious)
Version:5.0.40 OS:Any
Assigned to: Bugs System Target Version:5.1.23
Tags: bfsm_2007_06_21, uknown, join, ndb

[31 May 2007 12:20] Geert Vanderkelen
Description:
A query using joins between tables having lots of data is returning Uknown Error after a
while. In my case it takes about 350 seconds, but on another data set it is reported to
take more than 450 seconds. So, apparently not being a timeout.

Tables are using BLOB fields (TEXT) but converting them to VARCHAR doesn't change
anything.

Cluster itself is doing fine, no errors. Reducing the queries so the result set is
smaller make the query work 'OK'.

How to repeat:
Repeatable, but provided privately.

Suggested fix:
At least say what's wrong, if at all possible.
[31 May 2007 12:23] Geert Vanderkelen
Verified using 5.0.34 (on which it was originally reported) and 5.0.40.
[31 May 2007 12:25] Geert Vanderkelen
Configuration used with 2 data nodes:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=4G
IndexMemory=300M

MaxNoOfOrderedIndexes = 256
MaxNoOfUniqueHashIndexes = 256

MaxNoOfConcurrentOperations = 300000
MaxNoOfConcurrentIndexOperations = 32000
NoOfFragmentLogFiles = 32
TimeBetweenLocalCheckpoints = 18
[20 Jun 2007 5:11] Stewart Smith
have reproduced.

spends maybe 5 minutes doing something, then does nothing for another 5 (or so), then
unknown error.

now working on finding what "something" is and why we stop doing "something"
[20 Jun 2007 9:00] Stewart Smith
timeout during ha_ndbcluster::unique_index_read error being lost (not returned upwards),
and next time there is an error detected, we translate 0 (no error) to 1 unknown error.

So really two bugs:

1) loss of correct error
2) why we're timing out.

The missing error is 4012,
$ perror --ndb 4012
NDB error code 4012: Request ndbd time-out, maybe due to high load or communication
problems: Unknown result: Unknown result error

Which is certainly interesting... as I'm running on a single machine (so no
communications problems) and not too high load...

The timeout that is being hit is set by:

  ndb_mgm_configuration_iterator iter(* props, CFG_SECTION_NODE);
...
  Uint32 timeout = 120000;
  iter.first();
  for (iter.first(); iter.valid(); iter.next())
  {
    Uint32 tmp1 = 0, tmp2 = 0;
    iter.get(CFG_DB_TRANSACTION_CHECK_INTERVAL, &tmp1);
    iter.get(CFG_DB_TRANSACTION_DEADLOCK_TIMEOUT, &tmp2);
    tmp1 += tmp2;
    if (tmp1 > timeout)
      timeout = tmp1;
  }

(where timeout is milliseconds).

So it's either 120,000 (2 minutes) OR the maximum
TimeBetweenInactiveTransactionAbortCheck+TransactionDeadlockDetectionTimeout across all
nodes.

The default TimeBetweenInactiveTransactionAbortCheck is 1000
The default TransactionDeadlockDetectionTimeout is 1200

So the default here would be 2200... which is LESS than 120,000, so 2minutes is
selected.

But where this timout is called, we're using 3 * timeout... so really 6 minutes.

Which is consistent with what I've been seeing.
[21 Jun 2007 18:52] Stewart Smith
So further investigation into this has lead to some interesting points... but it ended up
having to solve 2) first to really get a grip on 1). But the good news here is, I've been
making good progress.

There is this resource in NDB known as "TransactionBufferMemory" and, by default, this is
set to 1MB in the config file. This memory is used for a tiny amount of time during unique
index lookups for storing some key information in. Only under intense parallel query load
should this ever need increasing.

However.. my current theory is that we have a bug where in the case of a unique index
lookup does not find a match, we keep this memory until the end of the transaction, and
not just until the end of that operation.

Increasing TransactionBufferMemory does help alleviate the problem, but (in my test) even
raising it to 10 or 100MB didn't allow the query to finish (although it ran for a while
longer). So using this as a work around is not practical.

I currently have a work-in-progress patch here that does seem to free the memory at the
right place and the query has been running for at least 18+ times longer than it has
before for me (it's still running).

So I'm:
a) rather hopeful that this is the correct fix
b) 4 tracks into The Beatles white and it's still running (where previously it would have
stopped well within the first track).
c) tomorrow, see the result of the query and start shaping the patch into something
sensible and discuss further with other members of the development team.
[29 Jun 2007 19:17] Stewart Smith
patch 01 in series

Attachment: improve_timeout_waiting_for_reply_handling.patch (text/x-patch), 2.35 KiB.

[29 Jun 2007 19:17] Stewart Smith
patch 02 in series

Attachment: fix_ha_ndbcluster_error_on_index_read.patch (text/x-patch), 1005 bytes.

[29 Jun 2007 19:17] Stewart Smith
patch 03 in series

Attachment: release_indexop_on_op_complete_not_txn.patch (text/x-patch), 896 bytes.

[29 Jun 2007 19:18] Stewart Smith
patch 04 (of 4) in series

Attachment: send_reply_on_failed_saveINDXATTRINFO.patch (text/x-patch), 1.21 KiB.

[16 Jul 2007 7:40] Stewart Smith
New patches posted to commits:

http://lists.mysql.com/commits/30943

fixes based on review
[13 Aug 2007 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32429

ChangeSet@1.2473, 2007-08-13 09:22:42+02:00, jonas@perch.ndb.mysql.com +8 -0
  ndb - bug#28804
    Handle out of transaction buffer in TC for INDX lookups
[14 Aug 2007 5:47] Stewart Smith
I'm okay with the extra patch that Jonas added.

Autotest run seems good (better than the last time we ran a 5.0 tree)
[24 Aug 2007 7:45] Jonas Oreland
pushed to 50-ndb
[24 Aug 2007 7:46] Stewart Smith
jonas has pushed to 5.0-ndb
[7 Sep 2007 21:04] Jon Stephens
Documented in mysql-5.1.22-ndb-6.2.5 changelog as

              A query using joins between several large tables and
              requiring unique index lookups failed to complete,
              eventually returning <errortext>Uknown Error</errortext>
              after a very long period of time. This occurred due to
              inadequate handling of instances where the Transaction
              Coordinator ran out of
              <literal>TransactionBufferMemory</literal>, when the
              cluster should have returned NDB error code 4012
              (<errortext>Request ndbd time-out</errortext>). (Bug #28804)

Left in PQ status pending pushes to mainline trees.
[14 Sep 2007 18:25] Bugs System
Pushed into 5.0.50
[14 Sep 2007 18:26] Bugs System
Pushed into 5.1.23-beta
[15 Sep 2007 13:05] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of
that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available
version, including the bug fix. More information about accessing the source trees is
available at

    http://dev.mysql.com/doc/en/installing-source.html

Also documented fix in 5.0.50 and 5.1.23 changelogs.