Bug #54945 Ndbrequire during scan due to other node failure
Submitted: 1 Jul 2010 21:25 Modified: 31 Aug 2010 12:16
Reporter: Andrew Hutchings Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: 6.3.29

[1 Jul 2010 21:25] Andrew Hutchings
Description:
Time: Tuesday 29 June 2010 - 15:38:57
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 7937) 0x0000000a
Program: ndbd
Pid: 5226
Trace: /local/cudb/mysql/ndbd/data/ndb_3_trace.log.1
Version: mysql-5.1.39 ndb-6.3.29-GA

Appears to have happened during scan operation when the other node failed.

How to repeat:
.
[5 Jul 2010 8:36] Alexey Asemov
Having the same. How to repeat: do a rolling restart, during one of the nodes restarting it will fail all the cluster and force a system restart.
[5 Jul 2010 8:37] Alexey Asemov
Forgot to mention: this happens with 7.1.4b
[5 Jul 2010 8:41] Andrew Hutchings
Hello Alexey,

Can you please upload an ndb_error_reporter output for this?  So that:
a) we can verify it is this code path failing
b) create a repeatable test case (we have only observed this once so far and haven't repeated it yet)
[6 Aug 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[31 Aug 2010 8:12] Jonas Oreland
Patch (for 7.1) which adds printouts when this happens

Attachment: bug54945.printout.patch (application/octet-stream, text), 558 bytes.

[31 Aug 2010 8:13] Jonas Oreland
Hello Alexey,

I added a patch that adds printout if this happens,
can you apply and reproduce and upload ndb_error_report ??

Please,

/Jonas
[31 Aug 2010 10:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117194

3271 Jonas Oreland	2010-08-31
      ndb - bug#54945 - fix 1) incorrect early abort of scan 2) incorrect handling of WAIT_AI_SCAN during node-failure handling
[31 Aug 2010 11:00] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.47-ndb-6.3.38 (revid:jonas@mysql.com-20100831104036-qyyalbwdkzkb2ymh) (version source revid:jonas@mysql.com-20100831104036-qyyalbwdkzkb2ymh) (merge vers: 5.1.47-ndb-6.3.38) (pib:21)
[31 Aug 2010 11:00] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.47-ndb-7.0.19 (revid:jonas@mysql.com-20100831105501-e010qej2ae05zqio) (version source revid:jonas@mysql.com-20100831105501-e010qej2ae05zqio) (merge vers: 5.1.47-ndb-7.0.19) (pib:21)
[31 Aug 2010 11:01] Jonas Oreland
pushed to 6.3.38, 7.0.19 and 7.1.8
[31 Aug 2010 12:16] Jon Stephens
Documented bugfix int he NDB-6.3.38, 7.0.19, and 7.1.8 changelogs, as follows:

        The failure of a data node during some scans could cause other
        data nodes to fail.

Closed.