Bug #39159 Timeout while server connects to cluster ("ndb_not_readonly.inc")
Submitted: 1 Sep 2008 15:08 Modified: 19 Sep 2009 19:32
Reporter: Joerg Bruehe Email Updates:
Status: No Feedback Impact on me:
None 
Category:Tests: Cluster Severity:S7 (Test Cases)
Version:Cluster 6.3.17 OS:Linux (RPM, SuSE 9, x86_64)
Assigned to: CPU Architecture:Any

[1 Sep 2008 15:08] Joerg Bruehe
Description:
Found while analysing the test failures of the 6.3.17 build:

We have 91 cases of tests aborting with this message:

=====
<<testname>> [ fail ]

mysqltest: In included file "./include/ndb_not_readonly.inc": At line 25: Failed while waiting for mysqld to come out of readonly mode

Stopping All Servers
=====

90 of them occurred on the SuSE9-x86_64-RPM platform,
1 on the RedHat3-IA64-RPM platform.

It happened in all 3 configurations ("cluster-gpl", "cluster-com", and "cluster-com-pro")
and in various runs ("ndb", "ndb+rpl_ndb+ps", "funcs1+ps", "partitions"

There are other failures related to this include file which I will report separately.

-----

Only some of these reports were followed by additional information:

Warnings from just before the error:
Error 1146 Table 'mysql.ndb_schema' doesn't exist
(Tests "ndb_storedproc_03", "ndb_restore_compat", "rpl_ndb_ddl", "rpl_ndb_delete_nowhere", "rpl_ndb_sp003", "rpl_ndb_trig004")

Warnings from just before the error:
Error 1296 Got error 4009 'Cluster Failure' from NDB
Error 1296 Got error 157 'Unknown error code' from NDBCLUSTER
(Test "ndb_binlog_format")

The result from queries just before the failure was:
select * from mysql.ndb_apply_status;
More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_ndb_apply_status.log
(Test "rpl_ndb_apply_status")

The result from queries just before the failure was:
select * from mysql.ndb_apply_status;
More results from queries before failure can be found in /PATH/mysql-test/var/log/rpl_ndb_apply_status.log
Warnings from just before the error:
Error 1146 Table 'mysql.ndb_schema' doesn't exist
(Also test "rpl_ndb_apply_status", other config/run)

How to repeat:
This happened in a release build and its test runs.

Suggested fix:
Not really sure.

AIUI, this happens when 60 seconds (600 turns of a loop with a 0.1 s sleep) are not sufficient for the server to report successful execution of a command requiring access to NDB.

It would surprise me if this machine, and just this one, were so loaded that it could not set up that connection in time.
[19 Aug 2009 19:32] Sveta Smirnova
Joerg,

thank you for the report. Please provide information about how to repeat this. Probably link to pushbuild log would be enough.
[19 Sep 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".