MySQL Bugs: #11217: Mysqld not connected to cluster error message missleading (4009)

Bug #11217	Mysqld not connected to cluster error message missleading (4009)
Submitted:	9 Jun 2005 18:42	Modified:	23 Oct 2008 4:24
Reporter:	Jonathan Miller	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	4.1	OS:	Linux (Linux)
Assigned to:	Martin Skold	CPU Architecture:	Any

Description:
Having just restarted the cluster leaving the mysqld process up during, I wanted to see what type of error message I would get if I tried to create a tables using NDB engine.

The error message inside the mysql client was:
mysql> create table t1 (c1 int, PRIMARY KEY(c1))ENGINE=NDB;
ERROR 1005 (HY000): Can't create table './test/t1.frm' (errno: 4009)

Not bad, but being a good MySQLer I wanted to know what a 4009 was:

./perror --ndb 4009
OS error code 4009:  Cluster Failure: Unknown result: Unknown result error

This error message is not what I would expect. I would expect an error stating that there was no connection to the NDBCLUSTER or send to NDB failed.

This would leave me to believe that my cluster had failed, but that is not the case.
Once the mysqld process is restarted all is fine.

How to repeat:
Restarted the cluster leaving the mysqld process up during.
Login into the one of the mysqld processes.
create table t1 (c1 int, PRIMARY KEY(c1))ENGINE=NDB;
bin/perror --ndb 4009

Suggested fix:
an error stating that there was no connection to the NDBCLUSTER or send to NDB failed.

assigning to martin for him to decide what to do about it

Using only the NDB API there is no way of determining the cause of
no reply from cluster. If it is being restarted, the management server (ndb_mgmd)
knows about it, so using the management client interface (already linked into
mysqld togeteher with NDB API), one could check with the management server
if a restart is in progress and return a different error code in that case.

I'm wondering: Can't the message returned by perror be a liilte more descriptive? It could at least give a hint that there is something wrong with the communication between the SQL node and the cluster.

now reports: ERROR 157 (HY000): Could not connect to storage engine
not sure whether there already was a changelog entry for this so
setting to "Documenting" for now ...

Any idea when the change took place?

Fixed in
Bug #18676 Missleading error message when trying to create table when cluster is down
Pushed into 5.1.19-beta

Tagged Bug#18676 changelog entry to indicate that fix resolved this bug also.

(According to developer notes for that bug, user-facing change took place in 5.0.40/5.1.18 - even though further internal changes were made in 5.1.41/5.1.19, the changelog entry is tagged for the releases where the user-visible fix took place.)