MySQL Bugs: #17297: Fix error messages

Bug #17297	Fix error messages
Submitted:	10 Feb 2006 11:31	Modified:	13 Jul 2006 23:11
Reporter:	Jonathan Miller	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	5.1.7	OS:	Linux (Linux)
Assigned to:	Tomas Ulin	CPU Architecture:	Any

Description:
Ref: http://bugs.mysql.com/bug.php?id=13965
Patch: http://lists.mysql.com/commits/2176

Please change " error: %d. Most likely change of configution",
to  " error: %d. Likely invalid change of configuration",

How to repeat:
N/A

Suggested fix:
see above

Note: This bug report will be reused as needed for error messages that need to be changed so that the bug systems does not become littered with low level bug reports of just message changes.

Cluster Network Failure:
Did a test where I had 2 DN and 1 MGM on 3 hosts. I shutdown the port on one of the DN hosts as a HA test. Cluster stayed up as expected, but the error message below from the failed DN does not make sense for what happened.

Current byte-offset of file-pointer is: 568
Time: Saturday 11 February 2006 - 14:56:59
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 3749
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.7 (beta)
***EOM***

A Better message would be:
Message: Lost connection to other data nodes, unable to contact Arbitrator, shuting down

from 15878

Description:
Failing to restarting of a node too many times gives poor error message

Message: Array index out of range (Internal error, programming error or missing
error message, please report a bug)
Error: 2304
Error data: DbdihMain.cpp
Error object: DBDIH (Line: 8287) 0x0000000a

How to repeat:
have node restart fail in e.g. startphase 5, 8 times

Suggested fix:
	    /*--------------------------------------------------------------
	     * SINCE IT WAS NOT ALIVE AT THE TIME OF THE SYSTEM CRASH THIS IS 
	     * A COMPLETELY NEW REPLICA. WE WILL SET THE CREATE GCI TO BE THE 
	     * NEXT GCI TO BE EXECUTED.                                       
	     *--------_----------------------------------------------------- */
	    const Uint32 nextCrashed = noCrashedReplicas + 1;
	    replicaPtr.p->noCrashedReplicas = nextCrashed;
	    arrGuard(nextCrashed, 8);
	    replicaPtr.p->createGci[nextCrashed] = newestRestorableGCI + 1;
	    ndbrequire(newestRestorableGCI + 1 != 0xF1F1F1F1);
	    replicaPtr.p->replicaLastGci[nextCrashed] = (Uint32)-1;

fix arrGuard to give correct error message

Upping the priority as we are getting close to GA and I don't want this to be over looked before we ship. With that stated:
(Cluster Disk Data)
Trying to create a table in a table space that does not exists:

ERROR 1005 (HY000): Can't create table 'dbt2.GL' (errno: 155)
mysql> show warnings;
+-------+------+-------------------------------------------+
| Level | Code | Message                                   |
+-------+------+-------------------------------------------+
| Error | 1005 | Can't create table 'dbt2.GL' (errno: 155) |
+-------+------+-------------------------------------------+
1 row in set (0.01 sec)
perror --ndb 155 NDB error code 155: No message slogan found (please report a bug if you get this error code): Unknown: Unknown

perror --ndb 1005
NDB error code 1005: No message slogan found (please report a bug if you get this error code): Unknown: Unknown

Here is one that I keep hitting, can also be seen in 16796:

failed: 1296: Got error 311 'Unknown error code' from NDBCLUSTER

../extra/perror --ndb 311
NDB error code 311: No message slogan found (please report a bug if you get this error code): Unknown: Unknown

/extra/perror  311
Illegal error code: 311

../extra/perror  1296
Illegal error code: 1296

../extra/perror  --ndb 1296
NDB error code 1296: No message slogan found (please report a bug if you get this error code): Unknown: Unknown

according to Jonas:
And you'll get 311, (undefined partition)

With that big sendbuffer memory and still getting "out of send buffer memory"
you have a not good dimensioned cluster.

we dont have overload protection. it's a big task.

maybe the error message should be "Signal lost, out of send buffer memory, please increase
> SendBufferMemory (Resource configuration error) or check for very high load on machines"
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

/Jonas

another one:

attempt to system restart 5.1 node on 5.0 filesystem got caught here

Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 13216) 0x0000000a
Program: storage/ndb/src/kernel/ndbd

code reads:

    case Sysfile::NS_TakeOver:
      jam();
      sngNodeptr.p->nodeGroup = Sysfile::getNodeGroup(sngNodeptr.i,
                                                      SYSFILE->nodeGroups);
      NGPtr.i = sngNodeptr.p->nodeGroup;
-->  ptrCheckGuard(NGPtr, MAX_NDB_NODES, nodeGroupRecord);
      NGPtr.p->nodesInGroup[NGPtr.p->nodeCount] = sngNodeptr.i;
      NGPtr.p->nodeCount++;
      break;

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7934

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7938

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

We can (and should) still let people know that several error messages were fixed, and where to look to see what they were.

Documented in 5.1.12 changelog. Closed.

Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug)

Seems this error has an error message, so not sure why it says "Internal error, programming error or missing error message, please report a bug"

Opening new report per Tomas