Description:
START BACKUP results in ndbd crash and loss of cluster services with the following diagnostic:
Time: Sunday 13 April 2008 - 13:30:45
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: backup/Backup.cpp
Error object: BACKUP (Line: 3348) 0x0000000a
Program: /opt/mysql/libexec/ndbd
Pid: 8911
Trace: /var/lib/mysql/data/ndb_2_trace.log.8
Version: mysql-5.1.24 ndb-6.3.13-RC
***EOM***
This occurs whenever the total number of tables+indexes exceeds 4096.
Related to this, the listObjects API call displays incorrect tableId values for any tableId > 4096, and ndb_show_tables similarly displays incorrect output, because all utilize a common bitmask named ListTablesData.
We can work around the ndb_show_tables bug in our production systems. The backup issue is critical to us however because it will result in a complete cluster crash if our schema grows beyond 4K tables. Further, this limit is far short of the documented maximum of 20,320 tables in NDB 5.1.
How to repeat:
Create ~1,500 instances of a table with two indexes similar to:
create table junk (
id integer auto_increment not null primary key,
time datetime,
index junk_time (time)
);
Then:
a) Run ndb_show_tables and verify the tableId column wraps at 4095, and
b) Attempt START BACKUP from the management console and observe ndbd failure.
Suggested fix:
The ListTablesData bitfield allocates 12 bits for tableId and 8 bits for tableType. The former was sufficient for 5.0, which had a hard limit of 1600 tables, but is insufficient for 5.1 which can grow to 20320 tables.
However, the maximum tableType is currently "23" so it seems you could shave 3 bits off the tableType and provide it to the tableId.
The following patch effectively solves our problem.
diff -ur mysql-5.1.24-ndb-6.3.13-telco/storage/ndb/include/kernel/signaldata/ListTables.hpp mysql-5.1.24-ndb-6.3.13-telco-eprize/storage/ndb/include/kernel/signaldata/ListTables.hpp
--- mysql-5.1.24-ndb-6.3.13-telco/storage/ndb/include/kernel/signaldata/ListTables.hpp 2008-04-10 13:27:21.000000000 -0400
+++ mysql-5.1.24-ndb-6.3.13-telco-eprize/storage/ndb/include/kernel/signaldata/ListTables.hpp 2008-04-13 21:01:07.000000000 -0400
@@ -26,16 +26,16 @@
class ListTablesData {
public:
static Uint32 getTableId(Uint32 data) {
- return BitmaskImpl::getField(1, &data, 0, 12);
+ return BitmaskImpl::getField(1, &data, 0, 15);
}
static void setTableId(Uint32& data, Uint32 val) {
- BitmaskImpl::setField(1, &data, 0, 12, val);
+ BitmaskImpl::setField(1, &data, 0, 15, val);
}
static Uint32 getTableType(Uint32 data) {
- return BitmaskImpl::getField(1, &data, 12, 8);
+ return BitmaskImpl::getField(1, &data, 15, 5);
}
static void setTableType(Uint32& data, Uint32 val) {
- BitmaskImpl::setField(1, &data, 12, 8, val);
+ BitmaskImpl::setField(1, &data, 15, 5, val);
}
static Uint32 getTableStore(Uint32 data) {
return BitmaskImpl::getField(1, &data, 20, 3);