Bug #86657 ndb_restore 781: Invalid schema transaction key from NDB API
Submitted: 12 Jun 2017 12:47 Modified: 14 Jun 2017 8:01
Reporter: Christian Tölle Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.7.18 ndb-7.5.6 OS:Ubuntu
Assigned to: CPU Architecture:Any
Tags: Backup, ndb_restore

[12 Jun 2017 12:47] Christian Tölle
Description:
when i want to replay a backup from version

mysql-5.6.14 ndb-7.3.3

to version

mysql-5.7.18 ndb-7.5.6

i got, while restoring metadata, on a table the error:

"Create table `xyz` failed: 781: Invalid schema transaction key from NDB API"

it is not the first table. 

How to repeat:
BACKUP DATA in mysql-5.6.14 ndb-7.3.3

and restore it with ndb_restore to

mysql-5.7.18 ndb-7.5.6
[13 Jun 2017 7:42] Christian Tölle
more info:

if i try to do it again, (after i create the database wich has temp. imported and drop it again) it stops on the same table with

"Create table `pl_new_2/def/plan_datei_version` failed: 4009: Cluster Failure"

after that, both datanodes are down!

file: ndb_3_out.log

2017-06-13 09:16:58 [ndbd] INFO     -- /export/home/pb2/build/sb_0-22875044-1490679582.39/release/mysql-cluster-gpl-7.5.6/storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
2017-06-13 09:16:58 [ndbd] INFO     -- DBDIH (Line: 24247) 0x00000002 Check false failed
2017-06-13 09:16:58 [ndbd] INFO     -- Error handler shutting down system
2017-06-13 09:16:58 [ndbd] INFO     -- Error handler shutdown completed - exiting
2017-06-13 09:16:58 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

exactly the same for node 4.

file: ndb_3_error.log

Time: Tuesday 13 June 2017 - 09:16:58
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbdihMain.cpp
Error object: DBDIH (Line: 24247) 0x00000002 Check false failed
Program: ndbd
Pid: 15108
Version: mysql-5.7.18 ndb-7.5.6
Trace file name: ndb_3_trace.log.2
Trace file path: /var/lib/mysql/ndb_3_trace.log.2 [t1..t1]
***EOM***

also exactly the same for node 4.

the tail of the trace_log:

--------------- Signal ----------------
r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 1397043 gsn: 433 "TUPFRAGREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 1397042 length: 12 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00f70003 H'00000000 H'000000ca H'00000001 H'00000000 H'00000000
 H'00000027 H'00000000 H'ffffff00 H'00000000 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1397042 gsn: 95 "ACCFRAGCONF" prio: 1
s.bn: 248 "DBACC", s.proc: 3, s.sigId: 1397041 length: 7 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'ffffff00 H'00000001 H'ffffff00 H'0000004a H'ffffff00 H'00000006
--------------- Signal ----------------
r.bn: 248 "DBACC", r.proc: 3, r.sigId: 1397041 gsn: 97 "ACCFRAGREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 1397040 length: 12 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00f70003 H'000000ca H'00000000 H'00000001 H'00000001 H'00000050
 H'0000004e H'00000006 H'00000000 H'00000000 H'00000001
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1397040 gsn: 313 "LQHFRAGREQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 3, s.sigId: 1397039 length: 23 trace: 0 #sec: 0 fragInf: 0
 senderData: 50332680 senderRef: fa0003 tableId: 202 fragmentId: 1 localKeyLength: 1 maxLoadFactor: 80 minLoadFactor: 78
 kValue: 6 lh3DistrBits: 0 lh3PageBits: 0
 keyLength: 1
 maxRowsLow/High: 0/0  minRowsLow/High: 39/0
 nextLCP: 0
[13 Jun 2017 7:47] Christian Tölle
sorry for message flood, but the command i used is:

ndb_restore -m -b 4888 -n 3
[13 Jun 2017 11:01] Christian Tölle
error report

Attachment: ndb_error_report_20170613125830.tar.bz2 (application/octet-stream, text), 249.00 KiB.

[13 Jun 2017 14:40] Christian Tölle
output before failure with --print-meta

Successfully printed table: Version: 67108865
Fragment type: HashMapPartition
K Value: 6
Min load factor: 78
Max load factor: 80
Temporary table: no
Number of attributes: 16
Number of primary keys: 1
Length of frm data: 1081
Max Rows: 0
Row Checksum: 1
Row GCI: 1
SingleUserMode: 0
ForceVarPart: 1
PartitionCount: 0
FragmentCount: 2
PartitionBalance: FOR_RP_BY_LDM
ExtraRowGciBits: 0
ExtraRowAuthorBits: 0
TableStatus: Retrieved
Table options:

Create table `pl_new_2/def/plan_datei_version` failed: 4009: Cluster Failure
Restore: Failed to restore table: `pl_new_2/def/plan_datei_version` ... Exiting
[13 Jun 2017 14:53] Christian Tölle
a show create table ... show's me that:

) ENGINE=ndbcluster AUTO_INCREMENT=413839 DEFAULT CHARSET=latin1'
/*!50100 PARTITION BY KEY ()
PARTITIONS 2 */

the "/*!50100 PARTITION BY KEY () PARTITIONS 2 */" message let me think, that it is a problem with partioning.
Other Tables dont have this partioning schema and there it works correct.

I will try to remove that in production database this night and then create a new backup and try to restore it again.

Hope, that removing the partioning schema from the table dont hurt the live data!!

PS: Importing to mysql-5.6.29 ndb-7.4.11(instead of mysql-5.7.18 ndb-7.5.6) works without problems!
[13 Jun 2017 15:07] Bogdan Kecman
Hi,

I can't reproduce this with the info you provided.

Can you, please,

- try restoring metadata with mysql client and not with ndb_restore -m (this is btw suggested method for restoring metadata, the ndb_restore -m can have issues between versions that have incompatible changes)
- if the restore fails with mysql client too, can you please restore one table at the time to see which table fails and then send us the create statement for this table
- if the restore does not fail with mysql client, is there a chance you can either find which table causes the problem or send us the complete create statement for all tables so that I can go trough them one by one 'cause all the tests we performed did not crash

thanks
Bogdan
[13 Jun 2017 20:27] Mikael Ronström
Very much looks like a bug, and yes, it is related to the fact that you used number of partitions = 2.
So a work around can be remove the partitioning specific information on this table to get the restore
working. I'll see if we can figure out what the problem is. I have already now a good hunch.
[13 Jun 2017 20:41] Mikael Ronström
I think the safest way to fix this is to use mysqldump to save the CREATE TABLE statements for the meta data you need to restore.
Next you edit the CREATE TABLE for this specific problem table and remove the specific information about number of partitions. You could also use the new feature PARTITION_BALANCE=RP_BY_NODE that will
use one fragment per node (which presumably is 2 fragments if you have a 2 node cluster).

Next you restore the meta data from this mysqldump file and restore the data from the ndb backup.
I am pretty sure this is also the recommended manner to perform restores in MySQL Cluster.
[14 Jun 2017 7:56] Christian Tölle
yes, it is working now!

1) i used ndb_restore with --exclude-tables=xxx on the failing one.
2) create the failing table over the mysql client without the partition schema
3) use ndb_restore to restore the data

thx!
[14 Jun 2017 8:01] Christian Tölle
the create table statement of the failing table.

CREATE TABLE `plan_datei_version` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `projekt_id` int(11) NOT NULL COMMENT 'Zu welchem Projekt gehört der plan_datei_version',
 `plan_datei_id` int(11) NOT NULL COMMENT 'Zu welcher plan_datei gehört die plan_datei_version',
 `datei_erweiterung` varchar(10) NOT NULL COMMENT 'die Dateierweiterung (z.B. pdf, xls, etc.)',
 `person_in_projekt_id` int(11) NOT NULL COMMENT 'Der Ersteller',
 `zeit` int(11) NOT NULL COMMENT 'Unixtime des Erstellzeitpunkt',
 `datei_hochgeladen` int(11) NOT NULL DEFAULT '0' COMMENT 'während der Upload läuft steht hier 0. Nach erfolgreichem Upload eine 1.',
 `projekt_verzeichnis_datei_id` int(11) NOT NULL DEFAULT '0' COMMENT 'Die zugehörige projekt_verzeichnis_datei',
 `projekt_verzeichnis_datei_revision` int(11) NOT NULL DEFAULT '0' COMMENT 'Die Revison der zugehörige projekt_verzeichnis_datei',
 `orginal_name` varchar(150) NOT NULL DEFAULT '' COMMENT 'der Orginalname beim CSVUpload',
 `plan_datei_upload_id` int(11) NOT NULL COMMENT 'die Upload id',
 `gesperrt` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 = die Version ist gesperrt, 0 = nicht gesperrt',
 `sortname` varchar(30) NOT NULL DEFAULT '' COMMENT 'deprecated',
 `under_controlling` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'unterliegt die Datei noch dem Controlling?',
 `qr_var_projekt_verzeichnis_datei_id` int(11) DEFAULT NULL,
 `qr_var_hash` varchar(64) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `qr_var_hash` (`qr_var_hash`),
 KEY `plan_datei_id` (`plan_datei_id`),
 KEY `projekt_id` (`plan_datei_id`,`datei_erweiterung`,`gesperrt`),
 KEY `plan_datei_id_2` (`plan_datei_id`,`gesperrt`),
 KEY `projekt_id_2` (`projekt_id`,`gesperrt`),
 KEY `gesperrt` (`gesperrt`,`sortname`),
 KEY `projekt_verzeichnis_datei_id` (`projekt_verzeichnis_datei_id`),
 KEY `projekt_id_4` (`projekt_id`),
 KEY `projekt_id_5` (`projekt_id`,`plan_datei_upload_id`),
 KEY `projekt_id_6` (`projekt_id`,`zeit`),
 KEY `projekt_id_3` (`projekt_id`,`gesperrt`,`sortname`,`plan_datei_id`)
) ENGINE=ndbcluster AUTO_INCREMENT=413839 DEFAULT CHARSET=latin1 COMMENT='Alle versch. Dateitypen einer Revison eines Plans'
/*!50100 PARTITION BY KEY ()
PARTITIONS 2 */