Bug #39540 | ndb_restore crash while restoring log from different endian | ||
---|---|---|---|
Submitted: | 19 Sep 2008 16:31 | Modified: | 13 Apr 2009 16:16 |
Reporter: | Joerg Bruehe | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | Cluster 6.3.17 | OS: | Solaris (Sparc only) |
Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
[19 Sep 2008 16:31]
Joerg Bruehe
[15 Dec 2008 11:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61659 2787 Leonard Zhou 2008-12-15 BUG#39540 Correc 'ndb_restore' tool
[9 Feb 2009 18:49]
Joerg Bruehe
Same crash in the build of cluster 6.4.2, same platforms.
[9 Feb 2009 20:11]
Joerg Bruehe
... and I suspect this might be related to the same basic problem, even though shows on 64 bit hosts only (build + test of 6.4.2): ===== ndb.ndb_restore_undolog [ fail ] Bus Error - core dumped mysqltest: At line 441: command "$NDB_RESTORE --no-defaults -b 1 -n 2 -r $MYSQL_TEST_DIR/std_data/ndb_backup51_undolog_le >> $NDB_TOOLS_OUTPUT" failed The result from queries just before the failure was: USE test; DROP TABLE IF EXISTS t_num,t_datetime,t_string_1,t_string_2,t_gis,t_string_3,t_string_4,t_string_5; exec of '/PATH/bin/ndb_restore --no-defaults -b 1 -n 2 -r /PATH/mysql-test/std_data/ndb_backup51_undolog_le >> /PATH/mysql-test/var/log/ndb_testrun.log' failed, error: 35328, status: 138, errno: 29 More results from queries before failure can be found in /PATH/mysql-test/var/log/ndb_restore_undolog.log Warnings from just before the error: Note 1051 Unknown table 't_num' Note 1051 Unknown table 't_datetime' Note 1051 Unknown table 't_string_1' Note 1051 Unknown table 't_string_2' Note 1051 Unknown table 't_gis' Note 1051 Unknown table 't_string_3' Note 1051 Unknown table 't_string_4' =====
[1 Apr 2009 13:59]
Jonas Oreland
Magnus, why not take it "one-step-further" and push the "null" check down into Twiddle? Also, can you explain more what the problem is, why does it only fail on sparc? does Twiddle(0) work on linux (or x86) etc...
[2 Apr 2009 8:29]
Magnus Blåudd
mysqldev@sol10-sparc-a:~/magnus/mysql-5.1.32-ndb-7.0.5-pb558/mysql-test> ../libtool --mode=execute dbx ../storage/ndb/tools/ndb_restore core For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading ndb_restore core file header read successfully Reading ld.so.1 Reading libmtmalloc.so.1 Reading libpthread.so.1 Reading libthread.so.1 Reading librt.so.1 Reading libgen.so.1 Reading libsocket.so.1 Reading libnsl.so.1 Reading libm.so.2 Reading libCstd.so.1 Reading libCrun.so.1 Reading libc.so.1 Reading libaio.so.1 Reading libmd.so.1 Reading libc_psr.so.1 t@1 (l@1) terminated by signal SEGV (no mapping at the fault address) Current function is BackupFile::Twiddle 84 attr_data->u_int32_value[i] = Twiddle32(attr_data->u_int32_value[i]); (dbx) p attr_data attr_data = 0x10043f320 (dbx) p *attr_data *attr_data = { null = true size = 5365856U int8_value = (nil) u_int8_value = (nil) int16_value = (nil) u_int16_value = (nil) int32_value = (nil) u_int32_value = (nil) int64_value = (nil) u_int64_value = (nil) string_value = (nil) void_value = (nil) } (dbx) up Current function is RestoreLogIterator::getNextLogEntry 1805 Twiddle(attr->Desc, &(attr->Data)); (dbx) p sz sz = 0 (dbx) p attr attr = 0x10043f318 (dbx) p * attr *attr = { Desc = 0x10041b138 Data = { null = true size = 5365856U int8_value = (nil) u_int8_value = (nil) int16_value = (nil) u_int16_value = (nil) int32_value = (nil) u_int32_value = (nil) int64_value = (nil) u_int64_value = (nil) string_value = (nil) void_value = (nil) } } (dbx) p attr->Desc attr->Desc = 0x10041b138 (dbx) p *attr->Desc *attr->Desc = { size = 32U arraySize = 2U attrId = 9U m_column = 0x10049b418 m_nullBitIndex = 0 convertFunc = (nil) parameter = (nil) }
[2 Apr 2009 14:30]
Magnus Blåudd
Crash occurs in ndb_restore when it tries to restore the log part of the checked in backup from mysql-test/std_data/ndb_backup_packed. The problem does not show up on little endian machines because the above backup is from a little endian machine and thus the 'Twiddle' function will detect that by checking "m_hostByteOrder" and return immediately without doing anything. When running on big endian machine 'Twiddle' will try to swap the byte order of the "attr_data" union, which has previously been set to NULL through the "void_value" pointer. Suggest that we don't call 'Twiddle' when data is NULL and there is nothing to do as well as adding asserts in 'Twiddle' to detect the problem on any platform.
[2 Apr 2009 15:08]
Jonas Oreland
super magnus, ok to push (what ever you do) since explanation is great
[2 Apr 2009 18:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71232
[2 Apr 2009 18:44]
Bugs System
Pushed into 5.1.32-ndb-6.3.24 (revid:magnus.blaudd@sun.com-20090402182917-iiiqfx7uf2g89y2a) (version source revid:magnus.blaudd@sun.com-20090402182917-iiiqfx7uf2g89y2a) (merge vers: 5.1.32-ndb-6.3.24) (pib:6)
[2 Apr 2009 19:05]
Bugs System
Pushed into 5.1.32-ndb-7.0.5 (revid:magnus.blaudd@sun.com-20090402185252-ohj110mutolxd2x4) (version source revid:magnus.blaudd@sun.com-20090402185252-ohj110mutolxd2x4) (merge vers: 5.1.32-ndb-7.0.5) (pib:6)
[13 Apr 2009 16:16]
Jon Stephens
Documented bugfix in the NDB-6.3.24 and 7.0.5 changelogs as follows: ndb_restore crashed when trying to restore a backup made to a MySQL Cluster running on a platform having different endianness from that on which the original backup was taken.