| Bug #74829 | ndb.ndb_restore_conv_more fails on POWER8 | ||
|---|---|---|---|
| Submitted: | 13 Nov 2014 3:47 | Modified: | 18 Jan 2016 20:59 |
| Reporter: | Stewart Smith | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 7.3.7 | OS: | Linux |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | ndb, POWER8, PowerPC | ||
[2 Feb 2015 7:32]
Stewart Smith
This test passes with 7.3.8 and on *LITTLE* endian ppc64. Unless there is a plat to support NDB on any big endian platform (are any current platforms big endian?) it may be better to either close this or just add a configure check to no longer build on big endian platforms.
[5 Mar 2015 11:29]
Mikael Ronström
Pretty sure SPARC is still big endian.
[18 Dec 2015 20:59]
Gustaf Thorslund
Hi Stewart,
Using 7.3.7 I've verified this bug almost as described. System used was qemu-system-ppc64 running Debian Jessie. Due to this I also had to update some timeout variables, both for ndbd and mtr. I also had to copy storage/ndb/src/kernel/vm/mt-asm.h from 7.4.9 to get it to compile at all.
$ ./mysql-test/mtr ndb.ndb_restore_conv_more |& tee ndb.ndb_restore_conv_more.log
.
.
ndb.ndb_restore_conv_more [ fail ]
Test ended at 2015-12-18 16:29:44
CURRENT_TEST: ndb.ndb_restore_conv_more
/home/gustaf/src/mysql-cluster-gpl-7.3.7/storage/ndb/tools/ndb_lib_move_data.cpp:1048: require((code != 0)) failed
1043 void
1044 Ndb_move_data::set_error_code(int code, const char* fmt, ...)
1045 {
1046 va_list ap;
1047 va_start(ap, fmt);
1048 require(code != 0);
1049 Error& e = m_error;
1050 e.code = code;
1051 my_vsnprintf(e.message, sizeof(e.message), fmt, ap);
1052 va_end(ap);
1053 }
$ tail mysql-test/var/log/ndb.ndb_restore_conv_more/tmp/ndb_testrun.log
Processing data in table: test/def/NDB$BLOB_18_3(20) fragment 1
_____________________________________________________
Processing data in table: test/def/NDB$BLOB_18_3(20) fragment 3
_____________________________________________________
Processing data in table: test/def/NDB$BLOB_18_3(20) fragment 5
_____________________________________________________
Processing data in table: test/def/NDB$BLOB_18_3(20) fragment 7
Opening file '/home/gustaf/src/mysql-cluster-gpl-7.3.7/build/mysql-test/var/mysql_cluster.1//BACKUP/BACKUP-8/BACKUP-8.2.log'
File size 1024 bytes
Restored 590 tuples and 0 log entries
$ cat /proc/cpuinfo
processor : 0
cpu : POWER7 (raw), altivec supported
clock : 1000.000000MHz
revision : 2.3 (pvr 003f 0203)
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
If I instead try the current version 7.4.9 I cannot reproduce the bug:
ndb.ndb_restore_conv_more [ pass ] 7769970
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 7769.970 of 8369 seconds executing testcases
Completed: All 1 tests were successful.
real 139m29.313s
user 46m59.335s
sys 91m31.128s
Could you try the current GA too?
/Gustaf
[19 Jan 2016 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

Description: On big endian ppc64, the ndb.ndb_restore_conv_more test fails with ndb_restore aborting. tail of var/log/ndb.ndb_restore_conv_more/tmp/ndb_testrun.log: Processing data in table: test/def/NDB$BLOB_14_14(24) fragment 6 Opening file '/home/stewart/mysql-cluster-gpl-7.3.7/build/mysql-test/var/mysql_cluster.1//BACKUP/BACKUP-11/BACKUP-11.1.log' File size 1024 bytes Restored 155 tuples and 0 log entries bh1->readData -8 /home/stewart/mysql-cluster-gpl-7.3.7/storage/ndb/tools/ndb_lib_move_data.cpp:1054: require((code != 0)) failed The -8 there is from me modifying the error codes to find out exactly where it fails. It asserts on ndb_lib_move_data.cpp:728, which is this: CHK2(bh1->readData(data, bytes) == 0, (bh1->getNdbError())); Which means the blob is not null but we failed to read some data. Adding in my marvellous debug code to find out where, I can see we're failing in NdbBlob::readDataPrivate, reading the last block: if (len > 0) { // partial last block DBUG_PRINT("info", ("partial last block pos=%llu len=%u", pos, len)); assert((pos - theInlineSize) % thePartSize == 0 && len < thePartSize); Uint32 part = getPartNumber(pos); Uint16 sz = 0; if (readPart(thePartBuf.data, part, sz) == -1) DBUG_RETURN(-7); // need result now if (executePendingBlobReads() == -1) DBUG_RETURN(-8); (I replaced return -1 with numbers to trace it), so (for some reason) we're failing executePendingBlobReads(). adding in: fprintf(stderr, "part %u partsz %u pos %u sz %u len %u\n", part, thePart\ Size, pos, sz, len); there, I get this just before the assert: part 2 partsz 2000 pos 4256 sz 0 len 744 for the record, gdb "p *bh1" is: $2 = {theBlobVersion = 2, theFixedDataFlag = false, theHeadSize = 16, theVarsizeBytes = 2, theState = NdbBlob::Active, theNdbRecordFlag = true, theEventBlobVersion = -1, theBtColumnNo = {-1, -1, 1, 2, 3}, theNdb = 0x10037403b10, theNdbCon = 0x100374b5d10, theNdbOp = 0x1003746e0a0, theEventOp = 0x0, theBlobEventOp = 0x0, theBlobEventPkRecAttr = 0x0, theBlobEventDistRecAttr = 0x0, theBlobEventPartRecAttr = 0x0, theBlobEventPkidRecAttr = 0x0, theBlobEventDataRecAttr = 0x0, theTable = 0x1003744d640, theAccessTable = 0x1003744d640, theBlobTable = 0x100374622f0, theColumn = 0x1003744d030, theFillChar = 255 '\377', theInlineSize = 256, thePartSize = 2000, theStripeSize = 0, theGetFlag = false, theGetBuf = 0x0, theSetFlag = false, theSetValueInPreExecFlag = false, theSetBuf = 0x0, theGetSetBytes = 0, thePendingBlobOps = 1 '\001', theActiveHook = 0x0, theActiveHookArg = 0x0, theKeyBuf = {data = 0x1003744bdc0 "\003", size = 4, maxsize = 8}, theAccessKeyBuf = {data = 0x1003745dc60 "", size = 4, maxsize = 8}, thePackKeyBuf = {data = 0x1003744d8a0 "\003", size = 4, maxsize = 8}, theHeadInlineBuf = {data = 0x1003746eab0 "\016\001", size = 272, maxsize = 272}, theHeadInlineCopyBuf = {data = 0x1003746f3b0 "", size = 272, maxsize = 272}, thePartBuf = {data = 0x1003746ebd0 "", size = 2000, maxsize = 2000}, thePartLen = 0, theBlobEventDataBuf = { data = 0x0, size = 0, maxsize = 0}, theBlobEventDistValue = 16842752, theBlobEventPartValue = 927304144, theBlobEventPkidValue = 4294967295, theHead = {varsize = 270, reserved = 0, pkid = 0, length = 5000, headsize = 16}, theInlineData = 0x1003746eac0 " -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.; -,.;"..., theHeadInlineRecAttr = 0x1003746f4d0, theHeadInlineReadOp = 0x0, theHeadInlineUpdateFlag = false, userDefinedPartitioning = false, thePartitionId = 4294967295, thePartitionIdRecAttr = 0x0, theNullFlag = 0, theLength = 5000, thePos = 0, theError = {status = NdbError::Success, classification = NdbError::NoError, code = 0, mysql_code = 0, message = 0x10235d30 "No error", details = 0x0}, theNext = 0x1003746f530} How to repeat: run ndb.ndb_restore_conv_more on POWER8 (maybe others? i haven't tried) Suggested fix: Find the bug and fix it :)