Bug #45282 NDBAPI - Duplicate read of column results in Api failure
Submitted: 3 Jun 2009 6:08 Modified: 6 Oct 2009 12:00
Reporter: John David Duncan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any

[3 Jun 2009 6:08] John David Duncan
Description:
I see an issue with BLOB handling in 7.0.5 that is not present in 6.3.22, resulting in a seg fault. 

When I do a primary key lookup on a table with a blob, I get the problem here:

  if(i->flag.has_blob) {
    /* Execute NoCommit */
    if(i->tx->execute(NdbTransaction::NoCommit)) {

(gdb) bt
#0  0x903bfe42 in __kill ()
#1  0x903bfe34 in kill$UNIX2003 ()
#2  0x9043223a in raise ()
#3  0x9043e679 in abort ()
#4  0x00259787 in NdbReceiver::receive_packed_recattr ()
#5  0x002599e5 in NdbReceiver::execTRANSID_AI ()
#6  0x00244734 in Ndb::handleReceivedSignal ()
#7  0x0026d3b0 in TransporterFacade::deliver_signal ()
#8  0x00298975 in TransporterRegistry::unpack ()
#9  0x0029cf90 in TransporterRegistry::performReceive ()
#10 0x00244c95 in Ndb::poll_trans ()
#11 0x00244e3d in Ndb::sendPollNdb ()
#12 0x0026445e in NdbTransaction::executeNoBlobs ()
#13 0x0026478d in NdbTransaction::execute ()

How to repeat:
Perhaps the easiest way to reproduce this is to install mod_ndb from mod-ndb.googlecode.com and run the test "typ801"  ( ./configure ; make ; make start ; cd Tests ; . lib.test.sh ; t.sql ; t.run typ1 ; t.run typ801 ).  But I will also try to describe it here.

The table has this structure:

 CREATE TABLE `typ8` (
  `id` int(11) NOT NULL,
  `doc` text,
  PRIMARY KEY (`id`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1 | 

The operation is to retrieve the blob where id = 1.
[5 Jun 2009 16:13] John David Duncan
I tried to reproduce this using ndbapi-examples/ndbapi_blob but couldn't.  If you would like me to create a simple stand-alone test case I can work on that.
[5 Jun 2009 16:17] Jonas Oreland
like you
[6 Jun 2009 6:09] John David Duncan
Here is a more detailed backtrace.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xaf0afb90 (LWP 19950)]
0xb7f8f402 in __kernel_vsyscall ()
(gdb) 
(gdb) bt full
#0  0xb7f8f402 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7ddf085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xb7de0a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3  0xb75963b2 in NdbReceiver::receive_packed_recattr (this=0x8238e4c, 
    recAttr=0xaf0adb84, bmlen=<value optimized out>, aDataPtr=0xb64ef02c, 
    aLength=71) at NdbReceiver.cpp:391
	currRecAttr = (NdbRecAttr *) 0x824fcb0
	src = (const Uint8 *) 0xb64ef030 "\001"
	bitPos = 0
#4  0xb7596d06 in NdbReceiver::execTRANSID_AI (this=0x8238e4c, 
    aDataPtr=0xb64ef02c, aLength=70) at NdbReceiver.cpp:748
	tmp = (NdbRecAttr *) 0x824fcb0
	len = <value optimized out>
	attrId = <value optimized out>
	attrSize = 4
	exp = 71
	tmp = <value optimized out>
	currRecAttr = (NdbRecAttr *) 0x824fcb0
	save_pos = 0
	ndbrecord_part_done = true
---Type <return> to continue, or q <return> to quit---
#5  0xb7573eca in Ndb::handleReceivedSignal (this=0x8235720, 
    aSignal=0xaf0adc90, ptr=0xaf0addd4) at Ndbif.cpp:403
	com = <value optimized out>
	tOp = <value optimized out>
	tIndexOp = <value optimized out>
	tCon = (NdbTransaction *) 0x8238490
	tReturnCode = 0
	tDataPtr = (const Uint32 *) 0xb64ef018
	tWaitState = 0
	tFirstData = <value optimized out>
	tLen = 3
	tFirstDataPtr = (void *) 0x8238e4c
	t_waiter = <value optimized out>
#6  0xb75747fb in Ndb::executeMessage (NdbObject=0x8235720, aSignal=0xaf0adc90, 
    ptr=0xaf0addd4) at Ndbif.cpp:173
No locals.
#7  0xb7564eb9 in TransporterFacade::deliver_signal (this=0x81dfec8, 
    header=0xaf0addb8, prio=1 '\001', theData=0xb64ef018, ptr=0xaf0addd4)
    at TransporterFacade.cpp:398
	tmpSignal = {<SignalHeader> = {theVerId_signalNumber = 5, 
    theReceiversBlockNumber = 32782, theSendersBlockRef = 16318467, 
    theLength = 3, theSendersSignalId = 4294967295, theSignalId = 4294967295, 
    theTrace = 1, m_noOfSections = 1 '\001', m_fragmentInfo = 0 '\0'}, 
---Type <return> to continue, or q <return> to quit---
  static MaxSignalWords = <optimized out>, theData = {3058589712, 24, 0, 0, 0, 
    0, 0, 0, 0, 37, 32782, 16056322, 2, 4294967295, 4294967295, 1, 3084784094, 
    0, 0, 0, 0, 0, 1, 2936725004, 2936725100}, theNextSignal = 0x0, 
  theRealData = 0xb64ef018}
	oe = {m_object = 0x6, m_executeFunction = 0x4dee}
	tRecBlockNo = <value optimized out>
#8  0xb760ffb2 in TransporterRegistry::unpack (this=0x81eee20, 
    readPtr=0xb64ef008, sizeOfData=316, remoteNodeId=3, state=NoHalt)
    at Packer.cpp:113
	messageLenBytes = 316
	signalData = (Uint32 *) 0xb64ef018
	sectionPtr = (Uint32 *) 0xb64ef024
	sectionData = <value optimized out>
	signalHeader = {theVerId_signalNumber = 5, 
  theReceiversBlockNumber = 32782, theSendersBlockRef = 16318467, 
  theLength = 3, theSendersSignalId = 4294967295, theSignalId = 4294967295, 
  theTrace = 1, m_noOfSections = 1 '\001', m_fragmentInfo = 0 '\0'}
	ptr = {{sz = 71, p = 0xb64ef028}, {sz = 3076565879, p = 0x0}, {sz = 0, 
    p = 0x13c}}
	usedData = 0
	loop_count = 1
#9  0xb760c621 in TransporterRegistry::get_tcp_data (this=0x81eee20, 
    t=0x81f2bb0) at TransporterRegistry.cpp:1076
---Type <return> to continue, or q <return> to quit---
	ptr = (Uint32 *) 0xb64ef008
	sz = 316
	szUsed = <value optimized out>
#10 0xb760c6ed in TransporterRegistry::performReceive (this=0x81eee20)
    at TransporterRegistry.cpp:1111
	num_socket_events = 2
	i = <value optimized out>
	id = 3
	hasReceived = <value optimized out>
#11 0xb7564002 in TransporterFacade::external_poll (this=0x81dfec8, 
    wait_time=10) at TransporterFacade.cpp:611
No locals.
#12 0xb75640a7 in PollGuard::wait_for_input (this=0xaf0aef5c, wait_time=10)
    at TransporterFacade.cpp:1956
	t_poll_owner = <value optimized out>
#13 0xb7572af7 in Ndb::waitCompletedTransactions (this=0x8235720, 
    aMilliSecondsToWait=360000, noOfEventsToWaitFor=1, poll_guard=0xaf0aef5c)
    at Ndbif.cpp:1289
	maxTime = 43381723
	maxsleep = 10
#14 0xb7573192 in Ndb::poll_trans (this=0x8235720, aMillisecondNumber=360000, 
    minNoOfEventsToWakeup=1, pg=0xaf0aef5c) at Ndbif.cpp:1353
	tConArray = {0xb75636b8, 0x81eddb0, 0x0, 0x82399c0, 0xb779f8bc, 
---Type <return> to continue, or q <return> to quit---
  0xb779f8bc, 0x0, 0xaf0adf68, 0xb7563694, 0xaf0adf8c, 0x1d4c0, 0x0, 
  0xb779f8bc, 0xb779f8bc, 0x0, 0xaf0adfa8, 0xb757296e, 0xaf0adf8c, 0x1d4c0, 
  0x2, 0x3, 0x0, 0xaf0adf8c, 0x81dfec8, 0x81dfec8, 0x8235a8c, 0x800e, 
  0xffffff00, 0xb779f8bc, 0x2, 0x8235720, 0xaf0adff8, 0xb756c62b, 0x8235720, 
  0x82399c0, 0x3, 0x82399c0, 0x0, 0xaf0adfe8, 0xaf0adff8, 0xb75ba0cc, 
  0x80ebde8, 0x8238490, 0x82399c0, 0xb75ba1b6, 0x80ebde8, 0x80ebe48, 0x0, 
  0xb779f8bc, 0x2, 0x8235978, 0xaf0ae028, 0xb757d074, 0x8235720, 0x2, 0x0, 0x0, 
  0x82357b0, 0x0, 0x1, 0xb779f8bc, 0xaf0ae058, 0x0, 0xaf0af068, 0xb757d6a7, 
  0x8238e40, 0xaf0ae058, 0x1, 0x1, 0x0, 0x1, 0x0, 0x1, 0x0, 0x4, 0x1, 0x0, 
  0x8235720, 0x0 <repeats 22 times>, 0x4a550000, 0x66205349, 0x5720726f, 
  0x6f646e69, 0x4a207377, 0x6e617061, 0x657365, 0x0 <repeats 174 times>, 
  0xb7e1be24, 0x0, 0x0, 0xb7efeff4, 0xaf0aea3c, 0xaf0ae3cc, 0xb7e16ae7, 
  0xaf0aea3c, 0xaf0aeadc, 0xaf0aeb1c, 0x0, 0xaf0aeb1c, 0xb7efeff4, 0x1, 0xd, 
  0xaf0ae3f8, 0xb7e1bf83, 0xaf0aea3c, 0x31, 0x0, 0x8244a25, 0x8244a25, 
  0x8244a3e, 0xb7efeff4, 0xb7661380, 0x0, 0xaf0aea18, 0xb7df27cf, 0xaf0aea3c, 
  0xb7661380, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffffffd4, 0xffffffd4, 0xffffffd4, 0x0, 
  0x0, 0xb7df3fff, 0x0, 0x0, 0xb7efeff4, 0xaf0aeadc, 0xaf0ae98c, 0xb7e16ae7, 
  0xaf0aeadc, 0xaf0aeb7c, 0xaf0aebbc, 0x0, 0xaf0aebbc, 0xb7efeff4, 0xaf0aead0, 
  0x4, 0xaf0ae498, 0xb7e1bf83, 0xaf0aeadc, 0x38, 0x0, 0x824f6bc, 0x824f6bc, 
  0x0, 0xffffffff, 0x1b, 0xb7661380, 0xb766137b, 0x0, 0xaf0ae98c, 0x1, 0x2, 
  0x0, 0x0, 0x0, 0x0, 0xffffffd4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
  0x0, 0x20, 0x0, 0x0, 0x0, 0x73000000, 0x0, 0x0, 0x0, 0x0...}
	tNoCompletedTransactions = <value optimized out>
---Type <return> to continue, or q <return> to quit---
#15 0xb75732ef in Ndb::sendPollNdb (this=0x8235720, aMillisecondNumber=360000, 
    minNoOfEventsToWakeup=1, forceSend=0) at Ndbif.cpp:1338
	pg = {m_tp = 0x81dfec8, m_waiter = 0x8235a8c, m_block_no = 32782, 
  m_locked = true}
#16 0xb7578fee in NdbTransaction::executeNoBlobs (this=0x8238490, 
    aTypeOfExec=NdbTransaction::NoCommit, 
    abortOption=NdbOperation::DefaultAbortOption, forceSend=0)
    at NdbTransaction.cpp:535
	noOfComp = <value optimized out>
	tNdb = (Ndb *) 0x8235720
	timeout = <value optimized out>
#17 0xb757930b in NdbTransaction::execute (this=0x8238490, 
    aTypeOfExec=NdbTransaction::NoCommit, 
    abortOption=NdbOperation::DefaultAbortOption, forceSend=0)
    at NdbTransaction.cpp:421
	firstSavedOp = (class NdbOperation *) 0x0
	lastSavedOp = (class NdbOperation *) 0x0
	tPrepOp = (class NdbOperation *) 0x0
	tExecType = NdbTransaction::NoCommit
	tCompletedFirstOp = (class NdbOperation *) 0x0
	tCompletedLastOp = (class NdbOperation *) 0x0
	ret = 0
#18 0xb77b5ea3 in ExecuteAll ()
---Type <return> to continue, or q <return> to quit---
   from /home/jdd/src/mod_ndb-1.1-beta-r555/mod_ndb.so
No locals.
#19 0xb77b4c56 in Query () from /home/jdd/src/mod_ndb-1.1-beta-r555/mod_ndb.so
No locals.
[6 Jun 2009 6:13] John David Duncan
(gdb) thread apply all bt

Thread 23 (Thread 0xaf0afb90 (LWP 19950)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7ddf085 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7de0a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0xb75963b2 in NdbReceiver::receive_packed_recattr (this=0x8238e4c, 
    recAttr=0xaf0adb84, bmlen=<value optimized out>, aDataPtr=0xb64ef02c, 
    aLength=71) at NdbReceiver.cpp:391
#4  0xb7596d06 in NdbReceiver::execTRANSID_AI (this=0x8238e4c, 
    aDataPtr=0xb64ef02c, aLength=70) at NdbReceiver.cpp:748
#5  0xb7573eca in Ndb::handleReceivedSignal (this=0x8235720, 
    aSignal=0xaf0adc90, ptr=0xaf0addd4) at Ndbif.cpp:403
#6  0xb75747fb in Ndb::executeMessage (NdbObject=0x8235720, aSignal=0xaf0adc90, 
---Type <return> to continue, or q <return> to quit---
    ptr=0xaf0addd4) at Ndbif.cpp:173
#7  0xb7564eb9 in TransporterFacade::deliver_signal (this=0x81dfec8, 
    header=0xaf0addb8, prio=1 '\001', theData=0xb64ef018, ptr=0xaf0addd4)
    at TransporterFacade.cpp:398
#8  0xb760ffb2 in TransporterRegistry::unpack (this=0x81eee20, 
    readPtr=0xb64ef008, sizeOfData=316, remoteNodeId=3, state=NoHalt)
    at Packer.cpp:113
#9  0xb760c621 in TransporterRegistry::get_tcp_data (this=0x81eee20, 
    t=0x81f2bb0) at TransporterRegistry.cpp:1076
#10 0xb760c6ed in TransporterRegistry::performReceive (this=0x81eee20)
    at TransporterRegistry.cpp:1111
#11 0xb7564002 in TransporterFacade::external_poll (this=0x81dfec8, 
    wait_time=10) at TransporterFacade.cpp:611
#12 0xb75640a7 in PollGuard::wait_for_input (this=0xaf0aef5c, wait_time=10)
    at TransporterFacade.cpp:1956
#13 0xb7572af7 in Ndb::waitCompletedTransactions (this=0x8235720, 
    aMilliSecondsToWait=360000, noOfEventsToWaitFor=1, poll_guard=0xaf0aef5c)
    at Ndbif.cpp:1289
#14 0xb7573192 in Ndb::poll_trans (this=0x8235720, aMillisecondNumber=360000, 
    minNoOfEventsToWakeup=1, pg=0xaf0aef5c) at Ndbif.cpp:1353
#15 0xb75732ef in Ndb::sendPollNdb (this=0x8235720, aMillisecondNumber=360000, 
    minNoOfEventsToWakeup=1, forceSend=0) at Ndbif.cpp:1338
#16 0xb7578fee in NdbTransaction::executeNoBlobs (this=0x8238490, 
---Type <return> to continue, or q <return> to quit---
    aTypeOfExec=NdbTransaction::NoCommit, 
    abortOption=NdbOperation::DefaultAbortOption, forceSend=0)
    at NdbTransaction.cpp:535
#17 0xb757930b in NdbTransaction::execute (this=0x8238490, 
    aTypeOfExec=NdbTransaction::NoCommit, 
    abortOption=NdbOperation::DefaultAbortOption, forceSend=0)
    at NdbTransaction.cpp:421
#18 0xb77b5ea3 in ExecuteAll ()
   from /home/jdd/src/mod_ndb-1.1-beta-r555/mod_ndb.so
#19 0xb77b4c56 in Query () from /home/jdd/src/mod_ndb-1.1-beta-r555/mod_ndb.so
#20 0xb77ba2e3 in ndb_handler ()
   from /home/jdd/src/mod_ndb-1.1-beta-r555/mod_ndb.so
#21 0x08079639 in ap_run_handler ()
#22 0x0807ca47 in ap_invoke_handler ()
#23 0x08089d60 in ap_process_request ()
#24 0x0808706b in ?? ()
#25 0x08080c39 in ap_run_process_connection ()
#26 0x0808ed5f in ?? ()
#27 0xb7f39a76 in ?? () from /usr/lib/libapr-1.so.0
#28 0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#29 0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 20 (Thread 0xb08b2b90 (LWP 19947)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7f0caa5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7f2e8ca in apr_thread_cond_wait () from /usr/lib/libapr-1.so.0
#3  0x08090ee3 in ap_queue_pop ()
#4  0x0808ebe8 in ?? ()
#5  0xb7f39a76 in ?? () from /usr/lib/libapr-1.so.0
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

******* several other threads also in ap_queue_pop() just like thread 20 **********

Thread 7 (Thread 0xb60c5b90 (LWP 19934)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7e83881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7643453 in SocketServer::doAccept (this=0x81edbb8)
    at SocketServer.cpp:188
#3  0xb764362a in SocketServer::doRun (this=0x81edbb8) at SocketServer.cpp:266
#4  0xb764366d in socketServerThread_C (_ss=0x81edbb8) at SocketServer.cpp:224
#5  0xb76398f0 in ndb_thread_wrapper (_ss=0x81eeaf0) at NdbThread.c:144
---Type <return> to continue, or q <return> to quit---
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 6 (Thread 0xb60cdb90 (LWP 19933)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7e83881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb75fdd5a in my_sleep (m_seconds=0) at my_sleep.c:31
#3  0xb756974c in ClusterMgr::threadMain (this=0x81f7d18)
    at ../../../../storage/ndb/include/portlib/NdbSleep.h:28
#4  0xb756999d in runClusterMgr_C (me=0x81f7d18) at ClusterMgr.cpp:47
#5  0xb76398f0 in ndb_thread_wrapper (_ss=0x81f3220) at NdbThread.c:144
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 5 (Thread 0xb60d5b90 (LWP 19932)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7e83881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb75fdd5a in my_sleep (m_seconds=0) at my_sleep.c:31
#3  0xb760c1b9 in TransporterRegistry::start_clients_thread (this=0x81eee20)
    at ../../../../../storage/ndb/include/portlib/NdbSleep.h:28
#4  0xb760c54d in run_start_clients_C (me=0x81eee20)
    at TransporterRegistry.cpp:1315
#5  0xb76398f0 in ndb_thread_wrapper (_ss=0x81f30d0) at NdbThread.c:144
---Type <return> to continue, or q <return> to quit---
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 4 (Thread 0xb60ddb90 (LWP 19931)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7e83881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb75fdd5a in my_sleep (m_seconds=0) at my_sleep.c:31
#3  0xb7564866 in TransporterFacade::threadMainSend (this=0x81dfec8)
    at ../../../../storage/ndb/include/portlib/NdbSleep.h:28
#4  0xb756490d in runSendRequest_C (me=0x81dfec8) at TransporterFacade.cpp:526
#5  0xb76398f0 in ndb_thread_wrapper (_ss=0x81f2f80) at NdbThread.c:144
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 3 (Thread 0xb716db90 (LWP 19930)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7e83881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0xb75fdd5a in my_sleep (m_seconds=0) at my_sleep.c:31
#3  0xb75643d3 in TransporterFacade::threadMainReceive (this=0x81dfec8)
    at ../../../../storage/ndb/include/portlib/NdbSleep.h:28
#4  0xb756475d in runReceiveResponse_C (me=0x81dfec8)
    at TransporterFacade.cpp:561
#5  0xb76398f0 in ndb_thread_wrapper (_ss=0x81eeac8) at NdbThread.c:144
#6  0xb7f084fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#7  0xb7e8ae5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 1 (Thread 0xb77d0700 (LWP 19926)):
#0  0xb7f8f402 in __kernel_vsyscall ()
#1  0xb7f10b1a in do_sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7f10bbf in sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#3  0xb7f3a23c in apr_signal_thread () from /usr/lib/libapr-1.so.0
#4  0x0808f059 in ?? ()
#5  0x0808f250 in ?? ()
#6  0x0808f337 in ?? ()
#7  0x0809037a in ap_mpm_run ()
#8  0x08066f58 in main ()
#0  0xb7f8f402 in __kernel_vsyscall ()
(gdb)
[6 Jun 2009 6:17] John David Duncan
I have a simple program that attempts to repeat the exact sequence of NDB API calls -- but so far I have not been able to reproduce the bug with this program.  (Even though in the real application the bug is always reproducible, and on multiple platforms).
[9 Jun 2009 22:47] John David Duncan
The abort() happens here, in NdbReceiver.cpp, in receive_packed_recattr(). 
In 7.0.6 this is at line 339.

  for (Uint32 i = 0, attrId = 0; i<32*bmlen; i++, attrId++)
  {
    if (BitmaskImpl::get(bmlen, aDataPtr, i))
    {
      const NdbColumnImpl & col = 
	NdbColumnImpl::getImpl(* currRecAttr->getColumn());
      if (unlikely(attrId != (Uint32)col.m_attrId))
        goto err;

When it fails, attrId = 0, but col.m_attrid = 1.
[10 Jun 2009 5:19] Jonas Oreland
reproducible test case is needed...

anyway, last time that abort fired was due to a bug
in the *customer* application that inserted corrupt data (due to sharing a buffer between threads/ wo locks), this was bug#44132

so maybe you can try with 7.0.6 and see if you get error back when inserting data...

/Jonas
[11 Jun 2009 18:45] John David Duncan
Here is what is happening with this bug. 

In 6.3.x and prior it is OK to do:
        NdbOperation::getValue("BLOB_Column");
        NdbOperation::getBlobHandle("BLOB_Column") ;
        NdbTransaction::Execute(NoCommit)

In 7.0.x, it is OK to call getBlobHandle() then Execute(), but if you call getValue() first then you get this abort.    

I will be able to supply a little test program soon, but that much is the essence of it.
[12 Jun 2009 3:47] John David Duncan
code to reproduce issue

Attachment: bug-45282.tar.gz (application/x-gzip, text), 5.13 KiB.

[13 Jun 2009 16:01] John David Duncan
The blob API is a bit confusing.  Some things are not clear from the docs and examples:  At what point after what API calls does the NdbBlob state progress from Prepared to Active to Closed?  What API calls are allowed / not allowed /required in each blob state?  What is the difference between ndbBlob::getValue() and readData(), and why do they *both* take a pointer to a result buffer? So, application developers might end up doing "whatever works." 

Having said that -- I do not need the issue in this bug to be addressed; I see how to rewrite my application code so it will work.
[5 Oct 2009 14:45] Frazer Clement
What appears to be happening is :

1) Your with-bug code reads the same column twice, once with getValue, and once implicitly within getBlobHandle()

2) There is a bug in the implementation of a function called 'repack_read()' which determines whether 'packed read' functions can be used.  It does not take into account the possibility of multiple reads of the same column in a single operation.  It uses the READ_ALL mechanism when it is not wanted, resulting in the crash.
Specifically, it observes that the requested attrIds are not descending (they are both 1), and that the number of requested columns is equal to the number of columns in the table (2), so it substitutes READ_ALL.  When the result arrives, the Operation's NdbRecAttr list does not contain an NdbRecAttr for attribute 0, and we fail.
repack_read() should check that attrIds are strictly ascending, with no 'duplicates' .  We could recode the unpacking mechanism to deal with 'duplicates' if this became a significant use case, but it's simpler to fix it to 'just work'.  

3) You did not observe this bug in 6.3.22... as there was some code in getBlobHandle() which, as a side-effect, caused repack_read() not to attempt to repack.  This code was altered later and the side-effect disappeared causing this bug to manifest.
Specifically, getBlobHandle() used to *always* read the partitionId of the Blob head row.  Now it only does this for user-defined partitioned tables.  Reading a pseudo-column 'disabled' the repack_read() mechanism.

Suggest :
 1) Rename bug to NdbApi - duplicate read of column results in failure
 2) Reduce version to 6.2+
[5 Oct 2009 23:14] Frazer Clement
Renaming bug from "NDBAPI - BLOB issue in 7.0.5".

Changing version to mysql-5.1-telco-6.2
[5 Oct 2009 23:15] Frazer Clement
Proposed patch + testcase

Attachment: bug45282.patch (text/x-patch), 10.48 KiB.

[6 Oct 2009 10:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85856

3012 Frazer Clement	2009-10-06
      Bug#45282 NDBAPI - Duplicate read of column results in Api failure
      modified:
        storage/ndb/src/ndbapi/NdbOperationExec.cpp
        storage/ndb/test/include/HugoCalculator.hpp
        storage/ndb/test/ndbapi/testNdbApi.cpp
        storage/ndb/test/run-test/daily-basic-tests.txt
        storage/ndb/test/src/HugoCalculator.cpp
[6 Oct 2009 11:35] Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:frazer@mysql.com-20091006113218-v9ekkvyafoe0xa5o) (version source revid:frazer@mysql.com-20091006113218-v9ekkvyafoe0xa5o) (merge vers: 5.1.39-ndb-7.1.0) (pib:11)
[6 Oct 2009 12:00] Jon Stephens
Documented bugfix in the NDB-7.1.0 changelog as follows:

        A duplicate read of a column caused NDB API applications to 
        crash.

Closed.
[6 Oct 2009 13:01] Jon Stephens
Changelog entry re-tagged as applying to NDB-6.2.19/6.3.28/7.0.9 per Frazer email.