Bug #39879 NDBAPI : Wrong error reported for SendBuffer overload
Submitted: 6 Oct 2008 13:57 Modified: 12 Nov 2008 14:16
Reporter: Frazer Clement
Status: Closed
Category:Server: NDBAPI Severity:S3 (Non-critical)
Version:5.1-telco-6.2.15 OS:Any
Assigned to: Frazer Clement Target Version:
Triage: Needs Triage: D3 (Medium)

[6 Oct 2008 13:57] Frazer Clement
Description:
Bug 39867 occurs when messages due to Blob part operations are discarded within the Ndb
kernel, and the handler's ActiveHook callback does not correctly indicate the error to
the upper layers.

Modifying the ActiveHook to notify the upper layers of an error from the NdbTransaction
shows that the error on the transaction is 1297, Time-out in NDB, probably caused by
deadlock.

This message indicates that TC has timed-out waiting to hear back from the API.

Looking at debug trace output from the MySQLD, it appears that the MySQLD is still
waiting to hear the results of all of its submitted operations from TC when it receives a
TCROLLBACKREP, probably carrying the timeout error code.

This is confusing for users, as it indicates that some locking issue may be at fault,
when in reality it is a buffer configuration problem.

The system should indicate the true source of the problem in this case.

How to repeat:
Run example program from bug#39867 against mysql-5.1-telco-6.2.15 with default SendBuffer
size.

Modify ha_ndbcluster.cc to use error code from transaction when readData() fails and the
Blob object has no error (see below).

In cases where SendBuffer overload occurs, timeout is given as the transaction failure
reason. 

=== modified file 'sql/ha_ndbcluster.cc'
--- sql/ha_ndbcluster.cc        2008-02-13 13:42:22 +0000
+++ sql/ha_ndbcluster.cc        2008-10-06 09:07:30 +0000
@@ -806,6 +806,13 @@ int g_get_ndb_blobs_value(NdbBlob *ndb_b
     ha->m_blobs_buffer_size= ha->m_blob_total_size;
   }

+  if (unlikely(ha->m_thd_ndb == NULL))
+  {
+    DBUG_ASSERT(FALSE);
+    DBUG_RETURN(-1);
+  }
+  NdbTransaction* trans= ha->m_thd_ndb->trans;
+
   /*
     Now read all blob data.
     If we know the destination mysqld row, we also set the blob null bit and
@@ -836,7 +843,9 @@ int g_get_ndb_blobs_value(NdbBlob *ndb_b
       uchar *buf= ha->m_blobs_buffer + offset;
       uint32 len= ha->m_blobs_buffer_size - offset;
       if (ndb_blob->readData(buf, len) != 0)
-          ERR_RETURN(ndb_blob->getNdbError());
+        ERR_RETURN((ndb_blob->getNdbError().code == 0)?
+                   trans->getNdbError():
+                   ndb_blob->getNdbError());
       DBUG_PRINT("info", ("[%u] offset: %u  buf: 0x%lx  len=%u",
                           i, offset, (long) buf, len));
       DBUG_ASSERT(len == len64);

Suggested fix:
Determine why API is still waiting to hear from kernel (not all TCKEYREFs
sent/received?)
Fix.
[6 Oct 2008 13:57] Frazer Clement
MySQLD trace of blob error handling

Attachment: blob_bug_trace.txt (text/plain), 39.82 KiB.

[3 Nov 2008 17:20] Frazer Clement
Bug#39867 contains a proposed patch to fix this bug.
[12 Nov 2008 14:16] Jon Stephens
See Bug#39867 for changelog entry for this fix.