Bug #53580 abort() in multi-threaded index rebuild on node restart
Submitted: 11 May 2010 17:40 Modified: 25 May 2010 9:45
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-6.2 OS:Linux
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: mysql-cluster-6.3.32

[11 May 2010 17:40] Hartmut Holzgraefe
Description:
crash happens in Dbtux::mt_buildIndexFragment_wrapper() on line 49 in 
storage/ndb/src/kernel/blocks/dbtux/DbtuxBuild.cpp

 48     if (!(UintPtr(ptr) - UintPtr(req->mem_buffer) <= req->buffer_size))
 49       abort();

How to repeat:
...

Suggested fix:
?
[11 May 2010 17:45] Hartmut Holzgraefe
first occurred after a power failure
[24 May 2010 7:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108990

3099 Jonas Oreland	2010-05-24
      ndb - bug#53580 - fix bug that caused alloc(#requested, #min) to sometimes allocate less than #min, causing later problems
[24 May 2010 8:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108992

3202 Jonas Oreland	2010-05-24
      ndb - bug#53580 - part II (>= 6.3) - ndbrequire that we got what we asked for during mtoib
[24 May 2010 8:07] Bugs System
Pushed into 5.1.44-ndb-6.3.34 (revid:jonas@mysql.com-20100524080154-74syl9t60ohrfl9j) (version source revid:jonas@mysql.com-20100524080154-74syl9t60ohrfl9j) (merge vers: 5.1.44-ndb-6.3.34) (pib:16)
[24 May 2010 8:07] Bugs System
Pushed into 5.1.44-ndb-7.0.15 (revid:jonas@mysql.com-20100524080447-jl195st9spefjway) (version source revid:jonas@mysql.com-20100524080447-jl195st9spefjway) (merge vers: 5.1.44-ndb-7.0.15) (pib:16)
[24 May 2010 8:10] Jonas Oreland
DOCS: A bug in internal buddy allocator could make
"alloc(#wanted, #min)" which should try to allocate #wanted, but is allowed to allocate between #wanted-#min to allocate less than #min, causing problem during multi-threaded ordered index build.

Note: this could also theoretically(but unlikely) cause
problems in other areas of code.

pushed to 6.2.19, 6.3.34, 7.0.15 and 7.1.4
[25 May 2010 9:45] Jon Stephens
Documented bugfix in the NDB-6.2.19, 6.3.34, 7.0.15, and 7.1.4 changelogs, as follows:

      An internal buffer allocator used by NDB has the form 
      'alloc(*wanted*, *minimum*)' and attempts to allocate *wanted* 
      pages, but is permitted to allocate a smaller number of pages 
      between *wanted* and *minimum*. However, this allocator could 
      sometimes allocate fewer than *minimum* pages, causing problems 
      with multi-threaded builds of ordered indexes.

Closed.
[4 Jun 2010 10:07] Ricky Chan
Would this explain a rare bug I have seen?

Namely (an arbitrary example):

A unique hash index exists for column varchar column (say abc) NULL is allowed called index1. (as hash only, no btree NULL is fine).

A row exists where abc = 'hello world'

An update (made via mysqld) reset abc to NULL. (abc = NULL where abc = 'hello world').

Now you try to set another row to abc = 'hello world', you get duplicate key exists for '' now.

I wrote some code using the C++ NDB API and found the following.

I readTuple for index1, looking for abc = 'hello world'.

Firstly:

 * A row is found!! this should not be the case.
 * On reading the row, the value for abc = NULL!!

So the data is changed but the index has been updated.

Rebuilding the indexes fixes this but it's a bit of concern for apps, which uses these type of indexes.

I am using mysql cluster 7.0.13.

If this indeed would fix this bug, when is a General Available due? I see that 7.1.3 (latest GA) doesn't have it yet...

Many Thanks.

Ricky

p.s. If I can re-create this in lab, I'll update on how to replicate it.  It's a strange one, that the data gets updated to NULL but the index entry for it still exists cause duplicate error where there is none.