Bug #47859 Node restart can fail with many "big" tables
Submitted: 6 Oct 2009 12:04 Modified: 6 Oct 2009 12:46
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[6 Oct 2009 12:04] Jonas Oreland
Description:
As an optimization in 7.0 makes DICT copy several tables at a time when sync:ing dictionary to starting node (in pre 7.0, this is done 1 table at a time)

However, if the tables are big enough, the internal buffer for storing them
could get full, causing node crash.

How to repeat:
create ~100 tables with 128 columns each
restart node

Suggested fix:
Check that you can receive a table, before deciding to copy one more in
a batch
[6 Oct 2009 12:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85874

3079 Jonas Oreland	2009-10-06
      ndb - bug#47859 - make sure enough buffer is available before copying another table during NR
[6 Oct 2009 12:22] Jonas Oreland
pushed to 7.0.9 and 7.1
[6 Oct 2009 12:46] Jon Stephens
Documented fix in the NDB-7.0.9 changelog as follows:

        An optimization in MySQL Cluster NDB 7.0 causes the
        DBDICT kernel block to copy several tables at
        a time when synchronizing the data dictionary to a newly-started
        node; previously, this was done one table at a time. However,
        when NDB tables were sufficiently large and
        numerous, the internal buffer for storing them could fill up,
        causing a data node crash.

        In testing, it was found that having 100 NDB
        tables with 128 columns each was enough to trigger this issue.

Closed.