MySQL Bugs: #59751: Fragment calculation when using MAX_ROWS with standby ndbd nodes is incorrect

Bug #59751	Fragment calculation when using MAX_ROWS with standby ndbd nodes is incorrect
Submitted:	26 Jan 2011 17:36	Modified:	1 Feb 2011 17:06
Reporter:	Daniel Smythe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.9	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	cluster, error, fragment, MAX_ROWS

Description:
When attempting to create a table on a cluster with many 'standby' ndbd nodes ( using nodegroup=65536 ) and using MAX_ROWS, mysqld incorrectly calculates the number of fragments to use, which results in a failed table creation. This in turn 'ghosts' the table, preventing any further attempts to create it even if the table structure is simplified and/or partitioned correctly. We will submit another bug describing the 'ghost' table issue.

How to repeat:
1. Setup a 24 node cluster, 22 of them using nodegroup=65536.
2. Start two data nodes using --nowait-nodes.
3. Start mysqld ( must be clean start and/or restarted ).
4. Execute SQL:

mysql> CREATE TABLE test ( id INT ) ENGINE=ndb MAX_ROWS=10000000;
ERROR 1005 (HY000): Can't create table 'test.test' (errno: 140)

mysql> show errors;
+-------+------+----------------------------------------------+
| Level | Code | Message                                      |
+-------+------+----------------------------------------------+
| Error | 1296 | Got error 1224 'Too many fragments' from NDB |
| Error | 1005 | Can't create table 'test.test' (errno: 140)  |
+-------+------+----------------------------------------------+
2 rows in set (0.00 sec)

Using the following patch, we can see where the number of fragments is incorrectly adjusted to the number of nodes:

/*patch*/
--- mysql-cluster-gpl-7.1.9/sql/ha_ndbcluster.cc 2010-11-05 00:12:51.000000000 -0700
+++ mysql-cluster-gpl-7.1.9-ds/sql/ha_ndbcluster.cc 2011-01-25 17:38:41.999940922 -0700
@@ -7751,6 +7751,12 @@
}

uint reported_frags= no_fragments;
+ push_warning_printf(current_thd,
+ MYSQL_ERROR::WARN_LEVEL_WARN, ER_UNKNOWN_ERROR,
+ "Initial reported_frags: %d, Initial no_fragments %d, Initial no_nodes: %d",
+ reported_frags,
+ no_fragments,
+ no_nodes);
if (adjusted_frag_count(no_fragments, no_nodes, reported_frags))
{
push_warning(current_thd,
@@ -7758,6 +7764,12 @@
"Ndb might have problems storing the max amount "
"of rows specified");
}
+ push_warning_printf(current_thd,
+ MYSQL_ERROR::WARN_LEVEL_WARN, ER_UNKNOWN_ERROR,
+ "Adjusted reported_frags: %d, Adjusted no_fragments %d, Adjusted no_nodes: %d",
+ reported_frags,
+ no_fragments,
+ no_nodes);
tab.setFragmentCount(reported_frags);
tab.setDefaultNoPartitionsFlag(false);
tab.setFragmentData(0, 0);
/*patch*/

This shows the following:

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------------------------------+
| Warning | 1105 | Initial reported_frags: 1, Initial no_fragments 1, Initial no_nodes: 24 |
| Warning | 1105 | Adjusted reported_frags: 24, Adjusted no_fragments 1, Adjusted no_nodes: 24 |
| Error | 1296 | Got error 1224 'Too many fragments' from NDB |
| Error | 1005 | Can't create table 'test.test' (errno: 140) |
+---------+------+--------------------------------------------------------------------------+
4 rows in set (0.00 sec)

The error is generated because, during table creation in sql/ha_ndbcluster.cc:7754, no_fragments, no_nodes, and reported_frags are passed into adjusted_frag_count() which modifies the reported_frags value ( by reference ) and sets it equal to no_nodes.

adjusted_frag_count() is defined in sql/ha_ndbcluster.cc:7394, and is set to no_nodes on 7403.

So from this you can see that even though 22 of the nodes are 'standby', mysqld is still factoring them into the calculation for number of fragments when using MAX_ROWS.

Suggested fix:
mysqld should not consider 'standby' nodes in fragment calculation for MAX_ROWS.

It might be safe to use # of ready nodes, since schema operations will not succeed if nodes are starting/stopping or the cluster is not fully started.

Please see Bug #59756 regarding 'ghost' table issue.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129847

4152 Jonas Oreland	2011-01-28
      ndb - bug#59751 - only "use" nodes with nodegroup when computing frag-count

Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.22 (revid:jonas@mysql.com-20110128112755-8iiuck6ewd3krtgo) (version source revid:jonas@mysql.com-20110128112627-c02onpmhv1kcy0lb) (merge vers: 5.1.51-ndb-7.0.22) (pib:24)

pushed to 7.0.22 and 7.1.11

Documented fix in the NDB-7.0.22 and 7.1.11 changelogs as follows:

        When attempting to create a table on a cluster with many
        'standby' data nodes (setting Nodegroup=65536 in config.ini for
        the nodes that should wait and starting the nodes that should
        start immediately with the --nowait-nodes option) and using the
        CREATE TABLE statement's MAX_ROWS option, mysqld incorrectly
        calculated the number of fragments to use. This caused CREATE
        TABLE to fail.

        Note: The CREATE TABLE failure caused by this issue in turn
        prevented any further attempts to create the table, even if the
        table structure was simplified or changed in such a way that the
        attempt should have succeeded. This "ghosting" issue is handled
        in BUG#59756.

Closed.