Bug #80845 Primary replica not evenly spread for non-standard tables
Submitted: 24 Mar 2016 10:27 Modified: 27 Jun 2016 13:46
Reporter: Mikael Ronström Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.5.2 OS:Any
Assigned to: CPU Architecture:Any

[24 Mar 2016 10:27] Mikael Ronström
Description:
When creating tables using specific partitioning or new FRAGMENT_COUNT_TYPEs or using
user defined partitioned tables, we attempt to spread the table fragments evenly on
node groups and on LDMs. We do however not attempt to spread the primary replicas
around other than for a single table and even inside a single table we're not really
making a very good job at it.

How to repeat:
Run sysbench in 7.5.2 using 8 partitions, 2 nodes, 2 tables. The tables
will have replicas on all nodes, but only half of the LDMs will have
primary replicas, so load on the LDMs becomes very uneven.

Suggested fix:
Introduce new two-dimensional array of the next primary replica to use in a specific
node group and LDM. This will be initialised at startup with all zeros. For tables
that use ONE_PER_LDM_PER_NODE we will use a temporary array to ensure that those
tables always have the same table distribution.
[14 Jun 2016 10:06] Mauritz Sundell
Posted by developer:
 
Introduce define for max nodegroup (47) or max nodegroups (48).
Approved otherwise.
[27 Jun 2016 13:46] Jon Stephens
Documented fix as follows in the NDB 7.5.4 changelog:

    Primary replicas of partitioned tables were not distributed
    evenly among node groups and local data managers.

    As part of the fix for this issue, the maximum number of node
    groups supported for a single MySQL Cluster, which was
    previously not determined, is now set at 48.

Closed.
[1 Jul 2016 13:34] Mikael Ronström
Posted by developer:
 
A fix for this bug have been pushed (using the wrong bug number 23601841).
The array is now initialised with not all zeroes, rather it is initialised such that
the node index is log part id modulo number of replicas.

This has the effect that we spread the primary replica differently for different LDMs.
This will give more balance between nodes while retaining balance between LDMs for those
tables that have a small number of partitions.
[8 Jul 2016 9:14] Jon Stephens
This appears to be an improvement on the previous fix, but is in the same version, and does not seem to require any change in the existing changelog entry.