Bug #84374 Primary key is the not the default sharding key anymore
Submitted: 30 Dec 2016 10:10 Modified: 4 Jan 2017 6:34
Reporter: Abdel-Mawla Gharieb Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.3.15, 7.4.13 & 7.5.4 OS:Any
Assigned to: CPU Architecture:Any
Tags: partitioning, primary key

[30 Dec 2016 10:10] Abdel-Mawla Gharieb
Description:
In MySQL Cluster 7.2, the default sharding key is the PK while if the table has no PK, a hidden column is being used as a sharding key instead.
Starting from MySQL Cluster 7.3, looks like the hidden column is always considered as the sharding key even if there is a PK in that table!

How to repeat:
Testing in a cluster of 4 datanodes as below:

In MySQL Cluster 7.2.26:

mysql> CREATE TABLE shard_check_pk (id int(11) primary key) engine=ndbcluster;
Query OK, 0 rows affected (1.17 sec)

mysql>  explain partitions select * from shard_check_pk;
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table       | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | shard_check_pk | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL |    2 |       |
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)

If PK is the sharding key, then a PK value lookup will be served from only one partition:

mysql> INSERT INTO shard_check_pk values (1);
Query OK, 1 row affected (0.00 sec)

mysql>  explain partitions select * from shard_check_pk where id=1;
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+-------+
| id | select_type | table       | partitions | type   | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | shard_check_pk | p3         | eq_ref | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

According to the explain plan above, PK is the default sharding key.

When creating a table without PK, a hidden column is being used as a sharding key instead:

mysql> CREATE TABLE shard_check_no_pk (id int(11)) engine=ndbcluster;
Query OK, 0 rows affected (0.19 sec)

mysql> explain partitions select * from shard_check_no_pk;
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | shard_check_no_pk    | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL |    2 |       |
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-------+
1 row in set (0.01 sec)

mysql> INSERT INTO shard_check_no_pk values (1);
Query OK, 1 row affected (0.00 sec)

mysql> explain partitions select * from shard_check_no_pk where id=1;
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------------------------------+
| id | select_type | table | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra                             |
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------------------------------+
|  1 | SIMPLE      | shard_check_no_pk    | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where with pushed condition |
+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------------------------------+
1 row in set (0.00 sec)

The hidden column - as in the previous example - is the default sharding key in MySQL Cluster 7.3, 7.4 and 7.5 even if the table has a PK as below:

In MySQL Cluster 7.3.15, 7.4.13 & 7.5.4:

mysql> CREATE TABLE shard_check_pk (id int(11) primary key) engine=ndbcluster;
Query OK, 0 rows affected (0.38 sec)

mysql> explain select * from shard_check_pk;
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table       | partitions  | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | shard_check_pk | p0,p1,p2,p3 | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | NULL  |
+----+-------------+-------------+-------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.01 sec)

mysql> INSERT INTO shard_check_pk values (1);
Query OK, 1 row affected (0.01 sec)

mysql> EXPLAIN SELECT * FROM shard_check_pk WHERE id=1;
+----+-------------+-------------+-------------+--------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table       | partitions  | type   | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------------+-------------+--------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | shard_check_pk | p0,p1,p2,p3 | eq_ref | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------------+-------------+--------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

To override this behavior we have to explicitly specify the shard key:

mysql> ALTER TABLE shard_check_pk PARTITION BY KEY (`id`);
Query OK, 1 row affected (1.10 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM shard_check_pk WHERE id=1;
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table       | partitions | type   | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | shard_check_pk | p3         | eq_ref | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

Suggested fix:
Same as in MySQL Cluster 7.2, the primary key should be used as the default sharding key.
[4 Jan 2017 6:34] MySQL Verification Team
Hi,

Thanks for your report, verified as described.

all best
Bogdan
[3 Feb 2017 13:11] MySQL Verification Team
Just a short update, the bug here is in EXPLAIN that display all partitions as search criteria. The sharding key is as always PK if exists and "internal" if PK does not exist. This EXPLAIN bug is being fixed, should be out soon.

all best
Bogdan
[3 Feb 2017 13:14] Mauritz Sundell
Posted by developer:
 
The default sharding key is still the primary key!

To see that 7.5.4 does not behave as hidden pk case in 7.2.26, look also in the other columns of explain output. The partitions column should be ignored for NDB tables.

Still, you found a bug. The explain output should not depend on how partitioning is specified as long as it is the same.

For NDB the partition pruning is actually not done by the optimizer in MySQL server but within NDB itself.

To inspect the table definition used within NDB one can use ‘ndb_desc -d <database-name> shard_check_pk’ it should reveal that there are no extra hidden columns, and also that the sharding key called distribution key in NDB is the same as the primary key.