Bug #72428 Partition by KEY() results in uneven data distribution
Submitted: 23 Apr 2014 0:12 Modified: 9 Jun 2014 15:17
Reporter: Justin Swanhart Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Partitions Severity:S2 (Serious)
Version:5.6.16,5.5.37 OS:Any
Assigned to: CPU Architecture:Any
Tags: partitioning

[23 Apr 2014 0:12] Justin Swanhart
Description:
I have the following table, from the Star Schema Benchmark:
CREATE TABLE IF NOT EXISTS lineorder
(
 LO_OrderKey bigint not null,
 LO_LineNumber tinyint not null,
 LO_CustKey int not null,
 LO_PartKey int not null,
 LO_SuppKey int not null,
 LO_OrderDateKey int not null,
 LO_OrderPriority varchar(15),
 LO_ShipPriority char(1),
 LO_Quantity tinyint,
 LO_ExtendedPrice decimal,
 LO_OrdTotalPrice decimal,
 LO_Discount decimal,
 LO_Revenue decimal,
 LO_SupplyCost decimal,
 LO_Tax tinyint,
 LO_CommitDateKey int not null,
 LO_ShipMode varchar(10),
 primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)
) PARTITION BY KEY() PARTITIONS 8;

I loaded the table with 8 concurrent LOAD DATA INFILE commands.

The majority of the rows ended up in one partition (p4):
*************************** 157. row ***************************
    table_name: lineorder
partition_name: p0
    table_rows: 0
avg_row_length: 0
*************************** 158. row ***************************
    table_name: lineorder
partition_name: p1
    table_rows: 0
avg_row_length: 0
*************************** 159. row ***************************
    table_name: lineorder
partition_name: p2
    table_rows: 0
avg_row_length: 0
*************************** 160. row ***************************
    table_name: lineorder
partition_name: p3
    table_rows: 0
avg_row_length: 0
*************************** 161. row ***************************
    table_name: lineorder
partition_name: p4
    table_rows: 58639997
avg_row_length: 155
*************************** 162. row ***************************
    table_name: lineorder
partition_name: p5
    table_rows: 0
avg_row_length: 0
*************************** 163. row ***************************
    table_name: lineorder
partition_name: p6
    table_rows: 0
avg_row_length: 0
*************************** 164. row ***************************
    table_name: lineorder
partition_name: p7
    table_rows: 0
avg_row_length: 0

I use a select count(*) using the PARTITION clause to verify that indeed, partition 4 contained many more rows than any other partition (most have none).

How to repeat:
Create the table:
CREATE TABLE IF NOT EXISTS lineorder
(
 LO_OrderKey bigint not null,
 LO_LineNumber tinyint not null,
 LO_CustKey int not null,
 LO_PartKey int not null,
 LO_SuppKey int not null,
 LO_OrderDateKey int not null,
 LO_OrderPriority varchar(15),
 LO_ShipPriority char(1),
 LO_Quantity tinyint,
 LO_ExtendedPrice decimal,
 LO_OrdTotalPrice decimal,
 LO_Discount decimal,
 LO_Revenue decimal,
 LO_SupplyCost decimal,
 LO_Tax tinyint,
 LO_CommitDateKey int not null,
 LO_ShipMode varchar(10),
 primary key(LO_OrderDateKey,LO_PartKey,LO_SuppKey,LO_Custkey,LO_OrderKey,LO_LineNumber)
) PARTITION BY KEY() PARTITIONS 8;

Generate a ssb scale factor 10 lineorder table
  git clone git@github.com:greenlion/ssb-dbgen.git
  cd ssb-dbgen
  make
  ./ssb-dbgen -s 10 -T l
OR
  use s3cmd (https://github.com/s3tools/s3cmd)
  s3cmd get s3://SQtestbucket/lineorder.tbl (this is a 6GB file)

split the lineorder table into 8 parts

use 8 load data infile in parallel, each on 1 part (not sure this is necessary to reproduce)

Use information_schema.partitions to confirm uneven distribution

Suggested fix:
Unknown.
[23 Apr 2014 2:26] Justin Swanhart
There are 2406 LO_OrderDateKey values (and it is an integer) so that should have definitely resulted in more than one partition being utilized.
[24 Apr 2014 20:40] Justin Swanhart
Here is an easy to reproduce simple test case from 5.5.37:

mysql> create table t1(c1 int, c2 date) partition by key(c2) partitions 8; Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-22');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-23');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-24');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-25');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-26');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-27');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-28');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-29');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-30');
Query OK, 1 row affected (0.00 sec)

mysql> select partition_name, table_rows from information_schema.partitions where table_name='t1'\G
*************************** 1. row ***************************
partition_name: p0
    table_rows: 3
*************************** 2. row ***************************
partition_name: p1
    table_rows: 0
*************************** 3. row ***************************
partition_name: p2
    table_rows: 0
*************************** 4. row ***************************
partition_name: p3
    table_rows: 0
*************************** 5. row ***************************
partition_name: p4
    table_rows: 2
*************************** 6. row ***************************
partition_name: p5
    table_rows: 0
*************************** 7. row ***************************
partition_name: p6
    table_rows: 4
*************************** 8. row ***************************
partition_name: p7
    table_rows: 0
8 rows in set (0.00 sec)
[28 Apr 2014 13:02] MySQL Verification Team
Hello Justin,

Thank you for the bug report and simplified test case.
Verified as described.

Thanks,
Umesh
[28 Apr 2014 13:04] MySQL Verification Team
/// Latest builds of 5.5/5.6

mysql> use test
Database changed
mysql> create table t1(c1 int, c2 date) partition by key(c2) partitions 8; 
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-22');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-23');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-24');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-25');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-27');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (1, '2014-04-28');
Query OK, 1 row affected (0.01 sec)

mysql>  insert into t1 values (1, '2014-04-29');
Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values (1, '2014-04-30');
Query OK, 1 row affected (0.00 sec)

mysql> select partition_name, table_rows from information_schema.partitions where table_name='t1'\G
*************************** 1. row ***************************
partition_name: p0
    table_rows: 2
*************************** 2. row ***************************
partition_name: p1
    table_rows: 0
*************************** 3. row ***************************
partition_name: p2
    table_rows: 0
*************************** 4. row ***************************
partition_name: p3
    table_rows: 0
*************************** 5. row ***************************
partition_name: p4
    table_rows: 2
*************************** 6. row ***************************
partition_name: p5
    table_rows: 0
*************************** 7. row ***************************
partition_name: p6
    table_rows: 4
*************************** 8. row ***************************
partition_name: p7
    table_rows: 0
8 rows in set (0.00 sec)

mysql> select version();
+------------------+
| version()        |
+------------------+
| 5.5.38-debug-log |
+------------------+
1 row in set (0.00 sec)

// 5.6.18

mysql> select version();
+---------------------------------------+
| version()                             |
+---------------------------------------+
| 5.6.18-enterprise-commercial-advanced |
+---------------------------------------+
1 row in set (0.00 sec)

mysql> create table t1(c1 int, c2 date) partition by key(c2) partitions 8;
Query OK, 0 rows affected (2.31 sec)

mysql> insert into t1 values (1, '2014-04-22');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-23');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-24');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-25');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-26');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-27');
Query OK, 1 row affected (0.04 sec)

mysql> insert into t1 values (1, '2014-04-28');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values (1, '2014-04-29');
Query OK, 1 row affected (0.04 sec)

mysql> insert into t1 values (1, '2014-04-30');
Query OK, 1 row affected (0.02 sec)

mysql> select partition_name, table_rows from information_schema.partitions where table_name='t1'\G
*************************** 1. row ***************************
partition_name: p0
    table_rows: 3
*************************** 2. row ***************************
partition_name: p1
    table_rows: 0
*************************** 3. row ***************************
partition_name: p2
    table_rows: 0
*************************** 4. row ***************************
partition_name: p3
    table_rows: 0
*************************** 5. row ***************************
partition_name: p4
    table_rows: 2
*************************** 6. row ***************************
partition_name: p5
    table_rows: 0
*************************** 7. row ***************************
partition_name: p6
    table_rows: 4
*************************** 8. row ***************************
partition_name: p7
    table_rows: 0
8 rows in set (0.00 sec)
[9 Jun 2014 15:17] Mattias Jonsson
This is a duplicate of bug#65112.

Notice that the behavior is due to the KEY hashing algorithm, which will not be changed since it will need current KEY partitioned tables to be rebuilt. But what can be done is to implement another KEY hashing algorithm that has a better distribution (such as MD5, which is used by NDB).
[29 Jun 2021 18:25] Shreya Gupta
I am also facing the same bug. Is there any possible solution for it?