Bug #101310 partition by key distribute unbalance for utf8
Submitted: 26 Oct 2020 4:39 Modified: 26 Oct 2020 5:54
Reporter: phoenix Zhang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Partitions Severity:S3 (Non-critical)
Version:8.0.21, 8.0.22 OS:Any
Assigned to: CPU Architecture:Any
Tags: partition

[26 Oct 2020 4:39] phoenix Zhang
Description:
When create partition table with charset utf8, the data distribute is not good

How to repeat:
First, use an python script to generate data:

$ cat gen_data.py 
file = open("/tmp/a.txt","w")

for i in range(1,10000):
  file.write(str(i))
  file.write('|')
  file.write(str(i))
  file.write('\n')
file.close()
phoenix@phoenix-Latitude-5491:~/test$ python gen_data.py 

Then, connect to mysql, create 2 tables, and load data:
mysql> create table t1 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8 partition by key(c1) partitions 4;
Query OK, 0 rows affected, 1 warning (0.17 sec)

mysql> create table t2 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8mb4 partition by key(c1) partitions 4;
Query OK, 0 rows affected (0.22 sec)

mysql> load data infile '/tmp/a.txt' into table t1 fields terminated by '|';
Query OK, 9999 rows affected (3.50 sec)
Records: 9999  Deleted: 0  Skipped: 0  Warnings: 0

mysql> load data infile '/tmp/a.txt' into table t2 fields terminated by '|';
Query OK, 9999 rows affected (3.85 sec)
Records: 9999  Deleted: 0  Skipped: 0  Warnings: 0

Then, check data distribute, it quite bad for utf8:
mysql> select count(*) from t1 partition(p0);
+----------+
| count(*) |
+----------+
|     3376 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t1 partition(p1);
+----------+
| count(*) |
+----------+
|      312 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t1 partition(p2);
+----------+
| count(*) |
+----------+
|     5999 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t1 partition(p3);
+----------+
| count(*) |
+----------+
|      312 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t2 partition(p0);
+----------+
| count(*) |
+----------+
|     2500 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t2 partition(p1);
+----------+
| count(*) |
+----------+
|     2500 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from t2 partition(p2);
+----------+
| count(*) |
+----------+
|     2500 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from t2 partition(p3);
+----------+
| count(*) |
+----------+
|     2499 |
+----------+
1 row in set (0.00 sec)
[26 Oct 2020 5:54] MySQL Verification Team
Hello phoenix Zhang!

Thank you for the report and test case.
Verified as described with 8.0.22 build.

regards,
Umes