Description:
When create partition table with charset utf8, the data distribute is not good
How to repeat:
First, use an python script to generate data:
$ cat gen_data.py
file = open("/tmp/a.txt","w")
for i in range(1,10000):
file.write(str(i))
file.write('|')
file.write(str(i))
file.write('\n')
file.close()
phoenix@phoenix-Latitude-5491:~/test$ python gen_data.py
Then, connect to mysql, create 2 tables, and load data:
mysql> create table t1 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8 partition by key(c1) partitions 4;
Query OK, 0 rows affected, 1 warning (0.17 sec)
mysql> create table t2 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8mb4 partition by key(c1) partitions 4;
Query OK, 0 rows affected (0.22 sec)
mysql> load data infile '/tmp/a.txt' into table t1 fields terminated by '|';
Query OK, 9999 rows affected (3.50 sec)
Records: 9999 Deleted: 0 Skipped: 0 Warnings: 0
mysql> load data infile '/tmp/a.txt' into table t2 fields terminated by '|';
Query OK, 9999 rows affected (3.85 sec)
Records: 9999 Deleted: 0 Skipped: 0 Warnings: 0
Then, check data distribute, it quite bad for utf8:
mysql> select count(*) from t1 partition(p0);
+----------+
| count(*) |
+----------+
| 3376 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t1 partition(p1);
+----------+
| count(*) |
+----------+
| 312 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t1 partition(p2);
+----------+
| count(*) |
+----------+
| 5999 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t1 partition(p3);
+----------+
| count(*) |
+----------+
| 312 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t2 partition(p0);
+----------+
| count(*) |
+----------+
| 2500 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t2 partition(p1);
+----------+
| count(*) |
+----------+
| 2500 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from t2 partition(p2);
+----------+
| count(*) |
+----------+
| 2500 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from t2 partition(p3);
+----------+
| count(*) |
+----------+
| 2499 |
+----------+
1 row in set (0.00 sec)
Description: When create partition table with charset utf8, the data distribute is not good How to repeat: First, use an python script to generate data: $ cat gen_data.py file = open("/tmp/a.txt","w") for i in range(1,10000): file.write(str(i)) file.write('|') file.write(str(i)) file.write('\n') file.close() phoenix@phoenix-Latitude-5491:~/test$ python gen_data.py Then, connect to mysql, create 2 tables, and load data: mysql> create table t1 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8 partition by key(c1) partitions 4; Query OK, 0 rows affected, 1 warning (0.17 sec) mysql> create table t2 (c1 varchar(64) primary key, c2 int) engine=innodb charset utf8mb4 partition by key(c1) partitions 4; Query OK, 0 rows affected (0.22 sec) mysql> load data infile '/tmp/a.txt' into table t1 fields terminated by '|'; Query OK, 9999 rows affected (3.50 sec) Records: 9999 Deleted: 0 Skipped: 0 Warnings: 0 mysql> load data infile '/tmp/a.txt' into table t2 fields terminated by '|'; Query OK, 9999 rows affected (3.85 sec) Records: 9999 Deleted: 0 Skipped: 0 Warnings: 0 Then, check data distribute, it quite bad for utf8: mysql> select count(*) from t1 partition(p0); +----------+ | count(*) | +----------+ | 3376 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t1 partition(p1); +----------+ | count(*) | +----------+ | 312 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t1 partition(p2); +----------+ | count(*) | +----------+ | 5999 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t1 partition(p3); +----------+ | count(*) | +----------+ | 312 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t2 partition(p0); +----------+ | count(*) | +----------+ | 2500 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t2 partition(p1); +----------+ | count(*) | +----------+ | 2500 | +----------+ 1 row in set (0.01 sec) mysql> select count(*) from t2 partition(p2); +----------+ | count(*) | +----------+ | 2500 | +----------+ 1 row in set (0.02 sec) mysql> select count(*) from t2 partition(p3); +----------+ | count(*) | +----------+ | 2499 | +----------+ 1 row in set (0.00 sec)