Bug #115434 | collation: utf8mb4_generail_ci can result in serious data skew | ||
---|---|---|---|
Submitted: | 26 Jun 2024 8:35 | Modified: | 1 Jul 2024 8:35 |
Reporter: | Chaofan Wang | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | utf8mb4 charset collation |
[26 Jun 2024 8:35]
Chaofan Wang
[26 Jun 2024 9:38]
MySQL Verification Team
Hi Mr. Wang, Thank you for your bug report. We managed to repeat it with latest 8.0 and 8.4: TABLE_SCHEMA TABLE_NAME PARTITION_NAME TABLE_ROWS sc users p0 185 sc users p1 0 sc users p2 385 sc users p3 1 sc users p4 151 sc users p5 3 sc users p6 274 sc users p7 1 However, this is not a bug, but a feature request. A new feature would be a better distribution of the values among the partitions. Verified as a feature request for the version 8.0 and higher. Thank you for pointing us to the problem in the code.
[26 Jun 2024 18:49]
Bernt Marius Johnsen
I suggest you use utf8mb4_0900_ai_ci which works much better (in several ways). Using your repro but with utf8mb4_0900_ai_ci I get: mysql> show create table users\G *************************** 1. row *************************** Table: users Create Table: CREATE TABLE `users` ( `username` varchar(255) NOT NULL, `id` int DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!50100 PARTITION BY KEY (username) PARTITIONS 8 */ 1 row in set (0,01 sec) mysql> SELECT table_schema, table_name, partition_name, table_rows FROM information_schema.partitions WHERE table_name = 'users' AND table_schema = 'test'; +--------------+------------+----------------+------------+ | TABLE_SCHEMA | TABLE_NAME | PARTITION_NAME | TABLE_ROWS | +--------------+------------+----------------+------------+ | test | users | p0 | 132 | | test | users | p1 | 113 | | test | users | p2 | 117 | | test | users | p3 | 115 | | test | users | p4 | 130 | | test | users | p5 | 126 | | test | users | p6 | 152 | | test | users | p7 | 115 | +--------------+------------+----------------+------------+ 8 rows in set (0,00 sec)
[27 Jun 2024 10:13]
MySQL Verification Team
Thank you, Bernt.
[1 Jul 2024 8:35]
Roy Lyseng
Since, we cannot change existing collations, and there is a reasonable workaround to upgrade to a more recent collation, we are closing this report as it is not feasible to fix.
[1 Jul 2024 11:20]
MySQL Verification Team
Thank you, Roy.