Bug #109548 | Poor performance when using HASH field to check unique | ||
---|---|---|---|
Submitted: | 9 Jan 2023 7:55 | Modified: | 20 Dec 2023 12:33 |
Reporter: | Hope Lee (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S5 (Performance) |
Version: | 8.0.13,8.0.31 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[9 Jan 2023 7:55]
Hope Lee
[9 Jan 2023 8:54]
MySQL Verification Team
Hello Hope Lee, Thank you for the report and feedback. Please ensure to re-send the patch via "contribution" tab. Otherwise we would not be able to accept it. Thank you! regards, Umesh
[9 Jan 2023 8:55]
MySQL Verification Team
- 8.0.31 with default config rm -rf 109548/ bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/109548 --log-error-verbosity=3 bin/mysqld_safe --no-defaults --mysqld-version='' --basedir=$PWD --datadir=$PWD/109548 --core-file --socket=/tmp/mysql.sock --port=3306 --log-error=$PWD/109548/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3 --secure-file-priv="" --local-infile=1 2>&1 & bin/mysql -uroot -S /tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.0.31 MySQL Community Server - GPL Copyright (c) 2000, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database test; Query OK, 1 row affected (0.01 sec) mysql> use test Database changed mysql> CREATE TABLE product ( -> value int NOT NULL AUTO_INCREMENT, -> code varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL, -> name varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> comment varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> platform varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> KEY idx_key (value) -> ); Query OK, 0 rows affected (0.02 sec) mysql> delimiter // mysql> CREATE PROCEDURE BatchInsert(IN row_count int) -> BEGIN -> SET @n = 1; -> REPEAT -> SET @str = (CONCAT('test', CAST(@n AS CHAR))); -> INSERT INTO product(code, name) VALUES(@str, @str); -> SET @n = @n + 1; -> UNTIL @n > row_count -> END REPEAT; -> END; -> // Query OK, 0 rows affected (0.01 sec) mysql> delimiter ; mysql> CALL BatchInsert(20000); Query OK, 0 rows affected (51.29 sec) mysql> CALL BatchInsert(20000); Query OK, 0 rows affected (50.49 sec)
[9 Jan 2023 8:56]
MySQL Verification Team
- 5.7.40 performance much better compared to 8.0 bin/mysql -uroot -S /tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.7.40 MySQL Community Server (GPL) Copyright (c) 2000, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test Database changed mysql> CREATE TABLE product ( -> value int NOT NULL AUTO_INCREMENT, -> code varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL, -> name varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> comment varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> platform varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, -> KEY idx_key (value) -> ); Query OK, 0 rows affected (0.02 sec) mysql> delimiter // mysql> CREATE PROCEDURE BatchInsert(IN row_count int) -> BEGIN -> SET @n = 1; -> REPEAT -> SET @str = (CONCAT('test', CAST(@n AS CHAR))); -> INSERT INTO product(code, name) VALUES(@str, @str); -> SET @n = @n + 1; -> UNTIL @n > row_count -> END REPEAT; -> END; -> // Query OK, 0 rows affected (0.00 sec) mysql> delimiter ; mysql> CALL BatchInsert(20000); Query OK, 0 rows affected (23.62 sec)
[20 Oct 2023 10:09]
Dag Wanvik
Posted by developer: The below documents the effects of the proposed patch. Performance test run without the patch. As can be seen, the performance with the repro data runs much slower than the normal case. # # Bug#34959356: Poor performance when using HASH field to check unique # CREATE TABLE product ( value INT NOT NULL AUTO_INCREMENT, code VARCHAR(255), name VARCHAR(255), comment VARCHAR(255), platform VARCHAR(50), KEY idx_key (value) ); CREATE PROCEDURE BatchInsertShowIssue(IN row_count int) BEGIN START TRANSACTION; SET @n = 1; REPEAT SET @str = (CONCAT('test', CAST(@n AS CHAR))); INSERT INTO product(code, name) VALUES(@str, @str); SET @n = @n + 1; UNTIL @n > row_count END REPEAT; COMMIT; END; // CALL BatchInsertShowIssue(20000); EXPLAIN ANALYZE SELECT COUNT(*) FROM ( SELECT ANY_VALUE(value) FROM product GROUP BY code, name, comment, platform ) derived; EXPLAIN -> Aggregate: count(0) (cost=8505..8505 rows=1) (actual time=29836..29836 rows=1 loops=1) -> Table scan on derived (cost=6253..6505 rows=20000) (actual time=29833..29835 rows=20000 loops=1) -> Materialize (cost=6253..6253 rows=20000) (actual time=29833..29833 rows=20000 loops=1) -> Table scan on <temporary> (cost=4000..4253 rows=20000) (actual time=29827..29830 rows=20000 loops=1) -> Temporary table with deduplication (cost=4000..4000 rows=20000) (actual time=29827..29827 rows=20000 loops=1) -> Table scan on product (cost=2000 rows=20000) (actual time=0.0296..32.7 rows=20000 loops=1) TRUNCATE product; CREATE PROCEDURE BatchInsertShowNormal(IN row_count int) BEGIN START TRANSACTION; SET @n = 1; REPEAT SET @str1 = (CONCAT('test', CAST(@n AS CHAR))); SET @str2 = (CONCAT('tset', CAST(@n AS CHAR))); INSERT INTO product(code, name) VALUES(@str1, @str2); SET @n = @n + 1; UNTIL @n > row_count END REPEAT; COMMIT; END; // CALL BatchInsertShowNormal(20000); EXPLAIN ANALYZE SELECT COUNT(*) FROM ( SELECT ANY_VALUE(value) FROM product GROUP BY code, name, comment, platform ) derived; EXPLAIN -> Aggregate: count(0) (cost=8505..8505 rows=1) (actual time=42.1..42.1 rows=1 loops=1) -> Table scan on derived (cost=6253..6505 rows=20000) (actual time=38.4..41.1 rows=20000 loops=1) -> Materialize (cost=6253..6253 rows=20000) (actual time=38.3..38.3 rows=20000 loops=1) -> Table scan on <temporary> (cost=4000..4253 rows=20000) (actual time=33..36.3 rows=20000 loops=1) -> Temporary table with deduplication (cost=4000..4000 rows=20000) (actual time=33..33 rows=20000 loops=1) -> Table scan on product (cost=2000 rows=20000) (actual time=0.0253..14.3 rows=20000 loops=1) DROP PROCEDURE BatchInsertShowIssue; DROP PROCEDURE BatchInsertShowNormal; DROP TABLE product; Performance test run with the patch. As can be seen, the performance with the repro data runs as fast as the normal case, which ran fast even without the patch. # # Bug#34959356: Poor performance when using HASH field to check unique # CREATE TABLE product ( value INT NOT NULL AUTO_INCREMENT, code VARCHAR(255), name VARCHAR(255), comment VARCHAR(255), platform VARCHAR(50), KEY idx_key (value) ); CREATE PROCEDURE BatchInsertShowIssue(IN row_count int) BEGIN START TRANSACTION; SET @n = 1; REPEAT SET @str = (CONCAT('test', CAST(@n AS CHAR))); INSERT INTO product(code, name) VALUES(@str, @str); SET @n = @n + 1; UNTIL @n > row_count END REPEAT; COMMIT; END; // CALL BatchInsertShowIssue(20000); EXPLAIN ANALYZE SELECT COUNT(*) FROM ( SELECT ANY_VALUE(value) FROM product GROUP BY code, name, comment, platform ) derived; EXPLAIN -> Aggregate: count(0) (cost=8505..8505 rows=1) (actual time=42.1..42.1 rows=1 loops=1) -> Table scan on derived (cost=6253..6505 rows=20000) (actual time=38.4..41.1 rows=20000 loops=1) -> Materialize (cost=6253..6253 rows=20000) (actual time=38.4..38.4 rows=20000 loops=1) -> Table scan on <temporary> (cost=4000..4253 rows=20000) (actual time=33.1..36.4 rows=20000 loops=1) -> Temporary table with deduplication (cost=4000..4000 rows=20000) (actual time=33.1..33.1 rows=20000 loops=1) -> Table scan on product (cost=2000 rows=20000) (actual time=0.0232..14.3 rows=20000 loops=1) TRUNCATE product; CREATE PROCEDURE BatchInsertShowNormal(IN row_count int) BEGIN START TRANSACTION; SET @n = 1; REPEAT SET @str1 = (CONCAT('test', CAST(@n AS CHAR))); SET @str2 = (CONCAT('tset', CAST(@n AS CHAR))); INSERT INTO product(code, name) VALUES(@str1, @str2); SET @n = @n + 1; UNTIL @n > row_count END REPEAT; COMMIT; END; // CALL BatchInsertShowNormal(20000); EXPLAIN ANALYZE SELECT COUNT(*) FROM ( SELECT ANY_VALUE(value) FROM product GROUP BY code, name, comment, platform ) derived; EXPLAIN -> Aggregate: count(0) (cost=8505..8505 rows=1) (actual time=41.7..41.7 rows=1 loops=1) -> Table scan on derived (cost=6253..6505 rows=20000) (actual time=38..40.7 rows=20000 loops=1) -> Materialize (cost=6253..6253 rows=20000) (actual time=38..38 rows=20000 loops=1) -> Table scan on <temporary> (cost=4000..4253 rows=20000) (actual time=32.7..36 rows=20000 loops=1) -> Temporary table with deduplication (cost=4000..4000 rows=20000) (actual time=32.7..32.7 rows=20000 loops=1) -> Table scan on product (cost=2000 rows=20000) (actual time=0.0346..14.4 rows=20000 loops=1) DROP PROCEDURE BatchInsertShowIssue; DROP PROCEDURE BatchInsertShowNormal; DROP TABLE product;
[20 Dec 2023 12:33]
Jon Stephens
Documented fix as follows in the MySQL 8.0.36 and 8.3.0 changelogs: The hashing algorithm employed yielded poor performance when using a HASH field to check for uniqueness. Closed.