Bug #116492 | The utf8mb4 character set has worse performance | ||
---|---|---|---|
Submitted: | 28 Oct 2024 11:47 | Modified: | 6 Nov 2024 13:45 |
Reporter: | Xizhe Zhang (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S5 (Performance) |
Version: | 8.0, 8.0.40 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[28 Oct 2024 11:47]
Xizhe Zhang
[31 Oct 2024 14:50]
MySQL Verification Team
Hello Xizhe Zhang, Thank you for the report and test case. I quickly attempted to reproduce in 8.0.40 release build with default configuration but not seeing the reported issue. Please share exact version you have used, if it is a source build then cmake options used for the build along with configuration file if it is not on default settings. -- BugNumber=116492 rm -rf $BugNumber/ bin/mysqld --no-defaults --initialize-insecure --basedir=$PWD --datadir=$PWD/$BugNumber --log-error-verbosity=3 bin/mysqld_safe --no-defaults --mysqld-version='' --basedir=$PWD --datadir=$PWD/$BugNumber --core-file --socket=/tmp/mysql.sock --port=3306 --log-error=$PWD/$BugNumber/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3 --secure-file-priv="" --local-infile=1 2>&1 & cat 116492.pl #!/usr/bin/perl for(my $i=1; $i <= 1000000; $i++) { my $txt = 'abc' x 15; print "$txt\n"; } perl 116492.pl > 116492.txt head 116492.txt abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabcabcabcabcabcabcabcabc wc -l 116492.txt 1000000 116492.txt bin/mysql -uroot -S/tmp/mysql.sock Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.0.40 MySQL Community Server - GPL Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test; Database changed mysql> mysql> CREATE TABLE `t_mb3` ( -> `a` varchar(50) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3; Query OK, 0 rows affected, 3 warnings (0.02 sec) mysql> show warnings; +---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+ | Warning | 1287 | 'utf8mb3' is deprecated and will be removed in a future release. Please use utf8mb4 instead | | Warning | 3778 | 'utf8mb3_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead. | | Warning | 1287 | 'utf8mb3' is deprecated and will be removed in a future release. Please use utf8mb4 instead | +---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) mysql> mysql> CREATE TABLE `t_mb4` ( -> `a` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; Query OK, 0 rows affected (0.01 sec) mysql> show warnings; Empty set (0.00 sec) Database changed mysql> load data local infile '/export/home/tmp/ushastry/mysql-8.0.40/116492.txt' into table t_mb3 fields terminated by ','; Query OK, 1000000 rows affected (4.09 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0 mysql> load data local infile '/export/home/tmp/ushastry/mysql-8.0.40/116492.txt' into table t_mb4 fields terminated by ','; Query OK, 1000000 rows affected (5.00 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0 -- ran some 10+ times and it is same mysql> select count(*) from t_mb3 where a=repeat('abc',15); +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.69 sec) mysql> select count(*) from t_mb4 where a=repeat('abc',15); +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.64 sec) regards, Umesh
[1 Nov 2024 16:31]
Xizhe Zhang
Thank you for your prompt reply. I checked my cmake options and I think this problem should not be related to the cmake options. From the code's logic, it can be seen that utf8mb4_0900_ai_ci has more branches, so the executable binary file has more instructions, which is within expectations. I currently have two machines, one is the machine where the bug was first reported, using Intel Xeon Platinum 8163 CPU, base frequency 2.7GHz. My other machine uses Intel Xeon Platinum 8369B CPU, base frequency 3.5GHz. The second CPU is better and has a higher base frequency. I measured the results on this machine, and utf8mb4 is 20% worse instead of the previous 25%: mysql> select count(*) from t_mb3 where a=repeat('abc',15); +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.59 sec) mysql> select count(*) from t_mb4 where a=repeat('abc',15); +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (0.70 sec) So I think the difference in execution time is related to the Generation of the CPU or the base frequency. The better the CPU, the less obvious the performance regression caused by the number of instructions. But even so, I see that your results show that utf8mb3 performs better than utf8mb4, but the difference is smaller. I also used perf to check the hot spots during the statement execution, It can be clearly seen that utf8mb4 has higher CPU consumption: # utf8mb3 + 28.44% mysqld mysqld [.] my_strnncollsp_utf8mb3 + 10.53% mysqld mysqld [.] buf::Block_hint::buffer_fix_block_if_still_valid + 6.97% mysqld mysqld [.] row_search_mvcc + 3.28% mysqld mysqld [.] memo_slot_release # utf8mb4 + 46.09% mysqld mysqld [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4> + 8.10% mysqld mysqld [.] buf::Block_hint::buffer_fix_block_if_still_valid + 5.24% mysqld mysqld [.] row_search_mvcc + 2.69% mysqld mysqld [.] memo_slot_release
[6 Nov 2024 8:24]
MySQL Verification Team
Thank you for the details, let me re-run this and get back to you if anything further needed. regards, Umesh
[6 Nov 2024 13:45]
MySQL Verification Team
Thank you once again. Verified as described. regards, Umesh