Bug #83319 Performance regression when changing character set to utf8mb4
Submitted: 10 Oct 2016 12:44 Modified: 14 Nov 2016 18:09
Reporter: Steinar Gunderson Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0.1 OS:Any
Assigned to: CPU Architecture:Any

[10 Oct 2016 12:44] Steinar Gunderson
When changing default character set from latin1 to utf8mb4 (which also includes collation), sysbench TPS drops by 90% (approx. 10000 -> 1000 tps on a regular desktop machine).

How to repeat:
(These instructions are based on investigations from Didrik)

You can change default character set in mysql-test/include/default_mysqld.cnf and then start a server using mtr --start.

+init_connect='SET collation_connection = utf8mb4_0900_ai_ci'
+init_connect='SET NAMES utf8'
+# default-character-set=utf8
+# character-set-server=utf8
+# collation-server=utf8_general_ci

Script for running sysbench follows:




sysbargs="--mysql-table-engine=innodb --test=oltp --oltp-test-mode=complex
--oltp-read-only=on --oltp-auto-inc=off --mysql-user=root
--oltp-table-size=100000 --oltp-dist-type=uniform --oltp-skip-trx=on

echo "CREATE DATABASE sbtest;" | client/mysql -u root -S $sock1
sysbench prepare --test=oltp --mysql-table-engine=innodb --oltp-num-tables=16
--oltp-table-size=100000 --myisam-max-rows=100000 --mysql-user=root

# --oltp-simple-ranges=0
# --oltp-distinct-ranges=0

sysbench run --num-threads=16 --max-time=$maxtime --max-requests=0 $sysbargs
--oltp-point-selects=0 --oltp-simple-ranges=0 --oltp-sum-ranges=0
--oltp-range-size=100 --oltp-simple-ranges=0 --mysql-socket=$sock1

Suggested fix:
TBD. It is certainly collation-related, but most likely, we need fixes in multiple places.
[14 Nov 2016 18:09] Paul Dubois
Posted by developer:
Noted in 8.0.1 changelog.

Performance of UCA 9.0.0-based collations (for example,
utf8mb4_0900_ai_ci) was improved.
[29 Nov 2016 15:45] Steinar Gunderson
Posted by developer:
Didrik pointed out we should also document the change of the max_length_for_sort_data from 1024 to 4096.
[31 Mar 2017 13:27] Paul Dubois
Posted by developer:
This change was reverted due to other 8.0.1 work that changed Unicode 9.0.0 collations from PAD SPACE to NO PAD. Consequently, these collations treat space like any other character.
[3 Apr 2017 10:03] Paul Dubois
Posted by developer:
Not all of this bug fix was reverted, so this part of the changelog entry still applies:

Performance of UCA 9.0.0-based collations (for example,
utf8mb4_0900_ai_ci) was improved. These collations are now faster
than any other UCA collations.
[5 Jul 2017 15:37] Paul Dubois
Posted by developer:
Addition to changelog entry:

Additionally, the max_length_for_sort_data system variable default
value was increased from 1024 to 4096.