Bug #92133 | DICT_SYS mutex contention causes complete stall when running with 40 mill tables | ||
---|---|---|---|
Submitted: | 22 Aug 2018 17:43 | Modified: | 27 Oct 2018 15:02 |
Reporter: | Alexander Rubin | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Data Dictionary | Severity: | S5 (Performance) |
Version: | 8.0.12 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Aug 2018 17:43]
Alexander Rubin
[29 Aug 2018 13:46]
MySQL Verification Team
Verified but I changed it to S5 (performance). Thanks for the report Bogdan
[1 Oct 2018 21:52]
Charles Thompson
Any update on this? This is a very concerning issue for my company as we have millions of tables.
[24 Oct 2018 19:39]
Alexander Rubin
Scripts to reproduce: 1. Create 20K+ tables first #/bin/bash function do_db { db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'") if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; mysql -vvv -e "create database $db"; mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;" for i in {1..100} do table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;" mysql $db -e "$table" done } c=0 for m in {1..4000000} do for i in {1..40} do let c=$c+1 echo $c db="sbtest_$c" do_db & done wait #if [ $c > 4000000 ]; then exit; fi done 2. Run this script: ##### shel script, sysbench_mysql.sh ##### function run_sb() { conn=" --db-driver=mysql --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-db=mysql --mysql-user=sbtest --mysql-password=abc " sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt } let db_count=70000 table_count=1000 max_time=10000 num_threads=32 run_sb #### select_custom.lua #### pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/" if pathtest then dofile(pathtest .. "common.lua") else require("common") end function thread_init(thread_id) set_vars() end function event() local table_name local i local c_val local k_val local pad_val oltp_db_count = tonumber(oltp_db_count) or 1 -- local oltp_db_count = 4 table_name = "sbtest_" .. sb_rand(100, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count) k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1") end
[24 Oct 2018 19:40]
Alexander Rubin
Scripts to reproduce: 1. Create 20K+ tables first #/bin/bash function do_db { db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'") if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; mysql -vvv -e "create database $db"; mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;" for i in {1..100} do table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;" mysql $db -e "$table" done } c=0 for m in {1..4000000} do for i in {1..40} do let c=$c+1 echo $c db="sbtest_$c" do_db & done wait #if [ $c > 4000000 ]; then exit; fi done 2. Run this script: ##### shel script, sysbench_mysql.sh ##### function run_sb() { conn=" --db-driver=mysql --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-db=mysql --mysql-user=sbtest --mysql-password=abc " sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt } let db_count=70000 table_count=1000 max_time=10000 num_threads=32 run_sb #### select_custom.lua #### pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/" if pathtest then dofile(pathtest .. "common.lua") else require("common") end function thread_init(thread_id) set_vars() end function event() local table_name local i local c_val local k_val local pad_val oltp_db_count = tonumber(oltp_db_count) or 1 -- local oltp_db_count = 4 table_name = "sbtest_" .. sb_rand(100, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count) k_val = sb_rand(1, oltp_table_size) c_val = sb_rand_str([[ ###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]]) pad_val = sb_rand_str([[ ###########-###########-###########-###########-###########]]) rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1") end
[27 Oct 2018 15:02]
Alexander Rubin
perf top with mysql-debug: 24.41% mysqld-debug [.] _Z23dict_table_add_to_cacheP12dict_table_tmP16mem_block_info_t 15.13% mysqld-debug [.] _ZNK15CheckInFreeListclEPK10buf_page_t 12.59% mysqld-debug [.] _Z16ut_list_validateI12ut_list_baseI10buf_page_tMS1_12ut_list_nodeIS1_EE15CheckInFreeListEvRKT_RT0_
[17 Aug 2022 8:36]
Rajesh Chiniwar
is there any updated on the bug, looks like issue is still present in latest versions of mysql
[18 Nov 2022 15:26]
Robert Nix
I encountered this bug in a production percona cluster running 8.0.25-15. After some investigation using the innodb status output, I came across this bug and corresponding blog post. With that I was able to create a local testcase that reproduced the issue fairly reliably, and then I did some profiling to figure out where the mutex holder thread was spending time. I left some more details on percona's github at the time: https://github.com/percona/percona-server/pull/4825 But, since this issue is present in upstream MySQL, I'm bringing it here. The caveat I have is that I have not yet tried reproducing this with latest MySQL, but the code I identified as causing the performance regression is still present in tip. 'LatchCounter uses a std::vector for storing registered counters, which leads to n^2 behavior during dict_make_room_in_cache; this can hold the dict_sys->mutex for multiple seconds at a time in the case where a very large number of innodb tables (200,000+) have been opened since the last cleaning round.' The 'n^2 behavior' is during deregistering a LatchCounter associated with a table, which happens when freeing a table entry: storage/innobase/include/sync0types.h:700: m_mutex.enter(); m_counters.erase(std::remove(m_counters.begin(), m_counters.end(), count), m_counters.end()); m_mutex.exit(); My 'fix' at the time was to simply use unordered_set instead; this prevented the stalls in testing, but it seemed to have some performance impact. My measurements weren't scientific enough for me to say whether that impact was significant.
[24 Nov 2022 14:33]
Robert Nix
Use unordered_set for LatchCounter storage (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Use-unordered_set-for-LatchCounter-storage.patch (text/x-patch), 2.28 KiB.
[24 Nov 2022 15:40]
Robert Nix
After further performance testing, I have concluded that the impact of using vector instead of unordered_set for LatchCounter storage is not significant. The change to unordered_set reduces the stall duration from ~20s to ~2s in my test case: https://robert-nix.github.io/mysql-loadgen/stall-measurement.html This change does have a visible effect on program behavior: the order of mutex instances from the same source location in the output of SHOW ENGINE INNODB MUTEX is no longer the order of mutex creation and is instead 'randomized' by the hashmap iteration. I did try an alternative structure which preserves order and has similar memory overhead to the hashmap, a boost::intrusive::list, but I think the added complexity isn't worth the order preservation in this case. From my observations, in practice, most of these table mutexes have no spins/waits and so won't show up in the output, and the order of mutex creation is not a useful signal since no other metadata is printed alongside the latch status. I have also done some profiling during the new 2 seconds of stall, and I can't see any other 'easy' ways to reduce the stall duration.