You have been subscribed for email updates to this bug report.
Bug #92133 DICT_SYS mutex contention causes complete stall when running with 40 mill tables
Submitted: 22 Aug 2018 17:43 Modified: 27 Oct 2018 15:02
Reporter: Alexander Rubin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Dictionary Severity:S5 (Performance)
Version:8.0.12 OS:Any
Assigned to: CPU Architecture:Any

[22 Aug 2018 17:43] Alexander Rubin
Description:
When running with millions of tables (and shared tablespaces) I can see the complete stall waiting on DICT_SYS mutex

Sysbench load:

[ 453s ] thds: 32 tps: 1203.96 qps: 1203.96 (r/w/o: 1203.96/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 454s ] thds: 32 tps: 1202.32 qps: 1202.32 (r/w/o: 1202.32/0.00/0.00) lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00
[ 455s ] thds: 32 tps: 1196.74 qps: 1196.74 (r/w/o: 1196.74/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 456s ] thds: 32 tps: 1197.18 qps: 1197.18 (r/w/o: 1197.18/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 457s ] thds: 32 tps: 887.11 qps: 887.11 (r/w/o: 887.11/0.00/0.00) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00
[ 458s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 459s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 460s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 461s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 462s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 463s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 464s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 465s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 466s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 467s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 468s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 469s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 470s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 471s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 472s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 473s ] thds: 32 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 474s ] thds: 32 tps: 403.96 qps: 403.96 (r/w/o: 403.96/0.00/0.00) lat (ms,95%): 16819.24 err/s: 0.00 reconn/s: 0.00
[ 475s ] thds: 32 tps: 1196.00 qps: 1196.00 (r/w/o: 1196.00/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 476s ] thds: 32 tps: 1208.96 qps: 1208.96 (r/w/o: 1208.96/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 477s ] thds: 32 tps: 1192.06 qps: 1192.06 (r/w/o: 1192.06/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 478s ] thds: 32 tps: 1173.89 qps: 1173.89 (r/w/o: 1173.89/0.00/0.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[ 479s ] thds: 32 tps: 1207.07 qps: 1207.07 (r/w/o: 1207.07/0.00/0.00) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00

Show engine innodb status shows:

----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2195106
--Thread 140451928655616 has waited at fts0opt.cc line 2703 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451898394368 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140457041856256 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451897509632 has waited at dict0dd.ic line 273 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451898689280 has waited at ha_innodb.cc line 5814 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140456571414272 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140456571119360 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140457043920640 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451896329984 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451898099456 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140457040971520 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451897214720 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140456572004096 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140457043035904 has waited at ha_innodb.cc line 6080 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

--Thread 140451899279104 has waited at ha_innodb.cc line 5814 for 13.00 seconds the semaphore:
Mutex at 0x7fbeb83e6bd8, Mutex DICT_SYS created dict0dict.cc:1019, lock var 1

How to repeat:
Create 1 million to 40 million tables with MySQL 8.0.12.
Script to create 40M tables in parallel:
============================
#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; 
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done
============================

Run sysbench:
============================
function run_sb() {
conn=" --db-driver=mysql --mysql-socket=/var/lib/mysql-data/mysql.sock  --mysql-db=sbtest_1 --mysql-user=sbtest --mysql-password=abc "
sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt

}

let db_count=400000
table_count=100
max_time=10000
num_threads=32
run_sb 
============================
[29 Aug 2018 13:46] Bogdan Kecman
Verified but I changed it to S5 (performance).

Thanks for the report
Bogdan
[1 Oct 2018 21:52] Charles Thompson
Any update on this? This is a very concerning issue for my company as we have millions of tables.
[24 Oct 2018 19:39] Alexander Rubin
Scripts to reproduce:

1. Create 20K+ tables first

#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; 
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done

2. Run this script:

##### shel script, sysbench_mysql.sh
#####

function run_sb() {
        conn=" --db-driver=mysql --mysql-socket=/var/run/mysqld/mysqld.sock  --mysql-db=mysql --mysql-user=sbtest --mysql-password=abc "
        sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt
}
let db_count=70000
table_count=1000
max_time=10000
num_threads=32
run_sb

#### select_custom.lua
####

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"
if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end
function thread_init(thread_id)
   set_vars()
end
function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   oltp_db_count = tonumber(oltp_db_count) or 1
   -- local oltp_db_count = 4
   table_name = "sbtest_" .. sb_rand(100, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
      rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")
end
[24 Oct 2018 19:40] Alexander Rubin
Scripts to reproduce:

1. Create 20K+ tables first

#/bin/bash
function do_db {
        db_exist=$(mysql -A -s -Nbe "select 1 from information_schema.schemata where schema_name = '$db'")
        if [ "$db_exist" == "1" ]; then echo "Already exists $db"; return 0; fi; 
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..4000000}
do
        for i in {1..40}
        do
                let c=$c+1
                echo $c
                db="sbtest_$c"
                do_db &
        done
        wait
        #if [ $c > 4000000 ]; then exit; fi
done

2. Run this script:

##### shel script, sysbench_mysql.sh
#####

function run_sb() {
        conn=" --db-driver=mysql --mysql-socket=/var/run/mysqld/mysqld.sock  --mysql-db=mysql --mysql-user=sbtest --mysql-password=abc "
        sysbench $conn --oltp_db_count=$db_count --oltp_tables_count=$table_count --oltp-table-size=10000 --report-interval=1 --num-threads=$num_threads --max-requests=0 --max-time=$max_time ./select_custom.lua run | tee -a sysbench_2.txt
}
let db_count=70000
table_count=1000
max_time=10000
num_threads=32
run_sb

#### select_custom.lua
####

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"
if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end
function thread_init(thread_id)
   set_vars()
end
function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   oltp_db_count = tonumber(oltp_db_count) or 1
   -- local oltp_db_count = 4
   table_name = "sbtest_" .. sb_rand(100, oltp_db_count)..".sbtest".. sb_rand(1, oltp_tables_count)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
      rs = db_query("SELECT id FROM " .. table_name .." LIMIT 1")
end
[27 Oct 2018 15:02] Alexander Rubin
perf top with mysql-debug:

  24.41%  mysqld-debug         [.] _Z23dict_table_add_to_cacheP12dict_table_tmP16mem_block_info_t                                                                                                          
  15.13%  mysqld-debug         [.] _ZNK15CheckInFreeListclEPK10buf_page_t                                                                                                                                  
  12.59%  mysqld-debug         [.] _Z16ut_list_validateI12ut_list_baseI10buf_page_tMS1_12ut_list_nodeIS1_EE15CheckInFreeListEvRKT_RT0_