Bug #89101 MySQL server hang when gtid_mode=on and innodb_thread_concurrency>0
Submitted: 4 Jan 2018 1:52 Modified: 18 Jan 2018 13:02
Reporter: Seunguck Lee Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.7.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: GTID, hang, thread_concurrency

[4 Jan 2018 1:52] Seunguck Lee
Description:
MySQL server hang when gtid_mode=on and innodb_thread_concurrency=16.

At first everything is okay but mysql server hang suddenly in 10~20 seconds.
After server hang, I can not run SELECT query also.

mysql> show processlist;
+------+-----------+-------------------+-----------+---------+-------+-------------+-------------------------------------------------------+
| Id   | User      | Host              | db        | Command | Time  | State       | Info                                                  |
+------+-----------+-------------------+-----------+---------+-------+-------------+-------------------------------------------------------+
|    5 | mysqlslap | xx.xx.xx.xx:34500 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
|    6 | mysqlslap | xx.xx.xx.xx:34502 | mysqlslap | Query   |  2514 | query end   | update mysqlslap.mysqlslap set value='A' where id=@id |
|    7 | mysqlslap | xx.xx.xx.xx:34504 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
|    8 | mysqlslap | xx.xx.xx.xx:34506 | mysqlslap | Query   |  2514 | query end   | update mysqlslap.mysqlslap set value='A' where id=@id |
|    9 | mysqlslap | xx.xx.xx.xx:34508 | mysqlslap | Query   |  2514 | query end   | update mysqlslap.mysqlslap set value='A' where id=@id |
|   10 | mysqlslap | xx.xx.xx.xx:34510 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
|   11 | mysqlslap | xx.xx.xx.xx:34512 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
|   12 | mysqlslap | xx.xx.xx.xx:34514 | mysqlslap | Query   |  2514 | query end   | update mysqlslap.mysqlslap set value='A' where id=@id |
|   13 | mysqlslap | xx.xx.xx.xx:34516 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
|   14 | mysqlslap | xx.xx.xx.xx:34518 | mysqlslap | Query   |  2514 | updating    | update mysqlslap.mysqlslap set value='A' where id=@id |
...

mysql> select event_name, sum(timer_wait)/1000000000 from performance_schema.events_waits_current group by event_name order by sum(timer_wait) desc limit 10;
+----------------------------------------------+----------------------------+
| event_name                                   | sum(timer_wait)/1000000000 |
+----------------------------------------------+----------------------------+
| wait/io/table/sql/handler                    |             742480189.0890 |
| wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done |             563307295.4995 |
| wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log |                871394.3280 |
| idle                                         |                710250.7020 |
| wait/synch/cond/sql/COND_compress_gtid_table |                446463.0986 |
| wait/synch/cond/sql/COND_server_started      |                    11.7853 |
| wait/synch/mutex/innodb/recalc_pool_mutex    |                     0.0006 |
| wait/synch/mutex/innodb/flush_list_mutex     |                     0.0003 |
| wait/synch/mutex/innodb/log_sys_mutex        |                     0.0003 |
| wait/synch/mutex/innodb/lock_wait_mutex      |                     0.0002 |
+----------------------------------------------+----------------------------+

How to repeat:
mysql> create database mysqlslap;
mysql> create table mysqlslap(id bigint not null auto_increment primary key, value varchar(10));
mysql> insert into mysqlslap values (NULL, ''); -- Repeat 4 Million

mysql> set global gtid_mode=on;
mysql> set global innodb_thread_concurrency=16;

# ./mysqlslap -hlocalhost -umysqlslap --password=xxx --delimiter=";" \
  --concurrency=3000 --number-of-queries=1000000000 --iterations=1 \
  --query="set @id=floor(rand()*4000000);update mysqlslap.mysqlslap set value='A' where id=@id; "

Actually, I changed mysqlslap to sleep several milli-seconds between query execution.
This is only for reducing server query frequency. I am sure this change is not affected or cause this issue.

Suggested fix:
Fix deadlock on COND_done and LOCK_done.
[5 Jan 2018 15:47] Przemyslaw Malkowski
Easy to reproduce with the following settings:

server_id = 111
log_bin = binlog
max_connections                     = 4000

binlog_format                       = STATEMENT
enforce_gtid_consistency            = ON
gtid_mode                           = ON
innodb_thread_concurrency           = 16
[12 Jan 2018 17:57] MySQL Verification Team
Hi!

Four million rows ????

Have you tried to repeat the behaviour with the smaller number of rows. I ask that because this is, most likely, expected behaviour with STATEMENT mode. That mode will be deprecated , anyway .....
[15 Jan 2018 14:50] Seunguck Lee
Hi...

"4-Million rows" is just test data. I mean inserting 4 million rows before run test case.

Regards,
[15 Jan 2018 15:50] Przemyslaw Malkowski
mtr test case

Attachment: suite.opt (application/octet-stream, text), 230 bytes.

[15 Jan 2018 15:51] Przemyslaw Malkowski
mtr test

Attachment: mtr_test_89101.test (application/octet-stream, text), 740 bytes.

[15 Jan 2018 15:51] Przemyslaw Malkowski
Hello Sinisa,

Please note that this issue is not about server slowness, but about permanent, complete dead lock, where once the condition is entered, you won't be able to do a simple select from table at all.
 
This is an example state, long after actually mysqlslap process is gone:
mysql [localhost] {msandbox} (mysqlslap) > select * from information_schema.processlist order by time desc limit 10;
+------+----------+-----------+-----------+---------+------+----------+-------------------------------------------------------+
| ID   | USER     | HOST      | DB        | COMMAND | TIME | STATE    | INFO                                                  |
+------+----------+-----------+-----------+---------+------+----------+-------------------------------------------------------+
| 5639 | msandbox | localhost | mysqlslap | Query   | 5841 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3025 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3299 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3303 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3326 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3364 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3383 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3441 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3957 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
| 3785 | msandbox | localhost | mysqlslap | Query   | 5840 | updating | update mysqlslap.mysqlslap set value='A' where id=@id |
+------+----------+-----------+-----------+---------+------+----------+-------------------------------------------------------+
10 rows in set (0.02 sec)

Please see the mtr test case, which I hope should allow to reproduce it easily.
As the hung condition is entered after a while, sometimes 2 minutes, sometimes 10 minutes, please let the test run for a while and then watch the TIME column for the mysqlslap updates, once it goes >30 seconds, you are likely hitting the issue already.
[18 Jan 2018 13:02] MySQL Verification Team
Hi!

After several attempts , I have managed to verify your test case.

So, this bug is verified.