Bug #98493 Server stalls and block connections with binlog_gtid_simple_recovery=1
Submitted: 5 Feb 2020 18:00 Modified: 20 Feb 2020 19:01
Reporter: Bernardo Perez Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.7.26 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any

[5 Feb 2020 18:00] Bernardo Perez
Description:
Hello,

we could observe that in 5.7.26 the server would block connections and the performance will degrade to be nearly non operative when the following conditions happened:

binlog_gtid_simple_recovery=1 
large amount of binlogs to purge (600GB)
purge binary log command come into the server

The issue presented itself right after a reboot. We initially thought it was related to https://bugs.mysql.com/bug.php?id=91548 but this bug is fixed for 5.7.25 as we had also some queries against system tables.

Once the purge command would complete, the connections would be established (if not timed out by the client) and the queries will complete until a new purge binary logs will come into the server

Disabling the parameter removes the problem immediately. 

How to repeat:
- Generate a large amount of binlogs.
- Set binlog_gtid_simple_recovery=1 
- run PURGE BINARY LOGS TO 
- Pile up a bunch of show slave status and queries to the system tables like the ones below.

mysql> select * from information_schema.processlist;
+----+----------+--------------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER     | HOST               | DB   | COMMAND | TIME | STATE          | INFO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+----+----------+--------------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 17 | admin | localhost          | NULL | Query   |    0 | executing      | select * from information_schema.processlist                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| 25 | pmm      | 10.32.81.205:49549 | NULL | Query   |  423 | starting       | SHOW SLAVE STATUS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| 26 | pmm      | 10.32.81.205:49551 | NULL | Query   |  418 | starting       | SHOW SLAVE STATUS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|  8 | admin | localhost          | NULL | Query   |  350 | starting       | SELECT @@GLOBAL.performance_schema                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|  7 | admin | localhost:53144    | NULL | Query   |  435 | starting       | PURGE BINARY LOGS TO 'mysql-bin-changelog.042856'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| 24 | pmm      | 10.32.81.205:49548 | NULL | Query   |  433 | starting       | SHOW SLAVE STATUS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| 27 | pmm      | 10.32.81.205:49554 | NULL | Query   |  428 | starting       | SHOW SLAVE STATUS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| 29 | pmm      | 10.32.81.205:49562 | NULL | Query   |  415 | Sending data   | SHOW GLOBAL VARIABLES                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| 30 | pmm      | 10.32.81.205:49563 | NULL | Query   |  415 | Opening tables | SELECT t.table_schema, t.table_name, column_name, `auto_increment`,
		  pow(2, case data_type
		    when 'tinyint'   then 7
		    when 'smallint'  then 15
		    when 'mediumint' then 23
		    when 'int'       then 31
		    when 'bigint'    then 63
		    end+(column_type like '% unsigned'))-1 as max_int
		  FROM information_schema.columns c
		  STRAIGHT_JOIN information_schema.tables t
		    ON BINARY t.table_schema = c.table_schema AND BINARY t.table_name = c.table_name
		  WHERE c.extra = 'auto_increment' AND t.auto_increment IS NOT NULL |
+----+----------+--------------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)
[7 Feb 2020 18:19] Bernardo Perez
Apologies, I just realized that I did explained something wrong for the repro of the bug.

The parameter binlog_gtid_simple_recovery has to be set to 0.

binlog_gtid_simple_recovery=0 is key to reproduce the issue.
[13 Feb 2020 0:07] MySQL Verification Team
Hi Bernardo,

I am having issues reproducing this with 5.7.29 both with binlog_gtid_simple_recovery=1  and binlog_gtid_simple_recovery=0.

Can you give me more details on how to reproduce and maybe your complete config file.

Thanks
[13 Feb 2020 14:32] Bernardo Perez
Hello Bogdan, of course happy to help.

I've uploaded the configuration file.

Also to make emphasis as I created the bug with the wrong description, the parameter binlog_gtid_simple_recovery=0 is key to reproduce the issue.

Regards,
[13 Feb 2020 14:41] Bernardo Perez
A bit more detailed description of how this happened. 

We saw this in 5.7.26.
We just found the issue after a very large amount of binary logs where piling up (600GB). 
This was the master and had 5 slaves.
At some point the master crashed and it was never able to come back online allowing new connections. The process was up but everything was extremely slow.

At the same time that the "purge binary log to" command (try to purge a good set of them) we had the queries against the system tables that you can see on the creation description and it felt that the more queries the worse things where getting.

SHOW SLAVE STATUS
SHOW SLAVE STATUS
SELECT @@GLOBAL.performance_schema  
PURGE BINARY LOGS TO 'mysql-bin-changelog.042856' 
SHOW SLAVE STATUS                                                                                                                                                                                                                                                         
SHOW SLAVE STATUS                                                                                                                                                                                                                                                         
SHOW GLOBAL VARIABLES                                                                                                                                                                                                                                                      
SELECT t.table_schema, t.table_name, column_name, `auto_increment`,
		  pow(2, case data_type
		    when 'tinyint'   then 7
		    when 'smallint'  then 15
		    when 'mediumint' then 23
		    when 'int'       then 31
		    when 'bigint'    then 63
		    end+(column_type like '% unsigned'))-1 as max_int
		  FROM information_schema.columns c
		  STRAIGHT_JOIN information_schema.tables t
		    ON BINARY t.table_schema = c.table_schema AND BINARY t.table_name = c.table_name
		  WHERE c.extra = 'auto_increment' AND t.auto_increment IS NOT NULL 

The database would come to normal operation if we would change binlog_gtid_simple_recovery from 0 to 1.

When set to 1 it was performing correctly and problem was gone.
[13 Feb 2020 14:42] Bernardo Perez
Forgot to mention that the replicas had different lags so we could restore from different times.
[13 Feb 2020 15:37] MySQL Verification Team
Hi Bernardo,

I understand how it happened to you but I'm trying to reproduce the same scenario and I'm not managing to do so. It always works ok for me.