Bug #98493 | Server stalls and block connections with binlog_gtid_simple_recovery=1 | ||
---|---|---|---|
Submitted: | 5 Feb 2020 18:00 | Modified: | 20 Feb 2020 19:01 |
Reporter: | Bernardo Perez | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.7.26 | OS: | Linux |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[5 Feb 2020 18:00]
Bernardo Perez
[7 Feb 2020 18:19]
Bernardo Perez
Apologies, I just realized that I did explained something wrong for the repro of the bug. The parameter binlog_gtid_simple_recovery has to be set to 0. binlog_gtid_simple_recovery=0 is key to reproduce the issue.
[13 Feb 2020 0:07]
MySQL Verification Team
Hi Bernardo, I am having issues reproducing this with 5.7.29 both with binlog_gtid_simple_recovery=1 and binlog_gtid_simple_recovery=0. Can you give me more details on how to reproduce and maybe your complete config file. Thanks
[13 Feb 2020 14:32]
Bernardo Perez
Hello Bogdan, of course happy to help. I've uploaded the configuration file. Also to make emphasis as I created the bug with the wrong description, the parameter binlog_gtid_simple_recovery=0 is key to reproduce the issue. Regards,
[13 Feb 2020 14:41]
Bernardo Perez
A bit more detailed description of how this happened. We saw this in 5.7.26. We just found the issue after a very large amount of binary logs where piling up (600GB). This was the master and had 5 slaves. At some point the master crashed and it was never able to come back online allowing new connections. The process was up but everything was extremely slow. At the same time that the "purge binary log to" command (try to purge a good set of them) we had the queries against the system tables that you can see on the creation description and it felt that the more queries the worse things where getting. SHOW SLAVE STATUS SHOW SLAVE STATUS SELECT @@GLOBAL.performance_schema PURGE BINARY LOGS TO 'mysql-bin-changelog.042856' SHOW SLAVE STATUS SHOW SLAVE STATUS SHOW GLOBAL VARIABLES SELECT t.table_schema, t.table_name, column_name, `auto_increment`, pow(2, case data_type when 'tinyint' then 7 when 'smallint' then 15 when 'mediumint' then 23 when 'int' then 31 when 'bigint' then 63 end+(column_type like '% unsigned'))-1 as max_int FROM information_schema.columns c STRAIGHT_JOIN information_schema.tables t ON BINARY t.table_schema = c.table_schema AND BINARY t.table_name = c.table_name WHERE c.extra = 'auto_increment' AND t.auto_increment IS NOT NULL The database would come to normal operation if we would change binlog_gtid_simple_recovery from 0 to 1. When set to 1 it was performing correctly and problem was gone.
[13 Feb 2020 14:42]
Bernardo Perez
Forgot to mention that the replicas had different lags so we could restore from different times.
[13 Feb 2020 15:37]
MySQL Verification Team
Hi Bernardo, I understand how it happened to you but I'm trying to reproduce the same scenario and I'm not managing to do so. It always works ok for me.