Bug #90137 executed_gtid_set is empty after mysql crash
Submitted: 20 Mar 2018 10:40 Modified: 23 May 2018 1:24
Reporter: Prasad N Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.7.18 OS:CentOS (2.6.32-696.23.1.el6.x86_64 )
Assigned to: Bogdan Kecman CPU Architecture:x86
Tags: crash, execute_gtid_set, GTID

[20 Mar 2018 10:40] Prasad N
Description:
I did a kill-9 on mysql server, but when it came back up again - i found that it had lost track of its executed_gtid_set and it was empty.
I noticed it indirectly as my replication had failed with an error:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica'

Config and status details below:
mysql> show master status \G;
*************************** 1. row ***************************
             File: mysql-bin.000005
         Position: 310
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 9eed3f26-2bf7-11e8-93d9-000d3afd167e:1
1 row in set (0.00 sec)
+++PS: GTID of 1 above is due to single transaction i executed manually after the server had lost all its GTID_executed info. Looks like it has restarted from 1

+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000001 | 352921108 |
| mysql-bin.000002 |       123 |
| mysql-bin.000003 |       177 |
| mysql-bin.000004 |       177 |
| mysql-bin.000005 |       310 |
+------------------+-----------+
5 rows in set (0.00 sec)

mysql> show variables like 'gtid%';
+----------------------------------+-----------+
| Variable_name                    | Value     |
+----------------------------------+-----------+
| gtid_executed_compression_period | 1000      |
| gtid_mode                        | ON        |
| gtid_next                        | AUTOMATIC |
| gtid_owned                       |           |
| gtid_purged                      |           |
+----------------------------------+-----------+
5 rows in set (0.00 sec)

>>> my.cnf
----------
[mysqld]

port=3306
skip_name_resolve=1
bind_address=0.0.0.0
user=mysql
pid_file=/var/run/mysqld/mysqld.pid
socket=/var/lib/mysql/mysql.sock
server_id=19
log_bin=mysql-bin
expire_logs_days=21
sync_binlog=1
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=1
max_connect_errors=1000000
max_allowed_packet=16777216
max_heap_table_size=33554432
max_connections=500
thread_cache_size=50
open_files_limit=65535
table_open_cache=2048
table_definition_cache=2048
relay_log=relay-log-slave
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=MIXED
log_slave_updates=true
slave_net_timeout=60
master_info_repository=TABLE
relay_log_info_repository=TABLE
sync_master_info=10000
sync_relay_log=10000
relay_log_recovery=1
slave_parallel_workers=10
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
rpl_semi_sync_master_timeout=300000
datadir=/mysql_data/mysql
general_log=ON
general_log_file=/var/log/mysql/general.log
log_error=/var/log/mysql/mysqld.log
default_storage_engine=innodb
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_log_file_size=134217728
innodb_buffer_pool_size=1983905792
innodb_adaptive_hash_index=ON
innodb_lock_wait_timeout=50
log_queries_not_using_indexes=OFF
log_slow_admin_statements=OFF
log_throttle_queries_not_using_indexes=0
long_query_time=10
slow_query_log=ON
slow_query_log_file=/var/log/mysql//var/log/mysql/mysql-slowquery.log
symbolic_links=0
interactive_timeout=28800
div_precision_increment=4
sql_mode="ONLY_FULL_GROUP_BY,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
event_scheduler=OFF

mysqlbinlog output [ last few lines on mysql-bin.00001 file]
SET @@SESSION.GTID_NEXT= '9eed3f26-2bf7-11e8-93d9-000d3afd167e:36071'/*!*/;
# at 10485354
#180320  4:57:08 server id 19  end_log_pos 10485435 CRC32 0x78007681    Query   thread_id=520   exec_time=0     error_code=0
SET TIMESTAMP=1521521828/*!*/;
BEGIN
/*!*/;
# at 10485435
#180320  4:57:08 server id 19  end_log_pos 10485549 CRC32 0xb6590563    Query   thread_id=520   exec_time=0     error_code=0
SET TIMESTAMP=1521521828/*!*/;
INSERT INTO persons VALUES('9bd8',359)
/*!*/;
# at 10485549
#180320  4:57:08 server id 19  end_log_pos 10485580 CRC32 0xdf644fa8    Xid = 38698
COMMIT/*!*/;
# at 10485580
#180320  4:57:08 server id 19  end_log_pos 10485645 CRC32 0x0fa43ffd    GTID    last_committed=36017    sequence_number=36072
SET @@SESSION.GTID_NEXT= '9eed3f26-2bf7-11e8-93d9-000d3afd167e:36072'/*!*/;

How to repeat:
replica set with master and 2 slave , setup in semi-sync mode.
writes on the maser happening from 100 threads once every 100 ms.
kill -9 master mysql sesrver

I ran into this issue.

Suggested fix:
server should come back up and maintain the gtid_executed info.
[20 Mar 2018 10:49] Prasad N
mysqld.log

Attachment: mysqld.log (application/octet-stream, text), 68.06 KiB.

[20 Mar 2018 21:52] Bogdan Kecman
Hi,
Can you please share your config files on both master and slave?
I run a sample test and was unable to reproduce the problem so I'd like to try with your config to increase likelihood of reproducability.

thanks
Bogdan
[21 Mar 2018 8:28] Prasad N
Hi - Config files are same on both master and salve.
You may need to crash mysql few time before we can run into this issue.

Thanks
Prasad
[22 Mar 2018 21:45] Bogdan Kecman
Hi I spent day and a half and cannot reproduce this issue. If you had some crashes before maybe your data is corrupted? I did crashed mysql several times trying to reproduce this but had no luck. Please rebuild your tables to be sure your data is not currupted and try to reproduce then

all best
Bogdan
[23 Mar 2018 4:51] Prasad N
Hi - did you try with continuous loading on the mysql server ?
My data seems to be ok based on the random checks I did.
Will it help if I zip and share binary logs from my server ? Will it help to analyze further or debug ?

Thanks!
[23 Mar 2018 8:39] Prasad N
Probably it has got to do something with the behavior mentioned in the bug fix:https://bugs.mysql.com/bug.php?id=69097
Specifically:
[14 Aug 2014 16:32] David Moss
The following has been added to the 5.7.5 documentation with commit 39619:
simplified_binlog_gtid_recovery
By default, when MySQL recovers from a crash it iterates through binary log files searching for GTID events starting from the oldest file, which can take a long time if you have a large number of binary log files. By enabling this option, the newest binary log file is instead searched for GTID events. If a Previous_gtids_log_event and Gtid_log_event are found, gtid_executed and gtid_purged are initialized using these events as usual during recovery. If no GTID events are found, a second scan then searches the oldest binary log file for GTID events. If neither of these files contain the GTID events, no more binary log files are searched, and gtid_executed and gtid_purged are set to the empty string.
[23 Mar 2018 8:44] Prasad N
In my system, mysql-bin.000001 is where most of the transactions are logged. Looks like it got rotated to mysql-bin.000002 after the crash. There have been subsequent rotates as well as wee see below. Interesting point however is the size difference between the files mysql-bin.000002 and those that follow it.
Is it a case that mysql did not create the new binlog file correctly after the crash ?

-rw-r----- 1 mysql mysql 352921108 Mar 20 06:14 mysql-bin.000001
-rw-r----- 1 mysql mysql       123 Mar 20 06:15 mysql-bin.000002
-rw-r----- 1 mysql mysql       177 Mar 20 06:40 mysql-bin.000003
-rw-r----- 1 mysql mysql       177 Mar 20 06:45 mysql-bin.000004
[23 Mar 2018 8:49] Prasad N
I have pasted the output of 2nd and 3rd binlog files for comparison. Please investigate.

[root@vmadf0264117 mysql]# mysqlbinlog mysql-bin.000002
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#180320  6:14:02 server id 19  end_log_pos 123 CRC32 0x35bc7b93         Start: binlog v 4, server v 5.7.18-log created 180320  6:14:02 at startup
ROLLBACK/*!*/;
BINLOG '
qqawWg8TAAAAdwAAAHsAAAAAAAQANS43LjE4LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAACqprBaEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AZN7vDU=
'/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@vmadf0264117 mysql]#

[root@vmadf0264117 mysql]# mysqlbinlog mysql-bin.000003
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#180320  6:15:31 server id 19  end_log_pos 123 CRC32 0x2129daba         Start: binlog v 4, server v 5.7.18-log created 180320  6:15:31 at startup
ROLLBACK/*!*/;
BINLOG '
A6ewWg8TAAAAdwAAAHsAAAAAAAQANS43LjE4LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAADp7BaEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AbraKSE=
'/*!*/;
# at 123
#180320  6:15:31 server id 19  end_log_pos 154 CRC32 0x560f7b18         Previous-GTIDs
# [empty]
# at 154
#180320  6:40:05 server id 19  end_log_pos 177 CRC32 0x7c916cbe         Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@vmadf0264117 mysql]#
[23 Mar 2018 11:05] Bogdan Kecman
Hi,

> Hi - did you try with continuous loading on the mysql server ?

yes

> My data seems to be ok based on the random checks I did.

if data is ok then having previous crashes should not have any effect on anything as when you start mysqld everything is "new" except the data, data is old, so unless there's something wrong with data previous state is not really relevant. when I say data, I mean anything in data-files, having bin able to do a select does not mean all data files are ok.

> Will it help if I zip and share binary logs from my server ? 
> Will it help to analyze further or debug ?

you can send and I will test but I can't promise anything, it might for sure help, especially if it's related to this gtid search you mentioned.

thanks
Bogdan
[4 Apr 2018 10:26] Prasad N
I have uploaded my mysql data directory through sftp.
file name is : mysql-bug-data-90137

Please check and let me know.

Thanks
Prasad
[21 May 2018 22:51] Bogdan Kecman
Hi,

This took a while but I can't reproduce this.

Are you still hitting this issue?

all best
Bogdan
[23 May 2018 1:24] Prasad N
yes we still saw this issue recently as well.
this time the sequence was that mysql was crashed and started. it was taking a while to recover from the crash and there was a start attempted from another thread.
Attaching the log snippet.

Kindly investigate further.
[23 May 2018 1:27] Prasad N
mysqld logs having this issue

Attachment: mysqld.log (application/octet-stream, text), 9.50 KiB.