| Bug #90137 | executed_gtid_set is empty after mysql crash | ||
|---|---|---|---|
| Submitted: | 20 Mar 2018 10:40 | Modified: | 23 May 2018 1:24 |
| Reporter: | Prasad N | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
| Version: | 5.7.18 | OS: | CentOS (2.6.32-696.23.1.el6.x86_64 ) |
| Assigned to: | MySQL Verification Team | CPU Architecture: | x86 |
| Tags: | crash, execute_gtid_set, GTID | ||
[20 Mar 2018 10:49]
Prasad N
mysqld.log
Attachment: mysqld.log (application/octet-stream, text), 68.06 KiB.
[20 Mar 2018 21:52]
MySQL Verification Team
Hi, Can you please share your config files on both master and slave? I run a sample test and was unable to reproduce the problem so I'd like to try with your config to increase likelihood of reproducability. thanks Bogdan
[21 Mar 2018 8:28]
Prasad N
Hi - Config files are same on both master and salve. You may need to crash mysql few time before we can run into this issue. Thanks Prasad
[22 Mar 2018 21:45]
MySQL Verification Team
Hi I spent day and a half and cannot reproduce this issue. If you had some crashes before maybe your data is corrupted? I did crashed mysql several times trying to reproduce this but had no luck. Please rebuild your tables to be sure your data is not currupted and try to reproduce then all best Bogdan
[23 Mar 2018 4:51]
Prasad N
Hi - did you try with continuous loading on the mysql server ? My data seems to be ok based on the random checks I did. Will it help if I zip and share binary logs from my server ? Will it help to analyze further or debug ? Thanks!
[23 Mar 2018 8:39]
Prasad N
Probably it has got to do something with the behavior mentioned in the bug fix:https://bugs.mysql.com/bug.php?id=69097 Specifically: [14 Aug 2014 16:32] David Moss The following has been added to the 5.7.5 documentation with commit 39619: simplified_binlog_gtid_recovery By default, when MySQL recovers from a crash it iterates through binary log files searching for GTID events starting from the oldest file, which can take a long time if you have a large number of binary log files. By enabling this option, the newest binary log file is instead searched for GTID events. If a Previous_gtids_log_event and Gtid_log_event are found, gtid_executed and gtid_purged are initialized using these events as usual during recovery. If no GTID events are found, a second scan then searches the oldest binary log file for GTID events. If neither of these files contain the GTID events, no more binary log files are searched, and gtid_executed and gtid_purged are set to the empty string.
[23 Mar 2018 8:44]
Prasad N
In my system, mysql-bin.000001 is where most of the transactions are logged. Looks like it got rotated to mysql-bin.000002 after the crash. There have been subsequent rotates as well as wee see below. Interesting point however is the size difference between the files mysql-bin.000002 and those that follow it. Is it a case that mysql did not create the new binlog file correctly after the crash ? -rw-r----- 1 mysql mysql 352921108 Mar 20 06:14 mysql-bin.000001 -rw-r----- 1 mysql mysql 123 Mar 20 06:15 mysql-bin.000002 -rw-r----- 1 mysql mysql 177 Mar 20 06:40 mysql-bin.000003 -rw-r----- 1 mysql mysql 177 Mar 20 06:45 mysql-bin.000004
[23 Mar 2018 8:49]
Prasad N
I have pasted the output of 2nd and 3rd binlog files for comparison. Please investigate. [root@vmadf0264117 mysql]# mysqlbinlog mysql-bin.000002 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #180320 6:14:02 server id 19 end_log_pos 123 CRC32 0x35bc7b93 Start: binlog v 4, server v 5.7.18-log created 180320 6:14:02 at startup ROLLBACK/*!*/; BINLOG ' qqawWg8TAAAAdwAAAHsAAAAAAAQANS43LjE4LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAACqprBaEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AZN7vDU= '/*!*/; SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@vmadf0264117 mysql]# [root@vmadf0264117 mysql]# mysqlbinlog mysql-bin.000003 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #180320 6:15:31 server id 19 end_log_pos 123 CRC32 0x2129daba Start: binlog v 4, server v 5.7.18-log created 180320 6:15:31 at startup ROLLBACK/*!*/; BINLOG ' A6ewWg8TAAAAdwAAAHsAAAAAAAQANS43LjE4LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAADp7BaEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AbraKSE= '/*!*/; # at 123 #180320 6:15:31 server id 19 end_log_pos 154 CRC32 0x560f7b18 Previous-GTIDs # [empty] # at 154 #180320 6:40:05 server id 19 end_log_pos 177 CRC32 0x7c916cbe Stop SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; [root@vmadf0264117 mysql]#
[23 Mar 2018 11:05]
MySQL Verification Team
Hi, > Hi - did you try with continuous loading on the mysql server ? yes > My data seems to be ok based on the random checks I did. if data is ok then having previous crashes should not have any effect on anything as when you start mysqld everything is "new" except the data, data is old, so unless there's something wrong with data previous state is not really relevant. when I say data, I mean anything in data-files, having bin able to do a select does not mean all data files are ok. > Will it help if I zip and share binary logs from my server ? > Will it help to analyze further or debug ? you can send and I will test but I can't promise anything, it might for sure help, especially if it's related to this gtid search you mentioned. thanks Bogdan
[4 Apr 2018 10:26]
Prasad N
I have uploaded my mysql data directory through sftp. file name is : mysql-bug-data-90137 Please check and let me know. Thanks Prasad
[21 May 2018 22:51]
MySQL Verification Team
Hi, This took a while but I can't reproduce this. Are you still hitting this issue? all best Bogdan
[23 May 2018 1:24]
Prasad N
yes we still saw this issue recently as well. this time the sequence was that mysql was crashed and started. it was taking a while to recover from the crash and there was a start attempted from another thread. Attaching the log snippet. Kindly investigate further.
[23 May 2018 1:27]
Prasad N
mysqld logs having this issue
Attachment: mysqld.log (application/octet-stream, text), 9.50 KiB.

Description: I did a kill-9 on mysql server, but when it came back up again - i found that it had lost track of its executed_gtid_set and it was empty. I noticed it indirectly as my replication had failed with an error: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replica' Config and status details below: mysql> show master status \G; *************************** 1. row *************************** File: mysql-bin.000005 Position: 310 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 9eed3f26-2bf7-11e8-93d9-000d3afd167e:1 1 row in set (0.00 sec) +++PS: GTID of 1 above is due to single transaction i executed manually after the server had lost all its GTID_executed info. Looks like it has restarted from 1 +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000001 | 352921108 | | mysql-bin.000002 | 123 | | mysql-bin.000003 | 177 | | mysql-bin.000004 | 177 | | mysql-bin.000005 | 310 | +------------------+-----------+ 5 rows in set (0.00 sec) mysql> show variables like 'gtid%'; +----------------------------------+-----------+ | Variable_name | Value | +----------------------------------+-----------+ | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_next | AUTOMATIC | | gtid_owned | | | gtid_purged | | +----------------------------------+-----------+ 5 rows in set (0.00 sec) >>> my.cnf ---------- [mysqld] port=3306 skip_name_resolve=1 bind_address=0.0.0.0 user=mysql pid_file=/var/run/mysqld/mysqld.pid socket=/var/lib/mysql/mysql.sock server_id=19 log_bin=mysql-bin expire_logs_days=21 sync_binlog=1 innodb_log_files_in_group=2 innodb_flush_log_at_trx_commit=1 max_connect_errors=1000000 max_allowed_packet=16777216 max_heap_table_size=33554432 max_connections=500 thread_cache_size=50 open_files_limit=65535 table_open_cache=2048 table_definition_cache=2048 relay_log=relay-log-slave gtid_mode=ON enforce_gtid_consistency=ON binlog_format=MIXED log_slave_updates=true slave_net_timeout=60 master_info_repository=TABLE relay_log_info_repository=TABLE sync_master_info=10000 sync_relay_log=10000 relay_log_recovery=1 slave_parallel_workers=10 slave_preserve_commit_order=1 slave_parallel_type=LOGICAL_CLOCK rpl_semi_sync_master_timeout=300000 datadir=/mysql_data/mysql general_log=ON general_log_file=/var/log/mysql/general.log log_error=/var/log/mysql/mysqld.log default_storage_engine=innodb innodb_flush_method=O_DIRECT innodb_file_per_table=ON innodb_log_file_size=134217728 innodb_buffer_pool_size=1983905792 innodb_adaptive_hash_index=ON innodb_lock_wait_timeout=50 log_queries_not_using_indexes=OFF log_slow_admin_statements=OFF log_throttle_queries_not_using_indexes=0 long_query_time=10 slow_query_log=ON slow_query_log_file=/var/log/mysql//var/log/mysql/mysql-slowquery.log symbolic_links=0 interactive_timeout=28800 div_precision_increment=4 sql_mode="ONLY_FULL_GROUP_BY,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" event_scheduler=OFF mysqlbinlog output [ last few lines on mysql-bin.00001 file] SET @@SESSION.GTID_NEXT= '9eed3f26-2bf7-11e8-93d9-000d3afd167e:36071'/*!*/; # at 10485354 #180320 4:57:08 server id 19 end_log_pos 10485435 CRC32 0x78007681 Query thread_id=520 exec_time=0 error_code=0 SET TIMESTAMP=1521521828/*!*/; BEGIN /*!*/; # at 10485435 #180320 4:57:08 server id 19 end_log_pos 10485549 CRC32 0xb6590563 Query thread_id=520 exec_time=0 error_code=0 SET TIMESTAMP=1521521828/*!*/; INSERT INTO persons VALUES('9bd8',359) /*!*/; # at 10485549 #180320 4:57:08 server id 19 end_log_pos 10485580 CRC32 0xdf644fa8 Xid = 38698 COMMIT/*!*/; # at 10485580 #180320 4:57:08 server id 19 end_log_pos 10485645 CRC32 0x0fa43ffd GTID last_committed=36017 sequence_number=36072 SET @@SESSION.GTID_NEXT= '9eed3f26-2bf7-11e8-93d9-000d3afd167e:36072'/*!*/; How to repeat: replica set with master and 2 slave , setup in semi-sync mode. writes on the maser happening from 100 threads once every 100 ms. kill -9 master mysql sesrver I ran into this issue. Suggested fix: server should come back up and maintain the gtid_executed info.