Bug #71037 can not start mysql when no Previous_gtids_log_event in the binlog
Submitted: 29 Nov 2013 8:07 Modified: 27 May 2014 8:52
Reporter: li pickup (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.14 OS:Linux (rhel 6.2)
Assigned to: Venkatesh Duggirala CPU Architecture:Any
Tags: crash, GTID, Previous_gtids_log_event

[29 Nov 2013 8:07] li pickup
Description:
MySQL switch the binlog and does not write Previous_gtids_log_event in binlog when something got wrong internel。
The slave will not go further and just switch relaylog forever.
And We restart MySQL manually, it Refuses to start because there is no Previous_gtids_log_event in binlog when in gtid-mode=on and enforce-gtid-consistency=true.

How to repeat:
can not find ways to repeat it.

Suggested fix:
May be shutdown whole MySQL in such critical error.

or change the source code: sql/binlog.cc:3092 of MYSQL_BIN_LOG::open_binlog

/*
    We need to revisit this code and improve it.
    See further comments in the mysqld.
    /Alfranio
  */
  if (current_thd && gtid_mode > 0)
  {
    if (need_sid_lock)
      global_sid_lock->wrlock();
    else
      global_sid_lock->assert_some_wrlock();
    Previous_gtids_log_event prev_gtids_ev(previous_gtid_set);
    if (need_sid_lock)
      global_sid_lock->unlock();
    prev_gtids_ev.checksum_alg= s.checksum_alg;
    if (prev_gtids_ev.write(&log_file))
      goto err;
    bytes_written+= prev_gtids_ev.data_written;
  }

not care about the "current_thd" , just write the Previous_gtids_log_event in the binlog
[29 Nov 2013 8:08] li pickup
error message and binlog info

Attachment: error_message_and_binlog_info (application/octet-stream, text), 10.79 KiB.

[29 Nov 2013 8:15] li pickup
change category
[29 Nov 2013 15:16] MySQL Verification Team
Li,

We do require a repeatable test case in order to verify the bug. This is required because only when we repeat the problem we can find it's cause and without proper cause there can not be a fix.
[29 Nov 2013 17:37] Sveta Smirnova
In addition to the Sinisa's request I could not repeat described behavior after I manually imitated skipping Previous-GTIDs event:

[sveta@delly mysql-5.6]$ ./client/mysqlbinlog mysql-test/var/mysqld.1/data/master-bin.000001 | grep -i gtid
#131129 19:32:17 server id 1  end_log_pos 151 CRC32 0x2f9eaa3b 	Previous-GTIDs
#131129 19:32:18 server id 1  end_log_pos 199 CRC32 0x9d334c46 	GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '3061cbee-591c-11e3-87ee-a088b4923078:1'/*!*/;
#131129 19:32:18 server id 1  end_log_pos 344 CRC32 0x801a8886 	GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '3061cbee-591c-11e3-87ee-a088b4923078:2'/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
[sveta@delly mysql-5.6]$ ./client/mysqlbinlog mysql-test/var/mysqld.1/data/master-bin.000002 | grep -i gtid
#131129 19:32:18 server id 1  end_log_pos 168 CRC32 0x880ab327 	GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '3061cbee-591c-11e3-87ee-a088b4923078:3'/*!*/;

show slave status;
...
Relay_Log_File	slave-relay-bin.000004
Relay_Log_Pos	643
Relay_Master_Log_File	master-bin.000002
...
sleep(10);
show slave status;
...
Relay_Log_File	slave-relay-bin.000004
Relay_Log_Pos	643
Relay_Master_Log_File	master-bin.000002

So question is what was this kind of corruption which lead to such behavior? Looks like we not only need repeatable test case of replicating unnecessary relay log rotation, but a test case which leads to binary log corruption.
[16 Dec 2013 4:45] li pickup
It occasionally occured in my enviorment. 
I tried tens times but can not repeat it.
I will still try to repeat it.
[16 Dec 2013 14:28] MySQL Verification Team
Thank you for your feedback, Li.

Also, thank you for your patch, but we can not apply it until we have a repeatable test case.

We will be patiently waiting on your feedback.
[18 Dec 2013 15:59] Omri Bahumi
I think I found what triggers it on my installation. Can you try sending a SIGHUP to your mysqld?
[18 Dec 2013 16:27] MySQL Verification Team
Sending SIGHUP to master or to slave ???

After sighup is sent, MySQL server should be shutdown  and restarted ???

Also, do you get same effect if issuing : FLUSH LOGS command ???

Last, but not least, does MySQL server refuse to restart always after issuing SIGHUP ???
[18 Dec 2013 16:49] Omri Bahumi
> Sending SIGHUP to master or to slave ???
I'm experiencing this problem regardless to having a slave.

> After sighup is sent, MySQL server should be shutdown  and restarted ???
When I send a SIGHUP, the logs are rotated and the new log file that doesn't contain "Previous-GTIDs" is created.
If you want to see the error, stop and start mysql afterwards.

> Also, do you get same effect if issuing : FLUSH LOGS command ???
No. FLUSH LOGS works fine.

> Last, but not least, does MySQL server refuse to restart always after issuing SIGHUP ???
I'm 100% positive about that, I'll have to perform some more tests.
[18 Dec 2013 22:07] Omri Bahumi
Small correction for my last comment: I meant I'm NOT 100% positive
[19 Dec 2013 17:37] MySQL Verification Team
Hi!

I have spent a lot of time on testing your SIGHUP idea, with all options as you have presented here. I tested both 5.6.12 and 5.6.14.

I ran 3 parallel, concurrent client processes, each running a script with some very large DDL and DML statements. While running all these processes, I applied several times SIGHUP. Each SIGHUP produced a number of reports, as it is expected of it. 

I then stopped the server and re-started it. It worked like a charm, without any problems and without even a hick-up !!! I then repeated entire procedure two more times. No problems.

Please, do understand that we only accept 100 % fully repeatable test cases. We will not try to repeat a test case again, unless you are 100 percent certain and convinced that it is fully repeatable , with the options, version and environment.

We simply do not have that many resources available.
[25 Mar 2014 10:50] Evgeny Makarov
I was able to reproduce the issue with version 5.6.16 on debian.
Sending SIGHUP signal with kill -1 makes binlogs rotate and new binlog misses Pervious_gtids section (while previous logs have it; it's easy to check with mysqlbginlog). If any transaction is written to the binlog then on next restart you get an error: 
"The first global transaction identifier was read, but no other information regarding identifiers existing on the previous log files was found."
and the server refuses to start.

sql/mysqld.cc:3219
    case SIGHUP:
      if (!abort_loop)
      {
        int not_used;
  mysql_print_status();   // Print some debug info
  reload_acl_and_cache((THD*) 0,
           (REFRESH_LOG | REFRESH_TABLES | REFRESH_FAST |
            REFRESH_GRANT |
            REFRESH_THREADS | REFRESH_HOSTS),
           (TABLE_LIST*) 0, &not_used); // Flush logs
      }
  ......

reload_acl_and_cache is called and eventually this code is executed in sql/binlog.cc:3135
  if (current_thd && gtid_mode > 0)
  {
    if (need_sid_lock)
      global_sid_lock->wrlock();
    else
      global_sid_lock->assert_some_wrlock();
    Previous_gtids_log_event prev_gtids_ev(previous_gtid_set);
    if (need_sid_lock)
      global_sid_lock->unlock();
    prev_gtids_ev.checksum_alg= s.checksum_alg;
    if (prev_gtids_ev.write(&log_file))
      goto err;
    bytes_written+= prev_gtids_ev.data_written;
  }
.....

Looks like current_thd is not defined here at this stage.

This issue really affects us since SIGHUP signal is sent after every mysql version upgrade on the first normal start. Mysql server starts normally, transactions are written to logically corrupted binlog, and on next restart the server fails.
[25 Mar 2014 11:40] Venkatesh Duggirala
Hello, 

Thank you for the bug report and feedback.
Verified as described.

Thanks & Regards,
Venkatesh.
[10 Apr 2014 3:56] li pickup
Thanks very much for verify this bug, 
I am look forward to fix the bug in recent mysql release.
[27 May 2014 8:52] Erlend Dahl
[23 May 2014 5:19] Jon Stephens 

Documented fix as follows in the 5.6.19 and 5.7.5 changelogs:

        When the binary log was rotated due to receipt of a SIGHUP
        signal, the new binary log did not contain the
        Previous_gtid_event required for subsequent processing of that
        binary log's GTID events. Now when SIGHUP is received, steps are
        taken to insure that the server writes the necessary
        Previous_gtid_event to the new log before writing any GTID
        events to the new log.

Fixed under the heading of Bug#17026898 PREVIOUS GTID EVENT IS NOT WRITTEN WHEN BINLOG IS ROTATED VIA SIGHUP