Bug #86643 Unexpected GTID fatal error after slave server restarted
Submitted: 9 Jun 2017 23:39 Modified: 17 Jul 2017 19:02
Reporter: monty solomon Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:5.7.18 OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:Any

[9 Jun 2017 23:39] monty solomon
Description:
Slave replication fails with error after restart

                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

How to repeat:
Take a dump of the master

mysqldump --master-data=1 --single-transaction --all-databases --triggers --routines --events -r dump.sql

Copy the dump to the slave, load the dump, update and start replication

mysql -e "stop slave; reset master; reset slave all"

mysql < dump.sql

mysql -e "CHANGE MASTER TO MASTER_HOST='----master-host-name----', MASTER_AUTO_POSITION = 1, MASTER_CONNECT_RETRY=15, MASTER_RETRY_COUNT=0, MASTER_HEARTBEAT_PERIOD=1 , MASTER_USER='----replication-user----', MASTER_PASSWORD='----redacted----'"
mysql -e "start slave"
mysql -e "SET sql_log_bin = 0; SET GLOBAL sync_binlog=0; SET GLOBAL innodb_flush_log_at_trx_commit=2"

Restart the mysql service and observe that the slave fails

                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
[9 Jun 2017 23:40] monty solomon
Master status from master

mysql> show master status\G
*************************** 1. row ***************************
             File: bin.000499
         Position: 445847854
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-299609495,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)
[9 Jun 2017 23:40] monty solomon
Master and slave status from slave

mysql> show master status\G show slave status\G
*************************** 1. row ***************************
             File: bin.000002
         Position: 154
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-298987149,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master-host-name
                  Master_User: replication-user
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.000499
          Read_Master_Log_Pos: 556592155
               Relay_Log_File: relay.000018
                Relay_Log_Pos: 448309371
        Relay_Master_Log_File: bin.000498
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 448309210
              Relay_Log_Space: 1630372811
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 420
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: c7258e28-4c06-11e7-b189-0a26f8645d98:295407398-299677441
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-298987149,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[9 Jun 2017 23:42] monty solomon
The master still contains every binary log file

# ls -ltr | grep bin.000001
-rw-r----- 1 mysql mysql   23118812 Jun  8 08:01 bin.000001
[14 Jun 2017 3:16] MySQL Verification Team
Hi Monty

What does show binary logs on master retuns?
https://dev.mysql.com/doc/refman/5.7/en/show-binary-logs.html

Can you get us the full master and slave config's? I did a basic test (dump, restore, start replication) and it works as intended so need to see what is different with your setup.

all best
Bogdan
[19 Jun 2017 23:15] monty solomon
mysql> show binary logs;
+------------+-----------+
| Log_name   | File_size |
+------------+-----------+
| bin.001409 | 866185164 |
| bin.001410 | 672440224 |
| bin.001411 | 800033989 |
| bin.001412 | 617813323 |
| bin.001413 | 860691737 |
| bin.001414 | 596352151 |
| bin.001415 | 880646336 |
| bin.001416 | 765658722 |
| bin.001417 | 692453488 |
| bin.001418 | 742909521 |
| bin.001419 | 561598638 |
| bin.001420 | 905638958 |
| bin.001421 | 641816628 |
| bin.001422 | 756218758 |
| bin.001423 | 753291340 |
| bin.001424 | 782835227 |
| bin.001425 | 704879739 |
| bin.001426 | 675027282 |
| bin.001427 | 732744385 |
| bin.001428 | 918445716 |
| bin.001429 | 459555713 |
| bin.001430 | 807563132 |
| bin.001431 | 646384915 |
| bin.001432 | 914395885 |
| bin.001433 | 642919908 |
| bin.001434 | 750569824 |
| bin.001435 | 882150040 |
| bin.001436 | 718876656 |
| bin.001437 | 629795080 |
| bin.001438 | 716788365 |
| bin.001439 | 743691025 |
| bin.001440 | 742143254 |
| bin.001441 | 632827976 |
| bin.001442 | 816254951 |
| bin.001443 | 871902755 |
| bin.001444 | 642311172 |
| bin.001445 | 765164526 |
| bin.001446 | 802535740 |
| bin.001447 | 564123988 |
| bin.001448 | 769210267 |
| bin.001449 | 858489057 |
| bin.001450 | 646722541 |
| bin.001451 | 868073512 |
| bin.001452 | 579533571 |
| bin.001453 | 707953109 |
| bin.001454 | 952043309 |
| bin.001455 | 560039859 |
| bin.001456 | 789984004 |
| bin.001457 | 798889518 |
| bin.001458 | 658612977 |
| bin.001459 | 853577853 |
| bin.001460 | 607896611 |
| bin.001461 | 893601103 |
| bin.001462 | 801863151 |
| bin.001463 | 585858075 |
| bin.001464 | 866033786 |
| bin.001465 | 757115735 |
| bin.001466 | 636977186 |
| bin.001467 | 869798182 |
| bin.001468 | 711143505 |
| bin.001469 | 723821828 |
| bin.001470 | 695065178 |
| bin.001471 | 698826488 |
| bin.001472 | 892714438 |
| bin.001473 | 621548155 |
| bin.001474 | 797614053 |
| bin.001475 | 775442830 |
| bin.001476 | 820733067 |
| bin.001477 | 654908280 |
| bin.001478 | 922093926 |
| bin.001479 | 571247483 |
| bin.001480 | 813531278 |
| bin.001481 | 842770141 |
| bin.001482 | 828745017 |
| bin.001483 | 584958960 |
| bin.001484 | 832204014 |
| bin.001485 | 682241104 |
| bin.001486 | 920797393 |
| bin.001487 | 655620763 |
| bin.001488 | 840549349 |
| bin.001489 | 858010039 |
| bin.001490 | 739869895 |
| bin.001491 | 725410692 |
| bin.001492 | 891803492 |
| bin.001493 | 722156112 |
| bin.001494 | 918193010 |
| bin.001495 | 787420631 |
| bin.001496 | 721585091 |
| bin.001497 | 833596581 |
| bin.001498 | 722773077 |
| bin.001499 | 905686860 |
| bin.001500 | 726545947 |
| bin.001501 | 694116210 |
| bin.001502 | 863262562 |
| bin.001503 | 851660829 |
| bin.001504 | 778240809 |
| bin.001505 | 529867689 |
| bin.001506 | 325561474 |
+------------+-----------+
98 rows in set (0.01 sec)
[19 Jun 2017 23:16] monty solomon
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001506
          Read_Master_Log_Pos: 360219678
               Relay_Log_File: relay.000003
                Relay_Log_Pos: 4
        Relay_Master_Log_File: bin.001506
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 360219678
              Relay_Log_Space: 154967290
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 170619 23:15:03
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144061263:144062729-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1086581339:1086886492-1087122149,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[19 Jun 2017 23:18] monty solomon
mysql> show master status\G
*************************** 1. row ***************************
             File: bin.001506
         Position: 461972797
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1087282811,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.01 sec)
[20 Jun 2017 13:03] monty solomon
Bogdan,

The dump, restore, start replication steps work. I see the failures after waiting for replication to catch up and restarting the slave.
[20 Jun 2017 14:50] MySQL Verification Team
Hi,

MySQL show that it's aware of binlogs from 001409 up, the first 1408 binlogs are not visible. You might want to check the index file too.

Anyhow the GTID is not in the 1409+ binlogs so you get the error.

Questions are 
- the dump you restored, what was the position for it?
- when you do new dump/restore/restart replication do you have the same problem again

I see you write that "I see the failures after waiting for replication to catch up and restarting the slave"
Two questions about this 
 - your slave start to lag behind master, it just lags or after a while you get the same error (purged master no gtid there)
 - when you restart slave, every time you restart slave you get this message, sometimes when you restart slave you get this message?

Thanks
Bogdan
[20 Jun 2017 18:36] monty solomon
You wrote "the GTID is not in the 1409+ binlogs so you get the error" but the slave starts up fine after the restore. The slave then fails to start after the restart. The master contains the same binlogs for the restore and the subsequent restart.

The problem occurs each time I dump/copy/restore/start/restart.

The slave is lagged when it is started after the restore due to the time involved in dumping, copying and restoring.

The slave fails to start after each restart.
[20 Jun 2017 18:38] monty solomon
Before restart of slave from master

mysql> show binary logs;
+------------+-----------+
| Log_name   | File_size |
+------------+-----------+
| bin.001449 | 858489057 |
| bin.001450 | 646722541 |
| bin.001451 | 868073512 |
| bin.001452 | 579533571 |
| bin.001453 | 707953109 |
| bin.001454 | 952043309 |
| bin.001455 | 560039859 |
| bin.001456 | 789984004 |
| bin.001457 | 798889518 |
| bin.001458 | 658612977 |
| bin.001459 | 853577853 |
| bin.001460 | 607896611 |
| bin.001461 | 893601103 |
| bin.001462 | 801863151 |
| bin.001463 | 585858075 |
| bin.001464 | 866033786 |
| bin.001465 | 757115735 |
| bin.001466 | 636977186 |
| bin.001467 | 869798182 |
| bin.001468 | 711143505 |
| bin.001469 | 723821828 |
| bin.001470 | 695065178 |
| bin.001471 | 698826488 |
| bin.001472 | 892714438 |
| bin.001473 | 621548155 |
| bin.001474 | 797614053 |
| bin.001475 | 775442830 |
| bin.001476 | 820733067 |
| bin.001477 | 654908280 |
| bin.001478 | 922093926 |
| bin.001479 | 571247483 |
| bin.001480 | 813531278 |
| bin.001481 | 842770141 |
| bin.001482 | 828745017 |
| bin.001483 | 584958960 |
| bin.001484 | 832204014 |
| bin.001485 | 682241104 |
| bin.001486 | 920797393 |
| bin.001487 | 655620763 |
| bin.001488 | 840549349 |
| bin.001489 | 858010039 |
| bin.001490 | 739869895 |
| bin.001491 | 725410692 |
| bin.001492 | 891803492 |
| bin.001493 | 722156112 |
| bin.001494 | 918193010 |
| bin.001495 | 787420631 |
| bin.001496 | 721585091 |
| bin.001497 | 833596581 |
| bin.001498 | 722773077 |
| bin.001499 | 905686860 |
| bin.001500 | 726545947 |
| bin.001501 | 694116210 |
| bin.001502 | 863262562 |
| bin.001503 | 851660829 |
| bin.001504 | 778240809 |
| bin.001505 | 529867689 |
| bin.001506 | 900128196 |
| bin.001507 | 836630339 |
| bin.001508 | 722971149 |
| bin.001509 | 741062277 |
| bin.001510 | 676870787 |
| bin.001511 | 864376674 |
| bin.001512 | 696774475 |
| bin.001513 | 616741866 |
| bin.001514 | 964794183 |
| bin.001515 | 724258434 |
| bin.001516 | 607967818 |
| bin.001517 | 798397144 |
| bin.001518 | 806954895 |
| bin.001519 | 582259965 |
| bin.001520 | 842652150 |
| bin.001521 | 652709771 |
| bin.001522 | 809082618 |
| bin.001523 | 917841504 |
| bin.001524 | 573877118 |
| bin.001525 | 866988016 |
| bin.001526 | 821718367 |
| bin.001527 | 614207494 |
| bin.001528 | 780882512 |
| bin.001529 | 703511298 |
| bin.001530 | 692079507 |
| bin.001531 | 704558034 |
| bin.001532 | 995235586 |
| bin.001533 | 790915035 |
| bin.001534 | 603543383 |
| bin.001535 | 910522011 |
| bin.001536 | 802087075 |
| bin.001537 | 784538727 |
| bin.001538 | 790307296 |
| bin.001539 | 908405534 |
| bin.001540 | 707779527 |
| bin.001541 | 710023933 |
| bin.001542 | 855195388 |
| bin.001543 | 864449688 |
| bin.001544 | 623929851 |
| bin.001545 |  93161332 |
+------------+-----------+
97 rows in set (0.00 sec)

mysql> show master status\G
*************************** 1. row ***************************
             File: bin.001545
         Position: 100615381
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1131534987,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)
[20 Jun 2017 18:39] monty solomon
After restore on slave and caught up to master before restart

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001545
          Read_Master_Log_Pos: 102472316
               Relay_Log_File: relay.000005
                Relay_Log_Pos: 35527041
        Relay_Master_Log_File: bin.001545
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 102472316
              Relay_Log_Space: 102472833
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: c7258e28-4c06-11e7-b189-0a26f8645d98:1131372192-1131537429
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1131537429,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[20 Jun 2017 18:40] monty solomon
Slave failure after restart

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001545
          Read_Master_Log_Pos: 106059826
               Relay_Log_File: relay.000006
                Relay_Log_Pos: 4
        Relay_Master_Log_File: bin.001545
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106059826
              Relay_Log_Space: 106061939
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 170620 18:35:30
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144061263:144062729-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1130502378:1131372192-1131542539,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[20 Jun 2017 18:42] monty solomon
Master binary logs after slave restart

mysql> show binary logs;
+------------+-----------+
| Log_name   | File_size |
+------------+-----------+
| bin.001449 | 858489057 |
| bin.001450 | 646722541 |
| bin.001451 | 868073512 |
| bin.001452 | 579533571 |
| bin.001453 | 707953109 |
| bin.001454 | 952043309 |
| bin.001455 | 560039859 |
| bin.001456 | 789984004 |
| bin.001457 | 798889518 |
| bin.001458 | 658612977 |
| bin.001459 | 853577853 |
| bin.001460 | 607896611 |
| bin.001461 | 893601103 |
| bin.001462 | 801863151 |
| bin.001463 | 585858075 |
| bin.001464 | 866033786 |
| bin.001465 | 757115735 |
| bin.001466 | 636977186 |
| bin.001467 | 869798182 |
| bin.001468 | 711143505 |
| bin.001469 | 723821828 |
| bin.001470 | 695065178 |
| bin.001471 | 698826488 |
| bin.001472 | 892714438 |
| bin.001473 | 621548155 |
| bin.001474 | 797614053 |
| bin.001475 | 775442830 |
| bin.001476 | 820733067 |
| bin.001477 | 654908280 |
| bin.001478 | 922093926 |
| bin.001479 | 571247483 |
| bin.001480 | 813531278 |
| bin.001481 | 842770141 |
| bin.001482 | 828745017 |
| bin.001483 | 584958960 |
| bin.001484 | 832204014 |
| bin.001485 | 682241104 |
| bin.001486 | 920797393 |
| bin.001487 | 655620763 |
| bin.001488 | 840549349 |
| bin.001489 | 858010039 |
| bin.001490 | 739869895 |
| bin.001491 | 725410692 |
| bin.001492 | 891803492 |
| bin.001493 | 722156112 |
| bin.001494 | 918193010 |
| bin.001495 | 787420631 |
| bin.001496 | 721585091 |
| bin.001497 | 833596581 |
| bin.001498 | 722773077 |
| bin.001499 | 905686860 |
| bin.001500 | 726545947 |
| bin.001501 | 694116210 |
| bin.001502 | 863262562 |
| bin.001503 | 851660829 |
| bin.001504 | 778240809 |
| bin.001505 | 529867689 |
| bin.001506 | 900128196 |
| bin.001507 | 836630339 |
| bin.001508 | 722971149 |
| bin.001509 | 741062277 |
| bin.001510 | 676870787 |
| bin.001511 | 864376674 |
| bin.001512 | 696774475 |
| bin.001513 | 616741866 |
| bin.001514 | 964794183 |
| bin.001515 | 724258434 |
| bin.001516 | 607967818 |
| bin.001517 | 798397144 |
| bin.001518 | 806954895 |
| bin.001519 | 582259965 |
| bin.001520 | 842652150 |
| bin.001521 | 652709771 |
| bin.001522 | 809082618 |
| bin.001523 | 917841504 |
| bin.001524 | 573877118 |
| bin.001525 | 866988016 |
| bin.001526 | 821718367 |
| bin.001527 | 614207494 |
| bin.001528 | 780882512 |
| bin.001529 | 703511298 |
| bin.001530 | 692079507 |
| bin.001531 | 704558034 |
| bin.001532 | 995235586 |
| bin.001533 | 790915035 |
| bin.001534 | 603543383 |
| bin.001535 | 910522011 |
| bin.001536 | 802087075 |
| bin.001537 | 784538727 |
| bin.001538 | 790307296 |
| bin.001539 | 908405534 |
| bin.001540 | 707779527 |
| bin.001541 | 710023933 |
| bin.001542 | 855195388 |
| bin.001543 | 864449688 |
| bin.001544 | 623929851 |
| bin.001545 | 289524179 |
+------------+-----------+
97 rows in set (0.00 sec)
[20 Jun 2017 19:02] MySQL Verification Team
Thanks for the updated info.

Reading this looks like your slave fails *every time* you restart it. 

I'll update you after I'm done with some additional analysis of the data you provided

thanks
Bogdan
[20 Jun 2017 22:17] monty solomon
Not sure if it is related but a restored slave stopped after a while with a different error

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001552
          Read_Master_Log_Pos: 293005999
               Relay_Log_File: relay.000007
                Relay_Log_Pos: 585984001
        Relay_Master_Log_File: bin.001546
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table access_tokens; Can't find record in 'access_tokens', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log bin.001546, end_log_pos 649153085
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 649152698
              Relay_Log_Space: 4866855833
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table access_tokens; Can't find record in 'access_tokens', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log bin.001546, end_log_pos 649153085
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 170620 19:29:47
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: c7258e28-4c06-11e7-b189-0a26f8645d98:1131372192-1140057229
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1133710865,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[20 Jun 2017 23:27] monty solomon
That failure is similar to the ones experienced by the workers in #86641
[28 Jun 2017 19:00] MySQL Verification Team
I have to check, since I can't reproduce this in my test env. same thing I asked you in the other replication issue you have:
 - Do you have a primary key on every table that you are replicating?

All best
Bogdan
[29 Jun 2017 4:53] monty solomon
Yes, each table uses InnoDB and each table has a primary key.
[29 Jun 2017 6:42] MySQL Verification Team
Hi,

I think this is due to Bug #18145032 fixed in 5.6.19, you can see the release notes here:
https://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-19.html

When gtid_mode=ON, and a transaction is filtered out on the slave, the GTID of the transaction is still logged on the slave as an “empty” transaction (consisting of a GTID followed immediately by BEGIN and then COMMIT). This is necessary to prevent the transaction from being retransmitted the next time the slave reconnects or is involved in a failover. The current fix addresses two issues relating to such “empty” transactions:

    No empty transaction was generated for CREATE TEMPORARY TABLE or DROP TEMPORARY TABLE statements.

    If the slave used a database filter (--replicate-do-db or --replicate-ignore-db option), no empty transaction was generated.

(Bug #71376, Bug #18095502, Bug #18145032)
[29 Jun 2017 6:50] MySQL Verification Team
In your config I don't see you are using any filtering for replication but parallel replication (multithreaded slave) can have similar effect.

What to do with error 1236.. This error can arise if there are transactions on the master that were not replicated, even if they are not expected to be replicated.  An example of such a transaction is the creation of a temporary table on the master when row-based replication is used (see Replication and Temporary Tables ).  These transactions on the master generate GTID's, but they are not replicated to the slave, and no dummy GTID's are created on the slave, so the slave's Executed_Gtid_Set and gtid_purged list will have gaps for the missing transactions, and will appear something like this:

<uuid>:1-1098:1100-1149:1151-1300:1302-1500, etc.

rather than one large set like: 

<uuid>:1-1500

When a restart of the slave is attempted using MASTER_AUTO_POSITION=1, the sync will fail with the 1236 error listed above. 
This issue is described in bug 18145032, which is fixed in MySQL 5.6.19 ( Release Notes ). 

To recover from this scenario, it is required to reset the global variable gtid_purged on the slave to include the entire set of transactions from the master through the last transaction actually executed on the slave.

Here are the steps to recover:

From the slave:

      
    mysql> STOP SLAVE;
     
      
    mysql> SHOW SLAVE STATUS\G
     
      
    mysql> RESET MASTER; 
      
      
    mysql> SET GLOBAL gtid_purged='<uuid>:1-<last GTID from Executed_Gtid_Set>'
       
      
    mysql> START SLAVE;
      

 The slave should then start replicating from the the next GTID after the last GTID set in gtid_purged above.
[29 Jun 2017 6:53] MySQL Verification Team
Hi,

This "gap issue with filtered .." is, as I already wrote, fixed in 5.6.19 (and 5.7.5-m15) so what you are experiencing is similar but not exactly the same.

I believe it is caused by multithreaded slave.

If you reconfigure slave to work in single thread, does both this issue and #86641 go away for you?

Thanks
Bogdan Kecman
[29 Jun 2017 8:31] MySQL Verification Team
Some further investigation after consulting dev team, can you collect information about master's GTID_PURGED and put here, should help out with debugging.

When the slave reports it doesn't have a GTID to the master, and the master had applied the GTID (it is on GTID_EXECUTED) but has no way to recover it anymore (i.e. the binary log file was purged or is missing in the index file, so the GTID is present in GTID_PURGED set), the slave I/O thread will fail as reported. 

This looks like a Bug #24590891 (fixed in 5.7.19)

all best
Bogdan
[29 Jun 2017 19:49] monty solomon
You wrote that this was fixed in 5.6.19. We are running 5.7.18. Is there a regression for the bug?

I think this is due to Bug #18145032 fixed in 5.6.19, you can see the release notes here:
https://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-19.html
[29 Jun 2017 19:55] monty solomon
I don't see any gaps in the Executed Gtid set on the master or on the slaves

master

mysql> show master status\G
*************************** 1. row ***************************
             File: bin.001980
         Position: 502294532
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1634578681,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)

slave

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001980
          Read_Master_Log_Pos: 502950686
               Relay_Log_File: relay.002476
                Relay_Log_Pos: 495996923
        Relay_Master_Log_File: bin.001980
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 502950686
              Relay_Log_Space: 502951203
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: c7258e28-4c06-11e7-b189-0a26f8645d98:691372439-1634579799
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1634579799,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[29 Jun 2017 20:01] monty solomon
I restored a slave again from a fresh dump and it stopped after a while with another error 1032.

I don't see any gaps here.
 

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: sweet-hyena
                  Master_User: SUSR_Repl
                  Master_Port: 3306
                Connect_Retry: 15
              Master_Log_File: bin.001980
          Read_Master_Log_Pos: 466954440
               Relay_Log_File: relay.000008
                Relay_Log_Pos: 134115281
        Relay_Master_Log_File: bin.001953
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table access_tokens; Can't find record in 'access_tokens', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log bin.001953, end_log_pos 138482317
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 138481930
              Relay_Log_Space: 22354366362
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Delete_rows event on table access_tokens; Can't find record in 'access_tokens', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log bin.001953, end_log_pos 138482317
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 293473754
                  Master_UUID: c7258e28-4c06-11e7-b189-0a26f8645d98
             Master_Info_File: /opt/mysql/dbdata1/monty-p0/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 0
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 170629 06:09:04
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: c7258e28-4c06-11e7-b189-0a26f8645d98:1599545570-1634523122
            Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1601803277,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
[29 Jun 2017 20:06] monty solomon
You asked for the GTID_PURGED from the master but I already provided it in the output of show master status. Here is the current output of each of those commands.

mysql> show master status\G  select @@gtid_purged\G
*************************** 1. row ***************************
             File: bin.001980
         Position: 916880222
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1635192514,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)

*************************** 1. row ***************************
@@gtid_purged: 06970d75-3105-11e7-9e18-1228337aa004:1-2368691277,
0e6d000d-16e1-11e6-a80c-0a48c05654c7:1-11939041,
12459d26-f870-11e5-a18b-12544ded6a95:1-5628574,
2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144095557,
85ebe027-a5b6-11e5-861c-129ccaa4cbf7:1-4991016,
c7258e28-4c06-11e7-b189-0a26f8645d98:1-1518787191,
d2ecb5d0-3a7e-11e6-a6bb-0a0ad9f3138d:1-71377440,
d7fbf9fd-81fd-11e6-9f00-1244bf4cd67b:1-975482865
1 row in set (0.00 sec)

mysql>
[29 Jun 2017 20:11] monty solomon
You mentioned Bug #18145032, Bug #18095502, and Bug #24590891 but I an unable to read those bugs.

Is there any information you can share about them?

Thanks.
[29 Jun 2017 20:14] monty solomon
We are running without parallel replication and are still experiencing the issues
[29 Jun 2017 21:40] MySQL Verification Team
Hi,

> You wrote that this was fixed in 5.6.19. We are running 5.7.18. Is there a regression for the bug?

I wrote it looks like that bug but there's another one that I mention later on that is causing your problem as that in 5.6.19 is linked to replication filtering (that you do not use), this is similar but not same and is Bug #24590891 (fixed in 5.7.19). 5.7.19 is unfortunately not yet out so you have to wait for it to come out.

gaps, here:
> 2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144061263:144062729-144095557,
...
> c7258e28-4c06-11e7-b189-0a26f8645d98:1-1086581339:1086886492-1087122149,
...
> 2bdcda39-47c3-11e7-a5f6-0ac0ba22019e:1-144061263:144062729-144095557,
...
> c7258e28-4c06-11e7-b189-0a26f8645d98:1-1130502378:1131372192-1131542539,
...

The multi threaded slave is not directly the reason for the problem, it just make it more visible.

Related fix for this bug is also Bug #82848 (you did recreate slave using mysqldump, right)

Now, when you recreate the slave, do you copy only your database or you copy mysql database too (mysql.gtid_executed table especially). Can you recreate slave without this table and see if you still hit the bug?

all best
Bogdan
[30 Jun 2017 9:11] monty solomon
What set of commands can be run on a slave to determine if a restart will cause it to fail to replicate?
[30 Jun 2017 9:12] monty solomon
And what set of commands can be run on the slave to fix it so that it won't have a problem after a restart?
[30 Jun 2017 11:23] MySQL Verification Team
Hi,
I am running a test for this for past 48 hours, I reproduced the other bug (86641) but I'm yet to see the 1236. 

For the past 15 hours the test is working with slaves made by dumping master and manually deleting mysql.gtid_executed table from the dump file, and I'm not seeing 1236 nor 1032 error.

I'll stop the test in few hours and restart it with slave created with mysql.gtid_executed to see what will happen.

all best
Bogdan

p.s. how to make slave that will work, if my and dev team's idea that this is related to #82848 then you just need to make slave without this one table and it should work. My test so far show that's true but it's not conclusive yet
[30 Jun 2017 18:40] monty solomon
Other than restarting a slave to see if it will fail to replicate, are there any techniques to determine if the slave is in the fragile state?

Thanks.
[3 Jul 2017 7:18] MySQL Verification Team
Hi,

I'm still having issues reproducing this.

> Other than restarting a slave to see if it will fail to replicate,
> are there any techniques to determine if the slave is in the fragile state?

The "restart" should not affect the slave.

Slave should never be in "fragile state".

As for monitoring slave there are tools that come with MySQL Enterprise edition that can help a lot, subscribing to MySQL Support gets you eligible to then plus our support team can help you with many other things with regards to configuration and monitoring.

all best
Bogdan
[4 Jul 2017 18:00] monty solomon
By "fragile" I mean the state caused by the bug that will cause the slave I/O thread to fail due to a missing GTID when the slave is restarted. In that case, the slave is otherwise operating normally but if it is restarted for any reason it will fail to replicate.

I was asking if there are any steps to determine if a slave is in a state where restarting it will cause it to fail to replicate due to missing GTID.
[17 Jul 2017 19:02] MySQL Verification Team
Hi,

I still can't reproduce the issue other then as duplicate of #81692

Here are some updates about some aspects of the problem...

1. to avoid bug you have to create slave without mysql.gtid_executed

NOTE that you cannot just drop the table after the fact as your slave already diverged from the master (some transactions might have been applied twice)!!! so it is important to create slave without this table (delete it manually from dump or add to list of ignored tables when you create a dump)

2. Master status "executed_gtid_Set" looks identical to gtid_purged, is that ok/expected/same thing

This is expected on slave after applying the dump. This means that the slave had applied all GTIDs (from dump) but it doesn't have them on its binary logs.

3. What set of commands can be run on a slave to determine if a restart will cause it to fail to replicate?

On the slave, after restoring the dump, mysql.gtid_executed table should not be empty. It actually should have the same content as GTID_PURGED/GTID_EXECUTED.

When it is empty (or missing some GTIDs with respect to GTID_PURGED/GTID_EXECUTED), after restarting the slave it will compute "GTID_EXECUTED" from local binary logs (empty) plus mysql.gtid_executed table.

Let me give you an example:

Supposing all servers in MySQL 5.7.18 version.

Master's GTID_EXECUTED=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2000.

On its binary logs it has aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1001-2000. So GTID_PURGED=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-1000.

The content of the current binary log file on master is not yet on the mysql.gtid_executed table (it is updated upon binary log rotation on the master). So, it might have (example) aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-1990 (the last 10 transactions are not yet reflected on mysql.gtid_executed table).

Transaction aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1991 is an INSERT in a table with primary key.

Now, the DBA takes a dump of the master.

Transactions aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2001-2100 are applied on the master after the dump.

The DBA then apply the dump on an empty (to be slave) server.

The dump set GTID_PURGED as aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2000, populates all contents up to aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2000, but (because of BUG#82848) it also populate mysql.gtid_executed table with aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-1990.

When connecting to a master, the slave will tell it "I have populated all contents up to aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2000." Master will start replicating on aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-2001.

Slave syncs with master, and is restarted. It computes GTID_PURGED from mysql.gtid_executed table + local binary logs: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-1990 (mysqld.gtid_executed table) + aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2001-2100 (local binary logs). Slave tell the master when connecting to replicate: "I have applied this: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-1990,2001-2100". Master will start replicating on aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1991.

As aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1991 is an INSERT on a table with primary key that was already applied, the slave will fail to apply this transaction again because of a violation on the primary key.

4. And what set of commands can be run on the slave to fix it so that it won't have a problem after a restart?

Until 5.7.19 be released, just delete the mysql.gtid_executed table statements on the dump file before applying it on slave.

I hope this helps
All best
Bogdan