Bug #79831 Unexpected error message on crash-safe slave with max_relay_log_size set
Submitted: 4 Jan 2016 10:12 Modified: 4 Jan 2016 11:16
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:5.6.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: 2PC, max_relay_log_size, replication, transaction

[4 Jan 2016 10:12] Valeriy Kravchuk
Description:
The following error message was noted by me and others on 5.6.x slaves:

"2016-01-04 11:24:50 7196 [ERROR] Transaction not registered for MySQL 2PC, but transaction is active"

Replication continues to work as expected:

slave1 [localhost] {msandbox} (test) > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 22293
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 232002931
               Relay_Log_File: mysql_sandbox22294-relay-bin.000226
                Relay_Log_Pos: 564693
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 232002931
              Relay_Log_Space: 133149790
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 2b1577b7-b2c3-11e5-9b34-f4b7e2133dbd
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
1 row in set (0,03 sec)

It seems this happen only with row-based replication and on slave where both master and relay log info is stored in tables:

master_info_repository=TABLE
relay_log_info_repository=TABLE
max_relay_log_size=1048576

Setting max_relay_log_size explicitly helps to reproduce the problem faster/easier, but I feel it's related to switching from one relay log to the other in the process of writing a binlog event, and then updating the InnoDB tables for crash-safe slave accordingly.

How to repeat:
Set up classical replication sandbox:

wget -c http://cdn.mysql.com//Downloads/MySQL-5.6/mysql-5.6.28-linux-glibc2.5-x86_64.tar.gz
make_replication_sandbox mysql-5.6.28-linux-glibc2.5-x86_64.tar.gz
cd sandboxes/rsandbox_mysql-5_6_28/

Add the following line to my.sanbox.cnf on master:

binlog_format=row

Add the following lines to my.sandbox.cnf on slave:

master_info_repository=TABLE
relay_log_info_repository=TABLE
max_relay_log_size=1048576

and restart sandboxes:

./restart_all

Now, on master execute the following:

create table t1(id int auto_increment primary key, c1 varchar(1000)) engine=InnoDB;
insert into t1(c1) values(repeat('a',1000));
insert\ into t1(c1) select repeat('b',1000) from t1;
...

Repeat last statement until you get 16K rows inserted (and some 32K total). Then:

create table t2 select * from t1;

Then on slave node1 (wait for replication to catch up if needed):

select count(*) from t2;
flush logs;

Then back on master node:

insert into t1(c1)select repeat('b',1000) from t1;
update t1 set c1=repeat('a',1000);

Like these:

master [localhost] {msandbox} (test) > insert into t1(c1) select repeat('b',1000) from t1;
Query OK, 32768 rows affected (4,08 sec)
Records: 32768  Duplicates: 0  Warnings: 0

master [localhost] {msandbox} (test) > update t1 set c1=repeat('a',1000);
Query OK, 65535 rows affected (28,35 sec)
Rows matched: 65536  Changed: 65535  Warnings: 0

master [localhost] {msandbox} (test) > exit
Bye

Now, wait for some time for slave to catch up and check the error log on node1:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/master$ cd ../node1/
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail data/msandbox.err
2016-01-04 11:17:59 7196 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-01-04 11:17:59 7196 [Note] Server socket created on IP: '127.0.0.1'.
2016-01-04 11:18:00 7196 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22294-relay-bin' to avoid this problem.
2016-01-04 11:18:00 7196 [Note] Event Scheduler: Loaded 0 events
2016-01-04 11:18:00 7196 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22294.sock'  port: 22294  MySQL Community Server (GPL)
2016-01-04 11:18:00 7196 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-01-04 11:18:00 7196 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000001' at position 2696
2016-01-04 11:18:00 7196 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 2696, relay log './mysql_sandbox22294-relay-bin.000002' position: 2859
2016-01-04 11:24:50 7196 [ERROR] Transaction not registered for MySQL 2PC, but transaction is active

We have the error, let's check if slave is OK:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ ./my sql test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.6.28-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

slave1 [localhost] {msandbox} (test) > show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 22293
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 232002931
               Relay_Log_File: mysql_sandbox22294-relay-bin.000226
                Relay_Log_Pos: 564693
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 232002931
              Relay_Log_Space: 133149790
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 2b1577b7-b2c3-11e5-9b34-f4b7e2133dbd
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
1 row in set (0,03 sec)

So, node1 is in sync, but we see weird and unexpected error message in the error log. On other slave, where information is stored in files and there is no explicit limit for the relay log file size, we see no error:

openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ tail ../node2/data/msandbox.err
2016-01-04 11:18:01 7462 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2016-01-04 11:18:01 7462 [Note] Server socket created on IP: '127.0.0.1'.
2016-01-04 11:18:01 7462 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox22295-relay-bin' to avoid this problem.
2016-01-04 11:18:01 7462 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2016-01-04 11:18:01 7462 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:22293',replication started in log 'mysql-bin.000001' at position 2696
2016-01-04 11:18:01 7462 [Note] Event Scheduler: Loaded 0 events
2016-01-04 11:18:01 7462 [Note] /home/openxs/5.6.28/bin/mysqld: ready for connections.
Version: '5.6.28-log'  socket: '/tmp/mysql_sandbox22295.sock'  port: 22295  MySQL Community Server (GPL)
2016-01-04 11:18:01 7462 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2016-01-04 11:18:01 7462 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 2696, relay log './mysql_sandbox22295-relay-bin.000002' position: 2859
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node1$ cd ../node2
openxs@ao756:~/sandboxes/rsandbox_mysql-5_6_28/node2$ ./my sql test             Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.28-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

slave2 [localhost] {msandbox} (test) > select count(*) from t1;
+----------+
| count(*) |
+----------+
|    65536 |
+----------+
1 row in set (0,29 sec)

Suggested fix:
Check the code that produce the error message (line numbers is from current 5.7):

openxs@ao756:~/git/mysql-server$ grep -rn 'Transaction not registered for MySQL 2PC' *
storage/innobase/handler/ha_innodb.cc:3981:             sql_print_error("Transaction not registered for MySQL 2PC,"
storage/innobase/handler/ha_innodb.cc:16316:            sql_print_error("Transaction not registered for MySQL 2PC,"

...
   3974         /* Transaction is deregistered only in a commit or a rollback. If
   3975         it is deregistered we know there cannot be resources to be freed
   3976         and we could return immediately.  For the time being, we play safe
   3977         and do the cleanup though there should be nothing to clean up. */
   3978
   3979         if (!trx_is_registered_for_2pc(trx) && trx_is_started(trx)) {
   3980
   3981                 sql_print_error("Transaction not registered for MySQL 2PC,"
   3982                                 " but transaction is active");
   3983         }

...

and make sure it takes into account the case when we have to update transactional tables where replication information is stored. (IMHO: Changes to these tables are NOT written to the binary log by design and maybe that's why they may be NOT registered for 2PC.)

In the meantime explain somewhere in the manual that this error message can be ignored (if this is really true, as it seems from practice).
[4 Jan 2016 11:16] Umesh Shastry
Hello Valeriy,

Thank you for the report.
This is duplicate of internally reported bug BUG 20787532 - TRANSACTION NOT REGISTERED FOR MYSQL 2PC, BUT TRANSACTION IS ACTIVE ON SLAVE.

Thanks,
Umesh
[4 Jan 2016 11:17] Umesh Shastry
// mtr(simiplified mtr test case based on internal bug)
 cat t/rpl_bugXXX-master.opt 
--log-bin --binlog-format=ROW --character-set-server=utf8

--
cat t/rpl_bugXXX-slave.opt 
--relay_log_info_repository=TABLE --max_relay_log_size=1048576 --binlog-format=ROW

--
cat t/rpl_bugXXX.test

--source include/master-slave.inc

--connection master

CREATE TABLE tt (mycol varchar(1024) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into tt values (repeat('A',1024)),(repeat('A',1024)),(repeat('A',1024)),(repeat('A',1024));
insert into tt(mycol) select repeat('A',1024)  from tt k1, tt k2, tt k3, tt k4,tt k5,tt k6, tt k7, tt k8, tt k9,tt k0,tt ka, tt kb, tt kc, tt kd limit 100000;

CREATE TABLE ttt (mycol varchar(1024) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8 AS SELECT * from tt limit 90000;

sync_slave_with_master;

--connection slave

query_vertical show slave status;

[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.6.28/mysql-test: ./mtr rpl_bugXXX.test
Logging: ./mtr  rpl_bugXXX.test
2016-01-04 12:11:41 0 [Note] /export/umesh/server/binaries/mysql-advanced-5.6.28/bin/mysqld (mysqld 5.6.28-enterprise-commercial-advanced) starting as process 31962 ...
2016-01-04 12:11:42 31962 [Note] Plugin 'FEDERATED' is disabled.
2016-01-04 12:11:42 31962 [Note] Binlog end
2016-01-04 12:11:42 31962 [Note] Shutting down plugin 'CSV'
2016-01-04 12:11:42 31962 [Note] Shutting down plugin 'MyISAM'
MySQL Version 5.6.28
Checking supported features...
 - SSL connections supported
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/export/umesh/server/binaries/mysql-advanced-5.6.28/mysql-test/var'...
Installing system database...
.
.
main.rpl_bugXXX                          [ fail ]
        Test ended at 2016-01-04 12:12:00

CURRENT_TEST: main.rpl_bugXXX
mysqltest: At line 22: "DONE"

// On Slave - extract reported error

[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.6.28/mysql-test: cat var/log/mysqld.2.err|grep "2PC"
2016-01-04 14:11:59 32008 [ERROR] Transaction not registered for MySQL 2PC, but transaction is active
[4 Jan 2016 11:28] Umesh Shastry
// With same mtr, not seeing this issue with 5.7.11 build