Bug #38584 Wrong Exec_Master_Log_Pos from 'SHOW SLAVE STATUS'
Submitted: 5 Aug 2008 21:28 Modified: 23 Jan 2009 12:40
Reporter: Michael Jiang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.44, 5.0, 5.1, 6.0 bzr OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[5 Aug 2008 21:28] Michael Jiang
Description:
We were trying to setup a secondary slave from a running slave. We stop slave on the running slave first, then make a cold copy and resume the secondary slave from the position we got after stop slave. But it failed on the secondary slave with following errors.

080804 23:35:57 [Note] Slave SQL thread initialized, starting replication in log 'binary-logs.001875' at position 11929, relay log '/var/lib/mysql/mysqld-relay-bin_udb5101.000001' position: 4
080804 23:35:57 [Note] Slave I/O thread: connected to master 'tfbrepl@10.8.169.101:3306',  replication started in log 'binary-logs.001875' at position 11929
080804 23:35:57 [ERROR] Error reading packet from server: log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master ( server_errno=1236)
080804 23:35:57 [ERROR] Got fatal error 1236: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log
080804 23:35:57 [Note] Slave I/O thread exiting, read up to log 'binary-logs.001875', position 11929

Here is from the 'SHOW SLAVE STATUS' after we stopped the running slave.
Slave_IO_Running: No
Slave_SQL_Running: No
Relay_Master_Log_File: binary-logs.001875
Exec_Master_Log_Pos: 11929

I checked binlog with mysqlbinlog --start-position=11929. that position does not exist. And from the time we run the stop slave, it could not be that early position. Then I found out it is the position in a XA transaction. 

# at 1009487904
#080804 22:33:34 server id 108169101  end_log_pos 11929         Query   thread_id=1251984804    exec_time=0     error_code=0

Why does 'stop slave' cut off in the middle of a XA transaction? Should it wait until the XA finish?

How to repeat:
Running long extended insert statements in batch on master, then 'stop slave' on slave to see where it stops.
[6 Aug 2008 5:29] Sveta Smirnova
Thank you for the report.

Your error log contains next text:

080804 23:35:57 [ERROR] Error reading packet from server: log event entry exceeded
max_allowed_packet; Increase max_allowed_packet on master ( server_errno=1236)
080804 23:35:57 [ERROR] Got fatal error 1236: 'log event entry exceeded
max_allowed_packet; Increase max_allowed_packet on master' from master when reading data from binary log

So slave stopped because error and this is expected. Position exists in the part of the binary log provided, so this is correct too. Why do you care about place of transaction where slave is stopped? Can you check if transaction was not rolled back? If so please provide part of binary log which shows full transaction.
[6 Aug 2008 6:43] Michael Jiang
I was trying to setup a new slave from an existing slave. What I did was 'stop slave', get where it stopped, shutdown mysql and make a cold copy. copy db files over to the new slave, start mysql and run 'change master' from where it stopped. I am not sure if that XA transaction rollback or not. I did not see it in both slaves' log.
 
Following are logs from the slave I ran 'stop slave'.

080804 22:33:34 [Note] Slave I/O thread killed while reading event
080804 22:33:34 [Note] Slave I/O thread exiting, read up to log 'binary-logs.001875', position 1009499832
080804 22:33:34 [Note] Slave SQL thread exiting, replication stopped in log 'binary-logs.001875' at position 11929
080804 22:33:36 [Note] /usr/libexec/mysqld: Normal shutdown

080804 22:33:36  InnoDB: Starting shutdown...
080804 22:41:19  InnoDB: Shutdown completed; log sequence number 772 2135135178
080804 22:41:19 [Note] /usr/libexec/mysqld: Shutdown complete

080804 22:41:20  mysqld ended

080804 22:41:32  mysqld started
080804 22:41:43  InnoDB: Started; log sequence number 772 2135135178
080804 22:41:44 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.44-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
080804 22:41:44 [Note] Slave SQL thread initialized, starting replication in log 'binary-logs.001875' at position 11929, relay log '/var/lib/mysql/mysqld-relay-bin.003204' position: 1009499944
080804 22:41:44 [Note] Slave I/O thread: connected to master 'repl@10.8.169.101:3306',  replication started in log 'binary-logs.001875' at position 1009499832

I could not post the real data from binlog here. replaced with dummy data. 
# at 1009487904
#080804 22:33:34 server id 108169101  end_log_pos 11929         Query   thread_id=1251984804    exec_time=0     error_code=0
use db_123/*!*/;
SET TIMESTAMP=1217914414/*!*/;
REPLACE INTO table_a (col1, ... col10) VALUES (..), ... (...);
# at 1009499805
#080804 22:33:34 server id 108169101  end_log_pos 1009499832    Xid = 6596942401
COMMIT/*!*/;
[6 Aug 2008 8:30] Susanne Ebrecht
How did you make the cold copy? Do you copy the data directory from the master or from the first slave?
[6 Aug 2008 9:40] Sveta Smirnova
Thank you for the feedback.

Yes, transaction doesn't roll back in case if slave thread fails in its middle.

Verified using following test case:

$cat  rpl_bug38584.test
--source include/have_innodb.inc
--source include/master-slave.inc

CREATE TABLE t1 (a longtext) ENGINE = INNODB;

begin;

insert into t1 values('a');

set @str=repeat('b',5000000);

insert into t1 values(@str);

insert into t1 values('c');

commit;

select count(*) from t1;

sleep 1;

connection slave;

--vertical_results
show slave status;

select count(*) from t1;

$cat  rpl_bug38584-master.opt
--max-allowed-packet=50000000
[13 Nov 2008 11:56] Mats Kindahl
The code for handling this case is significantly different in 5.0 and 5.1. Is this test case described above verified against 5.0 or 5.1?
[13 Nov 2008 18:34] Sveta Smirnova
Yes, I tested against 5.1 and 6.0 as well.
[18 Nov 2008 9:05] Sveta Smirnova
Michael,

thank you for the feedback.

Seems my test case is not correct: I forgot to add slave option file. So bug is needed to be reproduced in other way.

Please send us master and relay log files: we think there is a chance relay log contains wrong value for Exec_Master_log_Pos. Also, please, send new slave configuration and error log files.
[18 Nov 2008 19:00] Michael Jiang
Is this one duplicated of following bug
http://bugs.mysql.com/bug.php?id=31190
[19 Nov 2008 6:34] Zhenxing He
Hi Michael

Is table_a a non-transactional table in your case?
[19 Nov 2008 6:50] Michael Jiang
We use InnoDB tables only. Not sure if it is the same case. Mats may know better.
[19 Nov 2008 8:18] Zhenxing He
before fix of BUG#26395, autocommit statements for InnoDB will be logged with out a BEGIN statement, like this:

STATEMENT (Query_log_event, master log position is relative)
COMMIT (Xid_log_event, master log position is absolute)

and because there is no BEGIN before the query, so the query log event will not be treated as in a transaction, and its position (relative to the begin of the transaction) will be used to update master_group_log_pos.

Here is an example I got with mysql-5.0.44:

# at 98
#081119 16:14:04 server id 1  end_log_pos 198   Query   thread_id=2     exec_time=0     error_code=0
use test/*!*/;
SET TIMESTAMP=1227082444/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=1, @@session.unique_checks=1 OPTION_BEGIN=0/*!*/;
SET @@session.sql_mode=0/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
create table t1 (a int) engine=InnoDB/*!*/;
# at 198
#081119 16:14:04 server id 1  end_log_pos 88    Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1227082444/*!*/;
INSERT INTO t1 VALUES (1)/*!*/;
# at 286
#081119 16:14:04 server id 1  end_log_pos 313   Xid = 10
COMMIT/*!*/;

Hi, Michael
Could you please check if this is the same with your problem? Do you see a BEGIN before the REPLACE statement in your binlog or not?
[19 Nov 2008 8:45] Michael Jiang
I did not see BEGIN in the binlog. And the end_log_pos is relative position for the insert statement. I think it is the same bug.

# at 70630
#081021 23:57:32 server id 101631117  end_log_pos 70657         Xid = 46
COMMIT/*!*/;
# at 70657
#081021 23:57:33 server id 101631117  end_log_pos 33984         Query   thread_id=14    exec_time=0     error_code=0
SET TIMESTAMP=1224658653/*!*/;
INSERT INTO tbl_3 (a, b) values (0, 'abcadfadf adf adfadfasdf'),(1, 'abcadfadf adf adfadfasdf'),....
, 'abcadfadf adf adfadfasdf'),(998, 'abcadfadf adf adfadfasdf'),(999, 'abcadfadf adf adfadfasdf')/*!*/;
# at 104641
#081021 23:57:33 server id 101631117  end_log_pos 104668        Xid = 47
COMMIT/*!*/;
# at 104668
#081021 23:57:34 server id 101631117  end_log_pos 33984         Query   thread_id=14    exec_time=0     error_code=0
SET TIMESTAMP=1224658654/*!*/;
INSERT INTO tbl_4 (a, b) values (0, 'abcadfadf adf adfadfasdf'),(1,
[19 Nov 2008 18:47] Michael Jiang
I tried 'START TRANSACTION' before INSERT. In binlog, I saw the 'BEGIN' now, but the INSERT statement's end_log_pos is still a relative position. Is that right?

# at 362
#081119  1:06:26 server id 101631117  end_log_pos 430   Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1227085586/*!*/;
BEGIN/*!*/;
# at 430
#081119  1:06:23 server id 101631117  end_log_pos 105   Query   thread_id=2     exec_time=0     error_code=0
SET TIMESTAMP=1227085583/*!*/;
insert tbl_1 (a, b) values (500005, 'abc')/*!*/;
# at 535
#081119  1:06:26 server id 101631117  end_log_pos 562   Xid = 28
COMMIT/*!*/;
[21 Nov 2008 1:24] Zhenxing He
Hi Michael

The relative offset in side a BEGIN ... COMMIT transaction is not a problem. the slave can handle this properly.
[23 Jan 2009 12:40] Mats Kindahl
Bug is a duplicate of BUG#26395, which was fixed in 5.0.56.