Bug #57548 The slave will be abnormal when rows event causes the ER_NET_PACKET_TOO_LARGE
Submitted: 19 Oct 2010 4:53 Modified: 30 Nov 2010 3:10
Reporter: Daogang Qu Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.5+ OS:Any
Assigned to: CPU Architecture:Any

[19 Oct 2010 4:53] Daogang Qu
Description:
In RBR, The slave server will hang there and can't exit when the
second or following rows event causes the ER_NET_PACKET_TOO_LARGE. 

How to repeat:
# rpl_packet.test
--source include/master-slave.inc
--source include/have_binlog_format_row.inc

# Create table.
--connection master
CREATE TABLE t1 (a INT, b TEXT) ENGINE=MyISAM;
CREATE TABLE t2 (a TEXT) ENGINE=MyISAM;
delimiter //;
CREATE FUNCTION f1() RETURNS INTEGER
BEGIN
   INSERT INTO t2(a) values(REPEAT('a', 1));
   RETURN 1;
END//
delimiter ;//
--sync_slave_with_master

# Create big row event.
--connection master
INSERT INTO t1(a,b) VALUES (f1(), REPEAT('a', 2048));

# Slave IO thread should stop with error when trying to read the big event.
--connection slave
call mtr.add_suppression("Slave I/O: Got a packet bigger than 'max_allowed_packet' bytes, Error_code: 1153");
--echo # 1153 = ER_NET_PACKET_TOO_LARGE
--let $slave_io_errno= 1153
#--let $show_slave_io_error= 1
--source include/wait_for_slave_io_error.inc

--echo # Now Read_Master_Log_Pos is as following:
--let $status_items= Read_Master_Log_Pos
--source include/show_slave_status.inc

--echo # Relay log contains the Table_map_log_event that ends at Read_Master_Log_Pos:
--let $MYSQLD_DATADIR= `SELECT @@datadir`
--echo Last event in relay log:
--let $relay_log_file= query_get_value(SHOW SLAVE STATUS, Relay_Log_File, 1)
exec $MYSQL_BINLOG $MYSQLD_DATADIR/$relay_log_file
# | grep 'end_log_pos'
 | tail -n 8;

--connection master
DROP TABLE t1,t2;
DROP FUNCTION f1;

# slave is stopped
connection slave;

DROP TABLE t1,t2;
DROP FUNCTION f1;

# rpl_packet-slave.test
--max_allowed_packet=1024 --net_buffer_length=1024

# rpl_packet-master.test
--max_allowed_packet=4096 --net_buffer_length=4096
[19 Oct 2010 11:03] Valeriy Kravchuk
This is what I've got with your test case on 56.1.53 from bzr:

openxs@ubuntu:/home2/openxs/dbs/5.1/mysql-test$ ./mtr --mysqld=--binlog-format="row" bug57548
Logging: ./mtr  --mysqld=--binlog-format=row bug57548
101019 13:53:13 [Note] Plugin 'FEDERATED' is disabled.
MySQL Version 5.1.53
Using binlog format 'row'
Checking supported features...
 - skipping ndbcluster, mysqld not compiled with ndbcluster
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
vardir: /home2/openxs/dbs/5.1/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home2/openxs/dbs/5.1/mysql-test/var'...
Installing system database...
Using server port 38460

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.bug57548                            [ fail ]
        Test ended at 2010-10-19 13:58:40

CURRENT_TEST: main.bug57548
mysqltest: In included file "./include/wait_for_slave_param.inc": At line 81: 

The result from queries just before the failure was:
< snip >
4	root	localhost:45194	test	Sleep	322		NULL
7	root	localhost:56228	NULL	Binlog Dump	81	Has sent all binlog to slave; waiting for binlog to be updated	NULL

**** SHOW BINLOG EVENTS on master ****
SHOW BINLOG EVENTS IN 'master-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	4	Format_desc	1	106	Server ver: 5.1.53-debug-log, Binlog ver: 4
master-bin.000001	106	Query	1	214	use `test`; CREATE TABLE t1 (a INT, b TEXT) ENGINE=MyISAM
master-bin.000001	214	Query	1	315	use `test`; CREATE TABLE t2 (a TEXT) ENGINE=MyISAM
master-bin.000001	315	Query	1	521	use `test`; CREATE DEFINER=`root`@`localhost` FUNCTION `f1`() RETURNS int(11)
BEGIN
INSERT INTO t2(a) values(REPEAT('a', 1));
RETURN 1;
END
master-bin.000001	521	Query	1	589	BEGIN
master-bin.000001	589	Table_map	1	632	table_id: 23 (test.t1)
master-bin.000001	632	Table_map	1	674	table_id: 24 (test.t2)
master-bin.000001	674	Write_rows	1	707	table_id: 24
master-bin.000001	707	Write_rows	1	2791	table_id: 23 flags: STMT_END_F
master-bin.000001	2791	Query	1	2860	COMMIT

More results from queries before failure can be found in /home2/openxs/dbs/5.1/mysql-test/var/log/bug57548.log

 - saving '/home2/openxs/dbs/5.1/mysql-test/var/log/main.bug57548/' to '/home2/openxs/dbs/5.1/mysql-test/var/log/main.bug57548/'
------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 329 seconds executing testcases

Completed: Failed 1/1 tests, 0.00% were successful.

Failing test(s): main.bug57548

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases

Can you specify (for our Triage team) what other versions are affected and what exactly the problem is?
[21 Oct 2010 2:47] Daogang Qu
I just say the slave I/O thread can't be stopped normally, so that the slave
server is abnormal, the created tables and functions can not be dropped
normally, when the second or following rows event causes the ER_NET_PACKET_TOO_LARGE error to multi-rows events statement. But every
thing works fine if the first rows event causes the ER_NET_PACKET_TOO_LARGE
error.
You can test the case by just updating one line of the above test as following:

Change:
INSERT INTO t1(a,b) VALUES (f1(), REPEAT('a', 2048));
To:
INSERT INTO t1(a,b) VALUES (1, REPEAT('a', 2048));
[22 Oct 2010 7:52] Daogang Qu
Replied.
[19 Nov 2010 20:19] Sveta Smirnova
Thank you for the feedback.

I get same error as Valeriy if change

INSERT INTO t1(a,b) VALUES (f1(), REPEAT('a', 2048));
To:
INSERT INTO t1(a,b) VALUES (1, REPEAT('a', 2048));

or even if remove this query.
[30 Nov 2010 3:10] Daogang Qu
You must forget to set the following two files:
# rpl_packet-slave.test
--max_allowed_packet=1024 --net_buffer_length=1024

# rpl_packet-master.test
--max_allowed_packet=4096 --net_buffer_length=4096

I find out that it's not a bug. Because the problem described
is expected. The slave SQL thread will wait the second or
the following rows event to replicate by slave I/O thread
until exit with a timeout error when the rows event causes
the slave I/O thread stop with the ER_NET_PACKET_TOO_LARGE
error.