Bug #30594 rpl.rpl_skip_error is nondeterministic
Submitted: 23 Aug 2007 13:46 Modified: 27 Nov 2007 18:48
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:mysql-5.1-target-5.1.22 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any

[23 Aug 2007 13:46] Jonathan Miller
Description:
Connect_Retry needed to be masked with #

rpl.rpl_skip_error             [ fail ]

Errors are (from e:/var-n_mix-120/log/mysqltest-time) :
mysqltest: Result length mismatch
(the last lines may be the most important ones)
Below are the diffs between actual and expected results:
-------------------------------------------------------
*** C:/cygwin/home/pushbuild/pb3/pb/mysql-5.1-target-5.1.22/2/mysql-5.1.22-beta-pb2/mysql-test/suite/rpl/r/rpl_skip_error.result	Tue Aug 21 22:27:44 2007
--- C:/cygwin/home/pushbuild/pb3/pb/mysql-5.1-target-5.1.22/2/mysql-5.1.22-beta-pb2/mysql-test/suite/rpl/r/rpl_skip_error.reject	Tue Aug 21 23:10:31 2007
***************
*** 29,36 ****
  a
  1
  2
- 3
  show slave status;
  Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error
! #	127.0.0.1	root	MASTER_PORT	1	master-bin.000001	786	#	#	master-bin.000001	Yes	Yes							0		0	786	#	None		0	No						#	No	0		0	
  drop table t1;
--- 29,35 ----
  a
  1
  2
  show slave status;
  Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error
! #	127.0.0.1	root	MASTER_PORT	1	master-bin.000001	851	#	#	master-bin.000001	Yes	Yes							0		0	851	#	None		0	No						#	No	0		0	
  drop table t1;

How to repeat:
N/A

Suggested fix:
See above
[24 Aug 2007 20:31] Serge Kozlov
I think it is not Connect_Retry but Read_Master_Log_Pos. 
Also seems not all statements were replicated to slave, probably would be better to add SHOW BINLOG EVENTS on master side.
[1 Oct 2007 9:10] Andrei Elkin
There is a difference between row and mixed/stmt format results.
Two rows in SBR is the correct for mixed/stmt whereas three rows
is also correct for RBR.
The three in RBR 

*** 29,36 ****
  a
  1
  2
- 3
  show slave status;

are due to an implicit reentrant reading from binlog rule
which means that a Write_rows_log_event overwrites a slave's row
if the one has the same primary key *silently*.

Notice, that --slave-skip-errors option deals with mixed/stmt format replication
events, aka queries :) Therefore, the test can be relaxed not to run with
row format.

I suggest 
1. to require
-- source include/have_binlog_format_mixed_or_statement.inc
2. re-record the results which will have only 2 rows in that select of interest.
[22 Oct 2007 18:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36075

ChangeSet@1.2578, 2007-10-22 21:45:21+03:00, aelkin@koti.dsl.inet.fi +3 -0
  Bug #30594  	rpl.rpl_skip_error is nondeterministic 
  
  Non-determinism in the tests was due to results of SBR are different from those gained
  with row binlog format.
  
  Because tests effectively verify skipping only ER_DUP_ENTRY it is explicitly required
  to run the test on in mixed and stmt binlog format.
  ER_DUP_ENTRY is automatically ignored when happened in RBR because of implicit rule
  favoring reentrant reading from binlog rule
  which means that a Write_rows_log_event overwrites a slave's row
  if the one has the same primary key.
  
  If future we might have skipping error due to applying of row-based events.
  The comments added saying a new file would be needed for that: rpl_row_skip_error or smth.
[27 Nov 2007 10:50] Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:52] Bugs System
Pushed into 6.0.4-alpha
[27 Nov 2007 18:48] Paul DuBois
Test case change. No changelog entry needed.