Bug #56912 | binlog_killed fails sporadically on mysql-next-mr-bugfixing codebase. | ||
---|---|---|---|
Submitted: | 21 Sep 2010 23:24 | Modified: | 15 Nov 2010 19:36 |
Reporter: | Luis Soares | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Tests: Replication | Severity: | S3 (Non-critical) |
Version: | mysql-next-mr-bugfixing, 5.6.99 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
[21 Sep 2010 23:24]
Luis Soares
[26 Sep 2010 23:03]
Luis Soares
Analysis ======== - Test description Roughly, the failing test asserts that a killed session either inserts rows in a table and writes the event to the binary log or it does neither of the two. Techincally, the assertion is done by: 1. selecting the row count from the table and storing the value in $row; 2. dumping the contents of the binary log from position 175 into a file and then parsing the file searching for the 'error_code' field and for the insert statement. If there is a match one is return for either, otherwise zero is returned. The values are stored in $error_code and $insert_binlogged variables. 3. subtracting to $row the values of $error_code and $insert_binlogged: set @result= $rows- $error_code - $insert_binlogged; If no rows were inserted on the table, then nothing must have been binlogged: set @result= 0 - 0 - 0 = 0 On the other hand, if the two rows were inserted on the table, then set @result= 2 - 1 - 1 = 0 The final result must be zero either way. - Problem Description The test fails with: insert into t2 values (null, null), (null, get_lock("a", 10)); select @result /* must be zero either way */; @result -0 +2 This suggests that there is something wrong with killing and binlogging on-going statements. However, from mysqlbinlog output we also have this: CURRENT_TEST: binlog.binlog_killed ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1835008, event_type: -109 ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116 ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116 This shows that something wrong happened while dumping the contents of the binary log (see step #2 of the assertion procedure above). Looking at the contents of the binary log, when the test fails, we can find: /*!*/; # at 177 #100924 18:31:48 server id 1 end_log_pos 205 Intvar SET INSERT_ID=1/*!*/; # at 205 #100924 18:31:48 server id 1 end_log_pos 329 Query thread_id=4 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1285349508/*!*/; insert into t2 values (null, null), (null, get_lock("a", 10)) Now... From the test case, we find that the dump is performed from position 175, which is wrong and should be 177. - Some observations 1. The test fails sporadically because the dump only fails when the kill command is slow enough to allow the insert to succeed. In that case, the events will be written to the binary log and the dump will fail. OTOH, if the kill succeeds before the connection is able to execute the insert statement, then the dump succeeds and consequently the assertion succeeds. 2. The format description event grew two bytes recently because it has been added two new event types to the post_header_len, which do shift the positions by two bytes. The new events were added in revision dao-gang.qu@sun.com-20100707064850-sja6v94mepw470al : - + + /* + In some situations, it is necessary to send over ignorable + data to the slave: data that a slave can handle in case there + is code for handling it, but which can be ignored if it is not + recognized. + */ + IGNORABLE_LOG_EVENT= 28, + ROWS_QUERY_LOG_EVENT= 29, + and then the post_header_len grew two bytes - checking the FD event constructor we have: number_of_event_types= LOG_EVENT_TYPES; /* we'll catch my_malloc() error in is_valid() */ post_header_len=(uint8*) my_malloc(number_of_event_types*sizeof(uint8), MYF(0)); Conclusion ========== 1. The test fails sporadically because the assertion checks for the atomicity, either: (a) we insert into the table and write the corresponding event to the binary log, or (b) we don't insert and don't write. When (a) happens, it fails. 2. The FD event has grown 2 bytes recently, and that explains why the assertion (a), because the parsing of the binary log is done based on the start position of the insert event. 3. The immediate fix would be to change the mysqlbinlog parameter from --start-position=175 to --start-position=177.
[26 Sep 2010 23:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119129 3298 Luis Soares 2010-09-27 BUG#56912: binlog_killed fails sporadically on mysql-next-mr-bugfixing codebase. Two new event types were added recently. This made the FD event to grow 2 bytes, thence the event positions in the binary log shifted two bytes as well. Given that sometimes the assertion in the binlog_killed test relies on absolute binary log positions, it will try to read events from a current obsolete position (175, while the new one is 177). We fix this by making the test to read the current position from SHOW MASTER STATUS into $start_pos and then issuing mysqlbinlog with --start-position=$start_pos instead of a hardcoded absolute position. This will also make the dump to read the BEGIN event that precedes the actual INSERT event (whereas before it would not), but this should not affect the assertion correctness.
[30 Sep 2010 11:52]
Luis Soares
Queued in mysql-next-mr-bugfixing: http://pb2.norway.sun.com/web.py?template=push_details&push=1612070
[2 Oct 2010 18:13]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:11]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[15 Nov 2010 19:36]
Jon Stephens
Affects testing only; nothing to document for end users. Closed.