| Bug #56912 | binlog_killed fails sporadically on mysql-next-mr-bugfixing codebase. | ||
|---|---|---|---|
| Submitted: | 21 Sep 2010 23:24 | Modified: | 15 Nov 2010 19:36 |
| Reporter: | Luis Soares | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Tests: Replication | Severity: | S3 (Non-critical) |
| Version: | mysql-next-mr-bugfixing, 5.6.99 | OS: | Any |
| Assigned to: | Luis Soares | CPU Architecture: | Any |
[26 Sep 2010 23:03]
Luis Soares
Analysis
========
- Test description
Roughly, the failing test asserts that a killed session
either inserts rows in a table and writes the event to the
binary log or it does neither of the two.
Techincally, the assertion is done by:
1. selecting the row count from the table and storing the
value in $row;
2. dumping the contents of the binary log from position 175
into a file and then parsing the file searching for the
'error_code' field and for the insert statement. If
there is a match one is return for either, otherwise
zero is returned. The values are stored in $error_code
and $insert_binlogged variables.
3. subtracting to $row the values of $error_code and
$insert_binlogged:
set @result= $rows- $error_code - $insert_binlogged;
If no rows were inserted on the table, then nothing must
have been binlogged:
set @result= 0 - 0 - 0 = 0
On the other hand, if the two rows were inserted on the
table, then
set @result= 2 - 1 - 1 = 0
The final result must be zero either way.
- Problem Description
The test fails with:
insert into t2 values (null, null), (null, get_lock("a", 10));
select @result /* must be zero either way */;
@result
-0
+2
This suggests that there is something wrong with killing and
binlogging on-going statements. However, from mysqlbinlog
output we also have this:
CURRENT_TEST: binlog.binlog_killed
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1835008, event_type:
-109
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688,
event_type: 116
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688,
event_type: 116
This shows that something wrong happened while dumping the
contents of the binary log (see step #2 of the assertion
procedure above).
Looking at the contents of the binary log, when the test
fails, we can find:
/*!*/;
# at 177
#100924 18:31:48 server id 1 end_log_pos 205 Intvar
SET INSERT_ID=1/*!*/;
# at 205
#100924 18:31:48 server id 1 end_log_pos 329 Query thread_id=4 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1285349508/*!*/;
insert into t2 values (null, null), (null, get_lock("a", 10))
Now... From the test case, we find that the dump is
performed from position 175, which is wrong and should be 177.
- Some observations
1. The test fails sporadically because the dump only fails
when the kill command is slow enough to allow the insert
to succeed. In that case, the events will be written to
the binary log and the dump will fail. OTOH, if the kill
succeeds before the connection is able to execute the
insert statement, then the dump succeeds and consequently
the assertion succeeds.
2. The format description event grew two bytes recently
because it has been added two new event types to the
post_header_len, which do shift the positions by two
bytes. The new events were added in revision
dao-gang.qu@sun.com-20100707064850-sja6v94mepw470al :
-
+
+ /*
+ In some situations, it is necessary to send over ignorable
+ data to the slave: data that a slave can handle in case there
+ is code for handling it, but which can be ignored if it is not
+ recognized.
+ */
+ IGNORABLE_LOG_EVENT= 28,
+ ROWS_QUERY_LOG_EVENT= 29,
+
and then the post_header_len grew two bytes - checking
the FD event constructor we have:
number_of_event_types= LOG_EVENT_TYPES;
/* we'll catch my_malloc() error in is_valid() */
post_header_len=(uint8*) my_malloc(number_of_event_types*sizeof(uint8),
MYF(0));
Conclusion
==========
1. The test fails sporadically because the assertion checks for
the atomicity, either: (a) we insert into the table and
write the corresponding event to the binary log, or (b) we
don't insert and don't write. When (a) happens, it fails.
2. The FD event has grown 2 bytes recently, and that explains
why the assertion (a), because the parsing of the binary log
is done based on the start position of the insert event.
3. The immediate fix would be to change the mysqlbinlog
parameter from --start-position=175 to --start-position=177.
[26 Sep 2010 23:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119129 3298 Luis Soares 2010-09-27 BUG#56912: binlog_killed fails sporadically on mysql-next-mr-bugfixing codebase. Two new event types were added recently. This made the FD event to grow 2 bytes, thence the event positions in the binary log shifted two bytes as well. Given that sometimes the assertion in the binlog_killed test relies on absolute binary log positions, it will try to read events from a current obsolete position (175, while the new one is 177). We fix this by making the test to read the current position from SHOW MASTER STATUS into $start_pos and then issuing mysqlbinlog with --start-position=$start_pos instead of a hardcoded absolute position. This will also make the dump to read the BEGIN event that precedes the actual INSERT event (whereas before it would not), but this should not affect the assertion correctness.
[30 Sep 2010 11:52]
Luis Soares
Queued in mysql-next-mr-bugfixing: http://pb2.norway.sun.com/web.py?template=push_details&push=1612070
[2 Oct 2010 18:13]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:11]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[15 Nov 2010 19:36]
Jon Stephens
Affects testing only; nothing to document for end users. Closed.

Description: binlog_killed test fails sporadically on next-mr-bugfixing codebase recently. I have noticed it mostly on mysql-next-mr-bufixing tree. binlog.binlog_killed 'stmt' [ fail ] Test ended at 2010-09-17 15:58:36 CURRENT_TEST: binlog.binlog_killed ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1835008, event_type: -109 ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116 ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 1140850688, event_type: 116 --- /export/home3/pb2/build/sb_2-None-1284723696.84/mysql-next-mr-bugfixing-gcov/mysql-test/suite/binlog/r/binlog_killed.result 2010-09-17 14:43:29.000000000 +0300 +++ /export/home3/pb2/build/sb_2-None-1284723696.84/mysql-next-mr-bugfixing-gcov/mysql-test/suite/binlog/r/binlog_killed.reject 2010-09-17 16:58:36.000000000 +0300 @@ -9,7 +9,7 @@ insert into t2 values (null, null), (null, get_lock("a", 10)); select @result /* must be zero either way */; @result -0 +2 select RELEASE_LOCK("a"); RELEASE_LOCK("a") 1 mysqltest: Result content mismatch How to repeat: Look at PB2: http://tinyurl.com/2477lot List of failures: http://tinyurl.com/26fgnfh Suggested fix: .