Bug #54999 mtr global suppression hides SQL thread execution unexpected errors
Submitted: 5 Jul 2010 9:54 Modified: 14 Mar 2011 17:48
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S3 (Non-critical)
Version:5.1+ OS:Any
Assigned to: Bjørn Munch CPU Architecture:Any
Tags: global suppresions, mtr

[5 Jul 2010 9:54] Andrei Elkin
Description:
mtr_warnings.sql contains

"Slave SQL:.*(Error_code: \[\[:digit:\]\]+|Query:.*)"

which is too coarse filtering to allow *any* SQL error that might
happen and mtr will okay at the end of a test.

This "feature" let possible for instance Bug #54988 where
in post-execution mysqld.2.err contains three key errors with two
expected and explicitly ignored by mtr.add of the test but not the last one:

100705 12:29:47 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 127231, event_type: 0
100705 12:29:47 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error

----- the last one ---------------------V

100705 12:29:47 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594

Oddly and wrongly but the test execution okays.

How to repeat:
1. Find in mtr_warnings.sql

the line:

"Slave SQL:.*(Error_code: \[\[:digit:\]\]+|Query:.*)"

existing in 5.1+ trees.

2. Execute rpl_row_event_max_size.test, existing on next-mr to see the last
   error in mysqld.2.err the same as on the description.
   The error is not expected by the test and therefore is not explicitly ignored.

Suggested fix:
Such a loose pattern as
"Slave SQL:.*(Error_code: \[\[:digit:\]\]+|Query:.*)"
should not be in the globally ignorable warnings but rather be deployed 
per test (mtr.add in a test that needs it).
[5 Jul 2010 10:00] Andrei Elkin
I would suggest to experiment to removing *all* other global suppression dealing with replication (master & slave errors). That will show up tests that actually require a pattern.
Only patterns that are after invoking common macros such as source include/master_slave*.inc (few more) should be in the global suppressions file.
[30 Sep 2010 12:11] Bjørn Munch
Some errors I see in 5.1 when removing the global suppression

Attachment: error.txt (text/plain), 3.44 KiB.

[30 Sep 2010 12:15] Bjørn Munch
I removed the global suppression mentioned and ran tests on 5.1. After adding a number of test specific suppressions for obvious cases, I'm left with a number of errors which I'm not sure can be suppressed. They may actually represent bugs. I would need help from someone more qualified to interpret them.

See previous attached file, where I first mention the test name and then the error message. This was on Solaris 10 x86, haven't tested on other platforms.
[30 Dec 2010 11:24] Bjørn Munch
New list of failing tests

Attachment: error.txt (text/plain), 1.03 KiB.

[30 Dec 2010 11:25] Bjørn Munch
I redid the experiment on current 5.1 and now have a shorter list of still failing tests with server warnings I'm not sure can be suppressed. I still need someone to look at it.
[14 Mar 2011 17:48] Paul DuBois
Changes to test suite. No changelog entry needed.

CHANGESET - http://lists.mysql.com/commits/131914