Bug #55683 rqg_mdl_stability reports diff when same query is executed with short delay
Submitted: 2 Aug 2010 12:29 Modified: 14 Jan 2011 15:50
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:5.5.6-m3-Celosia OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Tags: rqg_pb2

[2 Aug 2010 12:29] John Embretsen
Description:
The Random Query Generator test "rqg_mdl_stability" fails intermittently in Pushbuild with error code 32 (STATUS_LENGTH_MISMATCH), giving a difference in results between two executions of the same query.

Example:

# 2010-07-29T21:29:39 Query: SELECT /* QUERY_ID: 6909184 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 8 AS `col_int_key` FROM  table_2 WHERE `col_int_key` < 5; returns different result when executed after a delay of 0.01 seconds.
# 2010-07-29T21:29:39 --- C:\DOCUME~1\qauser\LOCALS~1\Temp\1\/randgen-4316-1280431779-server0.dump	2010-07-29 21:29:39.792361100 +0200
# 2010-07-29T21:29:39 +++ C:\DOCUME~1\qauser\LOCALS~1\Temp\1\/randgen-4316-1280431779-server1.dump	2010-07-29 21:29:39.792361100 +0200
# 2010-07-29T21:29:39 @@ -1,10 +0,0 @@
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8
# 2010-07-29T21:29:39 -8

In the test's description at http://forge.mysql.com/wiki/RandomQueryGeneratorTests#Metadata_Stability_Test the rationale for this check is:

"The SELECT queries in the grammar are such that they are expected to return consistent results, which is tested using the /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ pragma. Also, the SelectStability  Validator will issue the same query repeatedly within the context of the transaction, in order to determine that transaction consistency, that is, repeatable read, is not violated by any concurrent DDL statements."

The test fails mostly on Windows (regardless of server start mechanism used), however the test also fails relatively often on unix if using the non-MTR server start mechanism (i.e. runall-new.pl) of the RQG.

The test has also failed on unix with classic (MTR) server start mechanism, but this happens much less frequently.

Failures started occurring some time after 2010-06-18, when unfortunately all errors were masked for some period due to MTR bug#54835.

How to repeat:
Test failures are currently seen against branches mysql-trunk, mysql-next-mr, mysql-trunk-runtime.

The RQG can be obtained by doing:
 bzr branch lp:randgen

To reproduce as in Pushbuild with MTR server start mechanism:
(most likely to hit this issue on Windows)

perl pb2gentest.pl /path/to/code /path/to/vardir branch-name rqg_mdl_stability

To reproduce as in Pushbuild with non-MTR server start mechanism:

perl pb2gentest-new.pl /path/to/code /path/to/vardir branch-name rqg_mdl_stability

This corresponds to the use of runall.pl or runall-new.pl scripts respectively, with the following options:

--grammar=conf/runtime/metadata_stability.yy \ 
--gendata=conf/runtime/metadata_stability.zz \ 
--validator=SelectStability,QueryProperties \ 
--engine=Innodb \ 
--mysqld=--innodb \ 
--mysqld=--default-storage-engine=Innodb \ 
--mysqld=--transaction-isolation=SERIALIZABLE \ 
--mysqld=--innodb-flush-log-at-trx-commit=2 \ 
--mysqld=--innodb-lock-wait-timeout=1 \ 
--mysqld=--loose-lock-wait-timeout=1 \ 
--mysqld=--log-output=file \ 
--mysqld=--loose-skip-safemalloc \ 
--queries=1M \ 
--duration=600 \ 
--reporters=Deadlock,ErrorLog,Backtrace \ 
--basedir=/path/to/code \ 
--vardir=/path/to/vardir \ 
--testname=rqg_mdl_stability

Look for diffs in the test output and non-zero exit code.
[10 Aug 2010 6:40] John Embretsen
rqg_mdl_stability Pushbuild log from windows, MTR, mysql-trunk-runtime, Jul 29

Attachment: bug55683-failure-pushbuild-windowslog.txt (text/plain), 87.35 KiB.

[10 Aug 2010 6:40] John Embretsen
rqg_mdl_stability Pushbuild log from windows, non-MTR, mysql-trunk-runtime, Jul 19

Attachment: bug55683-failure-pushbuild-windowslog-noMTR.txt (text/plain), 95.52 KiB.

[10 Aug 2010 6:41] John Embretsen
rqg_mdl_stability Pushbuild log from solaris, MTR, mysql-trunk-runtime, Jul 28

Attachment: bug55683-failure-pushbuild-sol10log.txt (text/plain), 47.05 KiB.

[10 Aug 2010 6:42] John Embretsen
rqg_mdl_stability Pushbuild log from linux, non-MTR, mysql-trunk-bugfixing, Jul 30

Attachment: bug55683-failure-pushbuild-linuxlog-noMTR.txt (text/plain), 64.06 KiB.

[10 Aug 2010 6:54] John Embretsen
See attached log files for more actual query diffs.
[10 Aug 2010 13:03] John Embretsen
For what it's worth, by modifying the mdl_stability.yy grammar to print CONNECTION_ID as part of diffing queries, we can use the printed ID to extract queries from the connection's transaction in the query log.

Note that if runall-new.pl is used to run the test as described in "How to repeat", the following options must be added to the command line in order to enable the query log to be printed to mysql.err:

--mysqld=--log-output=file
--mysqld=--general-log

Grammar modification:

=== modified file 'conf/runtime/metadata_stability.yy'
--- conf/runtime/metadata_stability.yy  2010-06-11 13:47:48 +0000
+++ conf/runtime/metadata_stability.yy  2010-08-10 11:41:16 +0000
@@ -28,7 +28,7 @@
        START TRANSACTION | COMMIT | ROLLBACK ;
 
 select:
-       SELECT /* QUERY_ID: _mediumint_unsigned */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ select_item AS `col_int_key`
+       SELECT /* QUERY_ID: _mediumint_unsigned */ CONNECTION_ID(), /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ select_item AS `col_int_key`
        FROM join_list
        where;

Case: Diff on query executed by connection ID 8. From RQG output:

# 2010-08-10T13:44:24 Query: SELECT /* QUERY_ID: 11372544 */ CONNECTION_ID(), /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 8 AS `col_int_key` FROM  table_3 WHERE `col_int_key` < 6; returns different result when executed after a delay of 0.01 seconds.
# 2010-08-10T13:44:24 --- /tmp//randgen24720-1281440664-server0.dump    2010-08-10 13:44:24.748691888 +0200
# 2010-08-10T13:44:24 +++ /tmp//randgen24720-1281440664-server1.dump    2010-08-10 13:44:24.748773306 +0200
# 2010-08-10T13:44:24 @@ -0,0 +1,10 @@
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8
# 2010-08-10T13:44:24 +8        8

(full output log will be attached as bug55683_case1_rqg.txt)

Probable transaction (based on query log), AUTOCOMMIT=OFF:

8 Query     COMMIT
8 Query     SELECT /* QUERY_ID: 11372544 */ CONNECTION_ID(), /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 8 AS `col_int_key` FROM  table_3 WHERE `col_int_key` < 6
8 Query     SELECT /* QUERY_ID: 11372544 */ CONNECTION_ID(), /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 8 AS `col_int_key` FROM  table_3 WHERE `col_int_key` < 6
8 Query     SELECT /* QUERY_ID: 11372544 */ CONNECTION_ID(), /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 8 AS `col_int_key` FROM  table_3 WHERE `col_int_key` < 6
8 Query     COMMIT

These SELECTs are the same query executed at different intervals (t=0, t=0.01 and t=0.1). The diff occurs after a delay of 0.01 seconds after the first query in the transaction.

(all queries logged for connection ID 8 will be attached in compressed format as bug55683_case1_tid8.tar.gz)

Other cases may have different looking transactions.
[10 Aug 2010 13:05] John Embretsen
Full RQG output from case described in previous comment.

Attachment: bug55683_case1_rqg.txt (text/plain), 69.51 KiB.

[10 Aug 2010 13:06] John Embretsen
Compilation of all queries logged for TID 8 in query log relating to the case described in previous comment.

Attachment: bug55683_case1_tid8.tar.gz (application/x-gzip, text), 115.48 KiB.

[30 Aug 2010 14:11] John Embretsen
Unassigning myself and setting status to Verified because:

 a) I do not have time right now to investigate further (simplifying into a mysql-test test is not trivial).
 b) The test keeps failing in Pushbuild.
 c) Generally, a SELECT query executed multiple times within the same transaction with isolation level SERIALIZABLE should always return the same results, and in this case it does not.
 c) The issue is repeatable with the RQG, although it is sporadic. With --threads=5 the issue is seen at least once in 20 or so runs, and it is seen more often with more threads.
[14 Jan 2011 15:50] Jon Olav Hauglid
This bug turned out to be a test bug. It has been fixed in the RQG repository and pushbuild has been updated with the fix. The test is now passing.

Since no server patch was pushed, I'm just closing the bug. No changelog needed.
[1 Feb 2011 15:29] John Embretsen
See Bug#59850 for a very similar test failure in rqg_mdl_stability.