Bug #37848 Test relies on invalid timing/speed assumptions
Submitted: 3 Jul 2008 18:49 Modified: 11 Jan 2009 6:39
Reporter: Joerg Bruehe Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:5.1.26-rc OS:Linux (RPM builds)
Assigned to: Assigned Account CPU Architecture:Any
Tags: sporadic

[3 Jul 2008 18:49] Joerg Bruehe
Description:
Detected in the builds of 5.1.26-rc, and only with low probability.

Detected only in the RPM builds,
maybe their build environment (chroot) affects execution speed.

An "insert" from one connection
is (sometimes) not shown in one (or two) selects from another connection.

The symptoms:
::::::::::::::
main.query_cache               [ fail ]

--- /PATH/mysql-test/r/query_cache.result
+++ /PATH/mysql-test/r/query_cache.reject
@@ -1641,7 +1641,6 @@
 1      0
 2      0
 3      0
-4      0
 reset query cache;
 select *, (select count(*) from t2) from t1;
 a      (select count(*) from t2)

mysqltest: Result content mismatch
::::::::::::::

::::::::::::::
main.query_cache               [ fail ]

--- /PATH/mysql-test/r/query_cache.result
+++ /PATH/mysql-test/r/query_cache.reject
@@ -1641,14 +1641,12 @@
 1      0
 2      0
 3      0
-4      0
 reset query cache;
 select *, (select count(*) from t2) from t1;
 a      (select count(*) from t2)
 1      0
 2      0
 3      0
-4      0
 drop table t1,t2;
 #
 # Bug#25132 disabled query cache: Qcache_free_blocks = 1

mysqltest: Result content mismatch
::::::::::::::

Here is a snippet from the test file:
1205 insert into t1 values (1),(2),(3);
1206 connection user2;
1207 --echo Locking table T2 with a write lock.
1208 lock table t2 write;
1209
1210 connection user1;
1211 --echo Select blocked by write lock.
1212 --send select *, (select count(*) from t2) from t1;
1213 --echo Sleeing is ok, because selecting should be done very fast.
1214 sleep 5;
1215
1216 connection user3;
1217 --echo Inserting into table T1.
1218 insert into t1 values (4);
1219
1220 connection user2;
1221 --echo Unlocking the tables.
1222 unlock tables;
1223
1224 connection user1;
1225 --echo Collecting result from previously blocked select.
1226 #
1227 # Since the lock ordering rule in thr_multi_lock depends on
1228 # pointer values, from execution to execution we might have
1229 # different lock order, and therefore, sometimes lock t1 and block
1230 # on t2, and sometimes block on t2 right away. In the second case,
1231 # the following insert succeeds, and only then this select can
1232 # proceed, and we actually test nothing, as the very first select
1233 # returns 4 rows right away.
1234 # It's fine to have a test case that covers the problematic area
1235 # at least once in a while.
1236 # We, however, need to disable the result log here to make the
1237 # test repeatable.
1238 --disable_result_log
1239 --reap
1240 --enable_result_log
1241 --echo Next select should contain 4 rows, as the insert is long finished.
1242 select *, (select count(*) from t2) from t1;
1243 reset query cache;
1244 select *, (select count(*) from t2) from t1;

So it seems the assumption of line 1241 does not always hold.

How to repeat:
Found by the test suite,
but only in the RPM build environment.
[2 Dec 2008 17:23] Matthias Leich
The problem reported here is completely covered by the "younger"
Bug#40179 Test main.query_cache failing randomly on Pushbuild,
          test weakness
   (Patch pending)
Bug#41098: Query Cache returns wrong result with concurrent insert
   (verified, better testcase for replay attempts)
Both "younger" bug reports contain an analysis of the reasons
which lead to the described bad effect.
Therefore I propose to declare this report to be a duplicate
and to focus on Bug#41098 instead.