Bug #40179 Test main.query_cache failing randomly on Pushbuild, test weakness
Submitted: 20 Oct 2008 15:19 Modified: 21 Jan 2009 16:19
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Server Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Matthias Leich
Tags: pushbuild, random failure, test failure
Triage: Triaged: D3 (Medium)

[20 Oct 2008 15:19] Patrick Crews
Description:
Test main.query_cache is failing randomly on Pushbuild - seems to be limited to Red Hat builds:

38081: main.query_cache               [ fail ]
38082: 
38083: --- /export/home/my/tmp-200810101108-5.1.29-rc-25410/rpm/BUILD/mysql-5.1.29-rc/mysql-test/r/query_cache.result	2008-10-12 01:19:43.000000000 +0300
38084: +++ /export/home/my/tmp-200810101108-5.1.29-rc-25410/rpm/BUILD/mysql-5.1.29-rc/mysql-test/r/query_cache.reject	2008-10-12 15:16:09.000000000 +0300
38085: @@ -1641,7 +1641,6 @@
38086:  1	0
38087:  2	0
38088:  3	0
38089: -4	0
38090:  reset query cache;
38091:  select *, (select count(*) from t2) from t1;
38092:  a	(select count(*) from t2)
38093: 
38094: mysqltest: Result content mismatch

How to repeat:
Run the test from the mysql-test directory:

./mtr query_cache

Suggested fix:
Correct the test to eliminate any random results / OS variable shifts in actual results.
[4 Nov 2008 13:54] Patrick Crews
See Bug #39249	Maria: query cache returns out-of-date results
This isn't limited to Maria.  I couldn't get the bug to repeat on Mac OS, but it occurred constantly on a Windows XP 32bit VM
[10 Nov 2008 21:58] Patrick Crews
Only able to repeat on my Windows XP 32bit VM (single CPU, 512MB RAM)  Test works perfectly every time on Mac OS X.

Adding a query against t1 after the INSERT..(4) fixes the test failure, but I feel like this is gaming the test to account for dodgy server behavior.

Changing the query from SELECT *, (SELECT COUNT(*) FROM t2) FROM t1 to SELECT * FROM t1 also fixes the problem.

Running a table status check immediately after the INSERT (4) statement show that t1 has had another row inserted, but for some reason the query cache doesn't drop the query results.

No difference in master.log files between a successful run and a test failure (some test failures like this have shown differences in processing sequence)
[12 Nov 2008 16:51] Matthias Leich
Results of discussion via email and phone:
------------------------------------------
We are unfortunately unable to distinct if the
result set difference is caused by
Scenario 1)
  a regression
  Bug#28249 Query Cache returns wrong result with concurrent
            insert /  certain lock
  The select getting the unexpected result happened after
  the insert was completed. It should see four rows.
or
Scenario 2)
  a weakness within the test which does not take properties
  of the MyISAM feature "concurrent_inserts" into account
  The select getting the unexpected result happened before
  the insert was completed. It should see three rows.
  
Concurrent inserts feature:
  http://dev.mysql.com/doc/refman/6.0/en/concurrent-inserts.html
  http://bugs.mysql.com/bug.php?id=37521
  Basically:
  user1: send INSERT ... into t1
  user2: select ... from t1
         - do not see the row inserted above for some time
           (in case of heavy load up to some seconds)
           because the processing of the insert above is not
           completed
         - see the row inserted above when the processing of
           the insert is completed

We must prevent Scenario 2. This could be done by:
--------------------------------------------------
# Since the lock ordering rule in thr_multi_lock depends on
# ... <The big comment>
--disable_result_log
--reap
--enable_result_log
== Begin of code to be added ==
a) Poll till SELECT COUNT(*) = 4 FROM t1 but not longer than
   ~ 6 till 10 seconds
b) just sleep ~6 till 10 seconds
c) connection user3;
   SELECT COUNT(*) FROM t1;
   connection user1;
   # Experience: This select and all following selects run
   #             by other connections will see the inserted row
== End of code to be added ==
--echo Next select should contain 4 rows, as the 4'th row must be now
visible

- Kristofer does not trust c) 100% and prefers a)
  + would like to be a reviewer
- Matthias prefers c) but accepts also a)
[24 Nov 2008 8:45] Oleksandr Byelkin
The cause of the bug is that query cache expects changes for non transactional tables become visible just after insert (so insert invalidate the cache). But in reality MyISAM makes changes visible on unlock tables. So we have in the cache outdated data.

I 2 ways to soplve the problem:
1 make special QC type of tables for MyISAM which invalidates inserts on unlock
2 invalidate all non-transactional engines on unlock also (if they was writye locked or even better if they were changed).

1. of course is more efficient, but also it means more chenges in the code.

P.S.: Maria inherit this bug as BUG#39249
[25 Nov 2008 16:32] Matthias Leich
I am able to reproduce the effect described within
this bug report with query caching switched off.
This means there exists the possibility that
this test fails because it does not take effects
which are clear outside of any query caching 
(IMHO "concurrent_inserts") into account.
I will add my testscript used within these experiments
soon.
[26 Nov 2008 17:00] Matthias Leich
Final outcome of the experiments:
---------------------------------
1. QC off + script which ignores the possible effects
   caused by "concurrent_inserts"
   --> high likelihood of random failures
2. QC off + script which takes effects of "concurrent_inserts"
   into account
   --> I was unable to get any failure.
3. QC on + script which takes effects of "concurrent_inserts"
   into account
   --> high likelihood of random failures
       = We have a Query cache related server bug.
Solution:
1. Modify the bug title so that it points to the weakness
   around "concurrent_inserts" within the script
   --> fix of current bug = fix of the script
2. Report a new bug about the Query Cache problem
[27 Nov 2008 19:40] Matthias Leich
Archive with the scripts/results of experiments

Attachment: qc.tgz (application/x-compressed-tar, text), 4.01 KiB.

[27 Nov 2008 19:58] Matthias Leich
Details about qc.tgz:
query_cache.txt - Summary of the experiments
t/ml_qc3.test - Reveals that the test can suffer
     from "concurrent_inserts" effects if the
     load on the testing box is high
t/ml_qc4.test - Fixes the issues caused by 
     "concurrent_inserts"
t/ml_qc5.test - Show that we suffer from a
     query cache bug like
     Bug#28249 Query Cache returns wrong result 
         with concurrent insert / certain lock
t/ml_qc6.test and t/ml_qc7.test - Show that
     problem revealed by t/ml_qc5.test is a
     "concurrent_inserts"/MyISAM only problem
[27 Nov 2008 21:09] Matthias Leich
The test has a weakness around "concurrent_inserts",
but it must be mentioned that the "sleep 5" within
the current test makes it very unlikely that this
weakness causes test failures.
The majority of the random main.query_cache failures
is most probably caused by a query cache bug to be reported.
[28 Nov 2008 14:40] Matthias Leich
The query cache bug will be tracked in
Bug#41098 Query Cache returns wrong result with concurrent insert
[28 Nov 2008 15:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60193

2713 Matthias Leich	2008-11-28
      - Fix for Bug#40179 Test main.query_cache failing randomly on Pushbuild, test weakness
      - disable the test suffering from Bug#41098 Query Cache returns wrong result with
                                                  concurrent insert
      - additional improvements
      
      Details:
      - Move the test for the historic bug
        Bug 28249 Query Cache returns wrong result with concurrent insert / certain lock
        into its own testscript query_cache_28249.test.
      - query_cache.test:
        - replace error numbers with error names
        - remove trailing spaces, replace tabs with spaces
        - reset of @@global.log_bin_trust_function_creators to its original value
          at the end of the test
      - query_cache_28249.test:
        - replace sleep 5 with poll routine
        - avoid random differences caused by concurrent_inserts effects
        - improved comments and formatting
[1 Dec 2008 19:31] Patrick Crews
Ok to push.
[16 Dec 2008 13:23] Sven Sandberg
xref: http://tinyurl.com/6fy5qa
[19 Dec 2008 20:41] Matthias Leich
Pushed to
mysql-5.1-bugteam
mysql-6.0-bugteam
[15 Jan 2009 6:32] Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[16 Jan 2009 3:44] Paul Dubois
Test case changes. No changelog entry needed.

Setting report to NDI pending push into 6.0.x.
[19 Jan 2009 11:30] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:07] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 15:54] Jon Stephens
Setting status back to NDI pending merge to 6.0 tree.
[19 Jan 2009 16:13] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:59] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[21 Jan 2009 16:19] Paul Dubois
Test case changes. No changelog entry needed.