Bug #39249 Maria: query cache returns out-of-date results
Submitted: 4 Sep 2008 17:25 Modified: 1 Apr 11:39
Reporter: Guilhem Bichot
Status: Verified
Category:Server: Maria Severity:S3 (Non-critical)
Version:5.1-maria, 6.0-bk OS:Linux
Assigned to: Target Version:6.0-beta
Triage: Triaged: D2 (Serious) / R1 (None/Negligible) / E2 (Low)

[4 Sep 2008 17:25] Guilhem Bichot
Description:
This problem happens when running:
 ./mtr --mem query_cache --mysqld=--default-storage-engine=maria:
@@ -1641,7 +1641,6 @@
 1	0
 2	0
 3	0
-4	0
 reset query cache;
 select *, (select count(*) from t2) from t1;
 a	(select count(*) from t2)

so don't forget to run this test with the above mtr options before closing the bug.
Note that the relevant test part starts with 
Bug#28249 Query Cache returns wrong result with concurrent insert/ certain lock
so maybe we wrongly merged the fix of BUG#28249 from MyISAM to Maria??

How to repeat:
Here is a testcase extracted from query_cache.test:
--echo Bug#28249 Query Cache returns wrong result with concurrent insert/ certain lock
connect (user1,localhost,root,,test,,);
connect (user2,localhost,root,,test,,);
connect (user3,localhost,root,,test,,);

connection user1;

set GLOBAL query_cache_type=1;
set GLOBAL query_cache_limit=10000;
set GLOBAL query_cache_min_res_unit=0;
set GLOBAL query_cache_size= 100000;

flush tables;
--disable_warnings
drop table if exists t1, t2;
--enable_warnings
create table t1 (a int);
create table t2 (a int);
insert into t1 values (1),(2),(3);
connection user2;
--echo Locking table T2 with a write lock.
lock table t2 write;

connection user1;
--echo Select blocked by write lock.
--send select *, (select count(*) from t2) from t1;
--echo Sleeing is ok, because selecting should be done very fast.
sleep 5;

connection user3;
--echo Inserting into table T1.
insert into t1 values (4);

connection user2;
--echo Unlocking the tables.
unlock tables;

connection user1;
--echo Collecting result from previously blocked select.
# 
# Since the lock ordering rule in thr_multi_lock depends on
# pointer values, from execution to execution we might have
# different lock order, and therefore, sometimes lock t1 and block
# on t2, and sometimes block on t2 right away. In the second case,
# the following insert succeeds, and only then this select can
# proceed, and we actually test nothing, as the very first select
# returns 4 rows right away.
# It's fine to have a test case that covers the problematic area
# at least once in a while.
# We, however, need to disable the result log here to make the
# test repeatable.
--disable_result_log
--reap
--enable_result_log
--echo Next select should contain 4 rows, as the insert is long finished.
sleep 5;
select *, (select count(*) from t2) from t1;
select *, (select count(*) from t2) from t1;
select *, (select count(*) from t2) from t1;
reset query cache;
select *, (select count(*) from t2) from t1;

I see that the 3 SELECTs before "reset query cache" return (1,0),(2,0),(3,0), which is
wrong: they should additionally show (4,0) like the last SELECT does.
Indeed: note that these 3 SELECTs run 5 seconds after the INSERT returned ok (INSERT
returned ok => it committed => it unlocked table => 5 seconds after that, its effects
should certainly be visible to "user1").
Note that "user1" is not using LOCK TABLES, so it's not a question of "user1" normally
seeing table as it was at LOCK TABLES.
Query cache probably keeps too old data.
[13 Oct 2008 15:06] Georgi Kodinov
I've seen this while testing in 6.0-bugteam : 
main.ps_not_windows            [ pass ]             21
main.query_cache               [ fail ]

--- /home/kgeorge/mysql/bzr/merge-6.0-bugteam/mysql-test/r/query_cache.result  2008-10-13
13:27:43.000000000 +0300
+++ /home/kgeorge/mysql/bzr/merge-6.0-bugteam/mysql-test/r/query_cache.reject  2008-10-13
16:04:57.000000000 +0300
@@ -1585,14 +1585,12 @@
 1      0
 2      0
 3      0
-4      0
 reset query cache;
 select *, (select count(*) from t2) from t1;
 a      (select count(*) from t2)
 1      0
 2      0
 3      0
-4      0
 drop table t1,t2;
 #
 # Bug#25132 disabled query cache: Qcache_free_blocks = 1

mysqltest: Result content mismatch

Stopping All Servers
Restoring snapshot of databases
Saving core.18412
Saving core.18440
Saving core.18531
Saving core.18558
Resuming Tests

main.query_cache_debug         [ skipped ]   Test needs 'big-test' opt
[15 Oct 2008 11:24] Kristofer Pettersson
This is what I think is the situation:

Under open_and_lock_tables, in execute_sqlcom_select the query cache is asked to attach a
result set writer intercepting the current statement. Before this is allowed a call to
ha_myisam::register_query_cache_table is issued to assert that the current statement can
be cached.

In this test case either a concurrent insert or a regular insert has happened depending
on the order locks were acquired on t1, t2. In the case of concurrent insert,
'register_query_cache_table' should return false to indicate that the statement shouldn't
be cached. In the second case, insert operation should have been visible to the select
operation waiting on the write lock and the result should have been the same.

Could you verify that the code in ha_myisam::register_query_cache_table is correct for
getting a consistent snap shot on the state of myisam which can be used in this context?

If the error on the other hand happens on a regular insert with a select hanging on lock
tables, then maybe the qc api has a more serious flaw, but I don't think this is the
case.

Do you agree?
[21 Nov 2008 19:38] Guilhem Bichot
In BUG#40179, query_cache.test fails in default configuration (i.e. using MyISAM, not
Maria), with exactly the same output (same missing line at line 1641):

8085: @@ -1641,7 +1641,6 @@
38086:  1	0
38087:  2	0
38088:  3	0
38089: -4	0
38090:  reset query cache;
38091:  select *, (select count(*) from t2) from t1;
38092:  a	(select count(*) from t2)

as in the Description of the present BUG#39249.
So I'm changing category to Server:QueryCache, marking it as duplicate of BUG#40179 and
de-assigning it from Sanja.
[1 Apr 11:39] Guilhem Bichot
http://bugs.mysql.com/bug.php?id=41098 and
http://bugs.mysql.com/bug.php?id=40179
have been fixed, but only for MyISAM. The bugfix revision
davi.arnaut@sun.com-20090219210935-9vilvcisyyieffxl has been merged into 5.1-maria but
not propagated to storage/maria as it was not trivial to do (in Maria there are several
ma*get_status() functions unlike MyISAM).
So this bug still exists (at least I can repeat the problem of the reduced test case). We
should propagate Davi's fix, and see if it fixes all testcases (including
row_format=page|dynamic).