Bug #54819 sys_vars.slow_query_log_func fails sporadically on Solaris 10
Submitted: 26 Jun 2010 7:22 Modified: 15 Oct 2012 14:40
Reporter: Alexander Nozdrin Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Tests: Server Severity:S3 (Non-critical)
Version:M3 (Celosia), M4 (Dahlia) OS:Solaris
Assigned to: Assigned Account CPU Architecture:Any
Tags: experimental, pb2, sporadic, test failure

[26 Jun 2010 7:22] Alexander Nozdrin
Description:
sys_vars.slow_query_log_func started to fail sporadically on Solaris 10
on 2010-06-15 in daily-trunk-merge.

Symptoms:
-------------------------------------------------------------------
@@ -73,7 +73,7 @@
 SELECT t1.c1-self.c1 > 1 FROM t1, t1 as self WHERE t1.c0=self.c0+1 ORDER BY t1.c0;
 t1.c1-self.c1 > 1
 1
-1
+0
 1
 DROP TRIGGER tf_before;
 DROP FUNCTION f_slow_now;
-------------------------------------------------------------------

The test case has been declared experimental due to this bug.

How to repeat:
XRef: http://tinyurl.com/2u6h4d4
[26 Jun 2010 14:31] Sveta Smirnova
Thank you for the report.

Verified as described using pushbuild logs. Also fails in mysql-next-mr-rpl-merge, mysql-next-mr-wl4895, daily-next-mr-bugfixing
[30 Jun 2010 0:18] Omer Barnir
triage:  - assuming a timing issue in the test to ensuring the logs having a difference of >1 - 3132 SR55TEST
[2 Jul 2010 8:22] Anitha Gopi
Kristofer,
As discussed assigning to you since this was added by you to test the fix for http://bugs.mysql.com/bug.php?id=53191
[2 Jul 2010 8:38] Kristofer Pettersson
Ok, I think this is what happens:

The three rows of 1 indicate that the time difference between each of the 4 rows inserted into t1 is lower than 1 second. The statements inserting the rows are:
a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())  // 1st row
b. INSERT INTO t1 (c1,c2) VALUES (now(), now())  // 2nd and 3rd row, because of trigger
c. CALL p1();  // 4th row

Since the failing time interval happens between a and b it seems clear that I just forgot to insert a Sleep(2) instruction there. And the reason it succeeded on my system is because it took a second (!) to create the trigger and SP but on the test system it might happen faster.
[2 Jul 2010 10:29] Kristofer Pettersson
Fail to reproduce the failure on ubuntu. Will attempt to try on Solaris 10 as soon as I can get a workable VM from IT.
[2 Jul 2010 12:39] Kristofer Pettersson
Anitha is kind enough to point out that my initial analysis wasn't all that accurate. Let's try again. ;)

a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())  // 1st row

Trigger created here.  

b. INSERT INTO t1 (c1,c2) VALUES (now(), now())  // 2nd row
 

c. CALL p1(); 
   expands into:
    INSERT INTO t1 (c1,c2) values (now(),now()); // 3rd row

    INSERT INTO t1 (c1,c2) values (now(),now()); // 4th row

The difference between the rows are row2-row1, row3-row2, row4-row3 and should have resolved to: (1,1,1) if the time difference between each row was more than one second. We got (1,0,1) instead, indicating that row3-row2 difference is maybe calculated with a bad timing.

There is no sleep after statement b. That shouldn't matter, because as Anitha point out, the trigger is active for the INSERT statement inside the stored procedure p1() and should have caused a SLEEP(2) just before the INSERT of the 3rd row, making the difference between row3 and row2 more than 1 second. In fact there is even one SLEEP too many inside the SP which could safely be removed.

So why did this fail on Solaris 10? I don't know yet. I can't see anything obviously wrong with the test case just now.

If it doesn't fail on 5.1 then the risk is that the test case is good and that we're seeing a regression for 5.5 here.

In 5.1 this behavior is verified from a debug trace:

query: INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())
       execute: Instruction 0
       >sp_instr_stmt::execute
           proc_info: item_func.cc:3670  User sleep
       <sp_instr_stmt::execute
       execute: Instruction 1
query: CREATE TRIGGER tf_before BEFORE INSERT ON t1
query: INSERT INTO t1 (c1,c2) VALUES (now(), now())
   >sp_head::execute_trigger
     execute: Instruction 0
         execute: Instruction 0
         >sp_instr_stmt::execute
             proc_info: item_func.cc:3670  User sleep
         <sp_instr_stmt::execute
         execute: Instruction 1
   <sp_head::execute_trigger
query: CALL p1()
    execute: Instruction 0
    >sp_instr_stmt::execute
        >sp_head::execute_trigger
          execute: Instruction 0
              execute: Instruction 0
              >sp_instr_stmt::execute
                  proc_info: item_func.cc:3670  User sleep
              <sp_instr_stmt::execute
              execute: Instruction 1
        <sp_head::execute_trigger
    <sp_instr_stmt::execute

    execute: Instruction 1
    >sp_instr_stmt::execute
        >sp_head::execute_trigger
          execute: Instruction 0
              execute: Instruction 0
              >sp_instr_stmt::execute
                  proc_info: item_func.cc:3670  User sleep
              <sp_instr_stmt::execute
              execute: Instruction 1
        <sp_head::execute_trigger
    <sp_instr_stmt::execute

query: SELECT c1-c2 FROM t1
query: SELECT t1.c1-self.c1 > 1 FROM t1, t1 as self WHERE t1.c0=self.c0+1 ORDER BY t1.c0
[4 Feb 2011 10:37] Roel Van de Paar
Could bug #42054 be related (thanks Shane!)?
[8 Feb 2011 12:07] Matthias Leich
Hi Roel,

the test has the potential to harm succeeding tests because
it is not ensured that the disconnect of con2 is finished
before this test ends.
Please fix this.
Current:
   SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10;
   disconnect con2;
   DROP TABLE t1;
Repaired:
   SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10;
   connection con2;
   disconnect con2;
   --source include/wait_until_disconnected.inc
   connection default;
   DROP TABLE t1;
This fix itself will not prevent the sporadic failures reported
within this bug.

Thank you

Matthias
[11 Feb 2011 5:09] Roel Van de Paar
Bug reproduced on Solaris 10. Many threads (i.e. busy machine) will result in inconsistencies in the test case outcome. Will research further and likely open new bug with test case.