MySQL Bugs: #54819: sys_vars.slow_query_log_func fails sporadically on Solaris 10

Bug #54819	sys_vars.slow_query_log_func fails sporadically on Solaris 10
Submitted:	26 Jun 2010 7:22	Modified:	15 Oct 2012 14:40
Reporter:	Alexander Nozdrin	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	Tests: Server	Severity:	S3 (Non-critical)
Version:	M3 (Celosia), M4 (Dahlia)	OS:	Solaris
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	experimental, pb2, sporadic, test failure

Description:
sys_vars.slow_query_log_func started to fail sporadically on Solaris 10
on 2010-06-15 in daily-trunk-merge.

Symptoms:
-------------------------------------------------------------------
@@ -73,7 +73,7 @@
 SELECT t1.c1-self.c1 > 1 FROM t1, t1 as self WHERE t1.c0=self.c0+1 ORDER BY t1.c0;
 t1.c1-self.c1 > 1
 1
-1
+0
 1
 DROP TRIGGER tf_before;
 DROP FUNCTION f_slow_now;
-------------------------------------------------------------------

The test case has been declared experimental due to this bug.

How to repeat:
XRef: http://tinyurl.com/2u6h4d4

Thank you for the report.

Verified as described using pushbuild logs. Also fails in mysql-next-mr-rpl-merge, mysql-next-mr-wl4895, daily-next-mr-bugfixing

triage:  - assuming a timing issue in the test to ensuring the logs having a difference of >1 - 3132 SR55TEST

Kristofer,
As discussed assigning to you since this was added by you to test the fix for http://bugs.mysql.com/bug.php?id=53191

Ok, I think this is what happens:

The three rows of 1 indicate that the time difference between each of the 4 rows inserted into t1 is lower than 1 second. The statements inserting the rows are:
a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())  // 1st row
b. INSERT INTO t1 (c1,c2) VALUES (now(), now())  // 2nd and 3rd row, because of trigger
c. CALL p1();  // 4th row

Since the failing time interval happens between a and b it seems clear that I just forgot to insert a Sleep(2) instruction there. And the reason it succeeded on my system is because it took a second (!) to create the trigger and SP but on the test system it might happen faster.

Fail to reproduce the failure on ubuntu. Will attempt to try on Solaris 10 as soon as I can get a workable VM from IT.

Anitha is kind enough to point out that my initial analysis wasn't all that accurate. Let's try again. ;)

a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())  // 1st row

Trigger created here.  

b. INSERT INTO t1 (c1,c2) VALUES (now(), now())  // 2nd row
 

c. CALL p1(); 
   expands into:
    INSERT INTO t1 (c1,c2) values (now(),now()); // 3rd row

    INSERT INTO t1 (c1,c2) values (now(),now()); // 4th row

The difference between the rows are row2-row1, row3-row2, row4-row3 and should have resolved to: (1,1,1) if the time difference between each row was more than one second. We got (1,0,1) instead, indicating that row3-row2 difference is maybe calculated with a bad timing.

There is no sleep after statement b. That shouldn't matter, because as Anitha point out, the trigger is active for the INSERT statement inside the stored procedure p1() and should have caused a SLEEP(2) just before the INSERT of the 3rd row, making the difference between row3 and row2 more than 1 second. In fact there is even one SLEEP too many inside the SP which could safely be removed.

So why did this fail on Solaris 10? I don't know yet. I can't see anything obviously wrong with the test case just now.

If it doesn't fail on 5.1 then the risk is that the test case is good and that we're seeing a regression for 5.5 here.

In 5.1 this behavior is verified from a debug trace:

query: INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now())
       execute: Instruction 0
       >sp_instr_stmt::execute
           proc_info: item_func.cc:3670  User sleep
       <sp_instr_stmt::execute
       execute: Instruction 1
query: CREATE TRIGGER tf_before BEFORE INSERT ON t1
query: INSERT INTO t1 (c1,c2) VALUES (now(), now())
   >sp_head::execute_trigger
     execute: Instruction 0
         execute: Instruction 0
         >sp_instr_stmt::execute
             proc_info: item_func.cc:3670  User sleep
         <sp_instr_stmt::execute
         execute: Instruction 1
   <sp_head::execute_trigger
query: CALL p1()
    execute: Instruction 0
    >sp_instr_stmt::execute
        >sp_head::execute_trigger
          execute: Instruction 0
              execute: Instruction 0
              >sp_instr_stmt::execute
                  proc_info: item_func.cc:3670  User sleep
              <sp_instr_stmt::execute
              execute: Instruction 1
        <sp_head::execute_trigger
    <sp_instr_stmt::execute

    execute: Instruction 1
    >sp_instr_stmt::execute
        >sp_head::execute_trigger
          execute: Instruction 0
              execute: Instruction 0
              >sp_instr_stmt::execute
                  proc_info: item_func.cc:3670  User sleep
              <sp_instr_stmt::execute
              execute: Instruction 1
        <sp_head::execute_trigger
    <sp_instr_stmt::execute

query: SELECT c1-c2 FROM t1
query: SELECT t1.c1-self.c1 > 1 FROM t1, t1 as self WHERE t1.c0=self.c0+1 ORDER BY t1.c0

Could bug #42054 be related (thanks Shane!)?

Hi Roel,

the test has the potential to harm succeeding tests because
it is not ensured that the disconnect of con2 is finished
before this test ends.
Please fix this.
Current:
   SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10;
   disconnect con2;
   DROP TABLE t1;
Repaired:
   SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10;
   connection con2;
   disconnect con2;
   --source include/wait_until_disconnected.inc
   connection default;
   DROP TABLE t1;
This fix itself will not prevent the sporadic failures reported
within this bug.

Thank you

Matthias

Bug reproduced on Solaris 10. Many threads (i.e. busy machine) will result in inconsistencies in the test case outcome. Will research further and likely open new bug with test case.