Bug #54819 | sys_vars.slow_query_log_func fails sporadically on Solaris 10 | ||
---|---|---|---|
Submitted: | 26 Jun 2010 7:22 | Modified: | 15 Oct 2012 14:40 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | Tests: Server | Severity: | S3 (Non-critical) |
Version: | M3 (Celosia), M4 (Dahlia) | OS: | Solaris |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | experimental, pb2, sporadic, test failure |
[26 Jun 2010 7:22]
Alexander Nozdrin
[26 Jun 2010 14:31]
Sveta Smirnova
Thank you for the report. Verified as described using pushbuild logs. Also fails in mysql-next-mr-rpl-merge, mysql-next-mr-wl4895, daily-next-mr-bugfixing
[30 Jun 2010 0:18]
Omer Barnir
triage: - assuming a timing issue in the test to ensuring the logs having a difference of >1 - 3132 SR55TEST
[2 Jul 2010 8:22]
Anitha Gopi
Kristofer, As discussed assigning to you since this was added by you to test the fix for http://bugs.mysql.com/bug.php?id=53191
[2 Jul 2010 8:38]
Kristofer Pettersson
Ok, I think this is what happens: The three rows of 1 indicate that the time difference between each of the 4 rows inserted into t1 is lower than 1 second. The statements inserting the rows are: a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now()) // 1st row b. INSERT INTO t1 (c1,c2) VALUES (now(), now()) // 2nd and 3rd row, because of trigger c. CALL p1(); // 4th row Since the failing time interval happens between a and b it seems clear that I just forgot to insert a Sleep(2) instruction there. And the reason it succeeded on my system is because it took a second (!) to create the trigger and SP but on the test system it might happen faster.
[2 Jul 2010 10:29]
Kristofer Pettersson
Fail to reproduce the failure on ubuntu. Will attempt to try on Solaris 10 as soon as I can get a workable VM from IT.
[2 Jul 2010 12:39]
Kristofer Pettersson
Anitha is kind enough to point out that my initial analysis wasn't all that accurate. Let's try again. ;) a. INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now()) // 1st row Trigger created here. b. INSERT INTO t1 (c1,c2) VALUES (now(), now()) // 2nd row c. CALL p1(); expands into: INSERT INTO t1 (c1,c2) values (now(),now()); // 3rd row INSERT INTO t1 (c1,c2) values (now(),now()); // 4th row The difference between the rows are row2-row1, row3-row2, row4-row3 and should have resolved to: (1,1,1) if the time difference between each row was more than one second. We got (1,0,1) instead, indicating that row3-row2 difference is maybe calculated with a bad timing. There is no sleep after statement b. That shouldn't matter, because as Anitha point out, the trigger is active for the INSERT statement inside the stored procedure p1() and should have caused a SLEEP(2) just before the INSERT of the 3rd row, making the difference between row3 and row2 more than 1 second. In fact there is even one SLEEP too many inside the SP which could safely be removed. So why did this fail on Solaris 10? I don't know yet. I can't see anything obviously wrong with the test case just now. If it doesn't fail on 5.1 then the risk is that the test case is good and that we're seeing a regression for 5.5 here. In 5.1 this behavior is verified from a debug trace: query: INSERT INTO t1 (c1,c2) VALUES (now(), f_slow_now()) execute: Instruction 0 >sp_instr_stmt::execute proc_info: item_func.cc:3670 User sleep <sp_instr_stmt::execute execute: Instruction 1 query: CREATE TRIGGER tf_before BEFORE INSERT ON t1 query: INSERT INTO t1 (c1,c2) VALUES (now(), now()) >sp_head::execute_trigger execute: Instruction 0 execute: Instruction 0 >sp_instr_stmt::execute proc_info: item_func.cc:3670 User sleep <sp_instr_stmt::execute execute: Instruction 1 <sp_head::execute_trigger query: CALL p1() execute: Instruction 0 >sp_instr_stmt::execute >sp_head::execute_trigger execute: Instruction 0 execute: Instruction 0 >sp_instr_stmt::execute proc_info: item_func.cc:3670 User sleep <sp_instr_stmt::execute execute: Instruction 1 <sp_head::execute_trigger <sp_instr_stmt::execute execute: Instruction 1 >sp_instr_stmt::execute >sp_head::execute_trigger execute: Instruction 0 execute: Instruction 0 >sp_instr_stmt::execute proc_info: item_func.cc:3670 User sleep <sp_instr_stmt::execute execute: Instruction 1 <sp_head::execute_trigger <sp_instr_stmt::execute query: SELECT c1-c2 FROM t1 query: SELECT t1.c1-self.c1 > 1 FROM t1, t1 as self WHERE t1.c0=self.c0+1 ORDER BY t1.c0
[4 Feb 2011 10:37]
Roel Van de Paar
Could bug #42054 be related (thanks Shane!)?
[8 Feb 2011 12:07]
Matthias Leich
Hi Roel, the test has the potential to harm succeeding tests because it is not ensured that the disconnect of con2 is finished before this test ends. Please fix this. Current: SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10; disconnect con2; DROP TABLE t1; Repaired: SELECT count(*) FROM mysql.slow_log WHERE lock_time > 10; connection con2; disconnect con2; --source include/wait_until_disconnected.inc connection default; DROP TABLE t1; This fix itself will not prevent the sporadic failures reported within this bug. Thank you Matthias
[11 Feb 2011 5:09]
Roel Van de Paar
Bug reproduced on Solaris 10. Many threads (i.e. busy machine) will result in inconsistencies in the test case outcome. Will research further and likely open new bug with test case.