Bug #41004 | events_scheduling times out sporadically on pushbuild | ||
---|---|---|---|
Submitted: | 25 Nov 2008 10:57 | Modified: | 8 Sep 2009 9:14 |
Reporter: | Sven Sandberg | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.1, 6.0 | OS: | Any |
Assigned to: | Georgi Kodinov | CPU Architecture: | Any |
Tags: | events_scheduling, pushbuild, sporadic, test failure |
[25 Nov 2008 10:57]
Sven Sandberg
[8 Dec 2008 18:54]
Matthias Leich
Some estimation --------------- There are 9 places where include/wait_condition.inc is sourced. Worst case is that we need in every case nearly $wait_timeout seconds (default of 30 seconds is never overwritten). ==> 270 seconds + 30 seconds for execution of the other stuff ----- 300 seconds which is far away of the usual 900 seconds testcase timeout. So it looks like the problem is outside of the existing wait routines.
[8 Dec 2008 21:41]
Matthias Leich
analyze-timeout-mysqld.1.err shows a process with info like "%SET GLOBAL event_scheduler=0". This command is only in line 14 of the script. Replay script: -------------- --disable_query_log let $num = 10000; while ($num) { SET GLOBAL event_scheduler=1; SET GLOBAL event_scheduler=0; dec $num; } --enable_query_log --echo # End Expected results: ----------------- # End How to replay: -------------- ./mysql-test-run.pl --mem \ `perl -e 'for (1 .. 100) { print "<script> "}'` A parallel compile increases the likelihood to get the problem. You do not need to wait till the testcase timeout of 900 seconds has passed. If one test has not passed after ~ 10 seconds and several executions of wc -l var/log/master.log do not show an increasing amount of line than you have replayed the problem. This is a bug within the events scheduler. IMHO this bug is - most probably not very critical for customers 1. It is not extreme likely because it requires some load on the testing box. 2. It is rather unlikely that somebody runs switch event_scheduler on / switch event_scheduler off without nearly no timespan between. 3. The bug itself does not produce additional heavy problems like for example excessive CPU load. 4. Of course the events_scheduling is somehow out of order till next server shutdown/restart 5. I tried from another connection CREATE TABLE, INSERT and they worked Output of "SELECT * FROM information_schema.processlist" +-------+-----------------+-----------+------+---------+------+---------------------------+----------------------------------------------+ | ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO | +-------+-----------------+-----------+------+---------+------+---------------------------+----------------------------------------------+ | 18991 | root | localhost | NULL | Query | 0 | executing | SELECT * FROM information_schema.processlist | | 18986 | event_scheduler | localhost | NULL | Killed | 624 | Waiting on empty queue | NULL | | 10002 | root | localhost | test | Query | 624 | Waiting scheduler to stop | SET GLOBAL event_scheduler=0 | +-------+-----------------+-----------+------+---------+------+---------------------------+----------------------------------------------+ This fits very good to the reported effect. - is very annoying when running our tests ~ 157 times testcase timeout when running this test during the last 15 months - there are other events tests which also fail often with testcase timeout although the reason might be sometimes a different one (unfortunate poll routines etc.)
[8 Sep 2009 9:14]
Georgi Kodinov
Tried the latest 5.1-bugteam codebase. I was able to run the test case from Matthias 100 times without a timeout (through the perl line provided). And since the last pb occurrence was on 5 May 2009 I assume the problem is no longer repeatable in 5.1.