Bug #54729 sleep() capped to 5 seconds when executed in the sql thread or in an event
Submitted: 23 Jun 2010 9:51 Modified: 4 Aug 2010 16:52
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:next-mr, trunk OS:Any
Assigned to: Sven Sandberg CPU Architecture:Any
Tags: Event, regression, replication, sleep

[23 Jun 2010 9:51] Sven Sandberg
Description:
When the sql function SLEEP() is executed in the slave SQL thread or from an event (as in CREATE EVENT, not binlog event), then the timeout is capped to 5 seconds.

For example, if the user does INSERT INTO t1 VALUES (SLEEP(30)), then it sleeps 30 seconds when invoked from a normal client, but when replicated to a slave, the slave SQL thread sleeps only 5 seconds. Also, if an event is created as follows:

CREATE EVENT e ON SCHEDULE EVERY 30 SECOND do insert into t1 values (sleep(15));

then the event only sleeps 5 seconds.

How to repeat:
--source include/have_binlog_format_statement.inc
--source include/master-slave.inc

CREATE TABLE t1 (a VARCHAR(100));

--delimiter |
CREATE FUNCTION f() RETURNS INT
BEGIN
  INSERT INTO t1 VALUES (SYSDATE());
  INSERT INTO t1 SELECT SLEEP(13);
  INSERT INTO t1 VALUES (SYSDATE());
  RETURN 1;
END|
--delimiter ;

--echo # OK: timestamps differ by 13 seconds
SELECT (f());
SELECT * FROM t1;

--echo # BUG: timestamps differ by only 5 seconds
--sync_slave_with_master
SELECT * FROM t1;

--echo # BUG: timestamps differ by only 5 seconds
--connection master
DELETE FROM t1;
SET GLOBAL EVENT_SCHEDULER = 1;
CREATE EVENT e ON SCHEDULE EVERY 15 SECOND DO SELECT f();
--sleep 20
DROP EVENT e;
SELECT * FROM t1;

Suggested fix:
This bug was introduced in the fix of BUG#10374, in the function interruptible_wait() in item_func.cc.

The function interruptible_wait(), called from item_func_sleep::val_int(), splits the sleep into 5 seconds units. After each unit, it checks if thd->is_connected() is true: if not, it stops sleeping. The purpose is to not use system resources to sleep when a client disconnects.

However, thd->is_connected() returns false for the slave SQL thread and for the event worker thread, because they don't connect to the server the same way as client threads do.

The fix is to make thd->is_connected() return true for the slave SQL thread and for the event thread:

=== modified file 'sql/sql_class.h'
--- sql/sql_class.h	2010-06-19 07:50:33 +0000
+++ sql/sql_class.h	2010-06-22 15:34:18 +0000
@@ -2478,7 +2478,14 @@
   /** Return FALSE if connection to client is broken. */
   bool is_connected()
   {
-    return vio_ok() ? vio_is_connected(net.vio) : FALSE;
+    /*
+      The slave SQL thread and the event worker thread are connected
+      but not using vio. So this function always returns true for
+      them.
+    */
+    return system_thread == SYSTEM_THREAD_SLAVE_SQL ||
+      system_thread == SYSTEM_THREAD_EVENT_WORKER ||
+      (vio_ok() ? vio_is_connected(net.vio) : FALSE);
   }
 #else
   inline bool vio_ok() const { return TRUE; }
[23 Jun 2010 12:27] Davi Arnaut
Looks like the check was wrong for any system thread, wasn't it? A more sensible fix would be:

-    return vio_ok() ? vio_is_connected(net.vio) : FALSE;
+    return system_thread || (vio_ok() ? vio_is_connected(net.vio) : FALSE);
[29 Jun 2010 20:03] MySQL Verification Team
Thank you for the bug report.
[8 Jul 2010 16:01] Omer Barnir
triage: minor but setting SR55RC because a regression
[12 Jul 2010 13:24] Sven Sandberg
My post-commit hooks don't work, but I have committed this:

 3107 Sven Sandberg	2010-07-12
      BUG#54729: sleep() capped to 5 seconds when executed in the sql thread or in an event
      
      Symptom:
      When the sql function SLEEP() was executed in the slave SQL thread or from an event (as in
      CREATE EVENT, not binlog event), then the timeout was capped to 5 seconds.
      
      Background:
      This bug was introduced in the fix of BUG#10374, in the function interruptible_wait() in
      item_func.cc.
      The function interruptible_wait(), called from item_func_sleep::val_int(), splits the
      sleep into 5 seconds units. After each unit, it checks if thd->is_connected() is true: if
      not, it stops sleeping. The purpose is to not use system resources to sleep when a client
      disconnects.
      However, thd->is_connected() returns false for the slave SQL thread and for the event
      worker thread, because they don't connect to the server the same way as client threads
      do.
      
      Fix:
      Make thd->is_connected() return true for all system threads.
     @ sql/sql_class.h
        Made THD::is_connected() return true for all system threads.

=== modified file 'sql/sql_class.h'
--- sql/sql_class.h	2010-07-08 21:20:08 +0000
+++ sql/sql_class.h	2010-07-12 13:17:51 +0000
@@ -2459,7 +2459,12 @@
   /** Return FALSE if connection to client is broken. */
   bool is_connected()
   {
-    return vio_ok() ? vio_is_connected(net.vio) : FALSE;
+    /*
+      All system threads (e.g., the slave IO thread) are connected but
+      not using vio. So this function always returns true for all
+      system threads.
+    */
+    return system_thread || (vio_ok() ? vio_is_connected(net.vio) : FALSE);
   }
 #else
   inline bool vio_ok() const { return TRUE; }
[13 Jul 2010 10:00] Sven Sandberg
pushed to next-mr-bugfixing and trunk-bugfixing
[23 Jul 2010 12:21] Bugs System
Pushed into mysql-trunk 5.5.6-m3 (revid:alik@sun.com-20100723121820-jryu2fuw3pc53q9w) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[23 Jul 2010 12:29] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (pib:18)
[27 Jul 2010 1:02] Paul DuBois
Noted in 5.5.6 changelog.

In a slave SQL thread or Event Scheduler thread, the SLEEP() function
could not sleep more than five seconds.
[4 Aug 2010 8:02] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[4 Aug 2010 8:18] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[4 Aug 2010 16:52] Paul DuBois
Not present in any released 5.6.x version.