Bug #33019 Online backup process can consume too much processor time
Submitted: 5 Dec 2007 21:27 Modified: 20 Feb 2010 19:29
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Rafal Somla CPU Architecture:Any

[5 Dec 2007 21:27] Chuck Bell
Description:
The backup kernel can cause a starvation situation in the loops during data reads. This was taken from WL#4060:

      - Starvation (insert waits if nothing happens)	        [2d]
        Benefit: Will not take 90% of processor power when
        drivers are idle                                                

How to repeat:
Run online backup and monitor processor utilization.

Suggested fix:
Use a means to permit task switching during intensive reads. See MyISAM backup driver for examples.
[3 Dec 2009 13:53] Rafal Somla
Suggested solution.

In the data polling loop (backup::Scheduler::step() in data_backup.cc) have a timer and a counter for get_data() calls (actually, Pump_iterator::pump()). If there are more calls per time unit than a certain threshold value, insert a wait.

For example, let's say that time unit is 1sec and we don't want more than 100 calls per second. If we reach 100 calls after say, 0.3sec, then wait 0.7sec and start counting again.
[14 Dec 2009 14:30] Rafal Somla
Assigning myself to that one, because I am assigned to 49337, which is a duplicate of this one.
[15 Dec 2009 10:15] Rafal Somla
BUG#49337 is a duplicate of this one.

Triage: please note that BUG#49337 is tagged "SRGAQUAL,SRFEATURE", so this one should also be tagged with these tags.
[15 Dec 2009 10:16] Rafal Somla
Triage: also triage and priority are different for BUG#49337.
[15 Dec 2009 10:22] Rafal Somla
I can reproduce symptoms with the following test script:

-----------------------------------------------------------------------------
SET DEBUG_SYNC= 'RESET';
DROP DATABASE IF EXISTS db1;

CREATE DATABASE db1;

connect (con_bup,localhost,root,,);
connect (con_dml,localhost,root,,);

--connection con_bup

CREATE TABLE db1.t1 (a int) ENGINE= myisam;

  --connection con_dml

  SET DEBUG_SYNC= 'after_insert_locked_tables SIGNAL insert_started
                   WAIT_FOR complete_insert';
  send INSERT INTO db1.t1 VALUES (1);

--connection con_bup

SET DEBUG_SYNC= 'now WAIT_FOR insert_started';
--echo # Should hang here
BACKUP DATABASE db1 TO "db1.bak";

--exit
-----------------------------------------------------------------------------

When this test is run, BACKUP hangs and I can observe 100% CPU load.

Note: test must be run with --mysql-backup server option.
[15 Dec 2009 10:25] Rafal Somla
PROPOSED SOLUTION
-----------------
In the data polling loop (Scheduler::step()) have a counter which is stepped 
each time a backup driver returns empty buffer and is reset to 0 for each 
non-empty buffer. If this counter reaches certain threshold (100) then a 
fixed duration wait is inserted.
[15 Dec 2009 11:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94116

2909 Rafal Somla	2009-12-15
      Bug #33019 - Online backup process can consume too much processor time
      
      The problem is that BACKUP can enter a tight data pooling loop which
      consumes about 100% of CPU time if drv->get_data() calls are idle and
      produce no data.
      
      It is solved by counting consecutive idle iterations of the data 
      polling loop and inserting waits if this number exceeds a defined 
      threshold.
[15 Dec 2009 11:59] Rafal Somla
Changing 2nd reviewer to Thava, who was assigned to the duplicate BUG#49337.
[15 Dec 2009 23:18] Chuck Bell
See commit reply for suggestions.
[17 Dec 2009 13:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94743

2919 Rafal Somla	2009-12-17
      Bug #33019 - Online backup process can consume too much processor time
      
      The problem is that BACKUP can enter a tight data pooling loop which
      consumes about 100% of CPU time if drv->get_data() calls are idle and
      produce no data.
      
      It is solved by counting consecutive idle iterations of the data 
      polling loop and inserting waits if this number exceeds a defined 
      threshold.
      
      The parameters controlling insertion of vait can be set by the following
      environment variables:
      
      MYSQL_BACKUP_IDLE_COUNT	- how many consecutive idle iterations would trigger
                                a wait,
      MYSQL_BACKUP_IDLE_SLEEP - duration of the wait in ms.
      
      The default values are 10 and 50ms.
     @ sql/backup/data_backup.cc
        - Move Scheduler constructor outside class declaration, because it becomes
          long.
        - Add members controlling insertion of sleeps in the data reading loop.
        - In Scheduler's constructor, initialize these members from environment
          variables or using default values.
        - Update Scheduler::step() to insert waits and maintain idle iterations
          counter.
[18 Dec 2009 7:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94888

2919 Rafal Somla	2009-12-18
      Bug #33019 - Online backup process can consume too much processor time
      
      The problem is that BACKUP can enter a tight data pooling loop which
      consumes about 100% of CPU time if drv->get_data() calls are idle and
      produce no data.
      
      It is solved by counting consecutive idle iterations of the data 
      polling loop and inserting waits if this number exceeds a defined 
      threshold.
     @ sql/backup/data_backup.cc
        - Define the idle count threshold and idle wait constants.
        - Add Scheduler::m_idle_count member and initialize it with 0.
        - Update Scheduler::step() to insert waits and maintain idle iterations
          counter.
[21 Dec 2009 7:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95176

2923 Rafal Somla	2009-12-21
      Bug #33019 - Online backup process can consume too much processor time
      
      The problem is that BACKUP can enter a tight data pooling loop which
      consumes about 100% of CPU time if drv->get_data() calls are idle and
      produce no data.
      
      It is solved by counting consecutive idle iterations of the data 
      polling loop and inserting waits if this number exceeds a defined 
      threshold.
     @ sql/backup/data_backup.cc
        - Define the idle count threshold and idle wait constants.
        - Add Scheduler::m_idle_count member and initialize it with 0.
        - Update Scheduler::step() to insert waits and maintain idle iterations
          counter.
[21 Dec 2009 7:40] Rafal Somla
Pushed to mysql-6.0-backup tree.
revid:rafal.somla@sun.com-20091221073802-bb049clqu70rleme
[20 Feb 2010 9:16] Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[20 Feb 2010 19:34] Paul DuBois
Ignore previous comment.
[20 Feb 2010 19:37] Paul DuBois
Noted in 6.0.14 changelog.

BACKUP DATABASE could enter a tight polling loop that used almost all
processor time.