Bug #49532 Excessive CPU usage during LCP of DD
Submitted: 8 Dec 2009 12:53 Modified: 9 Dec 2009 14:53
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[8 Dec 2009 12:53] Jonas Oreland
Description:
During checkpoint of disk-page-buffer-cache, if using sufficiently
large buffer, excessive CPU is consumed, as code will "spin" on page
until permitted io-threshold is reached.

I.e pgman can only have X outstanding IOs, and when
checkpointing each dirty page needs to be flushed.

When finding a page to flush, the code would "spin" (using continueb)
until other IO had completed.

The patch changes so that it in this scenario instead relinquish control,
and wakeup the LCP thread when an IO has completed.

How to repeat:
Run write intensive work-load on quite big disk-page-buffer cache
Observe almost 100% (times #lqh-threads) CPU usage during DD checkpoint

Suggested fix:
See description
[8 Dec 2009 12:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93179

3186 Jonas Oreland	2009-12-08
      ndb - bug#49532 - don't busy loop in pgman during LCP if max-io-limit is reached
[8 Dec 2009 14:21] Jonas Oreland
pushed to 6.3.29 and 7.0.10
[8 Dec 2009 14:25] Bugs System
Pushed into 5.1.39-ndb-7.0.10 (revid:jonas@mysql.com-20091208141844-5shueznybtmo2tql) (version source revid:jonas@mysql.com-20091208134403-6sj860qv428m06oh) (merge vers: 5.1.39-ndb-7.0.10) (pib:13)
[8 Dec 2009 15:02] Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091208143724-9l8bip30mvk3uq1m) (version source revid:jonas@mysql.com-20091208143724-9l8bip30mvk3uq1m) (merge vers: 5.1.39-ndb-7.1.0) (pib:13)
[9 Dec 2009 14:53] Jon Stephens
Documented bugfix in the NDB-6.3.29 and 7.0.10 changelog as follows:

        When running a write-intensive workload with a very large disk
        page buffer cache, CPU usage approached 100% during a local
        checkpoint of a cluster containing Disk Data tables.

Closed.