Bug #58693 Disconnecting mysqld can overflow short-time-queue
Submitted: 3 Dec 2010 11:40 Modified: 13 Dec 2010 3:54
Reporter: Jonas Oreland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[3 Dec 2010 11:40] Jonas Oreland
Description:
When an mysqld performing replication gets forcefully disconnected,
SUMA will iterate through all the subscriptions that it had activated
and disable them. Iff this then was the last user of a specific subscription
that will also be deactivated in TUP (where the actual trigger is located).

This process had no flow control, and if there are many (about >512)
subscriptions being deactivated, this could overflow the short-time-queue
in DbtupProxy, as it had a queue of 3 outstanding and queue the others using the short time-queue.

How to repeat:
1) run ndbmtd
2) run mysqld with --log-bin
3) create 512 tables
4) kill -8 mysqld

Suggested fix:
Add flow-control to this process (no more than 9 outstanding drop trigger)
And extend DbtupProxy's internal queue (to 21)
[3 Dec 2010 11:42] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.21 (revid:jonas@mysql.com-20101203114051-i90ny6ghz8p96hua) (version source revid:jonas@mysql.com-20101203114051-i90ny6ghz8p96hua) (merge vers: 5.1.51-ndb-7.0.21) (pib:23)
[3 Dec 2010 11:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/125927

4046 Jonas Oreland	2010-12-03
      ndb - bug#58693 - prevent overflow during API_FAIL_REQ in SUMA
[3 Dec 2010 11:45] Jonas Oreland
pushed to 7.0.21 and 7.1.10
[3 Dec 2010 18:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126001

3751 Frazer Clement	2010-12-03
      Apply fix for bug # 58693 to special branch
[13 Dec 2010 3:54] Jon Stephens
Documented as follows in the NDB-7.0.21 and 7.1.10 changelogs:

        When a mysqld performing replication of a MySQL Cluster that
        uses ndbmtd is forcibly disconnected (thus causing an
        API_FAIL_REQ signal to be sent), the SUMA kernel block iterates
        through all active subscriptions and disables them. If a given
        subscription has no more active users, then this subscription is
        also deactivated in the DBTUP kernel block (where the actual
        trigger is located).

        This process had no flow control, and when there were many
        subscriptions being deactivated (more than 512), this could
        cause an overflow in the short-time queue found in the
        DbtupProxy class.

        The fix for this problem includes implementing proper flow
        control for this deactivation process and increasing the size of
        the short-time queue in DbtupProxy.

Closed.