MySQL Bugs: #79188: Distribution of schema operations stops or timeouts

Bug #79188	Distribution of schema operations stops or timeouts
Submitted:	9 Nov 2015 14:49	Modified:	23 Nov 2015 14:12
Reporter:	Ole John Aske	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	7.4.8	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When running Mysql Cluster configured with multiple mysqld's, and schema operation
is distributed between the mysqld. The mysqld initiating the schema change will then
wait (in ndbcluster_log_schema_op()) until all other mysqld has acknowledged
that they have 'seen' the change.

Sometimes mysqld times out while waiting for distribution to complete.

... [ERROR] NDB <operation type> distributing <table name> timed out. Ignoring...

How to repeat:
Running create + drop table in a loop waiting for these to be distributed.

MTR test program will be provided as part of fix

Documented fix in the NDB 7.2.23, 7.3.12, and 7.4.9 changelogs, as follows:

    When executing a schema operation such as CREATE TABLE while
    running a MySQL Cluster with multiple SQL nodes, it was possible
    for the SQL node on which the operation was performed to time
    out while waiting for an acknowledgement from the others. This
    could occur when different SQL nodes had different settings for
    --ndb-log-updated-only, --ndb-log-update-as-write, or other
    options effecting binary logging.

    This happened due to the fact that, in order to distribute
    schema changes between them, all SQL nodes subscribe to changes
    in the ndb_schema system table, and that all SQL nodes are made
    aware of each others subscriptions by subscribing to
    TE_SUBSCRIBE and TE_UNSUBSCRIBE events. The names of events to
    subscribe to are constructed from the table names, adding REPL$
    or REPLF$ as a suffix. REPLF$ is used when full binary logging
    is specified for the table. The issue described previously arose
    because different values for the options mentioned could lead to
    different events being subscribed to by different SQL nodes,
    meaning that all SQL nodes were not necessarily aware of each
    other, so that the code that handled waiting for schema
    distribution to complete did not work as designed.

    To fix this issue, MySQL Cluster now treats the ndb_schema table
    as a special case and enforces full binary logging at all times 
    for this table, independent of any settings for mysqld binary 
    logging options.

Closed.