Bug #68589 "Using GTIDs for Failover and Scaleout" seems flawed and misleading
Submitted: 6 Mar 2013 23:02 Modified: 2 Apr 2013 11:53
Reporter: Kolbe Kegel Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6.10 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any

[6 Mar 2013 23:02] Kolbe Kegel
The "Using GTIDs for Failover and Scaleout" section of the manual, available at http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-failover.html , contains some very confusing and perhaps even incorrect information.

First, there is much discussion on that page about "a master that has the entire execution history". The idea that any realistic replication setup has some machine in it with the *entire execution history* of all data in binary logs is so impractical that it barely merits inclusion on this page, and certainly doesn't belong at the very top as the first suggestion.

Second, in the "Copying data and transactions to the slave" section, it is suggested that mysqldump can be used to load a "snapshot" of the "Data Set" into the new machine, but along with that is the caveat that "You should have --gtid-mode=OFF while importing the dump". This is very odd advice, but it is also simply *not possible* to load a dump from a machine with --gtid-mode=ON into a machine with --gtid-mode=OFF. Attempting to do so gives this error:

  ERROR 1839 (HY000) at line 24: GTID_PURGED can only be set when GTID_MODE = ON.

The only practical suggestion on this page of how to set up a new machine in a GTID replication topology is the last one, "Excluding transactions with gtid_purged". The advice in this section is fine, but it is needlessly verbose and complicated. It is wholly unnecessary to "record the value of gtid_executed on the server from which the snapshot was taken". It is not necessary to "set gtid_purged on the slave directly". Both of those things are handled by mysqldump, since mysqldump on a server with --gtid-mode=ON causes a "SET @@GLOBAL.GTID_PURGED" statement to be written to its output, with "the value of gtid_executed on the server from which the snapshot was taken" and that, when executed on the new machine via the MySQL client, has the effect of "set[ting] gtid_purged on the slave directly".

The "Excluding transactions with gtid_purged" section ends with "this method creates a server that is functionally a snapshot, but in time is able to become a master as its binary log history converges with that of the replication master or group". This seems again to assume that no binary logs are ever purged from any server in the cluster, which is of course a completely impractical approach. In fact, this new server is 100% capable, immediately, of acting as a master for any other new node in the cluster loaded from the same snapshot, or for any node in the cluster. Over time, yes, it can function as a master for *existing* nodes in the topology, as long as those nodes are caught up at least with the GTID_PURGED value set by the mysqldump output, but its inability to do so immediately is no different from the restriction on using *any* lagging slave as a master.

How to repeat:
1) Try to identify any production replication topology in which keeps binary logs spanning the "entire execution history" are kept permanently.

2) Attempt to follow the instruction to "have --gtid-mode=OFF while importing the dump" to see that this is not a possibility.

Suggested fix:
This entire section should be rewritten to place emphasis on using mysqldump to set up a new node, relying on the GTID_PURGED value in mysqldump output to set up the new slave.

Of course, the suggestion to "have --gtid-mode=OFF while importing the dump" should be changed to "have --gtid-mode=ON while importing the dump", or this should be rewritten entirely to be more clear.

Other approaches could be retained, and explained, for informational purposes, but the emphasis on "a master that has the entire execution history" should be eliminated and replaced with some comments conceding that such a master is unlikely to exist in any well-maintained production replication topology.
[7 Mar 2013 10:30] Jon Stephens
Waiting for response from developers.
[2 Apr 2013 11:53] Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.
[2 Apr 2013 11:55] Jon Stephens
Fixed in mysqldoc rev 34807:

  gtid_mode should be OFF when importing dump

  Note mysqldump --set-gtid-purged option

  Note that backup/restore as shown is supported in 5.6.9+

  Add refs to relevant bug reports/fixes

Fixes Docs BUG#68589