MySQL Bugs: #70784: Adjust documentation and possibly defaults for sync_{master,relay_log}

Bug #70784	Adjust documentation and possibly defaults for sync_{master,relay_log}_info
Submitted:	31 Oct 2013 12:33	Modified:	5 Feb 2014 12:33
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	5.6.14	OS:	Any
Assigned to:	Jon Stephens	CPU Architecture:	Any
Tags:	binlogs, documentation, GTID, replication, safe

Description:
I had a crash on a server today (not mysqld) containing a mysql slave. On recovery the replication position was not consistent with the state of the database so there were several duplicate key errors. This required the server to be recloned.

Looking at the logs I see:

Version: '5.6.14-log' socket: 'mysql.sock' port: 3306 MySQL Community Server (GPL)
2013-10-31 11:59:42 3305 [ERROR] Slave SQL: Error 'Duplicate entry '70781501-2014-03-14' for key 'PRIMARY'' on query. Default database: 'db'. Query: 'INSERT INTO some_table_name (some_id,date) VALUES (70781501,'2014-03-14')
...

Cause: the replication position is behind the state of the [recovered] database.
Looking at the docs: http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html it seems the cause of this is the default options (even when using a setting like):

master_info_repository = TABLE
relay_log_info_repository = TABLE

are not crash safe. These default settings are the ones that concern me:

sync_relay_log = 10000
sync_relay_log_info = 10000
sync_master_info = 10000

While sync_relay_log = 1 may well generate i/o which isn't absolutely needed as the data should be on the master and can be "recovered", the other 2 entries if using "TABLE" entries being set to 1 should not generate much more I/O.

How to repeat:
see above.

Suggested fix:
I think that for InnoDB setups (pretty much the default for 5.6 and later):

1. the documentation should be clearer about the "potential cost" of leaving the values unchanged. A server crash may leave you with a database which recovers but is out of sync with the "replication positions stored in the repository locations". The new options in 5.6 were supposed to be there to prevent this.

2. The defaults for sync_relay_log_info and sync_master_info SHOULD BE 1 (at least if the repository settings use TABLE)

3. MySQL 5.7 should have the default setting for the repository settings set to TABLE not FILE.

These changes should ensure that recovery of a busy system after a crash should be complete and not require manual work to fix things (at least if sync_relay_log = 1), and those who for performance reasons want to adjust these values to "less safe" defaults are likely to understand the consequences of doing that.

Thank you for the bug report.

Related http://bugs.mysql.com/bug.php?id=69135

Those who want high reliability slaves should normally use GTIDs and the relatively high reliability and low work solution that they enable, as described at http://dev.mysql.com/doc/refman/5.6/en/replication-gtids.html where we tell people what we expect them to do to prevent the described problem:

"When using GTIDs, each transaction can be identified and tracked as it is committed on the originating server and applied by any slaves; this means that it is not necessary when using GTIDs to refer to log files or positions within those files when starting a new slave or failing over to a new master, which greatly simplifies these tasks. Because GTID-based replication is completely transaction-based, it is simple to determine whether masters and slaves are consistent; as long as all transactions committed on a master are also committed on a slave, consistency between the two is guaranteed."

Simon, are your log info tables InnoDB as they should be in 5.6.14 or do you happen to have the older MyISAM ones in use on this box (was changed from MyISAM to InnoDB in 5.6.6)?

Is there some reason why you weren't just using GTIDs pretty much automatically for all new replication setups? That's what we hope people will do, so if you've a reason not to it would be good to know about it so we might do something about that.

Moving on beyond that to the settings, at http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html we say that:

"In order for replication to be crash-safe when using tables for logging status and relay information, the tables must use a transactional storage engine, such as InnoDB."

If that alone is not sufficient to make them crash-safe then we should also at that point say that the required sync_* settings should be set to 1 also.

We probably won't change this in 5.6 in the program but I am thinking about whether to ask for some changes to defaults and if we did that and if the sync values really do have to be set to 1 to be robust with table format set then we should probably test the overhead of that and consider making the table setting change the sync value (if sync value isn't also set explicitly). If we were to make defaults changes it is the sort of thing we might do in one small batch a year or so after GA.

James Day, MySQL Senior Principal Support Engineer, Oracle

Hi James.

Using GTIDs is currently not possible on many of our servers because we are in the process of upgrading them from 5.5 to 5.6, so while not all the replication chain (including the master) is upgraded to 5.6 GTIDs are not an option.

Second, even when we have upgraded all servers to 5.6 I really doubt we will use GTID on existing chains.

This is for 2 reasons (and I've filed a bug about this):
1. you require binlogs to be written on both masters and slaves. That is not an option as it requires more disk space on the slaves and generates much higher i/o rates. So completely useless. We do not expect to have to promote most slaves to masters so there's no need for this. This is known by your team and it is imo a large show stopper.
2. The current GTID setup (considering someone who upgrades a replication chain) requires you to enable it on all servers at the same time AND to restart MySQL. That is completely unacceptable for a replication chain that is running 24x7.

So both these reasons mean that GTID is "useless" on some chains in the environments I work on.

My thoughts on this are:
(1). You must allow the option to _not_ write binlogs and store this information somewhere else (at least on "unpromotable slaves"). Personally I don't see why you can't store this information in a InnODB table like you do with the mysql.*repository tables, but perhaps I may be missing something. That or write only the GTID information to the binlogs. Personally I don't think the binlogs is the right place for this data but that's another discussion.

(2) You MUST provide an upgrade path from a working 5.5 environment with a master and slaves, such that the slave can be upgraded to 5.6 and finally the master. You must also ensure that the downtime while doing this is negligible.

In the environment I work in usually a new master is built and added to the replication chain, thus giving us master55 ---> master56 ---> slave56, and then we switch a "vip" from the 55 master to the 56 master. this keeps downtime to the master to a couple of seconds and is acceptable. Then once this is done the appropriate GTID setup can be enabled on the slave (maybe in a mode where it will receive the GTIDS but mainly ignore them), and finally on the master, (so GTID traffic is going down the chain but not being used), and finally you enable the slaves to pay attention to this.

This is the only workflow that I can see that enables me to keep my servers up pretty much 100%, and does not affect the whole replication chain and allows me to move over to using GTID on a replication chain which is live and busy and can not be stopped. The minor disruptions of changing configuration options on a server, or perhaps stopping and starting replication are less of an issue if each configuration change is going to take a couple of seconds. The master downtime MUST always be kept to an absolute minimum.

This contrasts heavily to current documentation which basically requires me to shutdown all servers at once in order to enable GTID, leaving me with a cold replication chain which would perform badly even ignoring the fact that stopping the whole chain is extremely disruptive.

New setups are different but again the real show stopper is again: I do not want to write binlogs on all systems. The I/O load this generates is unacceptable and on busy systems the extra disk space requirements are not acceptable.

So on tiny, small setups this would work, yes. However, all these tiny setups grow so the underlying problem still exists.

You may be aware that I spoke to some people at Oracle recently and relayed these concerns then. This response to you makes that need more explicit and I hope you understand the reasoning. I want to confirm that GTID is good and I am pleased that it has been added but I think the "setup" requirements of people who can not afford to take downtime were probably not considered.

If you plan on generating new features by all means solicit feedback from those of us who may be more affected by these changes, and more interested in them, and perhaps that will enable the final design of these features to better fit our needs and cover cases which you may not have considered.

As a final comment I am currently discussing with my colleagues which settings are now most appropriate to make a 5.6 slave (not using GTID) crash-safe, but as performant as possible. From what I remember the MySQL documentation does not really mention this. I'm sure other RDBMS users would _expect_ their database to withstand crashes including and systems in a replication chain, and that of course is what we hope for with MySQL. MySQL 5.5's writing to master.info and relay-log.info generated too much extra I/O to make is possible to do this on many systems (the sync_master_info = 1, sync_relay_log_info = 1) so we have accepted that will not happen but I had partly missed the fact that the current 5.6 settings are equally unsafe and trying to fix that by adjusting MySQL's configuration accordingly. Having the documentation tell me how to do that will help many who may bump into a crashed slave and find it unexpectedly out of sync, and changing the defaults in such a way would ensure that the problem would not occur in the first place.

Hence this bug report.

On a typical 5.6.14 box I have:

# grep repository /etc/my.cnf
master_info_repository    = TABLE
relay_log_info_repository = TABLE
# ls -l /path/to/mysql/*info*
-rw-rw---- 1 mysql mysql 10719 Nov  1 12:14 /path/to/mysql/slave_master_info.frm
-rw-rw---- 1 mysql mysql 98304 Nov  8 08:30 /path/to/mysql/slave_master_info.ibd
-rw-rw---- 1 mysql mysql  9334 Nov  1 12:14 /path/to/mysql/slave_relay_log_info.frm
-rw-rw---- 1 mysql mysql 98304 Nov  7 15:25 /path/to/mysql/slave_relay_log_info.ibd
-rw-rw---- 1 mysql mysql  9230 Nov  1 12:14 /path/to/mysql/slave_worker_info.frm
-rw-rw---- 1 mysql mysql 98304 Nov  1 12:14 /path/to/mysql/slave_worker_info.ibd
#

So my assumption here is that to have a slave (only InnoDB tables) _recover_ completely from a crash AND have a completely consistent replication position then we need to set on the slave:

master_info_repository    = TABLE
relay_log_info_repository = TABLE
sync_relay_log_info = 1
innodb_flush_log_at_trx_commit = 1

Other settings like the following would be ideal but may be unnecessary:

sync_relay_log = 1
sync_master_info = 1

What they may mean is that the relay logs get corrupted (5.6 has binlog checksums to detect this), or the I/O thread will find the relay log larger than it expects, and that may stop replication.  If it does, an appropriate CHANGE MASTER TO ... using the information stored for the SQL thread in mysql.slave_relay_log_info should throw away the relay logs and pull the data afresh from the master.

This also assumes, and I don't think the MySQL documentation says this happens explicitly, that the implicit or explicit commit by the SQL thread contains the change to the mysql.slave_relay_log_info table.  If that is not done atomically then I'd consider it a bug, as it allows you to write to the ib log files at the same time as you write the data needed for the SQL thread's commit, so it should be free and thus avoid any extra I/O.

see also: http://forums.mysql.com/read.php?144,600534

Most likely this is a consequence of some bad recovery.

When the option is TABLE , then a number for the sync_relay_log_info option is ignored. See:

http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_sync_relay_lo...

Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Fixed in mysqldoc rev 37424. Closed.

Jonathan, could you please indicate in this bug report what has been changed. Saying you've fixed the documentation and providing a commit as a reference is not helpful and forces me to download the source and do a diff.

I had a quick look at: http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html again and the documentation has a couple of references to "this being safer", but I see no clear section (this page is probably not the page to do that) on how to safely setup replication to be crash safe.

Perhaps this is now documented somewhere else but it's hard to know where if you do not actually mention this.

James, in relation to safety: GTIDs are not the full solution at least not in MySQL 5.6 with it's current implementation which has several shortcomings which prevent it just being switched on.
I know that is being looked at which is really good, but many other issues are still in the air for me.

So while GTID is not the default MySQL behaviour and if a slave is needed it would be most helpful to have a section of the documentation which clearly describes the options and settings which are needed to achieve this desired "replication crash safety": that is the server crashes, recovers and is able to continue replicating despite this.

If the docs now describe this please point me to them.

The exact docs change was two notes added at:

http://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html

"The setting of this variable also has a direct bearing on the effect had by the setting of the sync_master_info system variable; see that variable's description for further information."

And

"The setting of this variable also has a direct bearing on the effect had by the setting of the sync_relay_log_info system variable; see that variable's descrption for further information."