MySQL Bugs: #68281: sync_master_info=1 kills slave MTS concurrency

Bug #68281	sync_master_info=1 kills slave MTS concurrency
Submitted:	5 Feb 2013 23:19	Modified:	7 Mar 2013 15:01
Reporter:	Yoshinori Matsunobu (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S5 (Performance)
Version:	5.6.10	OS:	Any
Assigned to:	Jon Stephens	CPU Architecture:	Any

Description:
When setting sync_master_info=1, slave has no concurrency since LOG_lock mutex is held during flushing into master info repository. So MTS is not effective.

-----
  mysql_mutex_lock(log_lock);

  int err=  (mi->rli->flush_current_log() ||
             mi->flush_info(force));

  mysql_mutex_unlock(log_lock);
-----

How to repeat:
Enable MTS, set sync_master_info=1, and run concurrent insert on master.

Suggested fix:
sync_relay_log_info=1 is much more important for slave crash recovery, and hopefully this does not kill concurrency. Making sync_master_info=1 fast is kind of nice to have feature.

Please, just explain what do you mean by "running concurrent insert" on master. Concurrent with what ????

What I did was:
- Create 100 databases (db1 .. db100) and create an InnoDB table on each database
- Run 100 clients in parallel. Client $i connects to database $i, inserts rows into InnoDB
- Set slave_parallel_workers=100, sync_master_info=1, master-info-repository=TABLE, relay-log-info-repository=TABLE in slave my.cnf

This is similar to https://blogs.oracle.com/MySQL/entry/benchmarking_mysql_replication_with_multi

Yoshinori,

In analyzing this, bug I noticed that your setting is somewhat strange. Why do you need
sync_master_info=1 when using MTS and GTIDs? GTIDs has auto-positioning, so master.info positions are a bit less relevant...

In simple terms, this is an unnecessary setting. 

This does not mean that this is not a bug. It is, but your answer may influence it's priority.

Hi Sinisa,

Yes, I understand sync_master_info=1 is not needed to make slave crash safe (either relay-log-recovery=1 + sync-relay-log=1, or GTID is fine). We won't set 1 in production.

But sync_master_info=1 is introduced in some official documents(i.e. http://www.mysql.com/why-mysql/white-papers/mysql-replication-tutorial/), so I think this should be fixed, or more clear description (that sync_master_info=1 is not mandatory) is needed.

Yoshinori,

Yes, we are aware of all consequences of these settings. We are still investigating the problem.

So far, we have discovered that  --sync_master_info=1 can affect
MTS worker scheduling 'cos it makes SQL thread to spend more time at
relay-log reading.

However, we are unable to see a total loss of MTS concurrency when this variable is set to 1. We are only able to see a slowdown in SQL threads, for the reasons described above.

We yet have to run some tests, but this will finish most likely as "Not a Bug" or as a new task for our Documentation team.

Ultimately, this is not a bug, according to the results of our extended tests.

We ran first a test with 100 worker threads and noticed a slowdown in MTS thread execution time, but, replication still continued, only at a slower space.

After that, we ran a test from our suite, which measures a total time for executing all events from master's binary log.

Down below are some results, shown in total time execution in seconds.

In a table, those symbols are used:

s=0,1 (sync-master-info setting).
w=0,4 (w - number of workers)

The test also measures "dirty" applying time
of bin-logged events (that is time accounts
the event transporting to the slave from the master binlog).

So the test passes and its time passage in the following matrix.:

w|
s | 0 4
--+--------
0 | 18 14
1 | 26 17

So, in single thread mode, you have a slowdown from 26 to 18 seconds and in MTS mode, with 4 (four) workers threads a slowdown from 17 to 14 seconds. This is a time required for a slave to read and execute all entries from master's binary log.

So, MTS concurrency is not killed, as stated in the bug report, but it is slowdown due to the necessity of syncing some data to the disc. This slowdown is definitely expected and it is smaller then a slowdown in single SQL thread replication. Hence, MTS is not fully ineffective when sync_master_info = 1.

Luckily, 5.6 has a solution for the slowdown problem. Simply, one has to switch to GTID:s and instead of using the old options, like:

--sync-master-info
--relay-log-recovery
--sync-relay-log
--sync-relay-log-info

We shall inspect whether documentation will require appending on this issue.

Changed to "Documentation" bug, with the agreement of Documentation team ...

This doesn't appear to be a bug in the official MySQL documentation. The references provided by the submitter aren't in the official MySQL documentation. Blogs are not official MySQL documentation, and the Docs Team have no control over their content. 

In addition, the submitter's original contention was shown not to be true, so there's likely no error or omission to correct in the Manual, and none was indicated.

Closed.