Bug #68281 sync_master_info=1 kills slave MTS concurrency
Submitted: 5 Feb 2013 23:19 Modified: 7 Mar 2013 15:01
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Documentation Severity:S5 (Performance)
Version:5.6.10 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any

[5 Feb 2013 23:19] Yoshinori Matsunobu
Description:
When setting sync_master_info=1, slave has no concurrency since LOG_lock mutex is held during flushing into master info repository. So MTS is not effective.

-----
  mysql_mutex_lock(log_lock);

  int err=  (mi->rli->flush_current_log() ||
             mi->flush_info(force));

  mysql_mutex_unlock(log_lock);
-----

How to repeat:
Enable MTS, set sync_master_info=1, and run concurrent insert on master.

Suggested fix:
sync_relay_log_info=1 is much more important for slave crash recovery, and hopefully this does not kill concurrency. Making sync_master_info=1 fast is kind of nice to have feature.
[6 Feb 2013 18:56] MySQL Verification Team
Please, just explain what do you mean by "running concurrent insert" on master. Concurrent with what ????
[6 Feb 2013 19:20] Yoshinori Matsunobu
What I did was:
- Create 100 databases (db1 .. db100) and create an InnoDB table on each database
- Run 100 clients in parallel. Client $i connects to database $i, inserts rows into InnoDB
- Set slave_parallel_workers=100, sync_master_info=1, master-info-repository=TABLE, relay-log-info-repository=TABLE in slave my.cnf

This is similar to https://blogs.oracle.com/MySQL/entry/benchmarking_mysql_replication_with_multi
[7 Feb 2013 16:10] MySQL Verification Team
Yoshinori,

In analyzing this, bug I noticed that your setting is somewhat strange. Why do you need
sync_master_info=1 when using MTS and GTIDs? GTIDs has auto-positioning, so master.info positions are a bit less relevant...

In simple terms, this is an unnecessary setting. 

This does not mean that this is not a bug. It is, but your answer may influence it's priority.
[26 Feb 2013 5:06] Yoshinori Matsunobu
Hi Sinisa,

Yes, I understand sync_master_info=1 is not needed to make slave crash safe (either relay-log-recovery=1 + sync-relay-log=1, or GTID is fine). We won't set 1 in production.

But sync_master_info=1 is introduced in some official documents(i.e. http://www.mysql.com/why-mysql/white-papers/mysql-replication-tutorial/), so I think this should be fixed, or more clear description (that sync_master_info=1 is not mandatory) is needed.
[27 Feb 2013 13:46] MySQL Verification Team
Yoshinori,

Yes, we are aware of all consequences of these settings. We are still investigating the problem.
[27 Feb 2013 14:45] MySQL Verification Team
So far, we have discovered that  --sync_master_info=1 can affect
MTS worker scheduling 'cos it makes SQL thread to spend more time at
relay-log reading.

However, we are unable to see a total loss of MTS concurrency when this variable is set to 1. We are only able to see a slowdown in SQL threads, for the reasons described above.

We yet have to run some tests, but this will finish most likely as "Not a Bug" or as a new task for our Documentation team.
[28 Feb 2013 14:44] MySQL Verification Team
Ultimately, this is not a bug, according to the results of our extended tests.

We ran first a test with 100 worker threads and noticed a slowdown in MTS thread execution time, but, replication still continued, only at a slower space.

After that, we ran a test from our suite, which measures a total time for executing all events from master's binary log.

Down below are some results, shown in total time execution in seconds.

In a table, those symbols are used:

s=0,1 (sync-master-info setting).
w=0,4 (w - number of workers)

The test also measures "dirty" applying time
of bin-logged events (that is time accounts
the event transporting to the slave from the master binlog).

So the test passes and its time passage in the following matrix.:

w|
s |   0  4
--+--------
0 |   18 14  
1 |   26 17

So, in single thread mode, you have a slowdown from 26 to 18 seconds and in MTS mode, with 4 (four) workers threads a slowdown from 17 to 14 seconds. This is a time required for a slave to read and execute all entries from master's binary log.

So, MTS concurrency is not killed, as stated in the bug report, but it is slowdown due to the necessity of syncing some data to the disc. This slowdown is definitely expected and it is smaller then a slowdown in single SQL thread replication. Hence, MTS is not fully ineffective when sync_master_info = 1.

Luckily, 5.6 has a solution for the slowdown problem. Simply, one has to switch to GTID:s and instead of using the old options, like: 

--sync-master-info
--relay-log-recovery
--sync-relay-log
--sync-relay-log-info

We shall inspect whether documentation will require appending on this issue.
[4 Mar 2013 17:18] MySQL Verification Team
Changed to "Documentation" bug, with the agreement of Documentation team ...
[7 Mar 2013 15:01] Jon Stephens
This doesn't appear to be a bug in the official MySQL documentation. The references provided by the submitter aren't in the official MySQL documentation. Blogs are not official MySQL documentation, and the Docs Team have no control over their content. 

In addition, the submitter's original contention was shown not to be true, so there's likely no error or omission to correct in the Manual, and none was indicated.

Closed.