MySQL Bugs: #84590: mysql relaylog transfers very slowly (200k/sec), mysqld 50% load, cause unknown

Bug #84590	mysql relaylog transfers very slowly (200k/sec), mysqld 50% load, cause unknown
Submitted:	20 Jan 2017 20:20	Modified:	20 Mar 2017 14:52
Reporter:	Oregano Jim	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.6.35 and .28	OS:	Ubuntu
Assigned to:		CPU Architecture:	Any
Tags:	replication

Description:
After several days of testing, I can't see what is causing this.

There are two slave servers, and a master.

Master is running Ubuntu 14.04, and MySQL community edition 5.6.28.

Slaves are running Ubuntu 16.04, with the same -- and, I've tried 5.6.35 too.

Nothing seems amiss. However, even with the SQL thread off, the IO thread copies logfiles from the master exceptionally slow. 200kbytes/sec on a good day.

Steps taken to eliminate network as the cause:

- used mysqlbinlog to download the file directly -- speed is fine. The hope was to eliminate anything having to do with the router + port 3306

- switched to an alternate NIC card for replication, which uses a different router.. speed still slow

- used scp to test speed, it was OK. Then used an ssh tunnel, and replicated through that. The hope was to eliminate even more potential edgecases.. still, replication was slow

- mysqld is generally pegged at 50%. these are on modern, multi-core servers.. and with only the IO thread running

- swap is not an issue. 0 swap is used, swappiness=0, and mysql is using hugepages. server has 100G+, and has 7G reported free in free/top

- IO is not an issue, and top 'wa' reports almost no wait (sometimes 5% on one core) when SQL thread, and IO thread is on

- verified IO with iostat, not to mention that normal scp/rsync/mysqlbinlog download files instantly

- no queries are being run on these slaves right now -- it's just replication

- strace didn't show anything that popped up as a 'big deal'

I'm not sure where to go from here.

I believe this is a mysqld bug.

How to repeat:
This bug is repeatable with two servers, with different hardware configurations.

Hi!

What you report looks more like OS problem, configuration problem and hardware problem.

It is expected behavior, that copying of the file is much, much faster then constant retrieval from the master, transaction by transaction. Next, you have 7 % of free space on your disc. Thus you are running into the fragmentation problems. Last, but not least, proper design of the network with 1 GB ethernet and very fast switches is a must in a configuration like this.

Your report does not seem to contain any repeatable test case, hence you do not seem to report any bug. Do note that this is not a forum for free support, but only a vehicle for reporting bugs, by providing full and repeatable test cases that can be repeated on any network and hardware configuration. Bug can be also OS related, but then we also require repeatable test case.

Hi!

Your report still does not contain a semblance of the repeatable test cases. This however does not mean that we have not done our internal measurements, for the purposes of the improvements , quality control, research and commercial support.

MySQL will use almost all the network bandwidth available, between the dump (master) and IO threads (slaves). The applier is the one that may have more problem keeping up. This problem is exacerbated by the option of syncing relay logs and, further, if SQL threads are running, by logging slave updates. In any case, slave's storage of the events is the bottleneck in most cases, while in some other, more rare ones, the network. Last one is due to the fact that several network reads are required for a single event, in order to read event header, transaction headers etc .... Furthermore, it depends on the other settings in the configuration, like semi-sync etc ....

This all being said, I must repeat that you have not provided us with a repeatable test case, which is first condition required in order to treat this report as a bug.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".