Bug #71130 5.6 SQL thread is much slower than 5.1/5.5
Submitted: 11 Dec 2013 20:43 Modified: 17 Dec 2013 17:29
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.6.15 OS:Any
Assigned to: CPU Architecture:Any

[11 Dec 2013 20:43] Yoshinori Matsunobu
Description:
If all of the following conditions are met, SQL thread in 5.6 is significantly slower than 5.1/5.6.
* Almost all data fit in memory
* Very heavily inserted/updated and committed (7000~15000 commits/s)
* 5.6 Multi-threaded slave can not be used (due to application side restrictions etc)

Performance drop becomes much worse if:
* Crash safe slave is enabled (relay_log_info_repository=table and sync_relay_log_info=1)
* log-slave-update is set on the slave (for GTID etc)

Here are my micro-benchmark numbers (See "how to repeat" for benchmark conditions).

version: commits/s
5.1.69: 14520/s (no crash safe slave)
5.5.33: 12062/s (no crash safe slave)
fb5.1:  12505/s (crash safe slave)
5.6.15 TABLE: 6103/s  (crash safe slave)
5.6.15p FILE: 10264/s  (no crash safe slave)
5.6.15p TABLE: 7528/s (crash safe slave)

* TABLE means relay_log_info_repository=table
* 5.6.15p has a patch for bug#70342, a patch for skipping load_db_opt_by_name(), and a patch for not opening/fetching/closing slave_relay_log_info table every transaction

If log-slave-updates is not set:
5.1.69: 18692/s (no crash safe slave)
5.5.33: 15593/s (no crash safe slave)
fb5.1:  17056/s (crash safe slave)
5.6.15 TABLE: 7890/s  (crash safe slave)
5.6.15p FILE: 15244/s  (no crash safe slave)
5.6.15p TABLE: 9993/s (crash safe slave)

There are a few observations.
- 17% drop from 5.1.69 to 5.5.33. I have not profiled much in 5.5, but I suspect it's mostly caused by Metadata Locking.
- From 5.5.33 to 5.6.15p(FILE), 15% drop if log-slave-updates was enabled, 2.5% drop if log-slave-updates is disabled. Writing to binlog (from single threaded SQL thread) in 5.6 takes more time than 5.1/5.5.
- relay_log_info_repository=table adds significant overhead. Compared to fb5.1(crash safe slave), performance drop was 51.2% (14520/s->6103/s). slave_relay_log_info table is opened/fetched/updated/closed per every transaction commit. By keeping the table opened and skipping fetching row (updating only), performance improved to 7528/s (with log-slave-updates) and 9993/s (without log-slave-updates), but these are still much lower than fb5.1 (12505/s and 17056/s).

How to repeat:
my.cnf
loose_performance-schema=0
loose_binlog_checksum=0
log-bin
innodb-flush-log-at-trx-commit=0
log-slave-updates
sync-binlog=0
loose_relay_log_info_repository=table
loose_relay_log_recovery=1
loose_sync_relay_log_info=1
loose_sync_master_info=0
innodb_file_per_table
default-storage-engine=innodb
query_cache_type=0
query_cache_size=0

Setup master and slave. It is fine to run both of them on the same machine.

# Let the slave to receive all relay logs, so that SQL thread won't conflict with I/O thread
mysql -h $SLAVE -e "stop slave sql_thread"
mysqlslap -h $MASTER --auto-generate-sql --number-of-queries=500000 --concurrency=50 --auto-generate-sql-load-type=write

# Start SQL thread
mysql -h $SLAVE -e "start slave sql_thread"

# Then check how many transactions are processed on the slave
mysqladmin extended-status -i 10 -r -h $SLAVE | grep Com_insert

Suggested fix:
- Make relay_log_info_repository=table faster
- Writing to binlog in 5.6 has to be as fast as 5.1/5.6
- SQL thread took some time on load_db_opt_by_name() (~5% on FILE based benchmark). This function can be skipped many times. Consider adding a parameter to skip it.
- I think ~17% performance drops from 5.1 to 5.5 mostly came from Metadata Locking. Verify that and improve if it's possible
[17 Dec 2013 17:29] MySQL Verification Team
Hello Yoshinori,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh
[8 Jan 2014 2:10] Luis Soares
Hi Yoshinori,

Thank you for filing this bug and providing very good
feedback. We are analyzing both the issues you have 
raised and the profiling/micro-benchmarking inputs 
you have posted on the bug report.

Regards,
Luís