Bug #67805 Very slow replication with MySQL-5.5.28 compare to MySQL-5.5.24
Submitted: 4 Dec 2012 18:57 Modified: 23 May 2013 18:19
Reporter: Jean-Christophe Petit Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:MySQL-5.5.28 OS:Linux (Redhat-6.1)
Assigned to: CPU Architecture:Any
Tags: MySQL-5.5.28, replication, slow performance

[4 Dec 2012 18:57] Jean-Christophe Petit
Description:
Hello,

we move from 5.5.24 to 5.5.28 and noticed a huge slow down in replication time when doing intensive INSERTs/DELETEs
It is at least 10 times slower with 5.5.28 than with 5.5.24
Details:
we have 4 slaves: 2 with 5.5.24 and 2 with 5.5.28 (we only upgrade a few at a time into production to see how things go)
We then do a lot of INSERTs/DELETEs and the 5.5.24 were as fast as the Master (5.5.24)
The replication of the 5.5.28 lagged behind.
After an hour like this, we decided to stop one of the 5.5.28 slaves and downgraded it to 5.5.24. Then this slave kept up with the Master in a matter of minutes while the last remaining 5.5.28 was still very far behind: more than 10000 seconds behind. One hour after, 5.5.28 is still very far behind..

It could be also a INSERTs/DELETEs problem but we are not able to do a local test for the moment (ex: dump restore)

Thank you

How to repeat:
Setup:
1 Master 5.5.24
2 slaves: one with 5.5.24 and the other with 5.5.28
Do a huge dump restore
[4 Dec 2012 19:47] Sveta Smirnova
Thank you for the report.

Which binary log format do you use? Which storage engine do you use?
[4 Dec 2012 20:16] Jean-Christophe Petit
bin log format is STATEMENT
Storage engine is innodb
replication is asynchronous
File System is XFS

Have a good day
[4 Dec 2012 20:42] Valeriy Kravchuk
Please, send my.cnf file content and the output of:

show global status;

from your problematic 5.5.28 slave.
[22 Dec 2012 16:01] MySQL Verification Team
but is it replication specific or the server in general.  I mean, if you make the 5.5.28 a master, does the 5.5.28 master become slower than the 5.5.24 masters ?
[2 Jan 2013 9:26] Jean-Christophe Petit
It is specific to the Slave with version 5.5.28
Other slaves with 5.5.24 fine.
Master is 5.5.24 and slave 5.5.28

Thank you
[2 Jan 2013 10:13] MySQL Verification Team
Haven't had any luck repeating this yet with certainty.  I'm using blackhole tables and benchmarking a 200M statement binlog.  5.5.24 and 5.5.28 slaves play it within 3 seconds of each other in a > 1 minute test run, so they are comparable.

On a side note, for windows slaves I have managed to identify two things to make replication *much* faster (on any mysql version):

1.  put "symbolic-links=0" in my.cnf. And do not use .sym files anywhere.
2.  make sure there is a "db.opt" file in all databases, else a failed open takes a lot of time for each statement.
[7 Jan 2013 11:44] Jean-Christophe Petit
It happened with an other customer.
This time, Mysql-server was 5.1.67 and slave was 5.5.28
The replication was extremely slow the same as our first report.
Switching the slave to 5.5.24 resolved the issue.

This customer was attempting a migration from 5.1.X to 5.5.X

Thank you
[23 May 2013 17:40] MySQL Verification Team
I see no performance difference between my 5.5.24/5.5.28 slaves. Except 5.5.28 tends to crash when I add user variables into my test.

We need to know what kind of queries are run to hit the problem. Could it be some optimizer change, therefore deletes take longer if they have complex where clauses?

Is autoincrement variables and user variables used?  
Do you have some benchmark/testcase, or profiling data that shows the problem?
[23 May 2013 17:54] Jean-Christophe Petit
Hopefully, I get a response after 4 months..
The problem seems to be fixed with 5.5.31
[23 May 2013 18:19] MySQL Verification Team
That's good news indeed.
[25 May 2013 4:19] Jervin R
I can still repeat this on 5.5.31, on my tests I have a 5.1.67 master, 5.1.69 and 5.5.31 slaves. The same lagging behavior can be observed on 5.5.24 and 5.5.30. Default configurations on all except server-id, log-bin on master.

1. STOP SLAVE on all slaves
2. Run mysqlslap on the master 

mysqlslap --auto-generate-sql --auto-generate-sql-load-type=write --commit=1 --concurrency=8 --create-schema=slap_test --detach=100 --iterations=5 --number-char-cols=5 --number-int-cols=5 --number-of-queries=50000 -umsandbox -pmsandbox

3. START SLAVE on the slaves, watch the 5.5.31 slave lag behind while 5.1.69 complete in a few seconds.

My test below is both slaves are on the same SSD drive, even if I set innodb_flush_log_at_trx_commit = 2 it will only cut the lag down some 50%. With innodb_flush_log_at_trx_commit = 1, the lag went up to 170 seconds and then a steep drop after about 3mins.

[revin@forge msb]$ for t in {1..360}; do echo $(date +%Y-%m-%d_%H_%M_%S) && ( for s in 5169 5531; do echo "$s $(sb $s -e 'show slave status \G'|grep -Ei 'read_master_log_pos|exec_master_log_pos|seconds'|xargs)"; done ) && sleep 2; done 
2013-05-25_00_12_18
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 238544568 Seconds_Behind_Master: 18
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 200156247 Seconds_Behind_Master: 19
2013-05-25_00_12_20
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 299611065 Seconds_Behind_Master: 18
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 203850830 Seconds_Behind_Master: 21
2013-05-25_00_12_22
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 360706440 Seconds_Behind_Master: 18
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 207445179 Seconds_Behind_Master: 23
2013-05-25_00_12_24
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 210512293 Seconds_Behind_Master: 24
...
2013-05-25_00_14_51
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 387776125 Seconds_Behind_Master: 166
2013-05-25_00_14_53
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 390456496 Seconds_Behind_Master: 168
2013-05-25_00_14_55
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 393220532 Seconds_Behind_Master: 170
2013-05-25_00_14_57
5169 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0
5531 Read_Master_Log_Pos: 395422698 Exec_Master_Log_Pos: 395422698 Seconds_Behind_Master: 0

pt-pmp shows most of the time on AIO:

[revin@forge ~]$ pt-pmp -i 10 -s 1 -p 5604
Sat May 25 00:17:48 EDT 2013
    100 libaio::??(libaio.so.1),os_aio_linux_collect(os0file.c:4539),os_aio_linux_handle(os0file.c:4539),fil_aio_wait(fil0fil.c:4620),io_handler_thread(srv0start.c:482),start_thread(libpthread.so.0),clone(libc.so.6)
     10 sigwait(libpthread.so.0),signal_hand(mysqld.cc:2700),start_thread(libpthread.so.0),clone(libc.so.6)
     10 select(libc.so.6),os_thread_sleep(os0thread.c:259),srv_master_thread(srv0srv.c:2871),start_thread(libpthread.so.0),clone(libc.so.6)
     10 read(libpthread.so.0),vio_read_buff(viosocket.c:95),my_real_read(net_serv.cc:831),my_net_read(net_serv.cc:1028),cli_safe_read(client.c:735),read_event(slave.cc:2226),handle_slave_io(slave.cc:2226),start_thread(libpthread.so.0),clone(libc.so.6)
     10 pthread_cond_timedwait,os_cond_wait_timed(os0sync.c:171),os_event_wait_time_low(os0sync.c:171),srv_monitor_thread(srv0srv.c:2183),start_thread(libpthread.so.0),clone(libc.so.6)
     10 pthread_cond_timedwait,os_cond_wait_timed(os0sync.c:171),os_event_wait_time_low(os0sync.c:171),srv_lock_timeout_thread(srv0srv.c:2333),start_thread(libpthread.so.0),clone(libc.so.6)
     10 pthread_cond_timedwait,os_cond_wait_timed(os0sync.c:171),os_event_wait_time_low(os0sync.c:171),srv_error_monitor_thread(srv0srv.c:2516),start_thread(libpthread.so.0),clone(libc.so.6)
     10 poll(libc.so.6),handle_connections_sockets(mysqld.cc:5179),mysqld_main(mysqld.cc:4625),__libc_start_main(libc.so.6),_start
      7 fsync(libpthread.so.0),os_file_fsync(os0file.c:2073),os_file_flush_func(os0file.c:2073),pfs_os_file_flush_func(os0file.ic:381),fil_flush(os0file.ic:381),log_write_up_to(log0log.c:1514),trx_commit_complete_for_mysql(trx0trx.c:1724),innobase_commit(ha_innodb.cc:2818),ha_commit_one_phase(handler.cc:1393),ha_commit_trans(handler.cc:1327),trans_commit_stmt(transaction.cc:299),mysql_execute_command(sql_parse.cc:4485),mysql_parse(sql_parse.cc:5627),Query_log_event::do_apply_event(log_event.cc:3472),apply_event(log_event.h:1165),apply_event_and_update_pos(log_event.h:1165),exec_relay_log_event(slave.cc:2552),handle_slave_sql(slave.cc:3377),start_thread(libpthread.so.0),clone(libc.so.6)
      1 memcpy(libc.so.6),Log_event::read_log_event(log_event.cc:1188),next_event(slave.cc:4614),exec_relay_log_event(slave.cc:4614),handle_slave_sql(slave.cc:3377),start_thread(libpthread.so.0),clone(libc.so.6)
      1 Log_event::do_shall_skip(libc.so.6),shall_skip(log_event.h:1190),apply_event_and_update_pos(log_event.h:1190),exec_relay_log_event(slave.cc:2552),handle_slave_sql(slave.cc:3377),start_thread(libpthread.so.0),clone(libc.so.6)
      1 lex_one_token(libc.so.6),MYSQLlex(sql_lex.cc:889),MYSQLparse(sql_yacc.cc:16806),parse_sql(sql_parse.cc:7365),mysql_parse(sql_parse.cc:5582),Query_log_event::do_apply_event(log_event.cc:3472),apply_event(log_event.h:1165),apply_event_and_update_pos(log_event.h:1165),exec_relay_log_event(slave.cc:2552),handle_slave_sql(slave.cc:3377),start_thread(libpthread.so.0),clone(libc.so.6)
[25 May 2013 4:45] Jervin R
Repeatable on 5.5.13, 5.5.24, 5.6.11
[25 May 2013 4:52] Jervin R
I also tested with both slaves 5.1.68 and 5.1.69 just to make sure its not IO, and both are able to catchup without problems:

[revin@forge msb]$ for t in {1..360}; do echo $(date +%Y-%m-%d_%H_%M_%S) && ( for s in 5169 5168; do echo "$s $(sb $s -e 'show slave status \G'|grep -Ei 'read_master_log_pos|exec_master_log_pos|seconds'|xargs)"; done ) && sleep 2; done 
2013-05-25_00_51_20
5169 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 35565992 Seconds_Behind_Master: 11
5168 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 30511490 Seconds_Behind_Master: 11
2013-05-25_00_51_22
5169 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 97201230 Seconds_Behind_Master: 11
5168 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 84266626 Seconds_Behind_Master: 12
2013-05-25_00_51_24
5169 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 158789902 Seconds_Behind_Master: 11
5168 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 138285706 Seconds_Behind_Master: 12
2013-05-25_00_51_26
5169 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 197711402 Seconds_Behind_Master: 0
5168 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 192343463 Seconds_Behind_Master: 13
2013-05-25_00_51_29
5169 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 197711402 Seconds_Behind_Master: 0
5168 Read_Master_Log_Pos: 197711402 Exec_Master_Log_Pos: 197711402 Seconds_Behind_Master: 0
^C
[revin@forge msb]$
[1 Jun 2013 3:36] Jervin R
Ignore my last 2 comments, apparently, mysqlslap does not have its table definitions explicitly set, so it was MyISAM on 5.1 and InnoDB on 5.5 causing major replication differences. Running the same tests with default-storage-engine=InnoDB yields the same 5.5 numbers to 5.1 slaves.