Description:
We are in a master/master solution with all reads/writes going to one master (primary)
and second server used for backups and failover (secondary).
On 2 of our 31 master/master pairs, when we issue stop slave, both slave threads stop
immediately, but IOwait on the CPU doubles - triples, and continues until "slave start"
is issued. No other threads are connected (except the replication connection from the
primary master) or performing work. The same issue occurs if the DB is brought up with
slave skip start. Once the slave thread is restarted, IO goes back to normal.
Centos 5, running mysql 5.0.68 ent
Box 1: 1 dual core CPU, 8 GB RAM,
I have reduced global innodb_max_dirty_pages_pct and gotten dirty buffers down to 0 and
the io still runs hot.
Example iostat -x during this period:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
sdb 0.00 8.50 16.00 101.50 832.00 6644.00 63.63 2.04 17.52
8.44 99.15
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00
dm-5 0.00 0.00 16.00 110.50 832.00 6868.00 60.87 2.04 16.28
7.84 99.15
Using show innodb status, I see that the log_buffer still seems to be flushing and
sequence # incrementing. Here are two samples, with ~60 seconds between while slave is
stopped.
---
LOG
---
Log sequence number 97 2015658440
Log flushed up to 97 2015646953
Last checkpoint at 97 1926146586
0 pending log writes, 0 pending chkp writes
205885465 log i/o's done, 2.33 log i/o's/second
---
LOG
---
Log sequence number 97 2017042788
Log flushed up to 97 2017017580
Last checkpoint at 97 1927895762
0 pending log writes, 0 pending chkp writes
205885643 log i/o's done, 2.21 log i/o's/second
Pertinent my.cnf parameters (consistent across other boxes on other clusters that do not
have these problems)
+---------------------------------+------------------------+
| innodb_additional_mem_pool_size | 33554432 |
| innodb_autoextend_increment | 8 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 5368709120 |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | /var/lib/mysql |
| innodb_adaptive_hash_index | ON |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_io_threads | 4 |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 2 |
| innodb_flush_method | O_DIRECT |
| innodb_force_recovery | 0 |
| innodb_lock_wait_timeout | 120 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_arch_dir | |
| innodb_log_archive | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 1363148800 |
| innodb_log_files_in_group | 3 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 25 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_open_files | 300 |
| innodb_rollback_on_timeout | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 20 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
So, why does IO go through the roof on these two servers when the slave is stopped and no
other connections occur?
Laine
How to repeat:
It only happens on two of our 31 secondary servers, but happens everytime we stop the
slave, and continues until slave is started.