Description:
We are in a master/master solution with all reads/writes going to one master (primary) and second server used for backups and failover (secondary).
On 2 of our 31 master/master pairs, when we issue stop slave, both slave threads stop immediately, but IOwait on the CPU doubles - triples, and continues until "slave start" is issued. No other threads are connected (except the replication connection from the primary master) or performing work. The same issue occurs if the DB is brought up with slave skip start. Once the slave thread is restarted, IO goes back to normal.
Centos 5, running mysql 5.0.68 ent
Box 1: 1 dual core CPU, 8 GB RAM,
I have reduced global innodb_max_dirty_pages_pct and gotten dirty buffers down to 0 and the io still runs hot.
Example iostat -x during this period:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 8.50 16.00 101.50 832.00 6644.00 63.63 2.04 17.52 8.44 99.15
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 16.00 110.50 832.00 6868.00 60.87 2.04 16.28 7.84 99.15
Using show innodb status, I see that the log_buffer still seems to be flushing and sequence # incrementing. Here are two samples, with ~60 seconds between while slave is stopped.
---
LOG
---
Log sequence number 97 2015658440
Log flushed up to 97 2015646953
Last checkpoint at 97 1926146586
0 pending log writes, 0 pending chkp writes
205885465 log i/o's done, 2.33 log i/o's/second
---
LOG
---
Log sequence number 97 2017042788
Log flushed up to 97 2017017580
Last checkpoint at 97 1927895762
0 pending log writes, 0 pending chkp writes
205885643 log i/o's done, 2.21 log i/o's/second
Pertinent my.cnf parameters (consistent across other boxes on other clusters that do not have these problems)
+---------------------------------+------------------------+
| innodb_additional_mem_pool_size | 33554432 |
| innodb_autoextend_increment | 8 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 5368709120 |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | /var/lib/mysql |
| innodb_adaptive_hash_index | ON |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_io_threads | 4 |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 2 |
| innodb_flush_method | O_DIRECT |
| innodb_force_recovery | 0 |
| innodb_lock_wait_timeout | 120 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_arch_dir | |
| innodb_log_archive | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 1363148800 |
| innodb_log_files_in_group | 3 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 25 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_open_files | 300 |
| innodb_rollback_on_timeout | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 20 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
So, why does IO go through the roof on these two servers when the slave is stopped and no other connections occur?
Laine
How to repeat:
It only happens on two of our 31 secondary servers, but happens everytime we stop the slave, and continues until slave is started.