Bug #100723 | slow MTS recovery in startup | ||
---|---|---|---|
Submitted: | 3 Sep 2020 7:30 | Modified: | 7 Sep 2020 9:03 |
Reporter: | Qilu Wei | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S5 (Performance) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[3 Sep 2020 7:30]
Qilu Wei
[4 Sep 2020 13:26]
MySQL Verification Team
Hi, > slow when there are lots of relay log to read. I'm not sure I understand what you consider to be a bug here. Yes, it is slow when it needs to read huge log. We are working on making everything faster and more optimized but I don't see how is this a bug? thanks
[5 Sep 2020 2:23]
Qilu Wei
ERROR LOGGING causes the slow startup, NOT READING LOG. Please pay attention to the difference of the situation of startup and after-startup. It is NOT slow to read huge log because the "start slave" reading the SAME huge log took very little time. And I have PROVED that the huge ERROR LOGGING in mts recovery made it slow by changing the code and restarting it: The code I changed is: (Line 6083, rpl_slave.cc) // LogErr(INFORMATION_LEVEL, ER_RPL_MTS_GROUP_RECOVERY_RELAY_LOG_INFO, // rli->get_group_master_log_name(), ev->common_header->log_pos); Without this part of error logging and reading the same huge log, the mts recovery of startup became as fast as "start slave". And please be aware, error logging in startup and after-startup have different procedure.
[5 Sep 2020 4:18]
Qilu Wei
Amend previous comment: Without this part of error logging and reading the same huge log, the mts recovery of startup became as fast as "start slave". to: Reading the same huge log when there is no huge error logging, the mts recovery in startup became as fast as "start slave".
[5 Sep 2020 9:02]
MySQL Verification Team
I've once found error logging to be very slow, but it was on Windows, can you take a look if this is similar ? https://bugs.mysql.com/bug.php?id=91487
[7 Sep 2020 6:48]
Qilu Wei
The affect of Bug #91487 is similar to this case but the cause is not the same. Bug #91487 is the result of writing 1 byte at a time to disk. As you proved in Bug #91487, on Linux it makes a single write() per error log line. Using strace, I found there are much time used in futex in startup. The following is strace in startup. There are futex between the writes and the time interval between two write is about 0.02s: 23545 10:27:32.287594 write(2, "2020-09-07T10:27:32.262444+08:00 0 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9487925.\n", 172) = 172 <0.000030> 23590 10:27:32.289766 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.010084> 23590 10:27:32.289839 futex(0x16d72c00, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000039> 23590 10:27:32.289943 futex(0x16d72c2c, FUTEX_WAIT_BITSET_PRIVATE, 20135, {3775589, 478319277}, ffffffff) = -1 ETIMEDOUT (Connection timed out) <0.010123> 23590 10:27:32.300140 futex(0x16d72c00, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000020> 23590 10:27:32.300222 futex(0x16d72c2c, FUTEX_WAIT_BITSET_PRIVATE, 20137, {3775589, 488595702}, ffffffff) = -1 ETIMEDOUT (Connection timed out) <0.010113> 23590 10:27:32.310422 futex(0x16d72c00, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000020> 23590 10:27:32.310503 futex(0x16d72c2c, FUTEX_WAIT_BITSET_PRIVATE, 20139, {3775589, 498878847}, ffffffff <unfinished ...> 23545 10:27:32.315253 write(2, "2020-09-07T10:27:32.287756+08:00 0 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9488754.\n", 172) = 172 <0.000029> 23590 10:27:32.320652 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.010124> 23590 10:27:32.320743 futex(0x16d72c00, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000023> 23590 10:27:32.320816 futex(0x16d72c2c, FUTEX_WAIT_BITSET_PRIVATE, 20141, {3775589, 509190687}, ffffffff) = -1 ETIMEDOUT (Connection timed out) <0.010082> 23590 10:27:32.330971 futex(0x16d72c00, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000020> 23590 10:27:32.331052 futex(0x16d72c2c, FUTEX_WAIT_BITSET_PRIVATE, 20143, {3775589, 519426527}, ffffffff <unfinished ...> 23545 10:27:32.340663 write(2, "2020-09-07T10:27:32.315383+08:00 0 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9489583.\n", 172) = 172 <0.000040> 23590 10:27:32.341161 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.010084> The following is strace in "start slave". There is no futex between the writes and the time interval between two write is about 0.0001s: 23858 10:34:14.188853 write(2, "2020-09-07T10:34:14.188823+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9356762.\n", 173) = 173 <0.000028> 23858 10:34:14.188945 write(2, "2020-09-07T10:34:14.188922+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9356846.\n", 173) = 173 <0.000022> 23858 10:34:14.189020 write(2, "2020-09-07T10:34:14.188999+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9356912.\n", 173) = 173 <0.000022> 23858 10:34:14.189093 write(2, "2020-09-07T10:34:14.189072+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9356943.\n", 173) = 173 <0.000022> 23858 10:34:14.189200 write(2, "2020-09-07T10:34:14.189178+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9357772.\n", 173) = 173 <0.000023> 23858 10:34:14.189291 write(2, "2020-09-07T10:34:14.189269+08:00 42 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name mysql-bin.000001, event_master_log_pos 9358601.\n", 173) = 173 <0.000024> Before startup, error logging using log_sink_buffer() takes more locks than after-startup. It is probably the reason why the startup is slow when huge amounts of log lines getting written to the error log.
[7 Sep 2020 9:03]
MySQL Verification Team
Hi, I still believe this is not a bug but since I can reproduce that > The code I changed is: > (Line 6083, rpl_slave.cc) > // LogErr(INFORMATION_LEVEL, ER_RPL_MTS_GROUP_RECOVERY_RELAY_LOG_INFO, > // rli->get_group_master_log_name(), ev->common_header->log_pos); does take a lot of time I'm verifying the bug but I'm dropping severity down to S5-Performance thanks for the report and the analysis Bogdan