Bug #102346 MTS Recovery Group, unnecessary recovery
Submitted: 22 Jan 2021 13:02 Modified: 12 Feb 2021 15:55
Reporter: hongyu dong (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.7.32 OS:Any
Assigned to: CPU Architecture:Any

[22 Jan 2021 13:02] hongyu dong
Description:
hi:

in mts_recovery_groups function:
LOG_POS_COORD w_last = {const_cast<char*>(worker->get_group_master_log_name()),
                        worker->get_group_master_log_pos() };
if (mts_event_coord_cmp(&w_last, &cp)> 0)
{
  /*
    Inserts information into a dynamic array for further processing.
    The jobs/workers are ordered by the last checkpoint positions
    workers have seen.
  */
  job_worker.worker = worker;
  job_worker.checkpoint_log_pos = worker->checkpoint_master_log_pos;
  job_worker.checkpoint_log_name = worker->checkpoint_master_log_name;

  above_lwm_jobs.push_back(job_worker);
}

If w_last> cp, add the worker thread to above_lwm_jobs, then scan the relay log from lwm, and set recovery_group_cnt++

int ret = 0;
LOG_POS_COORD ev_coord= {(char *) rli->get_group_master_log_name(),
                            ev->common_header->log_pos };
flag_group_seen_begin = false;
recovery_group_cnt++;

to solve the following problems
+----+----+--------------+----+----+----+----+---- +
| a | c |           a | b | b |   |   |   |
+----+----+--------------+----+----+----+----+---- +
| 0 | 1 | 2(applying) | 3 | 4 | 5 | 6 | 7 |
+----+----+--------------+----+----+----+----+---- +
  |                        | 
  |                        |
  |                        |
  |                        |
 lwm                     worker

Since 2 is not completed, there is'GAP', need fill_mts_gaps_and_recover(), recovery 'GAP' 

However, it is also possible that the checkpoint has not been done in time. The corresponding situation is as follows:
+----+----+--------------+----+----+----+----+---- +
| a | c | a | b | b |   |   |   |
+----+----+--------------+----+----+----+----+---- +
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+--------------+----+----+----+----+---- +
  |               | 
  |               |
  |               |
  |               |
 lwm            worker

In this case, there is no'GAP', so the following fill_mts_gaps_and_recover() is not needed

How to repeat:
none

Suggested fix:
add the judgment of groups, if the bitmap is all 1, set recovery_group_cnt = 0
[27 Jan 2021 3:27] hongyu dong
update version
[2 Feb 2021 3:52] MySQL Verification Team
Hi,

Thanks for the report.

best regards
[12 Feb 2021 15:55] hongyu dong
diff --git a/sql/rpl_slave.cc b/sql/rpl_slave.cc
index 53f43ae..6a6d29e 100644
--- a/sql/rpl_slave.cc
+++ b/sql/rpl_slave.cc
@@ -6431,6 +6431,7 @@ bool mts_recovery_groups(Relay_log_info *rli)
     rli->get_group_master_log_pos()
   };

+  LOG_POS_COORD max_w_last = cp;
   Format_description_log_event fdle(BINLOG_VERSION), *p_fdle= &fdle;
   DBUG_ASSERT(p_fdle->is_valid());

@@ -6477,6 +6478,11 @@ bool mts_recovery_groups(Relay_log_info *rli)
       job_worker.checkpoint_log_name= worker->checkpoint_master_log_name;

       above_lwm_jobs.push_back(job_worker);
+
+      if (mts_event_coord_cmp(&w_last, &max_w_last) > 0){
+        max_w_last.file_name = w_last.file_name;
+        max_w_last.pos = w_last.pos;
+      }
     }
     else
     {
@@ -6680,6 +6686,27 @@ bool mts_recovery_groups(Relay_log_info *rli)
 err:
   is_error= true;
 end:
+
+    if (rli->mts_recovery_group_cnt > 0) {
+    uint i;
+    for (i = 0; i < rli->mts_recovery_group_cnt; ++i) {
+      if (bitmap_is_set(groups, i)) {
+        continue;
+      }
+      break;
+    }
+    if (i == rli->mts_recovery_group_cnt) {
+
+      rli->mts_recovery_group_cnt = 0;
+      rli->set_group_master_log_name(max_w_last.file_name);
+      rli->set_group_master_log_pos(max_w_last.pos);
+      sql_print_information("Slave: MTS group recovery  unnecessary recovery"
+                            "group_master_log_name %s, "
+                            "group_master_log_pos %llu.",
+                            rli->get_group_master_log_name(),
+                            rli->get_group_master_log_pos());
+    }
+  }

   for (Slave_job_group *jg= above_lwm_jobs.begin();
        jg != above_lwm_jobs.end(); ++jg)
[12 Feb 2021 15:55] hongyu dong
patch file

Attachment: 102346_patch (application/octet-stream, text), 1.65 KiB.

[26 Feb 2021 15:59] OCA Admin
Contribution submitted via Github - FIX Bug #102346 MTS Recovery Group, unnecessary recovery 
(*) Contribution by hongyu dong (Github donghy-coredumped, mysql-server/pull/326#issuecomment-786574640): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_577071005.txt (text/plain), 1.96 KiB.