Bug #105938 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE count abnormal
Submitted: 20 Dec 2021 10:18 Modified: 29 Aug 2022 14:46
Reporter: peng gao Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.27 OS:Any
Assigned to: CPU Architecture:Any

[20 Dec 2021 10:18] peng gao
Description:
Hi all:

  recently,we use 8.0.27 MGR,when we start SECONDARY node,COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE is not 0,
  and then switch SECONDARY to primary COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE still not 0.sometimes is 1, sometimes larger than 1.

mysql> select a.member_id,a.member_host,a.member_role,b.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE from performance_schema.replication_group_members a,        performance_schema.replication_group_member_stats b where a.MEMBER_ID=b.member_id;
+--------------------------------------+-------------+-------------+--------------------------------------------+
| member_id                            | member_host | member_role | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE |
+--------------------------------------+-------------+-------------+--------------------------------------------+
| 6f450880-5e0c-11ec-9998-000c29e4ec4a | mgr10       | SECONDARY   |                                          1 |
| 97a04402-5dae-11ec-9190-000c29a8c42a | mgr11       | PRIMARY     |                                          0 |
+--------------------------------------+-------------+-------------+--------------------------------------------+
2 rows in set (0.00 sec)

mysql> show master status ;
+------------------+----------+--------------+------------------+----------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                            |
+------------------+----------+--------------+------------------+----------------------------------------------+
| mysql-bin.000007 |     3132 |              |                  | eb2ec3f5-5daf-11ec-95bc-000c29a8c42a:1-36273 |
+------------------+----------+--------------+------------------+----------------------------------------------+
1 row in set (0.00 sec)

mysql> select group_replication_set_as_primary('6f450880-5e0c-11ec-9998-000c29e4ec4a');
+--------------------------------------------------------------------------+
| group_replication_set_as_primary('6f450880-5e0c-11ec-9998-000c29e4ec4a') |
+--------------------------------------------------------------------------+
| Primary server switched to: 6f450880-5e0c-11ec-9998-000c29e4ec4a         |
+--------------------------------------------------------------------------+
1 row in set (5.55 sec)

mysql> select a.member_id,a.member_host,a.member_role,b.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE from performance_schema.replication_group_members a,        performance_schema.replication_group_member_stats b where a.MEMBER_ID=b.member_id;
+--------------------------------------+-------------+-------------+--------------------------------------------+
| member_id                            | member_host | member_role | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE |
+--------------------------------------+-------------+-------------+--------------------------------------------+
| 6f450880-5e0c-11ec-9998-000c29e4ec4a | mgr10       | PRIMARY     |                                          1 |
| 97a04402-5dae-11ec-9190-000c29a8c42a | mgr11       | SECONDARY   |                                          0 |
+--------------------------------------+-------------+-------------+--------------------------------------------+

We find commit cb094ae3612e65c7eda4702d3471e5cf00db9f1a
BUG#33067441: COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE IS NOT UPDATED DURING RECOVERY
modify and new add recovery's COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE in recovery stage. 

1.Applier_handler::handle_event add recovery satge trx increment.
2.group_replication_trans_before_commit add recovery stage trx decrement. 

We watch applier_module->get_pipeline_stats_member_collector()->m_transactions_waiting_apply memory address,
find value increment 1 when applier thread revoke Applier_handler::handle_event ,but not decrement 1.
Hardware watchpoint 13: *(0x7fff1c234580)

Old value = 0
New value = 1

But we find sometimes revoke Query_log_event::do_apply_event to commit view change transation,
trans_commit function cache_mngr is nullptr then return sucess,not invoke group_replication_trans_before_commit
to decrement this value,so value still 1. step like follow:

7936      DBUG_PRINT("info",
(gdb) 
7938      binlog_cache_mngr *cache_mngr = thd_get_cache_mngr(thd);
(gdb) 
7939      Transaction_ctx *trn_ctx = thd->get_transaction();
(gdb) 
(gdb) p  cache_mngr
$20 = (binlog_cache_mngr *) 0x0
7940      my_xid xid = trn_ctx->xid_state()->get_xid()->get_my_xid();
(gdb) 
7941      bool stmt_stuff_logged = false;
(gdb) 
7942      bool trx_stuff_logged = false;
(gdb) 
7943      bool skip_commit = is_loggable_xa_prepare(thd);
(gdb) 
7944      bool is_atomic_ddl = false;
(gdb) 
7946      DBUG_PRINT("enter", ("thd: 0x%llx, all: %s, xid: %llu, cache_mngr: 0x%llx",
(gdb) 
7954      if (cache_mngr == nullptr) {
(gdb) 
7955        if (!skip_commit && ha_commit_low(thd, all)) return RESULT_ABORTED;
(gdb) 
7956        return RESULT_SUCCESS;
(gdb) 
7935      DBUG_TRACE;
(gdb) 
8188    }

How to repeat:
sometimes start group_replication in SECONDARY node.
[20 Dec 2021 10:22] peng gao
Is there any other impact if this values is not 0 ?such as choose primary .

thanks all!
[30 Dec 2021 6:48] MySQL Verification Team
Thanks for the report
[30 Dec 2021 7:04] peng gao
Is there any other impact if this values is not 0 ?such as choose primary .

thanks all!
[22 Aug 2022 10:09] xichao li
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE >0 problem analysis

Attachment: COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE >0 problem analysis.txt (text/plain), 10.12 KiB.

[22 Aug 2022 10:14] xichao li
Also, the issue is fixed in mysql 8.0.30。

Replication: The COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE column in the Performance Schema table replication_group_member_stats could persistently show transactions related to view change events (View_change_log_event) that had already been applied. These events are queued in the Group Replication applier channel but applied in the Group Replication recovery channel, causing a race condition that could result in the counter decrement being lost. The increment of the count now takes place at a more suitable point, and the counter for COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE is also now set to zero when the applier is not busy. (Bug #33602354, Bug #33674059)

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-30.html