Description:
when i use performance_schema.replication_group_member_stats to monitor COUNT_TRANSACTIONS_ROWS_VALIDATING, i found very different values in members, as follow:
node3>select COUNT_TRANSACTIONS_ROWS_VALIDATING from replication_group_member_stats;
+------------------------------------+
| COUNT_TRANSACTIONS_ROWS_VALIDATING |
+------------------------------------+
| 83 |
| 247 |
| 315708 |
+------------------------------------+
315708 value is local COUNT_TRANSACTIONS_ROWS_VALIDATING。then i use another node do select to compare
node3>show variables like "%server_uuid%";
+---------------+--------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------+
| server_uuid | 6f3e0827-2e70-11e8-8330-c81f66e48c6e |
+---------------+--------------------------------------+
1 row in set (0.00 sec)
node3>select * from replication_group_member_stats where MEMBER_ID = '5b314bb9-2e70-11e8-b53f-c81f66e48c6e'\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 15217921358048072:919
MEMBER_ID: 5b314bb9-2e70-11e8-b53f-c81f66e48c6e
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 949389
COUNT_CONFLICTS_DETECTED: 76
COUNT_TRANSACTIONS_ROWS_VALIDATING: 73
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-176037690:176638008-176721915
LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:176357180
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
COUNT_TRANSACTIONS_REMOTE_APPLIED: 334411
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 615107
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 78
1 row in set (0.00 sec)
node1>show variables like "%server_uuid%";
+---------------+--------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------+
| server_uuid | 5b314bb9-2e70-11e8-b53f-c81f66e48c6e |
+---------------+--------------------------------------+
1 row in set (0.00 sec)
node1>select * from replication_group_member_stats where MEMBER_ID = '5b314bb9-2e70-11e8-b53f-c81f66e48c6e'\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 15217921358048072:919
MEMBER_ID: 5b314bb9-2e70-11e8-b53f-c81f66e48c6e
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 901148
COUNT_CONFLICTS_DETECTED: 332
COUNT_TRANSACTIONS_ROWS_VALIDATING: 633190
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-176037690:176638008-176721915
LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:176313978
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
COUNT_TRANSACTIONS_REMOTE_APPLIED: 334411
COUNT_TRANSACTIONS_LOCAL_PROPOSED: 566741
COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 78
1 row in set (0.00 sec)
then ,i add some some debug info before sending and after receving the msg:
2018-03-26T13:58:11.818600+08:00 9 [Note] Plugin group_replication reported: 'encode_payload ROWS_VALIDATING 168300, CERTIFIED 61671'
2018-03-26T13:58:11.819216+08:00 0 [Note] Plugin group_replication reported: 'decode_payload ROWS_VALIDATING 108, CERTIFIED 61671'
101001000101101100 168300
01101100 108
it seems COUNT_TRANSACTIONS_ROWS_VALIDATING fetch from Pipeline_stats_member_message is wrong, only show the low 1 bytes.
How to repeat:
1、deploy mgr cluster with 3 nodes,
2、run sysbench oltp for a while, then
3、do select * from performance_schema.replication_group_member_stats.
4、compare the COUNT_TRANSACTIONS_ROWS_VALIDATING in different nodes
Suggested fix:
may be the reason that:
void
Pipeline_stats_member_message::decode_payload(const unsigned char *buffer,
const unsigned char *end)
.....
case PIT_TRANSACTIONS_ROWS_VALIDATING:
if (slider + payload_item_length <= end)
{
uint64 transactions_rows_validating_aux= *slider;
slider += payload_item_length;
m_transactions_rows_validating=
(int64)transactions_rows_validating_aux;
}
break;
.....
uint64 transactions_rows_validating_aux= *slider;
should be :
uint64 transactions_rows_validating_aux= sint8korr(slider);