Bug #108421 Events remain in SLAVESIDE_DISABLED after a node is promoted to PRIMARY in GR
Submitted: 7 Sep 2022 21:52 Modified: 16 Oct 5:59
Reporter: Marcos Albe (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.29 OS:Any
Assigned to: CPU Architecture:Any

[7 Sep 2022 21:52] Marcos Albe
Description:
Hello!

On a group replication cluster, if I enable event_scheduler on every node, then create a recurring event, then kill the primary node, when one of the remaining secondaries takes over, the event no longer runs there as it's not ENABLED but rather SLAVESIDE_DISABLED:

dbdeployer deploy replication --topology=group /opt/mysql/8.0.29 --port=28758 --single-primary
 
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./check_nodes
# Node 1 # select * from performance_schema.replication_group_members
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 00044659-1111-1111-1111-111111111111 | 127.0.0.1   |       44659 | ONLINE       | PRIMARY     | 8.0.29         | XCom                       |
| group_replication_applier | 00044660-2222-2222-2222-222222222222 | 127.0.0.1   |       44660 | ONLINE       | SECONDARY   | 8.0.29         | XCom                       |
| group_replication_applier | 00044661-3333-3333-3333-333333333333 | 127.0.0.1   |       44661 | ONLINE       | SECONDARY   | 8.0.29         | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+

[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./use_all -e "SET GLOBAL event_scheduler=ON; SELECT @@global.event_scheduler;"
# server: 1
@@global.event_scheduler
ON
# server: 2
@@global.event_scheduler
ON
# server: 3
@@global.event_scheduler
ON

~$ cat /tmp/event.sql
USE test;
DROP TABLE totals;
DROP EVENT e_totals;
CREATE TABLE test.totals (v TIMESTAMP(6) NOT NULL PRIMARY KEY, exec_host char(64) NOT NULL);
CREATE EVENT e_totals ON SCHEDULE every 5 second DO INSERT INTO test.totals VALUES (NOW(), @@server_uuid);
SET GLOBAL event_scheduler=ON;

~$ cat /tmp/check.sql
SELECT * FROM test.totals;
SELECT EVENT_NAME,STATUS FROM information_schema.events\G 
SELECT MEMBER_ID, MEMBER_STATE, MEMBER_ROLE FROM performance_schema.replication_group_members\G

On the original primary:
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./n1 < /tmp/check.sql
v       exec_host
2022-09-07 21:37:49.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:37:54.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:37:59.000000      00044659-1111-1111-1111-111111111111

*************************** 1. row ***************************
          EVENT_NAME: e_totals
              STATUS: ENABLED

*************************** 1. row ***************************
   MEMBER_ID: 00044659-1111-1111-1111-111111111111
MEMBER_STATE: ONLINE
 MEMBER_ROLE: PRIMARY
*************************** 2. row ***************************
   MEMBER_ID: 00044660-2222-2222-2222-222222222222
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY
*************************** 3. row ***************************
   MEMBER_ID: 00044661-3333-3333-3333-333333333333
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY

Before failover, on secondary:

[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./n2 < /tmp/check.sql
v       exec_host
2022-09-07 21:37:49.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:37:54.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:37:59.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:04.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:09.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:14.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:19.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:24.000000      00044659-1111-1111-1111-111111111111

*************************** 1. row ***************************
          EVENT_NAME: e_totals
              STATUS: SLAVESIDE_DISABLED

*************************** 1. row ***************************
   MEMBER_ID: 00044659-1111-1111-1111-111111111111
MEMBER_STATE: ONLINE
 MEMBER_ROLE: PRIMARY
*************************** 2. row ***************************
   MEMBER_ID: 00044660-2222-2222-2222-222222222222
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY
*************************** 3. row ***************************
   MEMBER_ID: 00044661-3333-3333-3333-333333333333
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY

Kill original primary
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ cat node1/data/*pid
2653599
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ kill -9 2653599

Check again on n2 (new primary):
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./n2 < /tmp/check.sql
v       exec_host
2022-09-07 21:38:54.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:59.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:04.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:09.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:14.000000      00044659-1111-1111-1111-111111111111

*************************** 1. row ***************************
          EVENT_NAME: e_totals
              STATUS: SLAVESIDE_DISABLED
  
*************************** 1. row ***************************
   MEMBER_ID: 00044660-2222-2222-2222-222222222222
MEMBER_STATE: ONLINE
 MEMBER_ROLE: PRIMARY
*************************** 2. row ***************************
   MEMBER_ID: 00044661-3333-3333-3333-333333333333
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY

Wait a bit and check again to make sure nothing changed:
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ sleep 10;
[COR-SER] [marcos.albe@tp-support03 group_sp_msb_8_0_29]$ ./n2 < /tmp/check.sql
v       exec_host
2022-09-07 21:38:54.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:38:59.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:04.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:09.000000      00044659-1111-1111-1111-111111111111
2022-09-07 21:39:14.000000      00044659-1111-1111-1111-111111111111

*************************** 1. row ***************************
          EVENT_NAME: e_totals
              STATUS: SLAVESIDE_DISABLED

*************************** 1. row ***************************
   MEMBER_ID: 00044660-2222-2222-2222-222222222222
MEMBER_STATE: ONLINE
 MEMBER_ROLE: PRIMARY
*************************** 2. row ***************************
   MEMBER_ID: 00044661-3333-3333-3333-333333333333
MEMBER_STATE: ONLINE
 MEMBER_ROLE: SECONDARY

You can see they are still disabled.

How to repeat:
See above for step by step

Suggested fix:
The natural thing would for the event's status to match with member's role, so that if member is the PRIMARY the event would be ENABLED and if member transitions to secondary the events would be DISABLED.  

Perhaps having an "auto_status" boolean indicating whether the event's status should be managed by GR would add extra coolness points :D
[8 Sep 2022 9:54] MySQL Verification Team
Hello Marcos,

Thank you for the report and test case.

Thanks,
Umesh
[16 Oct 5:56] Justin Jose
Posted by developer:
 
Hi,

The observed behavior is consistent with the documented functionality and does not appear to be a bug.

As per the documentation (https://dev.mysql.com/doc/refman/8.0/en/replication-features-invoked.html):
"The feature implementation resides on the replica in a renewable state, allowing the replica to become the source without loss of event processing if the source fails.
When promoting a replica with such events to a source, each event must be manually enabled using ALTER EVENT event_name ENABLE, where event_name refers to the specific event."

However, we believe this could be considered as a potential feature request, which is currently in the backlog.

Workaround: In the interim, we recommend that customers follow the documented steps and manually enable events on the newly elected primary.