Bug #116245 innodb cluster / GR error after import w/ has_generated_invisible_primary_key=0
Submitted: 26 Sep 2024 15:23 Modified: 30 Oct 2024 15:17
Reporter: Keith Hollman Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.36 OS:Red Hat ( 4.18.0-553.8.1.el8_10.x86_64)
Assigned to: MySQL Verification Team CPU Architecture:x86

[26 Sep 2024 15:23] Keith Hollman
Description:
 Ver 8.0.36 for Linux on x86_64 (MySQL Community Server - GPL)

After importing drupal data into a single schema, the 2 replica nodes receive the following error:
2024-09-18T08:22:36.452339Z 15 [ERROR] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': Worker 1 failed executing transaction 'a4144720-d7c2-11ee-bd73-0050569c0136:48171347'; Error executing row event: 'Table 'mindcompanion.cache_container' doesn't exist', Error_code: MY-001146
2024-09-18T08:22:36.461471Z 14 [Warning] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
2024-09-18T08:22:36.471998Z 14 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2024-09-18T08:22:36.474936Z 12 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2024-09-18T08:22:36.486026Z 12 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2024-09-18T08:22:36.488305Z 12 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2024-09-18T08:22:40.303061Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

Then we get:
2024-09-18T18:00:01.838541Z 3763 [ERROR] [MY-011601] [Repl] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication is on ERROR state. Check for errors and restart the plugin'

This server is configured with:
 SET persist_only disabled_storage_engines='MyISAM', persist sql_generate_invisible_primary_key=ON;
 SELECT @@sql_generate_invisible_primary_key, @@show_gipk_in_create_table_and_information_schema;
+--------------------------------------+----------------------------------------------------+
| @@sql_generate_invisible_primary_key | @@show_gipk_in_create_table_and_information_schema |
+--------------------------------------+----------------------------------------------------+
|                                    1 |                                                  1 |
+--------------------------------------+----------------------------------------------------+

However in the binlogs I see the following for the aforementioned cache_container table:

/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1726647756439240 (2024-09-18 08:22:36.439240 UTC)
# immediate_commit_timestamp=1726647756450119 (2024-09-18 08:22:36.450119 UTC)
/*!80001 SET @@session.original_commit_timestamp=1726647756439240*//*!*/;
/*!80014 SET @@session.original_server_version=80036*//*!*/;
/*!80014 SET @@session.immediate_server_version=80036*//*!*/;
SET @@SESSION.GTID_NEXT= 'a4144720-d7c2-11ee-bd73-0050569c0136:48171347'/*!*/;
# at 939181689
#240918  8:22:36 server id 192692467  end_log_pos 939181778 CRC32 0xadd9f4cf    Query   thread_id=75704930      exec_time=0     error_code=0
SET TIMESTAMP=1726647756/*!*/;
SET @@session.pseudo_thread_id=75704930/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1306787887/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=2/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=255,@@session.collation_connection=255,@@session.collation_server=255/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
/*!80011 SET @@session.default_collation_for_utf8mb4=255*//*!*/;
BEGIN
/*!*/;
# at 939181778
#240918  8:22:36 server id 192692467  end_log_pos 939181868 CRC32 0xc05b27dd    Table_map: `mindcompanion`.`cache_container` mapped to number 272260068
# has_generated_invisible_primary_key=0
# at 939181868
#240918  8:22:36 server id 192692467  end_log_pos 939843345 CRC32 0x507195fd    Write_rows: table id 272260068 flags: STMT_END_F
# at 939843345
#240918  8:22:36 server id 192692467  end_log_pos 939843376 CRC32 0xc4098a54    Xid = 7654877912
COMMIT/*!*/;

Now the table is clearly being considered as has_generated_invisible_primary_key=0 however I suspect that other factors might be influencing the expelling of the 2 nodes (we're also getting 

How to repeat:
Not sure, but I suppose re-running the data import (from MariaDB 10.32) 

Suggested fix:
Ensure that even though the binlogs state has_generated_invisible_primary_key=0, that the session and server creates the table with sql_generate_invisible_primary_key on.
[27 Sep 2024 16:54] MySQL Verification Team
Hi,

Apologies but I am not able to reproduce this. I am not 100% sure what you are doing here. You had DRUPAL running on Maria, you dumped the data from Maria and tried to load it using MySQL 8.0.36? Where does replication come in?
[30 Sep 2024 12:56] Keith Hollman
Hi,
 Apologies. I'm importing into a 3 node InnoDB Cluster with sql_generate_invisible_primary_key=ON for all missing PK's and the import goes ok, but it seems that after creation and importing the data, the table is marked with has_generated_invisible_primary_key=0 and hence the GR across the other 2 nodes fails and this causes the nodes to become out of sync and the cluster fails.

Thanks
K.
[30 Sep 2024 15:17] MySQL Verification Team
Hi,

I tried some manually created data and I failed to reproduce the problem.

Check if that dump that you created from unsupported sql server contain some comment settings changes. Maybe do export separately for structure and for data so you first import structure to create everything and then import data itself checking that there is nothing in the file that will mess up with invisible primary keys (ddl dump will be small and easy to manually check and fix)
[1 Nov 2024 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".