Bug #116535 Plugin instructed the server to rollback the current transaction
Submitted: 4 Nov 2024 4:56 Modified: 29 Dec 2024 12:42
Reporter: Krishnadas K P Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.39-0ubuntu0.24.04.2 OS:Ubuntu (24.04)
Assigned to: MySQL Verification Team CPU Architecture:x86
Tags: group replication, rollback transaction

[4 Nov 2024 4:56] Krishnadas K P
Description:
I have a 3 node MySQL cluster with group replication enabled. I am using MySQL version 8.0.39-0ubuntu0.24.04.2. I have setup the replication in multi-primary mode. However, my application is only connecting to one node and reading and writing from/to that single node. The rest of the 2 nodes are just replicating data and ready to serve writes anytime I point my application to it. This cluster works most of the time but when I apply large load to it, a number of transactions fail with the message "plugin instructed the server to rollback the current transaction". On reading about this, I could understand the issue can arise due to conflicting writes in multiple nodes. However, in my current setup, write is ONLY going to one node. There should not be a conflicting write from another node.

When the server is switched to single primary mode, and applying the same set of transactions, this error does not recur. If the transactions inherently were conflict inducing, this should not have gotten fixed on moving to single primary mode.

How to repeat:
Multi-primary MySQL group replication cluster and apply large load (300 rps+). A small set of transactions get rolled back.
[8 Nov 2024 15:45] MySQL Verification Team
Hi,

I did not manage to reproduce this, but I am not 100% sure even if I did this would be a bug as in multimaster having some transactions in parallel could be problematic so those would be rollbacked. MySQL do not know you are using only one server for writes so this rule would be enabled in multimaster setup. I'm double checking this with the GR team but in the meantime if you could help me reproduce this it would be great as ATTM me pushing a large number of transactions to a GR with multimaster did not reproduce this so it has to be a specific mixture of transactions

Thanks
[11 Nov 2024 5:22] Krishnadas K P
Thank you for looking into this.

I understand this is a hard case to reproduce. For me, I am able to consistently reproduce this with my performance testing script which does a test of my app, of which this MySQL cluster is the database component.

As mentioned already, I am not actually doing multi-write for node conflict induced rollbacks to happen and hence would have to assume the conflicts are between queries running in the same node. My concern with this behavior is that, if the queries do conflict, why are they not conflicting when I do group_replication_switch_to_single_primary_mode() and promptly return when I do  group_replication_switch_to_multi_primary_mode().

What checks do single primary mode lack that multi primary mode has, that might be giving a pass to these queries ? Either single primary mode is allowing queries that should have been rolled back or multi-primary mode is rolling back queries that should have been allowed.

In addition, I am not able to get diagnostics on what is causing the conflict/rollback from the logs. I have enabled general log and all I can see is some queries are rolling back and no other info. https://bugs.mysql.com/84730 was for having more observability into such cases but doesn't seem to have implemented.

Please let me know if you need logs etc. for looking into this.

Once again, thank you for looking into this.
[12 Nov 2024 15:32] MySQL Verification Team
Hi,

Yes, FR from Bug#84730 would let us easily find what's wrong.

If you have idea how I can more easily reproduce this - share please :)

I assume you cannot share your testing procedure as it is part of your app?

I am working with GR team to try to find out how to proceed.

Kind regards
[14 Nov 2024 5:17] Krishnadas K P
Thank you.

My tests are mostly doing API requests to the app rather than SQL queries so might have a bit of trouble replicating that exactly. However I will try to find out a way to replicate this consistently with scripts.

Would it be helpful to share logs from single primary mode (without any rollbacks) and multi primary mode (with rollbacks) ?
[20 Nov 2024 9:00] Krishnadas K P
I have an update on this. I did similar load testing by using Galera cluster and I did not face transaction rollback issue. This seems to be isolated to the multi-primary mode Group replication config
[29 Nov 2024 12:42] MySQL Verification Team
> Would it be helpful to share logs from single primary mode (without any rollbacks) and multi primary mode (with rollbacks) ?

Logs might help, but if you are able to write a script that does similar to what your app is doing that reproduces the problem that would be the best case scenario as I'm failing to reproduce this myself.
[30 Dec 2024 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".