MySQL Bugs: #112015: MySQL Operator / MySQL InnoDB Cluster - Node lagging behind

Bug #112015	MySQL Operator / MySQL InnoDB Cluster - Node lagging behind - election process
Submitted:	9 Aug 2023 14:01	Modified:	21 Aug 2023 23:59
Reporter:	Carlos Abrantes	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Hi MySQL,

This is a question/clarification/Feature Request (let me know if there is a better place/flow for it).

After deploying InnoDB Cluster with MySQL version 8.0.33 via MySQL Operator i was trying to understand if a node that is lagging could be elected as Primary.

{
    "clusterName": "mysql",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysql-2.mysql-instances.mysql.svc.cluster.local:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysql-0.mysql-instances.mysql.svc.cluster.local:3306": {
                "address": "mysql-0.mysql-instances.mysql.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "00:05:34.192010",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            },
            "mysql-1.mysql-instances.mysql.svc.cluster.local:3306": {
                "address": "mysql-1.mysql-instances.mysql.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            },
            "mysql-2.mysql-instances.mysql.svc.cluster.local:3306": {
                "address": "mysql-2.mysql-instances.mysql.svc.cluster.local:3306",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "mysql-2.mysql-instances.mysql.svc.cluster.local:3306"
}

The documentation i found don't state that lagging behind will prevent a node to be used as primary, something that can be critical in many use cases, can you please confirm if my understanding is correct?

For me makes since to at least be able to configure the behaviour of a lagging node if should be consider in Primary election or not, basically a threshold after which the node would not be participating in Primary election.

I was not able to find any reference to such behaviour, can you please clarify if it exists, where can i read about it, or if not consider  this a Feature Request?

Version:
community-operator:8.0.33-2.0.10
community-server:8.0.33
community-router:8.0.33

apiVersion: v2
appVersion: 8.0.33
description: MySQL InnoDB Cluster Helm Chart for deploying MySQL InnoDB Cluster in Kubernetes
icon: https://labs.mysql.com/common/themes/sakila/favicon.ico
name: mysql-innodbcluster
type: application
version: 2.0.10

Thanks,

How to repeat:
After deploying i used fallocate to create a big file to get the MySQL FS full.
This made the MySQL node to lagging behind.

Hi Mr. Abrantes,

Thank you for your bug report. 

However, this is not a bug. This is a question on the managing of the InnoDB Cluster.

We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the result of a bug.

For details on getting support for MySQL products see http://www.mysql.com/support/.
You can also check our free forums  at http://forums.mysql.com/.

There are also other sites for free discussions, like https://stackoverflow.com, https://dba.stackexchange.com, https://serverfault.com etc .......

Not a bug.

Thank you for your interest in MySQL.

Hi,

Yes, what i described was not a bug, but trying to get a correct information and if needed a Feature Request (i tried slack without success i try others).

Nevertheless and related to this (let me know if i should open a new ticket) i can see that this replicationLag seems not to work always.

I did the same as yesterday, so made the MySQL FS full, but now its not reporting as before that is lagging for 5m or 10m etc states:

            "mysql-0.mysql-instances.mysql.svc.cluster.local:3306": {
                "address": "mysql-0.mysql-instances.mysql.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "applier_queue_applied",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            },

 MySQL  mysql-0.mysql-instances.mysql.svc.cluster.local:3306 ssl  live  SQL > select count(1) from table;
+----------+
| count(1) |
+----------+
|      204 |
+----------+

 MySQL  mysql-1.mysql-instances.mysql.svc.cluster.local:3306 ssl  live  SQL > select count(1) from CPEManager_CPEs;
+----------+
| count(1) |
+----------+
|      404 |
+----------+

So how is it stating that all is done "replicationLag": "applier_queue_applied" when i can see in the data that a mysql-0 is stall behind not being able to write data?

Thanks,

Update:

only after i cleaned the FS i was able to see:

            "mysql-0.mysql-instances.mysql.svc.cluster.local:3306": {
                "address": "mysql-0.mysql-instances.mysql.svc.cluster.local:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "replicationLag": "01:20:23.311549",
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.33"
            },

"replicationLag": "01:20:23.311549",

This sounds like a limitation/bug. Can you please explain how this replicationLag works? yesterday with FS full replicationLag was showing time, today on really same scenario was always showing "replicationLag": "applier_queue_applied" (like nothing to apply, even if i saw that data was old), until i made space available where it started to show "replicationLag": "01:20:23.311549",

Thanks,

HI Mr. Abrantas,

Not having space available is a great explanation for what you observe.

The lag is actually a difference between time of the binary log being written and the statement executed.

Not a bug.

Hi MySQL team,

understood, its an expected behaviour, which means that in this case there is no way with MySQL mechanics to understand if cluster is healthy or with problems, correct?

I also notice that a node lagging behind is possible to be use as primary, which in many use cases should be avoided, should this be open as a bug or Feature Request?

Thanks,

You are truly welcome.

Before closing can you please confirm my questions if possible?

In this case there is no way with MySQL mechanics to understand if cluster is healthy or with problems, correct (Full FS)?

I also notice that a node lagging behind is possible to be elected as primary, which in many use cases should be avoided/forbidden, i will log a separate ticket, but should this be open as a bug or Feature Request?

Thanks,

Hi Mr. Abrantes,

We shall get back to you with the answers ......

Hi,

> let me know if there is a better place/flow for it)

Proper place would be MySQL Support system or if you do not want to get a support subscription then you can try mysql forums. As a colleague previously wrote, not a bugs system, bugs system is not to be used as support channel. We need to keep this focused on bugs only.

> I was not able to find any reference to such behaviour, can you please clarify if it exists, where can i read about it, or if not consider  this a Feature Request?

Yes. There are some enterprise tools that can help (MySQL Enterprise Monitor for e.g.).

> After deploying i used fallocate to create a big file to get the MySQL FS full.

Anything related to "full FS" is not considered a bug. Monitoring free space is much simpler than fixing behavior full disk can cause. There are many hard issues you can encounter if your disk gets full, including data loss. Use a proper system admin and monitor your disk usage or use MySQL as a service on Oracle cloud so we will monitor that for you.

> hich means that in this case there is no way with MySQL mechanics to understand if cluster is healthy or with problems, correct?

We assume there is free disk space so if there isn't everything after makes no sense so for start you need to make sure you never run out of disk space.

All best

Hi MySQL,

Thanks for all the answers, got it and agree that the FS full is something that should never happen.

One remaining, when there is lagging, the node lagging behind is still possible to be elected to Primary, is this per design? or can/should i open a Feature Request for it?

Thanks,

Hi,

Feel free to create a FR for that but I must say I am not sure what you expect to happen. If the previous primary is dead, lagging one will quickly catch up in most cases, you propose cluster fails rather than elect lagging one?

You do not need to create a new bug/fr just explain what exactly would you expect to happen in exactly what case and we will reconsider, but so far I do not see that failing cluster would be better decision than electing a new primary that lags a bit.

Hi,

Sorry for late response, vacations got in the middle.

"Feel free to create a FR for that but I must say I am not sure what you expect to happen. If the previous primary is dead, lagging one will quickly catch up in most cases, you propose cluster fails rather than elect lagging one?"

Well this got me thinking, question if a node is elected to primary, it will start answering to clients statements in parallel to the catch up process? or only at the end?

I m trying to understand if situation as a client that updates some value on the new elected PRIMARY can be then overwritten from a previous values that is the relay logs but still not processed. That would mean that the last value in the DB is not the correct one. Dont know if the use case i described is understandable.

Thanks,

Hi Mr. Abrantes,

Your question requires quite an elaborate answer.

First of all, your application(s) should be well written. So, in the case that source is not available, the application should return the error. It should then wait or abort completely, depending on how is your application written and what function it performs. Simply, when source is dead, those application that rely on the Group Replication should be in the state of waiting for the cluster to be available.

In the case of the hardest errors, like the one that you describe, you should switch one of the replicas to be a new source, manually. You must wait that all entries from the relay logs are executed and only then can you promote the replica to the master. This is due to the fact that in the case of the hard errors (like the fatal error of running out of the disk space) automatic failover is not feasible.

This can also be considered as a theoretical discourse. When you get a hard error, like the hardware error or disk overflow, it is quite probable that you will loose some data, may be entire transaction, but that also depends on your application.

In the light of the above, your proposition for the feature request does not seem feasible ......

We hope that we were precise in our answer.

Yes, i understand and i 100% agree that FS Full is something basic that should be ensured/monitored.

But speaking on other use cases that you can have a node lagging like disk performance, Primary node goes down, the Secondary that was lagging is elected Primary, then the situation i just described may happen, correct?

But that i believe is linked to group_replication_consistency, if set to EVENTUAL (most performant) it can happen as clients statements will run concurrently to the process of the backlog, correct?

But if used BEFORE_ON_PRIMARY_FAILOVER, then that situation should be avoided as per documentation (https://dev.mysql.com/doc/refman/8.0/en/group-replication-system-variables.html#sysvar_gro...) 

BEFORE_ON_PRIMARY_FAILOVER

New RO or RW transactions with a newly elected primary that is applying backlog from the old primary are held (not applied) until any backlog has been applied. This ensures that when a primary failover happens, intentionally or not, clients always see the latest value on the primary. This guarantees consistency, but means that clients must be able to handle the delay in the event that a backlog is being applied. Usually this delay should be minimal, but does depend on the size of the backlog.

And i assume the performance should be similar as we still only wait for the "ack" that the statement is "saved" in the relay queue of secondary, only the failover may take more time.

If my assumption is correct, pls confirm that and feel free to close the ticket (otherwise i may remember more things to ask :) ).

Hi,

Most of your assumptions are correct, so this ticket is closed now.