| Bug #101609 | MySQL server process ram/swap usage keeps increasing on a innodb cluster | ||
|---|---|---|---|
| Submitted: | 13 Nov 2020 20:45 | Modified: | 25 Jan 2021 8:38 | 
| Reporter: | k n | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) | 
| Version: | 8.0.22 | OS: | Linux (Debian 10) | 
| Assigned to: | MySQL Verification Team | CPU Architecture: | x86 (VPS) | 
   [13 Nov 2020 20:45]
   k n        
  
 
   [13 Nov 2020 20:50]
   k n        
  This could be related: https://bugs.mysql.com/bug.php?id=95422
   [15 Nov 2020 12:08]
   k n        
  Another thing to add is that this does not happen if the 3 nodes are in a group replication (start group_replication;) and not joined in a cluster.
   [17 Nov 2020 8:56]
   MySQL Verification Team        
  Hi, I'm having issues reproducing. Can you share full config files and how they are linked to each other. Why do you think this is related to Bug #95422 thanks Bogdan
   [18 Nov 2020 6:48]
   k n        
  The InnoDB cluster is made up of 3 master nodes
Here is all the related config
my.cnf (all 3 nodes in the cluster use similar config)
-------
[mysqld]
super_read_only = ON
relay_log=debian-s-1vcpu-1gb-sfo2-01-relay-bin
# General replication settings
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
log_bin = binlog
binlog_format = ROW
binlog_row_image=minimal
expire_logs_days=7
transaction_write_set_extraction = XXHASH64
loose-group_replication_bootstrap_group = OFF
loose-group_replication_start_on_boot = OFF
loose-group_replication_ssl_mode = REQUIRED
loose-group_replication_recovery_use_ssl = 1
loose-group_replication_unreachable_majority_timeout=1800
loose-group-replication-autorejoin-tries=9999
loose-group-replication-transaction-size-limit=0
max_connections=20
innodb_buffer_pool_size=512M
innodb_log_buffer_size=16M
innodb_log_file_size=64M
# Shared replication group configuration
loose-group_replication_group_name = "b00d81f2-8d00-4be4-90c1-290c8c1eb0c4"
loose-group_replication_ip_whitelist = "dbus1, dbeu1, dbtest1"
loose-group_replication_group_seeds = "dbus1:33061, dbeu1:33061, dbtest1:33061"
# Single or Multi-primary mode? Uncomment these two lines
# for multi-primary mode, where any host can accept writes
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
# Host specific replication configuration
server_id = 1
bind-address = "dbus1"
report_host = "dbus1"
loose-group_replication_local_address = "dbus1:33061"
innodb cluster info:
----------------------
var c = dba.getCluster("prodcluster");
c.status()
{
    "clusterName": "prodCluster",
    "defaultReplicaSet": {
        "name": "default",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "dbeu1:3306": {
                "address": "dbeu1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            },
            "dbtest1:3306": {
                "address": "dbtest1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            },
            "dbus1:3306": {
                "address": "dbus1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "replicationLag": null,
                "role": "HA",
                "status": "ONLINE",
                "version": "8.0.18"
            }
        },
        "topologyMode": "Multi-Primary"
    },
    "groupInformationSourceMember": "dbus1:3306"
}
 
   [19 Nov 2020 0:41]
   k n        
  Further information: The machine it is running on is a Linux/Debian10 with 2G ram and 5G swap. MySQL server manages to use up all the 5G swap. The memory increase is much faster on the cluster node that is sending the cluster metadata. When that mysql server restarts because ram runs out, the master cluster node (one sending metadata) is switched to the next node and repeats.
   [19 Nov 2020 5:44]
   MySQL Verification Team        
  Hi, I think I already saw this but I'm having reproducing right now. What kind of traffic are you running? Can you temporary remove all the queries querying information_schema, mysql_innodb_cluster_metadata and similar meta tables and see if you still get the ram usage increase. When you stop and disconnect all the clients, does the RAM return to system? Thanks Bogdan
   [20 Nov 2020 1:18]
   k n        
  Here's some variable values from the master cluster node: | Uptime | 670007 | Bytes_received | 4026884931 | Bytes_sent | 6693954498 The only way I know to release all the memory used by mysql server is to restart the service and this takes longer than usual (when not joined to cluster/group rep).
   [20 Nov 2020 12:50]
   k n        
  To be clear, issuing the command "stop group_replication" only releases some the swap but majority of swap is only returned when mysqld process is restarted.
   [20 Nov 2020 13:13]
   MySQL Verification Team        
  Hi, I'm still having issues reproducing this. I will let the test run over the weekend so we'll see. > This could be related: > https://bugs.mysql.com/bug.php?id=95422 You did not answer why you believe this might be related? > Can you temporary remove all the queries querying information_schema, > mysql_innodb_cluster_metadata and similar meta tables and see if you still get the ram usage increase. Is this something you can try? Is this a fresh 8.0.22 installation or this was upgrade from some earlier version? If this is upgrade, did you have this problem before upgrade to 8.0.22? Thanks Bogdan
   [21 Nov 2020 8:03]
   k n        
  > This could be related: > https://bugs.mysql.com/bug.php?id=95422 >> You did not answer why you believe this might be related? I don't think its this directly related to this feature but the memory leak does not happen when only replication is turned on (start group_replication) and not joined a cluster. The memory increase starts when nodes join a cluster. >> Can you temporary remove all the queries querying information_schema, >> mysql_innodb_cluster_metadata and similar meta tables and see if you still get the ram usage increase. >> Is this something you can try? I'm not exactly sure what you mean by "remove all the queries". Wouldn't "stop group replication" stop all cluster meta queries? >> Is this a fresh 8.0.22 installation or this was upgrade from some earlier version? If this is upgrade, did you have this problem before upgrade to 8.0.22? I was using 8.0.18 and it was happening and I upgraded to 8.0.22 hoping it would fix but it is still happening If you need more information let me know
   [21 Nov 2020 9:16]
   MySQL Verification Team        
  Hi, WRT queries, I meant that you do not query metadata yourself, what innodbcluster does for itself is not important. Good to know, so this is not some regression in 8.0.22 but you actually had the same issue with 8.0.18. I'm having issue reproducing this and if it was "common" we'd be seeing more reports about this. Need to figure out what is unique about your system that makes this bug pop up. thanks Bogdan
   [14 Dec 2020 22:25]
   Jiří Tužil        
  Hello, we are observing this (or similar) bug for a few months as well. Our setup: - 3-node InnoDB cluster (master-slave) - ESXi VMs: 4 CPU, 4 GB RAM - Debian 10.5 - MySQL 8.0.22 No other software installed. Cluster is set up using the simplest tutorial found. Database datafile folder size is about 50 MB. Cluster metadata is about 1.6 GB after cca two months. We are connecting to the cluster from PHP/PDO through MySQL-router 8.0.21 running in docker container. After several days or weeks, every node consumes all available memory and oom killer kills mysqld. We have started with 512 MB RAM/node (because on our other older cluster test setup with MySQL 8.0.11 it works well enough), increased to 1 GB, then 2 GB, to current 4 GB on each node. mysqld always eats all available memory after some time and gets killed. Before updating to 8.0.22 we were on (I think) 8.0.17 and it behaved the same. 8.0.11 on our other setup does not do that. We have not found a solution or workaround.
   [14 Dec 2020 22:39]
   Jiří Tužil        
  I don't think the https://bugs.mysql.com/bug.php?id=95422 is related. Although MySQL log on our slave nodes is full of "Invalid replication timestamps: ..." messages, this as well happens on our other 8.0.11 cluster setup, which does not exhibit this bug.
   [25 Dec 2020 8:38]
   MySQL Verification Team        
  Hi Jiří, I'm not reproducing this. You are saying you have the same issue with 8.0.18 but don't with 8.0.11, right? Can you try to run debug version of 8.0.22? Thanks Bogdan
   [27 Dec 2020 11:42]
   Jiří Tužil        
  Hello Bogdan, before I upgrade to 8.0.22 debug, I can offer you ssh access to the test setup we have here if you want, so you can take a look what actually is consuming the memory.
   [11 Jan 2021 13:54]
   MySQL Verification Team        
  Hi, We are not allowed to ssh to your box. Can't reproduce the issue no matter what we try. If you can, maybe rebuilding MySQL WITH_ASAN and try to get some report. thanks
   [26 Jan 2021 1:00]
   Bugs System        
  No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

