MySQL Bugs: #117003: Memory usage increases and reaches the limit, causing the pod to restart

Bug #117003	Memory usage increases and reaches the limit, causing the pod to restart
Submitted:	17 Dec 2024 19:25	Modified:	13 Feb 20:33
Reporter:	serhii hladchenko	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Operator	Severity:	S3 (Non-critical)
Version:	8.4.3	OS:	Ubuntu (ubuntu-2204-jammy-v20230919)
Assigned to:		CPU Architecture:	x86
Tags:	mysql-operator memory usage restart

Description:
We are using the MySQL Operator on a k3s cluster. We regularly encounter pod restarts with the cause OOMKilled. I deployed a new mysql-innodbcluster pod with default values and no load, but I noticed that the pod restarts once a day.
Memory usage increases and reaches the limit, causing the pod to restart.
Events:
  Type     Reason             Age                 From     Message
  ----     ------             ----                ----     -------
  Warning  Unhealthy          13m (x21 over 19m)  kubelet  Liveness probe failed: command "/livenessprobe.sh" timed out
  Normal   Killing            13m                 kubelet  Container mysql failed liveness probe, will be restarted
  Warning  FailedPreStopHook  12m                 kubelet  Exec lifecycle hook ([sh -c sleep 60 && mysqladmin -ulocalroot shutdown]) for Container "mysql" in Pod "mysql-0_default(f5ac257c-cc85-4f53-945d-dd0adc74e556)" failed - error: command 'sh -c sleep 60 && mysqladmin -ulocalroot shutdown' exited with 137: , message: ""
  Normal   Pulled             12m (x2 over 26h)   kubelet  Container image "container-registry.oracle.com/mysql/community-server:8.4.3" already present on machine
  Normal   Created            12m (x2 over 26h)   kubelet  Created container mysql
  Normal   Started            12m (x2 over 26h)   kubelet  Started container mysql

How to repeat:
Helm chart:
mysql-operator-2.2.2
mysql-innodbcluster-2.2.2
Deploy mysql-operator with values file values.yaml
credentials:
  root:
    user: root
    password: "xxxxx"
    host: "%"
serverVersion: 8.4.3
tls:
  useSelfSigned: true
serverInstances: 1
router:
  instances: 1
podSpec:
  containers:
  - name: mysql
    resources:
      requests:
        memory: "768Mi"
      limits:
        memory: "1Gi"

datadirVolumeClaimTemplate:
  storageClassName: standard
  resources:
    requests:
      storage: 10Gi
serverConfig:
  mycnf: |
    [mysqld]
    default-time-zone = 'Europe/Kiev'

Hi Serhii,

I have issues reproducing this with 8.4.4. I do not see anything between 8.4.3 an 8.4.4 that could affect this but can you please try if you can reproduce this with 8.4.4 too ?

Hi, We have just updated the MySQL server 8.4.4. Keep an eye on it. I will text in 2-3 days

I have installed community-server 8.4.4. The pod starts with about 800 MB of memory usage, reaches the 1.5 GB limit within two days, and reboots. It seems that this issue has not been handled in the new version.
    State:          Running
      Started:      Thu, 30 Jan 2025 01:23:04 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 28 Jan 2025 11:30:18 +0200
      Finished:     Thu, 30 Jan 2025 01:22:55 +0200

Our configuration:
Kubernetes cluster is k3s
k3s-master1   Ready    control-plane,master   128d   v1.29.9+k3s1
Pod Images
container-registry.oracle.com/mysql/community-operator:9.1.0-2.2.2
container-registry.oracle.com/mysql/community-router:8.4.4
container-registry.oracle.com/mysql/community-server:8.4.4

Additionally, I installed MySQL v8.4.4 today in my sandbox without any applications connected to it. In this configuration, MySQL 8.4.3 consumes memory. I will monitor version 8.4.4.

This does not look like a bug. How many connections do you have on your MySQL.

The MySQL server has 40 connections. I have all MySQL dashboards in Kibana. It is a development environment, and we don't have a high load. Memory usage is increasing every hour, and when it reaches the pod limit, the pod crashes. 
Screenshots are here https://drive.google.com/drive/folders/17ax-4hUh2XSlKmwHfKb2m4EIkWgletpO?usp=sharing

Please check SYS schema to monitor RAM usage to see why/where the RAM is used up. 1G is pretty small limit.

https://dev.mysql.com/doc/refman/8.4/en/monitor-mysql-memory-use.html

Also use https://www.mysqlcalculator.com/ to calculate what amount of memory your MySQL server can allocate according to your config.

I set limit 1500 Mb. I have loaded some screenshots from MySQL Workbench. 
https://drive.google.com/file/d/1bgncYBrcbebjZyaU6Ce1NESiCUkQSMAN/view?usp=drive_link

We have a small amount of connection about 13-15. Max_connection is 40. We run very load instance Mysql in Google Cloud SQL Service that consumes less 1Gb.

and We run our application on a bare metal server with huge memory, and Mysql usage memory is increase too. And It is reaching the limit of 4Gb. Seems our MySQL server can reach any limit.

Hi,

MySQL can easily use hundreds of gigabytes of RAM that is not a problem. It all depends on how you configure it.

As I wrote earlier, please use https://www.mysqlcalculator.com/ to calculate your MySQL memory usage.

Or let me know this:
select @@key_buffer_size + @@tmp_table_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size;

select @@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size ;

Thanks

+---------------------------------------------------------------------------------------------+
| @@key_buffer_size + @@tmp_table_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size |
+---------------------------------------------------------------------------------------------+
|                                                                                   629145600 |
+---------------------------------------------------------------------------------------------+

+------------------------------------------------------------------------------------------------------------------------------+
| @@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size |
+------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                      1998848 |
+------------------------------------------------------------------------------------------------------------------------------+

Hi,

This is showing some 700MB of use, not more. Not sure where is it leaking memory.

I cannot reproduce this, for me this works normally.

You will have to monitor memory usage as described in https://dev.mysql.com/doc/refman/8.4/en/monitor-mysql-memory-use.html so we can try to find what is going on.

logging through time and especially before it crashes something like this:

SELECT SUBSTRING_INDEX(event_name,'/',2) AS
       code_area, FORMAT_BYTES(SUM(current_alloc))
       AS current_alloc
       FROM sys.x$memory_global_by_current_bytes
       GROUP BY SUBSTRING_INDEX(event_name,'/',2)
       ORDER BY SUM(current_alloc) DESC;

This should show where the memory is going.

We have deployed MySQL Server 8.4.4 on an empty K3s cluster without any load. It started using 509 MB and reached 515 MB in 3 days.

We have also deployed MySQL Server on a bare-metal server with a large amount of memory and application load. It reached 2 GB in 3.5 days.

Additionally, we have deployed MySQL 8.4.4 in a development environment with a pod limit of 1.5 GB. Since we don’t have much memory in development, it hit the limit and crashed due to an OOM kill in 2 days.

->        ORDER BY SUM(current_alloc) DESC;
+---------------------------+---------------+
| code_area                 | current_alloc |
+---------------------------+---------------+
| memory/innodb             | 673.72 MiB    |
| memory/performance_schema | 235.76 MiB    |
| memory/group_rpl          | 125.64 MiB    |
| memory/sql                | 28.17 MiB     |
| memory/temptable          | 12.00 MiB     |
| memory/mysys              | 9.48 MiB      |
| memory/mysqld_openssl     | 1.41 MiB      |
| memory/refcache           | 37.95 KiB     |
| memory/mysqlx             | 23.70 KiB     |
| memory/vio                | 2.25 KiB      |
| memory/myisam             |  728 bytes    |
| memory/csv                |  120 bytes    |
| memory/blackhole          |  120 bytes    |
+---------------------------+---------------+

You said ~700 MB, but the kubectl command shows more.sgladc@C07ZF2EMJYW0:~$ kubectl top pod mysql-0 --containers
POD       NAME      CPU(cores)   MEMORY(bytes)   
mysql-0   mysql     47m          963Mi           
mysql-0   sidecar   0m           130Mi

Eight hours have passed. The pod consumes more than 100 MB of memory, and memory/group_rpl has increased.
+---------------------------------------------------------------------------------------------+
| @@key_buffer_size + @@tmp_table_size + @@innodb_buffer_pool_size + @@innodb_log_buffer_size |
+---------------------------------------------------------------------------------------------+
|                                                                                   629145600 |
+---------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------------------+
| @@sort_buffer_size + @@read_buffer_size + @@read_rnd_buffer_size + @@join_buffer_size + @@thread_stack + @@binlog_cache_size |
+------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                      1998848 |
+------------------------------------------------------------------------------------------------------------------------------+

+---------------------------+---------------+
| code_area                 | current_alloc |
+---------------------------+---------------+
| memory/innodb             | 683.86 MiB    |
| memory/performance_schema | 253.89 MiB    |
| memory/group_rpl          | 235.00 MiB    |
| memory/sql                | 37.04 MiB     |
| memory/mysys              | 9.42 MiB      |
| memory/temptable          | 7.00 MiB      |
| memory/mysqld_openssl     | 1.04 MiB      |
| memory/refcache           | 36.30 KiB     |
| memory/mysqlx             | 23.70 KiB     |
| memory/vio                | 1.16 KiB      |
| memory/myisam             |  728 bytes    |
| memory/csv                |  120 bytes    |
| memory/blackhole          |  120 bytes    |
+---------------------------+---------------+
sgladc@C07ZF2EMJYW0:~$ kubectl top pod mysql-0 --containers
POD       NAME      CPU(cores)   MEMORY(bytes)   
mysql-0   mysql     40m          1127Mi          
mysql-0   sidecar   0m           145Mi

Total Memory Used in Group Replication
 MySQL  localhost:3306 ssl  SQL > SELECT * FROM (
                               ->                 SELECT
                               ->                   (CASE
                               ->                     WHEN EVENT_NAME LIKE 'memory/group_rpl/%'
                               ->                     THEN 'memory/group_rpl/memory_gr'
                               ->                     ELSE 'memory_gr_rest'
                               ->                     END) AS EVENT_NAME, 
                               ->                     SUM(COUNT_ALLOC), SUM(COUNT_FREE),
                               ->                     SUM(SUM_NUMBER_OF_BYTES_ALLOC),
                               ->                     SUM(SUM_NUMBER_OF_BYTES_FREE), SUM(LOW_COUNT_USED),
                               ->                     SUM(CURRENT_COUNT_USED), SUM(HIGH_COUNT_USED),
                               ->                     SUM(LOW_NUMBER_OF_BYTES_USED), SUM(CURRENT_NUMBER_OF_BYTES_USED),
                               ->                     SUM(HIGH_NUMBER_OF_BYTES_USED)
                               ->                  FROM performance_schema.memory_summary_global_by_event_name
                               ->                  GROUP BY (CASE
                               ->                               WHEN EVENT_NAME LIKE 'memory/group_rpl/%'
                               ->                               THEN 'memory/group_rpl/memory_gr'
                               ->                               ELSE 'memory_gr_rest'
                               ->                             END)
                               ->               ) f
                               -> WHERE f.EVENT_NAME != 'memory_gr_rest'\G
*************************** 1. row ***************************
                       EVENT_NAME: memory/group_rpl/memory_gr
                 SUM(COUNT_ALLOC): 957109
                  SUM(COUNT_FREE): 858876
   SUM(SUM_NUMBER_OF_BYTES_ALLOC): 1094666442
    SUM(SUM_NUMBER_OF_BYTES_FREE): 845341985
              SUM(LOW_COUNT_USED): 0
          SUM(CURRENT_COUNT_USED): 98233
             SUM(HIGH_COUNT_USED): 98423
    SUM(LOW_NUMBER_OF_BYTES_USED): 0
SUM(CURRENT_NUMBER_OF_BYTES_USED): 249324457
   SUM(HIGH_NUMBER_OF_BYTES_USED): 249403622
1 row in set (0.0198 sec)

Could you explain how to identify the memory consumer? I found the following query examples in the documentation:
Memory Used to Capture Transactions
Memory Used to Broadcast Transactions
Total Memory Used in Group Replication
Memory Used in Certification
Memory Used in Certification
Memory Used in Replication Pipeline
Memory Used in Consistency
Memory Used in Delivery Message Service
Memory Used to Broadcast and Receive Transactions

We have executed a SELECT query on the production server. The application is running but without any load. I see that memory/group_rpl is significantly high.
code_area,current_alloc
memory/group_rpl,"1023.99 MiB"
memory/innodb,"280.93 MiB"
memory/performance_schema,"235.83 MiB"
memory/sql,"32.02 MiB"
memory/mysys,"9.43 MiB"
memory/temptable,"4.00 MiB"
memory/mysqld_openssl,"869.23 KiB"
memory/refcache,"41.24 KiB"
memory/mysqlx,"23.70 KiB"
memory/vio,"1.16 KiB"
memory/myisam," 728 bytes"
memory/csv," 120 bytes"
memory/blackhole," 120 bytes"

We have only one server without any replication in both dev and prod. Can I disable the entire replication group, or do I need to investigate the issue further?
memory/group_rpl/write_set_encoded                                | YES     |
| memory/group_rpl/certification_data                               | YES     |
| memory/group_rpl/certification_data_gc                            | YES     |
| memory/group_rpl/certification_info                               | YES     |
| memory/group_rpl/transaction_data                                 | YES     |
| memory/group_rpl/sql_service_command_data                         | YES     |
| memory/group_rpl/mysql_thread_queued_task                         | YES     |
| memory/group_rpl/message_service_queue                            | YES     |
| memory/group_rpl/message_service_received_message                 | YES     |
| memory/group_rpl/group_member_info                                | YES     |
| memory/group_rpl/consistent_members_that_must_prepare_transaction | YES     |
| memory/group_rpl/consistent_transactions                          | YES     |
| memory/group_rpl/consistent_transactions_prepared                 | YES     |
| memory/group_rpl/consistent_transactions_waiting                  | YES     |
| memory/group_rpl/consistent_transactions_delayed_view_change      | YES     |
| memory/group_rpl/GCS_XCom::xcom_cache                             | YES     |
| memory/group_rpl/Gcs_message_data::m_buffer                       | YES     |

It is a development environment. memory/group_rpl has increased by 200 MB in 12 hours.
+---------------------------+---------------+
| code_area                 | current_alloc |
+---------------------------+---------------+
| memory/innodb             | 689.24 MiB    |
| memory/group_rpl          | 434.45 MiB    |
| memory/performance_schema | 254.58 MiB    |
| memory/sql                | 43.02 MiB     |
| memory/mysys              | 9.46 MiB      |
| memory/temptable          | 7.00 MiB      |
| memory/mysqld_openssl     | 1019.70 KiB   |
| memory/refcache           | 37.95 KiB     |
| memory/mysqlx             | 23.70 KiB     |
| memory/vio                | 2.25 KiB      |
| memory/myisam             |  728 bytes    |
| memory/csv                |  120 bytes    |
| memory/blackhole          |  120 bytes    |
+---------------------------+---------------+

Could you explain how to identify the memory consumer within memory/group_rpl? If I am using only one server, is it possible to disable Group Replication Instrumentation?

Hi,

I am not sure if this is a bug.

You are running mysql-innodbcluster-2.2.2 image but there is no innodb cluster configured. 

I will verify this as I think it should not behave like this even if you do use mysql-innodbcluster-2.2.2 but if you are not using group replication / innodb cluster you should use regular mysql:9.2

Hi, yes I am using mysql-innodbcluster-2.2.2.tgz.
Here is values.yaml
credentials:
  root:
    user: 
    password: 
    host: "%"
serverVersion: 8.4.4
tls:
  useSelfSigned: true
serverInstances: 1
router:
  instances: 1
podSpec:
  containers:
  - name: mysql
    resources:
      requests:
        memory: "1Gi"
      limits:
        memory: "1536Mi"
datadirVolumeClaimTemplate:
  storageClassName: standard
  resources:
    requests:
      storage: 10Gi
backupProfiles:
- name: dump-instance
  dumpInstance:
    dumpOptions:
      excludeTables:
      - outofschool.statisticreportdata
    storage:
      persistentVolumeClaim:
        claimName: mysql-backup-pvc
backupSchedules:
- name: backup-schedule
  schedule: "0 2 * * *"
  deleteBackupData: true
  backupProfileName: dump-instance
  enabled: true
serverConfig:
  mycnf: |
    [mysqld]
    default-time-zone = 'Europe/Kiev'
    innodb_buffer_pool_size=512M
    innodb_buffer_pool_instances=1
    innodb_log_file_size=50M
    max_connections=150
    innodb_log_buffer_size=64M
    max_allowed_packet=256M

Can I disable the replication variable group_replication_start_on_boot=OFF if I don't use it?

Are you checking something? Any results? I asked if we can disable replication.