Bug #107991 Clone_persist_gtid causes memory leak
Submitted: 27 Jul 2022 2:02 Modified: 23 Oct 2023 8:31
Reporter: Baolin Huang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any
Tags: Clone_persist_gtid

[27 Jul 2022 2:02] Baolin Huang
Description:
We found slow memory leaks after instances were running for a long time.

The memory application of our instance during the 8-hour period is given below in the w_diff.pdf file, which is obtained by jemalloc profiling.

Code analysis:
In MySQL8.0, the Clone_persist_gtid thread is responsible for saving GTID into dd table gtid_executed periodically (every 100ms).

The Clone_persist_gtid thread requests memory from thd->mem_root during lock_tables. 

The code position is:
Gtid_table_persistor::save
  -> Gtid_table_access_context::init
    ->System_table_access::open_table
      ->open_n_lock_single_table
        ->open_and_lock_tables
          ->lock_tables

sql_base.cc
```
bool lock_tables(THD *thd, TABLE_LIST *tables, uint count, uint flags) 
  if (!thd->locked_tables_mode) {
    ...
    if (!(ptr = start = (TABLE **)thd->alloc(sizeof(TABLE *) * count)))
      return true;
    ...
       
```

This thd will not be closed forever, the memory will not be freed.

How to repeat:
1. Reduce the value of s_time_threshold_ms to speed up the scheduling frequency of Clone_persist_gtid

--- a/storage/innobase/include/clone0repl.h
+++ b/storage/innobase/include/clone0repl.h
@@ -324,7 +324,7 @@ class Clone_persist_gtid {
  private:
   /** Time threshold to trigger persisting GTID. Insert GTID once per 1k
   transactions or every 100 millisecond. */
-  const static uint32_t s_time_threshold_ms = 100;
+  const static uint32_t s_time_threshold_ms = 1;
 
   /** Threshold for the count for compressing GTID. */
   const static uint32_t s_compression_threshold = 50;

2. Create a stored procedure to continuously execute transaction.
```
CREATE TABLE t (
  id int NOT NULL AUTO_INCREMENT,
  col1 int unsigned NOT NULL,
  col2 tinyint(1) NOT NULL,
  PRIMARY KEY (id),
) ENGINE=InnoDB;

delimiter |;
CREATE PROCEDURE insert_1()
BEGIN
    ins: WHILE 1 DO
       insert into t(col1,col2) values(rand()*1000,111);
       select sleep(0.01);
    END WHILE;
END |

delimiter ;|

call insert_1();
```

3. Use jemalloc profiling to print memory allocation after a day

Suggested fix:
Release this memory which is only temporarily used.
[27 Jul 2022 2:02] Baolin Huang
jemalloc profiling over 8 hour

Attachment: w_diff.pdf (application/pdf, text), 20.04 KiB.

[2 Aug 2022 5:57] MySQL Verification Team
Hello Baolin Huang,

Thank you for the report and test case.

regards,
Umesh
[28 Oct 2022 1:26] Baolin Huang
Not sure if there are plans to fix this problem.

I think it can be fixed like this

Author: baolin.hbl <baolin.hbl@alibaba-inc.com>
Date:   Fri Oct 28 09:18:46 2022 +0800

    [Bugfix] [Aone#44846897] Clone_persist_gtid causes memory leak
    

diff --git a/storage/innobase/clone/clone0repl.cc b/storage/innobase/clone/clone0repl.cc
index 1d22a6814f4..db3488521e7 100644
--- a/storage/innobase/clone/clone0repl.cc
+++ b/storage/innobase/clone/clone0repl.cc
@@ -607,6 +607,7 @@ void Clone_persist_gtid::periodic_write() {
     os_event_reset(m_event);
     /* Write accumulated GTIDs to disk table */
     flush_gtids(thd);
+    thd->mem_root->ClearForReuse();
   }
[12 Jan 2023 3:35] Baolin Huang
Clear mem_root after each flush_gtids

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug_107991_clone_persist_gtid_memleak.txt (text/plain), 993 bytes.

[23 Oct 2023 8:31] Baolin Huang
Modify Category to Replication.