Bug #96188 memory leak detected from MDL locking in read_histograms
Submitted: 12 Jul 2019 13:12 Modified: 22 Aug 2019 13:00
Reporter: andy zhang Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[12 Jul 2019 13:12] andy zhang
Description:
In our MTR regression with ASAN enabled, there are several occurrences of complaint of memory leak like below 3 sample stacks:

Those failures are not so stable and I failed to reproduce them manually running smaller case set.

What they are common is that MDL locks are acquired by read_histograms. And those MDL locks are STATEMENT duration, and rely on callers to release them. I think in some error scenarios, callers fail to release them. And I appreciate if you guys can help to identify the scenarios.

I think in read_histograms, we can and it's safe to use EXPLICIT MDL locks instead, and after the histogram copy is got, release them within the method. So that we can make the function self-contained from MDL locking perspective. 

Please see my suggested patch in section of suggested fix.

==75722==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 672 byte(s) in 1 object(s) allocated from:
    #0 0x7f046fd288a0 in malloc (/lib64/libasan.so.4+0xde8a0)
    #1 0x4e50033 in my_raw_malloc /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/my_malloc.cc:199
    #2 0x4e50189 in my_malloc(unsigned int, unsigned long, int) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/my_malloc.cc:81
    #3 0x4e19cd8 in lf_alloc_new(LF_PINS*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/lf_alloc-pin.cc:448
    #4 0x4e1bef1 in lf_hash_insert(LF_HASH*, LF_PINS*, void const*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/lf_hash.cc:497
    #5 0x33d42a2 in MDL_map::find_or_insert(LF_PINS*, MDL_key const*, bool*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:1242
    #6 0x33cf274 in MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:3035
    #7 0x33d0a04 in MDL_context::acquire_lock(MDL_request*, unsigned long) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:3551
    #8 0x33d2757 in MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >*, unsigned long) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:3843
    #9 0x29909e3 in read_histograms /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:580
    #10 0x2992852 in get_table_share(THD*, char const*, char const*, char const*, unsigned long, bool, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:792
    #11 0x2992e3b in get_table_share_with_discover /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:863
    #12 0x2994efe in open_table(THD*, TABLE_LIST*, Open_table_context*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:3186
    #13 0x29972db in open_and_process_table /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:5028
    #14 0x299bf09 in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:5671
    #15 0x299d53d in open_tables_for_query(THD*, TABLE_LIST*, unsigned int) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:6548
    #16 0x2cc6646 in mysql_create_view(THD*, TABLE_LIST*, enum_view_create_mode) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_view.cc:479
    #17 0x2ac77df in mysql_execute_command(THD*, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:4525
    #18 0x2ac9f66 in mysql_parse(THD*, Parser_state*, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:5358
    #19 0x2acc82e in dispatch_command(THD*, COM_DATA const*, enum_server_command) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:1780
    #20 0x2acf29e in do_command(THD*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:1274
    #21 0x2dcb4fa in handle_connection /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/conn_handler/connection_handler_per_thread.cc:316
    #22 0x5a0cc22 in pfs_spawn_thread /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/storage/perfschema/pfs.cc:2836
    #23 0x7f046fa35e24 in start_thread (/lib64/libpthread.so.0+0x7e24)

Or

Or

Direct leak of 672 byte(s) in 1 object(s) allocated from:
    #0 0x7fbf6d0fb8a0 in malloc (/lib64/libasan.so.4+0xde8a0)
    #1 0x4e4012f in my_raw_malloc /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/my_malloc.cc:199
    #2 0x4e40285 in my_malloc(unsigned int, unsigned long, int) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/my_malloc.cc:81
    #3 0x4e0990c in lf_alloc_new(LF_PINS*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/lf_alloc-pin.cc:448
    #4 0x4e0bba5 in lf_hash_insert(LF_HASH*, LF_PINS*, void*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/mysys/lf_hash.cc:502
    #5 0x33c3e70 in MDL_map::find_or_insert(LF_PINS*, MDL_key const*, bool*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:1241
    #6 0x33bf59a in MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:2967
    #7 0x33c07d4 in MDL_context::acquire_lock(MDL_request*, unsigned long) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:3506
    #8 0x33c23e9 in MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >*, unsigned long) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/mdl.cc:3794
    #9 0x299ca01 in read_histograms /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:580
    #10 0x299e870 in get_table_share(THD*, char const*, char const*, char const*, unsigned long, bool, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:792
    #11 0x299ee59 in get_table_share_with_discover /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:863
    #12 0x29a0f1c in open_table(THD*, TABLE_LIST*, Open_table_context*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:3191
    #13 0x29a32f9 in open_and_process_table /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:5033
    #14 0x29a7f1d in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:5676
    #15 0x29a9551 in open_tables_for_query(THD*, TABLE_LIST*, unsigned int) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_base.cc:6553
    #16 0x2b94ec3 in Sql_cmd_dml::prepare(THD*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_select.cc:386
    #17 0x2bb281f in Sql_cmd_dml::execute(THD*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_select.cc:535
    #18 0x2ad51b4 in mysql_execute_command(THD*, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:4644
    #19 0x2ad77b5 in mysql_parse(THD*, Parser_state*, bool) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:5396
    #20 0x2ada07d in dispatch_command(THD*, COM_DATA const*, enum_server_command) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:1794
    #21 0x2adcaf2 in do_command(THD*) /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/sql_parse.cc:1288
    #22 0x2dda6c2 in handle_connection /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/sql/conn_handler/connection_handler_per_thread.cc:316
    #23 0x5a061f8 in pfs_spawn_thread /u01/jenkins-slave/workspace/PolarDB80_PQ_ASAN/storage/perfschema/pfs.cc:2836
    #24 0x7fbf6ce08e24 in start_thread (/lib64/libpthread.so.0+0x7e24)

How to repeat:
Running MTR regression in a batch with ASAN enabled.

Suggested fix:
static bool read_histograms(THD *thd, TABLE_SHARE *share,
                                           column->name(), &mdl_key);
 
     MDL_request *request = new (thd->mem_root) MDL_request;
-    MDL_REQUEST_INIT_BY_KEY(request, &mdl_key, MDL_SHARED_READ, MDL_STATEMENT);
+    MDL_REQUEST_INIT_BY_KEY(request, &mdl_key, MDL_SHARED_READ, MDL_EXPLICIT);
     mdl_requests.push_front(request);
   }
 
   if (thd->mdl_context.acquire_locks(&mdl_requests,
                                      thd->variables.lock_wait_timeout))
-    return true; /* purecov: deadcode */
-
-  for (const auto column : table_def->columns()) {
-    if (column->is_se_hidden()) continue;
+    error_hit = true; /* purecov: deadcode */
+  else {
+    for (const auto column : table_def->columns()) {
+      if (column->is_se_hidden()) continue;
+
+      const histograms::Histogram *histogram = nullptr;
+      if (histograms::find_histogram(thd, schema->name().c_str(),
+                                     table_def->name().c_str(),
+                                     column->name().c_str(), &histogram)) {
+        // Any error is reported by the dictionary subsystem.
+        error_hit = true; /* purecov: deadcode */
+        break;
+      }
 
-    const histograms::Histogram *histogram = nullptr;
-    if (histograms::find_histogram(thd, schema->name().c_str(),
-                                   table_def->name().c_str(),
-                                   column->name().c_str(), &histogram)) {
-      // Any error is reported by the dictionary subsystem.
-      return true; /* purecov: deadcode */
+      if (histogram != nullptr) {
+        /*
+          Make a clone of the histogram so it survives together with the
+          TABLE_SHARE in case the original histogram is thrown out of the
+          dictionary cache.
+        */
+        const histograms::Histogram *histogram_copy =
+            histogram->clone(&share->mem_root);
+        share->m_histograms->emplace(column->ordinal_position() - 1,
+                                     histogram_copy);
+      }
     }
+  }
 
-    if (histogram != nullptr) {
-      /*
-        Make a clone of the histogram so it survives together with the
-        TABLE_SHARE in case the original histogram is thrown out of the
-        dictionary cache.
-      */
-      const histograms::Histogram *histogram_copy =
-          histogram->clone(&share->mem_root);
-      share->m_histograms->emplace(column->ordinal_position() - 1,
-                                   histogram_copy);
-    }
+  // We can now release MDL locks acquired just now
+  MDL_request_list::Iterator it(mdl_requests);
+
+  MDL_request *mdl_request = nullptr;
+  while ((mdl_request = it++) != nullptr && mdl_request->ticket != nullptr) {
+    thd->mdl_context.release_lock(mdl_request->ticket);
   }
 
-  return false;
+  return error_hit;
 }
[12 Jul 2019 13:45] MySQL Verification Team
Hi Mr. Zhang,

Thank you for your bug report.

Also, thank you for your patch. However, we can not accept your patch without a fully reproducible test case, or if you prefer it that way, without a full scenario.

After we receive a test case from you, we shall run it with memory testing turned on. If our own testing show leaks, we will be happy to verify this bug.

Thanks in advance.
[15 Jul 2019 7:14] andy zhang
Sinisa,

Thanks for your quick response.

I admit it's so easy to make a handy reproducible scenario for me or you at present.

May I ask you to take my patch as an optimization feature, that is to reduce the MDL locking duration for histogram reading then?
[15 Jul 2019 7:15] andy zhang
typo 

 "it's so easy to make a handy reproducible scenario for me or you at present."  --->
" it's NOT so easy to make a handy reproducible scenario for me or you at present."
[15 Jul 2019 12:47] MySQL Verification Team
Hi Mr. Zhang,

I can leave this bug for developers to decide whether it makes sense or not. However, we can not accept patches from community without the OCA agreement.

In order to submit contributions you must first sign the Oracle Contribution Agreement (OCA).
For additional information please check http://www.oracle.com/technetwork/community/oca-486395.html.
If you have any questions, please contact the MySQL community team at https://dev.mysql.com/community/
[16 Aug 2019 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[23 Nov 2023 2:06] Fan Lyu
we meet the same leak stack at 8.0.28 and I think it's not solved yet because no more release note in later versions;

It's not easy to reproduce in our daily pipeline.

Direct leak of 688 byte(s) in 1 object(s) allocated from:
    #0 0x7f97d084be46 in __interceptor_malloc (/ci/workspace/hwsql/debug/binary/bin/../lib/libasan.so.6+0x109e46)
    #1 0x55c2c5e2e850 in my_raw_malloc /ci/workspace/hwsql/mysys/my_malloc.cc:206
    #2 0x55c2c5e2e96f in my_malloc(unsigned int, unsigned long, int) /ci/workspace/hwsql/mysys/my_malloc.cc:82
    #3 0x55c2c5df3e78 in lf_alloc_new(LF_PINS*) /ci/workspace/hwsql/mysys/lf_alloc-pin.cc:449
    #4 0x55c2c5df5f13 in lf_hash_insert(LF_HASH*, LF_PINS*, void const*) /ci/workspace/hwsql/mysys/lf_hash.cc:491
    #5 0x55c2c415084e in MDL_map::find_or_insert(LF_PINS*, MDL_key const*, bool*) /ci/workspace/hwsql/sql/mdl.cc:1344
    #6 0x55c2c414c6bf in MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**) /ci/workspace/hwsql/sql/mdl.cc:3016
    #7 0x55c2c414d3bc in MDL_context::acquire_lock(MDL_request*, unsigned long) /ci/workspace/hwsql/sql/mdl.cc:3487
    #8 0x55c2c414eaa6 in MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >*, unsigned long) /ci/workspace/hwsql/sql/mdl.cc:3755
    #9 0x55c2c34bb53a in read_histograms /ci/workspace/hwsql/sql/sql_base.cc:587
    #10 0x55c2c34bd0a3 in get_table_share(THD*, char const*, char const*, char const*, unsigned long, bool, bool) /ci/workspace/hwsql/sql/sql_base.cc:817
    #11 0x55c2c34c0b4c in get_table_share_with_discover /ci/workspace/hwsql/sql/sql_base.cc:888
    #12 0x55c2c34cbbd8 in open_table(THD*, TABLE_LIST*, Open_table_context*) /ci/workspace/hwsql/sql/sql_base.cc:3221
    #13 0x55c2c34ceed4 in open_and_process_table /ci/workspace/hwsql/sql/sql_base.cc:5088
    #14 0x55c2c34cfc79 in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) /ci/workspace/hwsql/sql/sql_base.cc:5910
    #15 0x55c2c34d17b2 in open_and_lock_tables(THD*, TABLE_LIST*, unsigned int, Prelocking_strategy*) /ci/workspace/hwsql/sql/sql_base.cc:6641
    #16 0x55c2c33dab40 in open_and_lock_tables(THD*, TABLE_LIST*, unsigned int) (/ci/workspace/hwsql/debug/binary/bin/mysqld+0x3d6ab40)
    #17 0x55c2c3b19c41 in check_engine_type_for_acl_table(THD*, bool) /ci/workspace/hwsql/sql/auth/sql_auth_cache.cc:2018
    #18 0x55c2c3b1b3a1 in acl_init(bool) /ci/workspace/hwsql/sql/auth/sql_auth_cache.cc:1777
    #19 0x55c2c33b89ce in mysqld_main(int, char**) /ci/workspace/hwsql/sql/mysqld.cc:7862
    #20 0x55c2c339022d in main /ci/workspace/hwsql/sql/main.cc:25
    #21 0x7f97cf483504 in __libc_start_main (/lib64/libc.so.6+0x22504)
[23 Nov 2023 10:54] MySQL Verification Team
Hi,

This is a forum for bug reports with repeatable test cases. 

It can be run in some memory checking software, but it still has to be a test case.

Also, make sure that you are using binary from our download site at dev.mysql.com.
[19 Nov 2024 3:16] alex xing
I also encountered this problem with mysql8.0.26