Bug #104432 Stuck on mysql_cond_wait when get_table_share for non-exist tables
Submitted: 28 Jul 2021 3:03 Modified: 25 Aug 2021 2:28
Reporter: Fangxin Flou (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0.22, 8.0.26 OS:Any
Assigned to: CPU Architecture:Any

[28 Jul 2021 3:03] Fangxin Flou
Description:
I start 200 client session to access a non-exist table, found it runs very slow.

for non exist tables, share will not be opened, so m_open_in_progress will be always set to true as there is one session try to open it, and other session are waiting.

    share = it->second.get();
    if (!share->m_open_in_progress). /* share error should be checked here to avoid condition waiting */
      return process_found_table_share(thd, share, open_view);

    DEBUG_SYNC(thd, "get_share_before_COND_open_wait");
    thd_wait_begin(thd, THD_WAIT_TABLE_LOCK);
    mysql_cond_wait(&COND_open, &LOCK_open);
    thd_wait_end(thd);

How to repeat:
start 200 client session to access a non-exist table

Suggested fix:
should check share error before waiting.

    if (!share->m_open_in_progress)
      return process_found_table_share(thd, share, open_view);

by 

    if (share->error || !share->m_open_in_progress)
      return process_found_table_share(thd, share, open_view);
[28 Jul 2021 4:55] Fangxin Flou
sugested fix code

diff --git a/sql/sql_base.cc b/sql/sql_base.cc
index 2bf1ae61e11..40c2c25e8d3 100644
--- a/sql/sql_base.cc
+++ b/sql/sql_base.cc
@@ -704,7 +704,7 @@ TABLE_SHARE *get_table_share(THD *thd, const char *db, const char *table_name,
       continue;
     }
     share = it->second.get();
-    if (!share->m_open_in_progress)
+    if (share->error || !share->m_open_in_progress)
       return process_found_table_share(thd, share, open_view);

     DEBUG_SYNC(thd, "get_share_before_COND_open_wait");
@@ -844,6 +844,8 @@ TABLE_SHARE *get_table_share(THD *thd, const char *db, const char *table_name,
   */
   if (open_table_err) {
     share->error = true;  // Allow waiters to detect the error
+    mysql_mutex_unlock(&LOCK_open);
+    mysql_mutex_lock(&LOCK_open);
     share->decrement_ref_count();
     table_def_cache->erase(to_string(share->table_cache_key));
 #if defined(ENABLED_DEBUG_SYNC)
[28 Jul 2021 5:12] Fangxin Flou
suggested fix code

Attachment: bugfix_104432.log (application/octet-stream, text), 1.04 KiB.

[28 Jul 2021 5:13] Fangxin Flou
With the above fix, the error QPS improved from less than 1000 to 100k.
[28 Jul 2021 10:37] MySQL Verification Team
Hello Fangxin Flou,

Thank you for the report and contribution.
Please ensure to re-send the patch via "contribution" tab. Otherwise we would not be able to accept it.

Thanks,
Umesh
[28 Jul 2021 12:42] Fangxin Flou
suggested fix code

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bugfix_104432.log (application/octet-stream, text), 1.04 KiB.

[23 Aug 2021 13:56] Ståle Deraas
Posted by developer:
 
Hi Fangxin Flou,

Thank you for your contribution! However we think that we need to study possible alternative approaches to the problem which will avoid introducing sleeps in the code. You are also welcome to contribute another patch to the issue that do not use sleeps.
[25 Aug 2021 2:28] Fangxin Flou
Yes, sleep looks not a good way.

I am still trying to find out a better solution.