Bug #35714 several rpl tests fail with --thread-handling=pool-of-threads
Submitted: 31 Mar 2008 18:57 Modified: 27 May 2008 8:54
Reporter: Gleb Shchepa Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:6.0 OS:Any (64bit)
Assigned to: Gleb Shchepa CPU Architecture:Any
Triage: D3 (Medium)

[31 Mar 2008 18:57] Gleb Shchepa
Description:
For example:

rpl.rpl_truncate_2myisam 'stmt' [ fail ]

mysqltest: In included file "./extra/rpl_tests/rpl_truncate_helper.test": At line 21: query 'INSERT INTO t1 VALUES (1,1), (2,2)' failed: 2013: Lost connection to MySQL server during query
...

etc.

There is a race condition between THD::awake() call thread and thd_scheduler::thread_detach() call thread:

The THD::awake() function does:

(gdb) t 20
[Switching to thread 20 (Thread 0x417d1960 (LWP 25073))]#0  0x00000000006cc1e0 in THD::awake (this=0x15a5438, 
    state_to_set=THD::KILL_QUERY) at sql_class.cc:1041
1041        DBUG_ASSERT(mysys_var);
(gdb) l 1016
1011
1012          close_active_vio();
1013        }
1014    #endif    
1015      }
1016      if (mysys_var)
1017      {
1018        pthread_mutex_lock(&mysys_var->mutex);
1019        if (!system_thread)         // Don't abort locks
1020          mysys_var->abort=1;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and then this thread tries to read contents of mysys_var:

(gdb) p &mysys_var
$30 = (st_my_thread_var **) 0x15a63d0

After that another thread #22 does:

(gdb) thread 22
[Switching to thread 22 (Thread 0x41853960 (LWP 25075))]#0  thd_scheduler::thread_detach (this=0x15a6d98) at scheduler.cc:217
217         thread_attached= FALSE;
(gdb) list
213       if (thread_attached)
214       {
215         THD* thd = (THD*)list.data;
216         thd->mysys_var= NULL;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217         thread_attached= FALSE;
218     #ifndef DBUG_OFF
219         swap_dbug_explain();
220     #endif
221       }
(gdb) p &thd->mysys_var
$29 = (st_my_thread_var **) 0x15a63d0

Then the previous thread #20 tries to access contents of mysys_var (NULL pointer):

1040        if (mysys_var->current_cond && mysys_var->current_mutex)
                ^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^
1041        {
1042          pthread_mutex_lock(mysys_var->current_mutex);
                                 ^^^^^^^^^^^^^^^^^^^^^^^^
1043          pthread_cond_broadcast(mysys_var->current_cond);
                                     ^^^^^^^^^^^^^^^^^^^^^^^
1044          pthread_mutex_unlock(mysys_var->current_mutex);
                                   ^^^^^^^^^^^^^^^^^^^^^^^^
1045        }
1046        pthread_mutex_unlock(&mysys_var->mutex);
                                  ^^^^^^^^^^^^^^^^
1057      }

And rises SIGSEGV at one of that places.

How to repeat:
./mysql-test-run.pl --mysqld=--thread-handling=pool-of-threads rpl.rpl_truncate_2myisam

Suggested fix:
Just save mysys_var in local pointer:

===== sql_class.cc 1.374 vs edited =====
--- 1.374/sql/sql_class.cc      2008-03-31 20:03:50 +02:00
+++ edited/sql_class.cc 2008-03-31 20:03:46 +02:00
@@ -1013,11 +1013,12 @@ void THD::awake(THD::killed_state state_
     }
 #endif    
   }
-  if (mysys_var)
+  st_my_thread_var* save_mysys_var= mysys_var;
+  if (save_mysys_var)
   {
-    pthread_mutex_lock(&mysys_var->mutex);
+    pthread_mutex_lock(&save_mysys_var->mutex);
     if (!system_thread)                // Don't abort locks
-      mysys_var->abort=1;
+      save_mysys_var->abort=1;
     /*
       This broadcast could be up in the air if the victim thread
       exits the cond in the time between read and broadcast, but that is
@@ -1037,13 +1038,13 @@ void THD::awake(THD::killed_state state_
       It's true that we have set its thd->killed but it may not
       see it immediately and so may have time to reach the cond_wait().
     */
-    if (mysys_var->current_cond && mysys_var->current_mutex)
+    if (save_mysys_var->current_cond && save_mysys_var->current_mutex)
     {
-      pthread_mutex_lock(mysys_var->current_mutex);
-      pthread_cond_broadcast(mysys_var->current_cond);
-      pthread_mutex_unlock(mysys_var->current_mutex);
+      pthread_mutex_lock(save_mysys_var->current_mutex);
+      pthread_cond_broadcast(save_mysys_var->current_cond);
+      pthread_mutex_unlock(save_mysys_var->current_mutex);
     }
-    pthread_mutex_unlock(&mysys_var->mutex);
+    pthread_mutex_unlock(&save_mysys_var->mutex);
   }
   DBUG_VOID_RETURN;
 }
[23 May 2008 19:48] Andrei Elkin
Bug #36929 offers a patch for the current problem.
[27 May 2008 8:54] Andrei Elkin
After talking to Gleb and getting a positive feedback of a patch for Bug #36929, the latter is set as the parent to the current bug.