Bug #38522 5 seconds delay when closing application using embedded server
Submitted: 1 Aug 2008 19:01 Modified: 8 Dec 2008 23:43
Reporter: Jonathan Mcdougall Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Embedded Library ( libmysqld ) Severity:S2 (Serious)
Version:5.1.26-rc OS:Microsoft Windows
Assigned to: Vladislav Vaintroub CPU Architecture:Any
Triage: D3 (Medium)

[1 Aug 2008 19:01] Jonathan Mcdougall
Description:
Note: I am using Visual C++ 2005 SP1.

Since the embedded server seems to be bundled with no current release, I
downloaded the source for 5.1.26-rc and compiled it with WITH_EMBEDDED_SERVER.
My application hangs for 5 seconds inside mysql_library_end(). After spending
way too much time on this, here is what I found.

The delay is in pthread_cond_timedwait() (my_wincond.c:75). The call stack is as
follow:

libmysqld.dll!pthread_cond_timedwait()
libmysqld.dll!my_thread_global_end()
libmysqld.dll!my_end(int infoflag=4)
libmysqld.dll!mysql_server_end()

mysql_server_end() is called from my code when the application is closed.

pthread_cond_timedwait() waits on THR_COND_threads which is signaled in
my_thread_end() for each thread. The delay therefore happens when one or more
threads does not call my_thread_end(), which is the case here.

That baffled me at first because my application is currently single-threaded
(although compiled against the multi-threaded runtime library). I put a
breakpoint on my_thr_init.c:296, inside my_thread_init():

  pthread_mutex_lock(&THR_LOCK_threads);
  tmp->id= ++thread_id;
  ++THR_thread_count;                        // <- breakpoint
  pthread_mutex_unlock(&THR_LOCK_threads);

And sure enough, this is called twice. The first call comes from my application:

libmysqld.dll!my_thread_init()
libmysqld.dll!my_thread_global_init()
libmysqld.dll!my_init()
libmysqld.dll!mysql_server_init()

mysql_server_init() is called from my code when the application starts.

The second call comes from nowhere as it seems:

libmysqld.dll!my_thread_init()
libmysqld.dll!pthread_start()
libmysqld.dll!_callthreadstart()
libmysqld.dll!_threadstart()
kernel32.dll!7c80b683()

Under this, the debugger reports the usual "Frames below may be incorrect and/or
missing, no symbols loaded for kernel32.dll" and the trace ends there. I put a
breakpoint at the beginning _threadstart() hoping that the main thread would not
have had time to move too far from the creation point, which proved a good idea.
The thread creation comes from init_embedded_server() in lib_sql.cc:535 :

  if (flush_time && flush_time != ~(ulong) 0L)
  {
    pthread_t hThread;
    if (pthread_create(&hThread,&connection_attrib,handle_manager,0))
      sql_print_error("Warning: Can't create thread to manage maintenance");
  }

I then started looking at handle_manager() just to find that it was stuck
waiting in a condition in sql_manager.cc:91 :

  while (!manager_status && (!error || error == EINTR) && !abort_loop)
    error= pthread_cond_timedwait(&COND_manager, &LOCK_manager, &abstime);

The 'abstime' wait time here is 1800000 milliseconds, or 30 minutes. Sure
enough, it gets initialized from flush_time, which is initialized from
FLUSH_TIME that looks like:

  #define FLUSH_TIME 1800    /**< Flush every half hour */

in mysql_priv.h:385.

Two things are needed for this thread to gracefully (and quickly) exit: the
condition needs to be signaled and, since the pthread_cond_timedwait() call is
in an infinite loop, abort_loop needs to be set to 1 because of
sql_manager.cc:107 :

  if (abort_loop)
    break;

This is where it gets interesting: the only place COND_manager gets signaled is
in close_connections() (mysqld.cc:792) :

  if (manager_thread_in_use)
  {
    DBUG_PRINT("quit", ("killing manager thread: 0x%lx",
                        (ulong)manager_thread));
    (void) pthread_cond_signal(&COND_manager);
  }

However, the whole function is wrapped in #ifndef EMBEDDED_LIBRARY, meaning it
is never called. A quick global search reveals that it is called from:

free_used_memory()      mysql_test.c:930
kill_server()           mysqld.cc:1102

However, the content of kill_server() is also wrapped in #ifndef
EMBEDDED_LIBRARY. So, as far as I understand, that condition is doomed never to
be signaled.

Concerning 'abort_loop', it is set in several places:

kill_mysql()     mysqld.cc:1011  (wrapped in #ifndef EMBEDDED_LIBRARY)
kill_mysql()     mysqld.cc:1011  (wrapped in #ifndef SIGNALS_DONT_BREAK_READ,
                                  which is only defined in config-netware.h)
kill_server()    mysqld.cc:1082  (wrapped in #ifndef EMBEDDED_LIBRARY)
signal_hand()    mysqld.cc:2728  (wrapped in #if defined (HANDLE_SIGNALS) which
                                  is only defined in files from cmd-line-utils)
win_main()       mysqld.cc:4298  (wrapped in #ifndef EMBEDDED_LIBRARY)

So there are really two problems: the condition is never signaled, but even if
it was, the loop would never terminate. Therefore, the line 

  my_thread_end();

at the end of handle_manager() in sql_manager.cc:129 is never executed. This
forces mysql_server_end() to wait on this thread for 5 seconds before
abandoning it.

How to repeat:
Note: I am using Visual C++ 2005 SP1.

1) Compile 5.1.26-rc with WITH_EMBEDDED_SERVER.
2) Compile the following code and link it against libmysqld.lib

     # include <windows.h>
     # include <mysql.h>

     static char *server_options[] = {
        "mysql_test",
        "--datadir=c:/somewhere/data",
        "--language=c:/somewhere/english", 0};
    
     static char *server_groups[] = { "libmysqld_server", NULL };

     int num_elements = (sizeof(server_options) / sizeof(char *)) - 1;

     int main()
     {
        mysql_library_init(num_elements, server_options, server_groups);
  
        MYSQL* m = mysql_init(0);
        mysql_close(m);

        mysql_library_end();
     }

3) Copy libmysqld.dll that was built in 1) in the same folder as the executable.
4) Run. Note that there is a delay of 5 seconds before the application closes.

Suggested fix:
My first guess would be to signal COND_manager and set abort_loop somewhere,
perhaps in end_embedded_server(). But I may not understand the implications of
this.
[1 Aug 2008 23:02] Vladislav Vaintroub
This is unrelated to the bug itself, but the usual "Frames below might be incorrect.." can be cured if you setup symbol path to use MS public symbol server like here http://msdn.microsoft.com/en-us/library/b8ttk8zy(VS.80).aspx
[8 Aug 2008 12:55] Susanne Ebrecht
Bug #38653 is set as duplicate of this bug here.
[19 Aug 2008 15:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51978

2666 Vladislav Vaintroub	2008-08-19
      his patch fixes
      Bug#37226 Explicit call of my_thread_init() on Windows for every new thread.
      Bug#33031 app linked to libmysql.lib crash if run as service in vista under 
      localsystem
      Bug#38522 5 seconds delay when closing application using embedded server
      
      There are some problems using DllMain hook functions on Windows that 
      automatically do global and per-thread initialization for a DLL in MySQL context
      
      1)per-thread initialization(DLL_THREAD_ATTACH)
      MySQL internally counts number of active threads that and causes a delay in in 
      my_end() if not all threads are exited. But,there are threads that can be 
      started either by Windows internally (often in TCP/IP scenarios) or by user 
      himself - those threads are not necessarily using libmysql.dll functionality, 
      but nonetheless the contribute to the count of open threads.
      
      2)process-initialization (DLL_PROCESS_ATTACH)
      my_init() calls WSAStartup that itself loads DLLs and can lead to a deadlock in 
      Windows loader.
      
      
      Fix is to remove dll initialization code from libmysql.dll in general case. I
      still leave an environment variable LIBMYSQL_DLLINIT, which if set to any value 
      will cause the old behavior (DLL init hooks will be called). This env.variable 
      exists only to prevent breakage of existing Windows-only applications that 
      don't do mysql_thread_init() and work ok today. Use of LIBMYSQL_DLLINIT is 
      discouraged and it will be removed in 6.0
[19 Aug 2008 23:20] Jonathan Mcdougall
Unless I am mistaken (if I am, I apologize), this patch doesn't fix the problem: the delay is still there.

I fail to see how modifying libmysql/dll.c would fix a problem in libmysqld. In fact, dll.c does not seem to be linked with libmysqld at all. Are you sure your fix applied to the embedded server?
[20 Aug 2008 17:59] Vladislav Vaintroub
Jonathan, you're not mistaken, this does not fix your problem. The real cause of the wait is as you rightly point out is the hanging handle_manager thread.
Sorry for confusion.
[22 Aug 2008 3:55] Jonathan Mcdougall
Here is a patch that "works on my machine". I found that some debug statements
were crashing the embedded server on shutdown because they were eventually
trying to call my_thread_init(). I had to remove them also. This might be
indicating a more fundamental bug with your debug tools since they seem to be
depending on stuff that's not necessarily available while shutting down.

--- mysql-5.1.26-rc/libmysqld/lib_sql.cc Mon Jun 30 18:35:11 2008
+++ mysql-5.1.26-rc-modified/libmysqld/lib_sql.cc Thu Aug 21 23:23:10 2008
@@ -556,6 +556,12 @@
 
 void end_embedded_server()
 {
+  if (manager_thread_in_use)
+  {
+    abort_loop = 1;
+    pthread_cond_signal(&COND_manager);
+  }
+
   my_free((char*) copy_arguments_ptr, MYF(MY_ALLOW_ZERO_PTR));
   copy_arguments_ptr=0;
   clean_up(0);

   
--- mysql-5.1.26-rc/libmysql/libmysql.c Mon Jun 30 18:35:11 2008
+++ mysql-5.1.26-rc-modified/libmysql/libmysql.c Thu Aug 21 23:40:49 2008
@@ -209,7 +209,7 @@
   {
     my_end(MY_DONT_FREE_DBUG);
     /* Remove TRACING, if enabled by mysql_debug() */
-    DBUG_POP();
+    //DBUG_POP();
   }
   else
   {

   
--- mysql-5.1.26-rc/sql/sql_manager.cc Mon Jun 30 18:36:21 2008
+++ mysql-5.1.26-rc-modified/sql/sql_manager.cc Thu Aug 21 23:38:33 2008
@@ -127,5 +127,7 @@
   }
   manager_thread_in_use = 0;
   my_thread_end();
-  DBUG_RETURN(NULL);
+
+  return NULL;
+  //DBUG_RETURN(NULL);
 }
[22 Aug 2008 12:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52311

2716 Vladislav Vaintroub	2008-08-22
      Bug#38522: 5 seconds delay when closing application using embedded server
      
      The problem here is that embedded server starts handle_thread manager 
      thread  on mysql_library_init() does not stop it on mysql_library_end().
      At shutdown, my_thread_global_end() waits for thread count to become 0,
      but since we did not stop the thread it will give up after 5 seconds.
      
      Solution is to move shutdown for handle_manager thread from kill_server()
      (mysqld specific) to clean_up() that is used by both embedded and mysqld.
      
      This patch also contains some refactorings - to avoid duplicate code,
      start_handle_manager() and stop_handle_manager() functions are introduced.
      Unused variables are eliminated. handle_manager does not rely on global
      variable abort_loop anymore to stop (abort_loop is not set for embedded).
[23 Aug 2008 2:03] Jonathan Mcdougall
There is a race condition in handle_manager() and my_end() that is hard for me
to debug. I get an access violation regularily (something like 80% of
the time) in the manager thread with this stack trace:

ntdll.dll!_RtlpWaitForCriticalSection@4()  + 0x5b bytes	
ntdll.dll!_RtlEnterCriticalSection@4()  + 0x46 bytes	
libmysqld.dll!my_thread_init()  Line 294 + 0xd bytes C
libmysqld.dll!_my_thread_var()  Line 378 C
libmysqld.dll!code_state()  Line 341 + 0x5 bytes C
libmysqld.dll!_db_return_(
    unsigned int _line_=135, const char * * _sfunc_, const char * * _sfile_,
    unsigned int * _slevel_)  Line 1042 + 0xb bytes C
libmysqld.dll!handle_manager(void * arg=0x00000000)  Line 135 + 0x16 bytes C++
libmysqld.dll!pthread_start(void * param=0x01677a80)  Line 85 + 0x9 bytes C
libmysqld.dll!_callthreadstart()  Line 293 + 0xf bytes C
libmysqld.dll!_threadstart(void * ptd=0x016781b0) Line 277 C

The main thread has this:

ntdll.dll!_KiFastSystemCallRet@0() 	
ntdll.dll!_ZwYieldExecution@0()  + 0xc bytes	
kernel32.dll!_SwitchToThread@0()  + 0x6 bytes	
ws2_32.dll!DPROCESS::~DPROCESS()  + 0x6a bytes	
ws2_32.dll!DPROCESS::`scalar deleting destructor'()  + 0xd bytes	
ws2_32.dll!_WSACleanup@0()  + 0x48a0 bytes	
libmysqld.dll!my_end(int infoflag=4)  Line 230 + 0x8 bytes C
libmysqld.dll!mysql_server_end()  Line 210 + 0x7 bytes C

The rest of the time, the delay is still there. Breaking (if by change there is
no crash) in a timely fashion is difficult because the manager thread seems to
die very quickly, while the main thread is still waiting on THR_COND_threads. In
this case, THR_thread_count is still 1, even if only the main thread is running.

But again (surprise surprise), these two patches:

--- mysql-5.1.26-rc/libmysql/libmysql.c Mon Jun 30 18:35:11 2008
+++ mysql-5.1.26-rc-modified/libmysql/libmysql.c Thu Aug 21 23:40:49 2008
@@ -209,7 +209,7 @@
   {
     my_end(MY_DONT_FREE_DBUG);
     /* Remove TRACING, if enabled by mysql_debug() */
-    DBUG_POP();
+    //DBUG_POP();
   }
   else
   {

   
--- mysql-5.1.26-rc/sql/sql_manager.cc Mon Jun 30 18:36:21 2008
+++ mysql-5.1.26-rc-modified/sql/sql_manager.cc Thu Aug 21 23:38:33 2008
@@ -127,5 +127,7 @@
   }
   manager_thread_in_use = 0;
   my_thread_end();
-  DBUG_RETURN(NULL);
+
+  return NULL;
+  //DBUG_RETURN(NULL);
 }

fix the problem. As I said in a previous comment, it looks like the debug tools
are dependent on some thread specific data and call my_thread_init() when it
looks like the thread has not been initialized yet (but has in fact already been
terminated).

However, because this happens *after* the my_thread_end() has been called
(sql_manager.cc:130),  my_thread_end() is still not called for each
my_thread_init().

What's more, it looks like the server crashes if the main thread has gone far
enough so that critical structures are destroyed. If the manager thread has the
time to complete, it has called my_thread_init() and still forces the main
thread to complete.

In conclusion, in debug mode, depending on the weather outside, you either
crash in DBUG_RETURN() or wait 5 seconds in my_thread_global_end(). Your patch
merely changed the cause of the delay from an unsignaled condition in
handle_manager() to a buggy debug statement and introduced a frequent crash.

 
However, I think I understand what's happening. Building libmysqld in release
mode instead of debug works, obviously because the debug statements are not
compiled in.

Therefore, I suspect you are testing these patches in release mode only. It
might be a good idea to test them in all the supported configurations, since
the documentation clearly states that "From the work directory, open the
generated mysql.sln file with Visual Studio and select the proper configuration
using the Configuration menu. The menu provides Debug, Release, RelwithDebInfo,
MinRelInfo options. Then select Solution > Build to build the solution."

Either test your patches in all configurations or change the documentation if
some configurations are not maintained anymore.
[24 Aug 2008 1:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52388

2718 Vladislav Vaintroub	2008-08-23
      Bug#38522: 5 seconds delay when closing application using embedded server
            
      The problem here is that embedded server starts handle_thread manager 
      thread  on mysql_library_init() does not stop it on mysql_library_end().
      At shutdown, my_thread_global_end() waits for thread count to become 0,
      but since we did not stop the thread it will give up after 5 seconds.
       
      Solution is to move shutdown for handle_manager thread from kill_server()
      (mysqld specific) to clean_up() that is used by both embedded and mysqld.
      
      This patch also contains some refactorings - to avoid duplicate code,
      start_handle_manager() and stop_handle_manager() functions are introduced.
      Unused variables are eliminated. handle_manager does not rely on global
      variable abort_loop anymore to stop (abort_loop is not set for embedded).
      
      Note: Specifically on Windows and when using DBUG version of libmysqld, 
      the complete solution requires removing obsolete code my_thread_init() 
      from my_thread_var(). This has a side effect that a DBUG statement 
      after my_thread_end() can cause thread counter to be incremented, and 
      embedded will hang for some seconds. Or worse, my_thread_init() will 
      crash if critical sections have been deleted by the global cleanup 
      routine that runs in a different thread.
[24 Aug 2008 1:29] Vladislav Vaintroub
Jonathan,
thanks a lot for feedback and for following up, and for debugging and providing patches. I agree DBUG is broken on Windows in this case and it has to work. Yet I doubt we would officially support DBUG binaries - they are just to slow to run in any production environment. 

I submitted a new patch that addresses broken DBUG as well. Normally, we do not pack solutions to multiple  problems into a single patch and create a separate bug for each problem. Your case is however is arguable and complete solution to hanging embedded that works also in Debug configuration requires  fixes somewhere else.
[24 Aug 2008 21:05] Jonathan Mcdougall
> I agree DBUG is broken on Windows in this case and it has to
> work. Yet I doubt we would officially support DBUG binaries
> - they are just to slow to run in any production environment. 

In this case, remove the configurations from the generated Visual
C++ solutions: one day or another they'll start crashing and
you'll get dozens of bug reports about them.

> I submitted a new patch that addresses broken DBUG as well. 

I can confirm that changelist 52388 fixes the bug I reported
(38522): there is no delay when closing the application anymore.
 
I tested the patch on Windows XP using Visual C++ 2005 SP1, see
my initial test case. I compiled libmysqld both in Debug and
Release, you should however make sure RelWithDebInfo and MinSizeRel
also work correctly.

Good job.
[3 Dec 2008 21:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60551

2709 Vladislav Vaintroub	2008-12-03
      Bug#38522: 5 seconds delay when closing application using embedded server
                  
      The problem here is that embedded server starts handle_thread manager 
      thread  on mysql_library_init() does not stop it on mysql_library_end().
      At shutdown, my_thread_global_end() waits for thread count to become 0,
      but since we did not stop the thread it will give up after 5 seconds.
             
      Solution is to move shutdown for handle_manager thread from kill_server()
      (mysqld specific) to clean_up() that is used by both embedded and mysqld.
            
      This patch also contains some refactorings - to avoid duplicate code,
      start_handle_manager() and stop_handle_manager() functions are introduced.
      Unused variables are eliminated. handle_manager does not rely on global
      variable abort_loop anymore to stop (abort_loop is not set for embedded).
            
      Note: Specifically on Windows and when using DBUG version of libmysqld, 
      the complete solution requires removing obsolete code my_thread_init() 
      from my_thread_var(). This has a side effect that a DBUG statement 
      after my_thread_end() can cause thread counter to be incremented, and 
      embedded will hang for some seconds. Or worse, my_thread_init() will 
      crash if critical sections have been deleted by the global cleanup 
      routine that runs in a different thread.
[4 Dec 2008 0:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60555

2772 Vladislav Vaintroub	2008-12-04 [merge]
      merge bug#38522
[4 Dec 2008 18:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60647

2727 Vladislav Vaintroub	2008-12-04
      Bug#38522: 5 seconds delay when closing application using embedded server
      
                        
      The problem here is that embedded server starts handle_thread manager 
      thread  on mysql_library_init() does not stop it on mysql_library_end().
      At shutdown, my_thread_global_end() waits for thread count to become 0,
      but since we did not stop the thread it will give up after 5 seconds.
                   
      Solution is to move shutdown for handle_manager thread from kill_server()
      (mysqld specific) to clean_up() that is used by both embedded and mysqld.
                  
      This patch also contains some refactorings - to avoid duplicate code,
      start_handle_manager() and stop_handle_manager() functions are introduced.
      Unused variables are eliminated. handle_manager does not rely on global
      variable abort_loop anymore to stop (abort_loop is not set for embedded).
                  
      Note: Specifically on Windows and when using DBUG version of libmysqld, 
      the complete solution requires removing obsolete code my_thread_init() 
      from my_thread_var(). This has a side effect that a DBUG statement 
      after my_thread_end() can cause thread counter to be incremented, and 
      embedded will hang for some seconds. Or worse, my_thread_init() will 
      crash if critical sections have been deleted by the global cleanup 
      routine that runs in a different thread. 
      
      This patch also fixes and revert prior changes for Bug#38293 
      "Libmysqld crash in mysql_library_init if language file missing".
      
      Root cause of the crash observed in Bug#38293  was bug in my_thread_init() 
      described above
[5 Dec 2008 1:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60676

2772 Vladislav Vaintroub	2008-12-05 [merge]
       - merge BUG#38522 from 5.1
      - revert  BUG#41103
[8 Dec 2008 10:20] Bugs System
Pushed into 5.1.31  (revid:vvaintroub@mysql.com-20081204184153-1pkl90ezjlonu7s3) (version source revid:vvaintroub@mysql.com-20081204184153-1pkl90ezjlonu7s3) (pib:5)
[8 Dec 2008 11:31] Bugs System
Pushed into 6.0.9-alpha  (revid:vvaintroub@mysql.com-20081205010505-lsp4y3930slu2mie) (version source revid:vvaintroub@mysql.com-20081205010505-lsp4y3930slu2mie) (pib:5)
[8 Dec 2008 23:43] Paul Dubois
Noted in 5.1.31, 6.0.9 changelogs.

On Windows, a five-second delay occurred at shutdown of applications
that used the embedded server.
[19 Jan 2009 11:24] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090108105244-8opp3i85jw0uj5ib) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:02] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 16:08] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)