MySQL Bugs: #24507: rpl_log.test crash slave

Bug #24507	rpl_log.test crash slave
Submitted:	22 Nov 2006 14:17	Modified:	19 Jan 2007 3:39
Reporter:	Lars Thalmann	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.0	OS:
Assigned to:	Rafal Somla	CPU Architecture:	Any

Description:
rpl_log                        [ fail ]

Errors are (from /dev/shm/var-ps_stm-6/log/mysqltest-time) :
mysqltest: At line 16: query 'reset master' failed: 2013: Lost connection to MySQL server during query

MASTER.ERR LOG
--------------
CURRENT_TEST: rpl_log
061115 22:09:17 [Note] /data/pushbuild/pb/mysql-5.0/890/mysql-5.0-clone/sql/mysqld: Normal shutdown

SLAVE.ERR LOG
-------------
CURRENT_TEST: rpl_log
061115 22:09:16 [Note] /data/pushbuild/pb/mysql-5.0/890/mysql-5.0-clone/sql/mysqld: ready for connections.
Version: '5.0.32-debug-log'  socket: '/dev/shm/pbtmp-ps_stm-6/slave.sock'  port: 10062  Source distribution
061115 22:09:17 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log '/dev/shm/var-ps_stm-6/log/slave-relay-bin.000001' position: 4
061115 22:09:17 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10060',  replication started in log 'FIRST' at position 4
061115 22:09:17 [Note] Slave I/O thread killed while reading event
061115 22:09:17 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 98
061115 22:09:17 [Note] Error reading relay log event: slave SQL thread was killed

(And then a core file.)

How to repeat:
Pushbuild

Suggested fix:
Perhaps:
1. Make a better error message giving more information about what 
   the error actually was in reading
2. If error, make sure not to core  :)

Here is the stacktrace from the core:

    (gdb) bt
    #0  0x005227a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
    #1  0x007898bb in pthread_kill () from /lib/tls/libpthread.so.0
    #2  0x084a1775 in write_core (sig=11) at stacktrace.c:245
    #3  0x0826c995 in handle_segfault (sig=11) at mysqld.cc:2115
    #4  <signal handler called>
    #5  0x00000000 in ?? ()
    #6  0x0078d2aa in _Unwind_ForcedUnwind () from /lib/tls/libpthread.so.0
    #7  0x0078af81 in __pthread_unwind () from /lib/tls/libpthread.so.0
    #8  0x00786f00 in pthread_exit () from /lib/tls/libpthread.so.0
    #9  0x084885d7 in handle_slave_io (arg=0x9f6c858) at slave.cc:3769
    #10 0x00786341 in start_thread () from /lib/tls/libpthread.so.0
    #11 0x006066fe in clone () from /lib/tls/libc.so.6

I have a plausible theory about what is going on.

The crash is in this piece of code from glibc (as found by Google code
search), where pthread_cancel_init() is inlined in _Unwind_Reason_Code():

    _Unwind_Reason_Code
    _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
			  void *stop_argument)
    {
      if (__builtin_expect (libgcc_s_forcedunwind == NULL, 0))
	pthread_cancel_init ();
      return libgcc_s_forcedunwind (exc, stop, stop_argument);
    }

    void
    pthread_cancel_init (void)
    {
      void *resume, *personality, *forcedunwind, *getcfa;
      void *handle;

      if (__builtin_expect (libgcc_s_getcfa != NULL, 1))
	return;

      handle = __libc_dlopen ("libgcc_s.so.1");

      if (handle == NULL
	  || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
	  || (personality = __libc_dlsym (handle, "__gcc_personality_v0")) == NULL
	  || (forcedunwind = __libc_dlsym (handle, "_Unwind_ForcedUnwind"))
	     == NULL
	  || (getcfa = __libc_dlsym (handle, "_Unwind_GetCFA")) == NULL
    #ifdef ARCH_CANCEL_INIT
	  || ARCH_CANCEL_INIT (handle)
    #endif
	  )
	__libc_fatal ("libgcc_s.so.1 must be installed for pthread_cancel to work\n");

      libgcc_s_resume = resume;
      libgcc_s_personality = personality;
      libgcc_s_forcedunwind = forcedunwind;
      libgcc_s_getcfa = getcfa;
    }

Note that there is actually a race in this code:

   - Thread A finds libgcc_s_forcedunwind==NULL and enters
     pthread_cancel_init(). A context switch then occurs before thread A has
     the time to check the libgcc_s_getcfa variable.

   - Thread B finds libgcc_s_forcedunwind==NULL and enters
     pthread_cancel_init(). It finds libgcc_s_getcfa==NULL, and goes to set
     libgcc_s_getcfa = getcfa.

   - Thread A is later re-scheduled, and now finds libgcc_s_getcfa!=NULL so
     returns immediately from pthread_cancel_init(). It then proceeds to
     execute the call (*libgcc_s_forcedunwind)() using the _old_ previously
     loaded value in %edx, which is still NULL. Hence a segfault.

So the problem is that the libgcc_s_getcfa variable is checked and modified
without any kind of synchronization.

I actually found some evidence in the core file that this race is exactly what
happened. Here are the registers at the point of crash:

    (gdb) info reg
    eax            0x78ade0 7908832
    ecx            0xb6fa6dd0       -1225101872
    edx            0x0      0
    ebx            0x78fff4 7929844
    esp            0xb6fa6328       0xb6fa6328
    ebp            0xb6fa6348       0xb6fa6348
    esi            0xb6fa6480       -1225104256
    edi            0xb6fa6dd0       -1225101872
    eip            0x78d2aa 0x78d2aa

And here is the disassembly, with some comments.

    0x0078d270 <_Unwind_ForcedUnwind+0>:    push   %ebp
    0x0078d271 <_Unwind_ForcedUnwind+1>:    mov    %esp,%ebp
    0x0078d273 <_Unwind_ForcedUnwind+3>:    sub    $0x20,%esp
    0x0078d276 <_Unwind_ForcedUnwind+6>:    mov    %ebx,0xfffffff4(%ebp)
    0x0078d279 <_Unwind_ForcedUnwind+9>:    call   0x7852da <__i686.get_pc_thunk.bx>
    0x0078d27e <_Unwind_ForcedUnwind+14>:   add    $0x2d76,%ebx
    0x0078d284 <_Unwind_ForcedUnwind+20>:   mov    %esi,0xfffffff8(%ebp)
    0x0078d287 <_Unwind_ForcedUnwind+23>:   mov    0x21ac(%ebx),%edx

libgcc_s_forcedunwind is now loaded in %edx.

    0x0078d28d <_Unwind_ForcedUnwind+29>:   mov    %edi,0xfffffffc(%ebp)
    0x0078d290 <_Unwind_ForcedUnwind+32>:   test   %edx,%edx
    0x0078d292 <_Unwind_ForcedUnwind+34>:   je     0x78d2b7 <_Unwind_ForcedUnwind+71>

From the register dump, %edx is 0, so this jump is taken.

    0x0078d294 <_Unwind_ForcedUnwind+36>:   mov    0x10(%ebp),%esi
    0x0078d297 <_Unwind_ForcedUnwind+39>:   mov    0x8(%ebp),%edi
    0x0078d29a <_Unwind_ForcedUnwind+42>:   mov    0xc(%ebp),%eax
    0x0078d29d <_Unwind_ForcedUnwind+45>:   mov    %esi,0x8(%esp)
    0x0078d2a1 <_Unwind_ForcedUnwind+49>:   mov    %edi,(%esp)
    0x0078d2a4 <_Unwind_ForcedUnwind+52>:   mov    %eax,0x4(%esp)
    0x0078d2a8 <_Unwind_ForcedUnwind+56>:   call   *%edx

This is where it crashes since %edx (libgcc_s_forcedunwind) is NULL.

    0x0078d2aa <_Unwind_ForcedUnwind+58>:   mov    0xfffffff4(%ebp),%ebx
    0x0078d2ad <_Unwind_ForcedUnwind+61>:   mov    0xfffffff8(%ebp),%esi
    0x0078d2b0 <_Unwind_ForcedUnwind+64>:   mov    0xfffffffc(%ebp),%edi
    0x0078d2b3 <_Unwind_ForcedUnwind+67>:   mov    %ebp,%esp
    0x0078d2b5 <_Unwind_ForcedUnwind+69>:   pop    %ebp
    0x0078d2b6 <_Unwind_ForcedUnwind+70>:   ret    

This is where the code jump to from above when it finds libgcc_s_forcedunwind
to be NULL (it is the inlined pthread_cancel_init() code).

    0x0078d2b7 <_Unwind_ForcedUnwind+71>:   mov    0x21b0(%ebx),%eax
    0x0078d2bd <_Unwind_ForcedUnwind+77>:   test   %eax,%eax
    0x0078d2bf <_Unwind_ForcedUnwind+79>:   jne    0x78d294 <_Unwind_ForcedUnwind+36>

And here it returns immediately, since it loads libgcc_s_getcfa into %eax, and
finds it non-NULL (the register dump shows %eax is 0x78ade0) -> crash.

Some more dumps to show this:

    (gdb) x $ebx+0x21ac
    0x7921a0 <libgcc_s_forcedunwind>:       0x009340e4
    (gdb) x $ebx+0x21b0
    0x7921a4 <libgcc_s_getcfa>:     0x00932b98
    (gdb) x 0x009340e4
    0x9340e4 <_Unwind_ForcedUnwind>:        0x57e58955
    (gdb) x 0x00932b98
    0x932b98 <_Unwind_GetCFA>:      0x8be58955

So the variable libgcc_s_forcedunwind is actually non-NULL at the time of
crash (set by the other thread in the race). But the compiled code naturally
uses the previously loaded value in %edx, having no reason to believe that it
might have changed since it was last loaded. Hence the crash.

<rest of disassembly omitted>

As to how to fix it, I'm not sure ... it looks very much like a bug in
glibc. We cannot really be expected to protect pthread_exit() with a mutex
:-). I think we should file a glibc bug report actually...

If we were to try to fix it, it is BTW going to be real fun trying to
reproduce that race, especially given that it can only occur on the _first_
occurence of pthread_exit() in the process ...

Apparently, the problem is known to glibc developers and already fixed. See:
<http://sourceware.org/bugzilla/show_bug.cgi?id=2644>. We should upgrade to more recent version of glibc :)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15923

ChangeSet@1.2310, 2006-11-28 09:26:26+01:00, rafal@quant.(none) +1 -0
  BUG#24507 (rpl_log.test crash slave):
  
  This patch introduces pthread_join() call when terminating slave I/O 
  and SQL threads. This will make the main thread to wait until these 
  threads are really finished avoiding possible race conditions in 
  thread termination code.
  
  Using pthread_join() means that the threads should not be deteached.
  
  This change could also help with BUG#24387.

I don't like the proposed solutions becasue of the following reason:
- It changes the current code to use a pthread_join(); A function we don't use anywhere else and may have unknown problems,
- We create and destroy threads in a lot of other places than in replication and this patch doesn't solve any of these problems. (The reason this shows up in replication tests is mainly becasue we don't have many test, except replications tests, that creates a lot of new connections/threads)
- This is a glibc bug and we shouldn't have to change logic in the program becasue of a system library bug (that is already fixed)
- Slave threads are destroyed in many other places than in terminate_slave_threads() (for example when they loose connection to master)
and if we don't use detached threads we will use up resources for pthread_join() that will never be done.

An alternative solution for the problem that is less intrusive:
- On linux, lets on startup create a dummy thread that we then terminate with pthread_exit(0) at once. This will initialize the NTPL thread variables that causes the race condition and we will not get a crash anymore. After 1/2 a year, when all glibc versions are updated, we can delete this code.

For the linux build machines, I recommend we upgrade to a newer glibc as soon there is a glibc available with the patch. For all staticly compile mysqld versions, this problem will then be solved.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/16124

ChangeSet@1.2323, 2006-11-29 18:11:24+01:00, rafal@quant.(none) +2 -0
  BUG#24507 (rpl_log.test crash slave):
  
  The problem was located to lie inside current NTPL pthread_exit() 
  implementation. Race conditions in this code can lead to segmentation
  fault. Hovewer, this can happen only in a rece between first thread 
  calling pthread_exit() and other threads. 
  
  Workaround implemented in this patch spawns a dummy thread, which
  exits immediately, during thread lib initialization. This will exclude
  segment violations when further threads exit.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/16642

ChangeSet@1.2323, 2006-12-08 11:41:12+01:00, rafal@quant.(none) +2 -0
  BUG#24507 (rpl_log.test crash slave):
  
  The problem was located to lie inside current NPTL pthread_exit() 
  implementation. Race conditions in this code can lead to segmentation
  fault. Hovewer, this can happen only in a race between first thread 
  calling pthread_exit() and other threads. 
  
  Workaround implemented in this patch spawns a dummy thread, which
  exits immediately, during thread lib initialization. This will exclude
  segment violations when further threads exit.

Pushed into 5.0.36, 5.1.15

Noted in 5.0.36, 5.1.15 changelogs.

A workaround was implemented to avoid a race condition in the NPTL
pthread_exit() implementation.

Just to let people know and for better cross-referencing — the patch from this bug causes a regression which is a subject of bug #42850