Bug #99155 main.mysqldump fails occasionally when mysqlimport crashes instead of exits
Submitted: 1 Apr 2020 18:02 Modified: 22 Jul 2020 11:17
Reporter: Herman Lee Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S7 (Test Cases)
Version:8.0.17 OS:Any
Assigned to: CPU Architecture:Any

[1 Apr 2020 18:02] Herman Lee
Description:
We noticed the main.mysqldump test failing sometimes with the following output:

sh: line 1: 681098 Segmentation fault      (core dumped) _build-8.0-Debug/runtime_output_directory//mysqlimport --defaults-file=_build-8.0-Debug/mysql-test/var/4/my.cnf --silent --use-threads=2 test _build-8.0-Debug/mysql-test/var/4/tmp/t1.txt _build-8.0-Debug/mysql-test/var/4/tmp/t2.txt _build-8.0-Debug/mysql-test/var/4/std_data/words.dat _build-8.0-Debug/mysql-test/var/4/std_data/words2.dat 2>&1
mysqltest: At line 1700: Command "exec" failed with wrong error: 139, my_errno=175. should have failed with error '1'.

It looks like safe_exit() invoked by mysqlimport when using multiple threads might not be safe because cleanup.

gdb (on an optimized build) shows the following backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  malloc_consolidate (av=av@entry=0x7fe72c000020) at malloc.c:4461
4461    malloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7fe72b7fe700 (LWP 2852757))]
(gdb) bt
#0  malloc_consolidate (av=av@entry=0x7fe72c000020) at malloc.c:4461
#1  0x00007fe73384c5a2 in malloc_consolidate (av=0x7fe72c000020) at malloc.c:4436
#2  _int_free (av=0x7fe72c000020, p=<optimized out>, have_lock=have_lock@entry=0) at malloc.c:4369
#3  0x00007fe733948b2d in __GI___libc_free (mem=<optimized out>) at malloc.c:3125
#4  tcache_thread_freeres () at malloc.c:2976
#5  0x00007fe733948c62 in __libc_thread_freeres () at thread-freeres.c:29
#6  0x00007fe732f4b6db in start_thread (arg=0x7fe72b7fe700) at pthread_create.c:478
#7  0x00007fe7338d4ebf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread apply all bt

Thread 3 (Thread 0x7fe72bfff700 (LWP 2852756)):
#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007fe73384c6bd in _int_free (av=0x7fe72c000020, p=0x7fe72c017b30, have_lock=<optimized out>) at malloc.c:4273
#2  0x00007fe7337fd83a in __run_exit_handlers (status=0, listp=0x0, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83
#3  0x00007fe7337fd88a in __GI_exit (status=<optimized out>) at exit.c:106
#4  0x00000000005afcde in write_to_table (filename=<optimized out>, mysql=0x1) at /home/herman/rocks-mysql/8.0/client/mysqlimport.cc:387
#5  0x00000000005af290 in worker_thread (arg=0x7fff84a83dd5) at /home/herman/rocks-mysql/8.0/client/mysqlimport.cc:556
#6  0x00007fe732f4b6b6 in start_thread (arg=0x7fe72bfff700) at pthread_create.c:465
#7  0x00007fe7338d4ebf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fe733db72c0 (LWP 2852753)):
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fff84a81be8, expected=0, futex_word=0x92c080 <count_threshold+40>) at ../sysdeps/unix/sysv/linux/fut
ex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x7fff84a81be8, mutex=0x92c030 <counter_mutex>, cond=0x92c058 <count_threshold>) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x92c058 <count_threshold>, mutex=0x92c030 <counter_mutex>, abstime=0x7fff84a81be8) at pthread_cond_wait.c:667
#3  0x00000000005aed29 in native_cond_timedwait (cond=0x92c080 <count_threshold+40>, mutex=0x189, abstime=0x5e839faa) at /home/herman/rocks-mysql/8.0/include/thr_cond.h:100
#4  main (argc=<optimized out>, argv=<optimized out>) at /home/herman/rocks-mysql/8.0/client/mysqlimport.cc:641

Thread 1 (Thread 0x7fe72b7fe700 (LWP 2852757)):
#0  malloc_consolidate (av=av@entry=0x7fe72c000020) at malloc.c:4461
#1  0x00007fe73384c5a2 in malloc_consolidate (av=0x7fe72c000020) at malloc.c:4436
#2  _int_free (av=0x7fe72c000020, p=<optimized out>, have_lock=have_lock@entry=0) at malloc.c:4369
#3  0x00007fe733948b2d in __GI___libc_free (mem=<optimized out>) at malloc.c:3125
#4  tcache_thread_freeres () at malloc.c:2976
#5  0x00007fe733948c62 in __libc_thread_freeres () at thread-freeres.c:29
#6  0x00007fe732f4b6db in start_thread (arg=0x7fe72b7fe700) at pthread_create.c:478
#7  0x00007fe7338d4ebf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

How to repeat:
For us, it's running mtr --repeat=100 main.mysqldump on a release build, which invokes mysqlimport --use-threads=2. However, given this is a race condition and might be dependent on libraries being used (i.e. cleanup required during exit), it might be difficult to reproduce.

Suggested fix:
 static void safe_exit(int error, MYSQL *mysql) {
   if (ignore_errors) return;
+  if (opt_use_threads) return; /* Process can crash if worker thread exits */
   if (mysql) mysql_close(mysql);
   exit(error);
 }

This is just a workaround based on the code around invoking safe_exit() in the testcase. It might not apply to the general use case for mysqlimport.
[6 Apr 2020 12:18] Georgi Kodinov
Thank you for the bug report and for the analysis. 
Unfortunately the fix you've proposed is not completely enough as it will make e.g. mysqlimport issuing LOCK TABLE and that failing go unnoticed and I do not think this is the intent.
[6 Apr 2020 12:22] Georgi Kodinov
Posted by developer:
 
the recommended fix would be to add new checks and/or new flag to make sure processing stops for all threads in the right way.
[22 Jul 2020 11:17] Paul DuBois
Posted by developer:

Fixed in 8.0.22.

Work was done for test suite. No changelog entry required.