Bug #82886 Server may crash due to a glibc bug in handling short-lived detached threads
Submitted: 7 Sep 2016 10:44 Modified: 4 Oct 2016 16:31
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.6 OS:Linux
Assigned to: CPU Architecture:Any
Tags: glibc, pthreads

[7 Sep 2016 10:44] Laurynas Biveinis
Description:
glibc has a bug where a short-lived detached thread, that manages to complete before the caller thread finishes executing pthread_create, may crash server. In MySQL, at least the InnoDB full text parallel merge threads might be short-lived enough to hit this.

The glibc bug is https://sourceware.org/bugzilla/show_bug.cgi?id=20116 (https://sourceware.org/bugzilla/show_bug.cgi?id=19951 could be related too)

How to repeat:
Make several copies of innodb.innodb-alter testcase (move out innodb-alter-kill etc out of the way for faster repro, or just write a proper --do-test regexp). Then

$ ./mtr --debug-server --parallel=8 --do-test=innodb-alter --repeat=9000
(...)
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
62	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7fd4802f6700 (LWP 2706))]
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x00000000008f2f47 in my_write_core (sig=sig@entry=11) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/mysys/stacktrace.c:422
#2  0x000000000066ca0c in handle_fatal_signal (sig=11) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/signal_handler.cc:236
#3  <signal handler called>
#4  __pthread_create_2_1 (newthread=newthread@entry=0x7fd4802f2268, attr=attr@entry=0x7fd4802f2270, start_routine=start_routine@entry=0x9e1350 <fts_parallel_merge(void*)>, arg=arg@entry=0x7fd4580dbe38) at pthread_create.c:713
#5  0x00000000009bb4b6 in os_thread_create_func (func=func@entry=0x9e1350 <fts_parallel_merge(void*)>, arg=arg@entry=0x7fd4580dbe38, thread_id=thread_id@entry=0x7fd4802f22e0) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/storage/innobase/os/os0thread.cc:193
#6  0x00000000009dfb04 in row_fts_start_parallel_merge (merge_info=<optimized out>) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/storage/innobase/row/row0ftsort.cc:984
#7  0x00000000009f7806 in row_merge_build_indexes (trx=0x7fd458089638, old_table=0x7fd458072798, new_table=0x7fd458072798, online=false, indexes=0x7fd4580d77a0, key_numbers=0x7fd4580d77a8, n_indexes=1, table=0x7fd4580b7700, add_cols=0x0, col_map=0x0, add_autoinc=18446744073709551615, sequence=...) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/storage/innobase/row/row0merge.cc:3742
#8  0x000000000096b673 in ha_innobase::inplace_alter_table (this=0x7fd458042680, altered_table=0x7fd4580b7700, ha_alter_info=0x7fd4802f2630) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/storage/innobase/handler/handler0alter.cc:3967
#9  0x0000000000747cad in handler::ha_inplace_alter_table (ha_alter_info=0x7fd4802f2630, altered_table=0x7fd4580b7700, this=<optimized out>) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/handler.h:2974
#10 mysql_inplace_alter_table (target_mdl_request=0x7fd4802f2710, alter_ctx=0x7fd4802f2e20, inplace_supported=HA_ALTER_INPLACE_SHARED_LOCK_AFTER_PREPARE, ha_alter_info=0x7fd4802f2630, altered_table=0x7fd4580b7700, table=0x7fd4580c62a0, table_list=0x7fd458031ab0, thd=0x1931e90) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_table.cc:6850
#11 mysql_alter_table (thd=thd@entry=0x1931e90, new_db=<optimized out>, new_name=<optimized out>, create_info=create_info@entry=0x7fd4802f3fa0, table_list=table_list@entry=0x7fd458031ab0, alter_info=alter_info@entry=0x7fd4802f3ee0, order_num=0, order=0x0, ignore=false) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_table.cc:8904
#12 0x000000000082954e in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x1931e90) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_alter.cc:317
#13 0x00000000006f1cce in mysql_execute_command (thd=thd@entry=0x1931e90) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_parse.cc:5498
#14 0x00000000006f5da8 in mysql_parse (thd=thd@entry=0x1931e90, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fd4802f5610) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_parse.cc:6972
#15 0x00000000006f70b1 in dispatch_command (command=COM_QUERY, thd=0x1931e90, packet=0x1a0daa1 "ALTER TABLE t1n ADD FULLTEXT INDEX(ct)", packet_length=<optimized out>) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_parse.cc:1441
#16 0x00000000006f8fb9 in do_command (thd=<optimized out>) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_parse.cc:1053
#17 0x00000000006c1722 in do_handle_one_connection (thd_arg=thd_arg@entry=0x1931e90) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_connect.cc:1541
#18 0x00000000006c17c0 in handle_one_connection (arg=arg@entry=0x1931e90) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/sql/sql_connect.cc:1444
#19 0x0000000000b43856 in pfs_spawn_thread (arg=0x1984ed0) at /mnt/workspace/percona-server-5.6-trunk/BUILD_TYPE/release/Host/ubuntu-xenial-64bit/storage/perfschema/pfs.cc:1860
#20 0x00007fd4877256fa in start_thread (arg=0x7fd4802f6700) at pthread_create.c:333
#21 0x00007fd486bbab5d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The crash happens in __pthread_create_2_1, after the thread has been launched, in dereferencing  struct pthread *pd pointer. Strace will show its containing memory area to be munmap'ped by the spawned thread which has quit already.

Suggested fix:
Implement a workaround to pthread_join selected (only extremely short lived ones) server threads instead of pthread_detach'ing them.

With a prototype patch to do so I'm unable to reproduce the crash anymore.
[7 Sep 2016 10:46] Laurynas Biveinis
A standalone C program to show the bug:

#include <pthread.h>
#include <stdio.h>

void *thread_routine(void *arg __attribute__((unused)))
{
    pthread_exit(0);
    return NULL;
}

int main(void)
{
    for (int i = 0; i < 32000; i++) {
        pthread_t thread_handle;
        pthread_attr_t thread_attr;

        int err = pthread_attr_init(&thread_attr);
        if (err != 0) {
            perror("pthread_attr_init");
            return 1;
        }

        err = pthread_attr_setdetachstate(&thread_attr,
                                          PTHREAD_CREATE_DETACHED);
        if (err != 0) {
            perror("pthread_attr_setdetachstate");
            return 4;
        }

        err = pthread_create(&thread_handle, &thread_attr, &thread_routine,
                             NULL);
        if (err != 0) {
            perror("pthread_create");
            return 2;
        }

        err = pthread_attr_destroy(&thread_attr);
        if (err != 0) {
            perror("pthread_attr_destroy");
            return 3;
        }
    }

    return 0;
}

On my VM (Ubuntu 16.04 x86_64), running two copies in parallel crash about every third time. It differs from server source in the use of thread attributes instead of pthread_detach, but this does not appear to be a relevant difference.
[7 Sep 2016 13:35] Umesh Shastry
Hello Laurynas,

Thank you for the report and test case.
Observed the issue using provided 'C' test case on Ubuntu 16.04.

Thanks,
Umesh
[8 Sep 2016 13:25] Laurynas Biveinis
Bug 82886 fix for 5.6

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug82886-5.6.patch (application/octet-stream, text), 3.35 KiB.

[8 Sep 2016 13:26] Laurynas Biveinis
Bug 82886 5.7 patch

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug82886-5.7.patch (application/octet-stream, text), 0 bytes.

[8 Sep 2016 13:28] Laurynas Biveinis
The contributed fixes:
- make no attempt to support Windows threads;
- strive to be minimal. If they needed not to, then I'd also look into removing FTS_CHILD_EXITING and its associated code, which now looks mostly duplicated by joining.
[8 Sep 2016 13:32] Laurynas Biveinis
Bug 82886 fix for 5.7, non-empty file this time

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug82886-5.7.patch (application/octet-stream, text), 3.94 KiB.

[12 Sep 2016 8:11] Vasil Dimov
Laurynas,

Thank you very much!

The patches have been reviewed and pushed to mysql-5.6 (1a08bd0, 5.6.34) and mysql-5.7 (f8fe0bf, 5.7.16). mysql-trunk does not exhibit this problem.
[4 Oct 2016 16:31] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.6.34, 5.7.16, 8.0.1 release, and here's the changelog entry:

Due to a glibc bug, short-lived detached threads could exit before the
caller has returned from pthread_create(), causing a server exit.

Thanks to Laurynas Biveinis for the patch.
[4 Oct 2016 17:41] Daniel Price
Posted by developer:
 
The fix is in 5.6.35, 5.7.17, and 8.0.1. The changelog entry was updated accordingly.