Bug #90742 | SIGHUP cause mysql server crash. | ||
---|---|---|---|
Submitted: | 4 May 2018 3:42 | Modified: | 3 Oct 2018 1:47 |
Reporter: | Seunguck Lee | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Connection Handling | Severity: | S1 (Critical) |
Version: | 5.7.22, 8.0.11 | OS: | CentOS (Linux mysql_server 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_) |
Assigned to: | CPU Architecture: | x86 | |
Tags: | crash, HUP, SIGHUP |
[4 May 2018 3:42]
Seunguck Lee
[4 May 2018 4:26]
MySQL Verification Team
Hello Seunguck Lee, Thank you for the report and feedback. Thanks, Umesh
[14 Aug 2018 12:51]
Jean-François Gagné
I am able to reproduce this bug with only using concurrency=1 with mysqldslap and MySQL 5.7.22. >> TERMINAL-1: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_22$ mysqld_pid=$(pidof mysqld); while date; do kill -SIGHUP "$mysqld_pid" || break; sleep 1; done Tue Aug 14 12:38:35 UTC 2018 Tue Aug 14 12:38:36 UTC 2018 Tue Aug 14 12:38:37 UTC 2018 Tue Aug 14 12:38:38 UTC 2018 Tue Aug 14 12:38:39 UTC 2018 Tue Aug 14 12:38:40 UTC 2018 bash: kill: (20599) - No such process >> TERMINAL-2: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_22$ ~/opt/mysql/5.7.22/bin/mysqlslap --defaults-file=./my.sandbox.cnf --query="select 1;" --concurrency=1 --iterations=100000000 Tue Aug 14 12:38:37 UTC 2018 ^C >> MySQL Error Logs: [...] Version: '5.7.22' socket: '/tmp/mysql_sandbox5722.sock' port: 5722 MySQL Community Server (GPL) [...] Event queue status: Element count : 0 Data locked : NO Attempting lock : NO LLA : init_queue:96 LUA : init_queue:104 WOC : NO Next activation : never [Warning] Server shutdown in progress 12:38:39 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=3 max_threads=151 thread_count=1 connection_count=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68195 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something wentterribly wrong... stack_bottom = 0 thread_stack 0x40000 /home/jgagne/opt/mysql/5.7.22/bin/mysqld(my_print_stacktrace+0x35)[0xf4b6d5] /home/jgagne/opt/mysql/5.7.22/bin/mysqld(handle_fatal_signal+0x4a4)[0x7d0d74] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f7009d550c0] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15_List_node_base4hookEPS_+0xf)[0x7f700901167f] /home/jgagne/opt/mysql/5.7.22/bin/mysqld(_ZN29Per_thread_connection_handler40check_idle_thread_and_enqueue_connectionEP12Channel_info+0xc1)[0xded491] /home/jgagne/opt/mysql/5.7.22/bin/mysqld(_ZN29Per_thread_connection_handler14add_connectionEP12Channel_info+0x13)[0xdeddc3] /home/jgagne/opt/mysql/5.7.22/bin/mysqld(_ZN26Connection_handler_manager22process_new_connectionEP12Channel_info+0x66)[0x7ef186] /home/jgagne/opt/mysql/5.7.22/bin/mysqld(_Z11mysqld_mainiPPc+0x1464)[0x7cb394] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f70086d72e1] /home/jgagne/opt/mysql/5.7.22/bin/mysqld[0x7c0ed9] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Same with MySQL 5.7.23: >> TERMINAL-1: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_23$ mysqld_pid=$(pidof mysqld); while date; do kill -SIGHUP "$mysqld_pid" || break; sleep 1; done Tue Aug 14 12:48:24 UTC 2018 Tue Aug 14 12:48:26 UTC 2018 Tue Aug 14 12:48:27 UTC 2018 Tue Aug 14 12:48:28 UTC 2018 Tue Aug 14 12:48:29 UTC 2018 Tue Aug 14 12:48:30 UTC 2018 Tue Aug 14 12:48:31 UTC 2018 bash: kill: (20196) - No such process >> TERMINAL-2: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_23$ date; ~/opt/mysql/5.7.23/bin/mysqlslap --defaults-file=./my.sandbox.cnf --query="select 1;" --concurrency=1 --iterations=100000000 Tue Aug 14 12:48:26 UTC 2018 ^C >> MySQL Error Logs: [...] Version: '5.7.23' socket: '/tmp/mysql_sandbox5724.sock' port: 5724 MySQL Community Server (GPL) [...] Event queue status: Element count : 0 Data locked : NO Attempting lock : NO LLA : init_queue:96 LUA : init_queue:104 WOC : NO Next activation : never [Warning] Server shutdown in progress 12:48:30 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=5 max_threads=151 thread_count=2 connection_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68195 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /home/jgagne/opt/mysql/5.7.23/bin/mysqld(my_print_stacktrace+0x35)[0xf4e5e5] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(handle_fatal_signal+0x4a4)[0x7d1b64] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f39a7f690c0] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15_List_node_base4hookEPS_+0xf)[0x7f39a722567f] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN29Per_thread_connection_handler40check_idle_thread_and_enqueue_connectionEP12Channel_info+0xc1)[0xdeeda1] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN29Per_thread_connection_handler14add_connectionEP12Channel_info+0x13)[0xdef6d3] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN26Connection_handler_manager22process_new_connectionEP12Channel_info+0x66)[0x7f02a6] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_Z11mysqld_mainiPPc+0x146d)[0x7cc17d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f39a68eb2e1] /home/jgagne/opt/mysql/5.7.23/bin/mysqld[0x7c1cb9] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.
[14 Aug 2018 13:27]
Jean-François Gagné
I am also able to reproduce without mysqlslap, with MySQL 5.7.23, and only with the mysql client. Strangely, the crash does not happen at the 1st instance of the mysql client disconnection, but at the 2nd. Also note the "Server shutdown in progress" in the error logs that looks related to the 1st disconnection. Without the reconnection (\r), the problem is not reproduced. >> TERMINAL-1: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_23$ mysqld_pid=$(pidof mysqld); while date; do kill -SIGHUP "$mysqld_pid" || break; sleep 1; done Tue Aug 14 13:04:31 UTC 2018 Tue Aug 14 13:04:32 UTC 2018 Tue Aug 14 13:04:33 UTC 2018 Tue Aug 14 13:04:34 UTC 2018 Tue Aug 14 13:04:35 UTC 2018 Tue Aug 14 13:04:36 UTC 2018 Tue Aug 14 13:04:37 UTC 2018 Tue Aug 14 13:04:38 UTC 2018 Tue Aug 14 13:04:39 UTC 2018 bash: kill: (32313) - No such process >> TERMINAL-2: jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_23$ date; yes "select connection_id(); \\r" | head -n 100000 | ./use -N | tail Tue Aug 14 13:04:33 UTC 2018 ERROR 2013 (HY000) at line 1494: Lost connection to MySQL server at 'reading initial communication packet', system error: 95 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 jgagne@jgagne-mysql-tests-01:~/sandboxes/msb_5_7_23$ date; yes "select connection_id(); \\r" | head -n 100000 | ./use -N | tail Tue Aug 14 13:04:38 UTC 2018 ERROR 2013 (HY000) at line 2: Lost connection to MySQL server at 'reading initial communication packet', system error: 95 1496 1497 >> MySQL Error Logs: [...] Version: '5.7.23' socket: '/tmp/mysql_sandbox5724.sock' port: 5724 MySQL Community Server (GPL) [...] [Warning] Server shutdown in progress [...] Event queue status: Element count : 0 Data locked : NO Attempting lock : NO LLA : init_queue:96 LUA : init_queue:104 WOC : NO Next activation : never 13:04:38 UTC - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=2 max_threads=151 thread_count=0 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68195 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /home/jgagne/opt/mysql/5.7.23/bin/mysqld(my_print_stacktrace+0x35)[0xf4e5e5] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(handle_fatal_signal+0x4a4)[0x7d1b64] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7fdc5b3460c0] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt15_List_node_base4hookEPS_+0xf)[0x7fdc5a60267f] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN29Per_thread_connection_handler40check_idle_thread_and_enqueue_connectionEP12Channel_info+0xc1)[0xdeeda1] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN29Per_thread_connection_handler14add_connectionEP12Channel_info+0x13)[0xdef6d3] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_ZN26Connection_handler_manager22process_new_connectionEP12Channel_info+0x66)[0x7f02a6] /home/jgagne/opt/mysql/5.7.23/bin/mysqld(_Z11mysqld_mainiPPc+0x146d)[0x7cc17d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fdc59cc82e1] /home/jgagne/opt/mysql/5.7.23/bin/mysqld[0x7c1cb9] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.
[14 Aug 2018 13:51]
Jean-François Gagné
Two more things (sorry for the spam): 1) after the 1st disconnection, if I try to shutdown mysqld, the shutdown process hang (it looks like there is something corrupted in mysqld state). 2) if I disable the thread cache before the test (set global thread_cache_size = 0;), I am not able to reproduce the problem. So a workaround to this could be to disable the thread cache.
[15 Aug 2018 16:42]
Jean-François Gagné
Ok, after reading the code, I have a better understanding of what is happening. When there are threads available in the the Thread Cache and on a connection establishment, a channel_info is pushed on waiting_channel_info_list, wake_pthread is incremented and the blocked threads are woken up via COND_thread_cache (all this is in check_idle_thread_and_enqueue_connection of sql/conn_handler/connection_handler_per_thread.cc). But if above is happening while the Thread Cache is being purged (kill_blocked_pthreads_flag > 0), the woken-up thread will not consume the channel_info but will exit (in block_until_new_connection). This will leave waiting_channel_info_list with connections to be handled without threads in the cache. After all blocked threads are on their way to exit, kill_blocked_pthreads will close all enqueued channel_info. This is what is causing the 1st "Lost connection" and the line in the logs about the shutdown in my reproduction instruction with mysql client. Closing this connection is obviously wrong, but this is not the source of the crash. The source of the crash is that after all connection are closed in kill_blocked_pthreads, wake_pthread is still greater than 0 but there are no items in waiting_channel_info_list. Once a running thread will go to the cache (in block_until_new_connection), because wake_pthread is greater than zero, it will think there is work to be done and will pop waiting_channel_info_list. However, as the list has been emptied in kill_blocked_pthreads, this is getting a bogus pointer and causing the crash.
[16 Aug 2018 14:34]
Jean-François Gagné
Actually, it is a little more complicated than explained in my last comment. The wake_pthread greater than one after the purge is right. But it is not the 1st disconnection after the purge that crashes things (my reproduction was showing that it is the 3rd connection). So the 1st disconnection puts a thread in the cache and this thread consumes wake_pthread. But as the list front is NULL, it will exits. However, before exiting, it will pop the empty list, which corrupts the data structure. The next connection does not crash as the Thread Cache is empty, so a connection is created. Its disconnection will however put a thread in the cache. On the next connection, as there is a thread in the cache, the channel_info will be pushed on a corrupted data structure (corrupted by the 1st disconnection), which cause the SIGSEGV.
[3 Oct 2018 1:47]
Paul DuBois
Posted by developer: Fixed in 8.0.14. Server mishandling of SIGHUP signals could result in a server exit.
[3 Oct 2018 6:34]
J-F Legacy Gagné
This has been closed as fixed in 8.0.14, but it also affect 5.7. Can we hope for a fix in 5.7 ?
[5 Oct 2018 14:17]
Paul DuBois
Posted by developer: Fixed in 5.7.25.