Bug #75028 Unable to shutdown mysqld when HandlerSocket is installed
Submitted: 27 Nov 2014 19:42 Modified: 28 Nov 2014 13:02
Reporter: Marcos Albe (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:5.6.21, 5.6.22 OS:Any
Assigned to: CPU Architecture:Any
Tags: HandlerSocket, LOCK_thread_count, mutex

[27 Nov 2014 19:42] Marcos Albe
Description:
It's impossible to stop mysqld after enabling HandlerSocket:

Error log showed

2014-11-25 05:16:14 31200 [Note] /usr/sbin/mysqld: Normal shutdown
2014-11-25 05:16:14 31200 [Note] Giving 24 client threads a chance to die gracefully
2014-11-25 05:16:14 31200 [Note] Event Scheduler: Purging the queue. 0 events
2014-11-25 05:16:14 31200 [Note] Shutting down slave threads
2014-11-25 05:16:16 31200 [Note] Forcefully disconnecting 24 remaining clients

Then pmp showed:

23 __lll_lock_wait(libpthread.so.0),_L_lock_995(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),dena::dbcontext::term_thread(handlersocket.so),dena::hstcpsvr_worker::run(handlersocket.so),dena::thread::thread_main(handlersocket.so),start_thread(libpthread.so.0),clone(libc.so.6)

1 __lll_lock_wait(libpthread.so.0),_L_lock_995(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),remove_global_thread,dena::dbcontext::term_thread(handlersocket.so),dena::hstcpsvr_worker::run(handlersocket.so),dena::thread::thread_main(handlersocket.so),start_thread(libpthread.so.0),clone(libc.so.6)

Definition of dena::dbcontext::term_thread (from https://github.com/DeNA/HandlerSocket-Plugin-for-MySQL/blob/master/handlersocket/database....):

dbcontext::term_thread()
{
DBG_THR(fprintf(stderr, "HNDSOCK thread end %p\n", thd));
unlock_tables_if();
my_pthread_setspecific_ptr(THR_THD, 0);
{
pthread_mutex_lock(&LOCK_thread_count);
#if MYSQL_VERSION_ID >= 50600
remove_global_thread(thd);
#else
--thread_count;
#endif
delete thd;
thd = 0;
pthread_mutex_unlock(&LOCK_thread_count);
my_thread_end();
}
}

And definition of remove_global_thread (from percona-5.6.21/sql/mysqld.cc@892):

void remove_global_thread(THD *thd)
{
  DBUG_PRINT("info", ("remove_global_thread %p current_linfo %p",
                      thd, thd->current_linfo));
  mysql_mutex_lock(&LOCK_thd_remove);
  mysql_mutex_lock(&LOCK_thread_count);
  DBUG_ASSERT(thd->release_resources_done());
  /*
    Used by binlog_reset_master. It would be cleaner to use
    DEBUG_SYNC here, but that's not possible because the THD's debug
    sync feature has been shut down at this point.
   */
  DBUG_EXECUTE_IF("sleep_after_lock_thread_count_before_delete_thd",
                  sleep(5););

  const size_t num_erased= global_thread_list->erase(thd);
  if (num_erased == 1)
    --global_thread_count;
  // Removing a THD that was never added is an error.
  DBUG_ASSERT(1 == num_erased);

  mysql_mutex_unlock(&LOCK_thd_remove);
  mysql_cond_broadcast(&COND_thread_count);
  mysql_mutex_unlock(&LOCK_thread_count);
}

So my understanding is that the second time this tries to acquire the mutex inside remove_global_thread() (mysql_mutex_lock(&LOCK_thread_count)) it will wait forever because LOCK_thread_count is not a recursive mutex:

fgrep -RnI --exclude-dir=build* --exclude-dir=mysql-test --exclude=tags --exclude=*.P* --exclude=*.T* "LOCK_thread_count" . | grep init
./tests/thread_test.c:196: pthread_mutex_init(&LOCK_thread_count,MY_MUTEX_INIT_FAST);
./sql/mysqld.cc:3638: mysql_mutex_init(key_LOCK_thread_count, &LOCK_thread_count, MY_MUTEX_INIT_FAST);
./mysys/thr_lock.c:1608: if ((error= mysql_mutex_init(0, &LOCK_thread_count, MY_MUTEX_INIT_FAST)))
./mysys/thr_alarm.c:883: mysql_mutex_init(0, &LOCK_thread_count, MY_MUTEX_INIT_FAST);

About MY_MUTEX_INIT_FAST:

#ifdef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
extern pthread_mutexattr_t my_fast_mutexattr;
#define MY_MUTEX_INIT_FAST &my_fast_mutexattr
#else
#define MY_MUTEX_INIT_FAST NULL
#endif
...
#ifdef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
  /*
    Set mutex type to "fast" a.k.a "adaptive"
    In this case the thread may steal the mutex from some other thread
    that is waiting for the same mutex. This will save us some
    context switches but may cause a thread to 'starve forever' while
    waiting for the mutex (not likely if the code within the mutex is
    short).
  */
  pthread_mutexattr_init(&my_fast_mutexattr);
  pthread_mutexattr_settype(&my_fast_mutexattr,
                            PTHREAD_MUTEX_ADAPTIVE_NP);
#endif

How to repeat:
1) Setup MySQL 5.6.21
2) Setup HandlerSocket
3) Add these to you my.cnf:
loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
4) Restart MySQL
5) INSTALL PLUGIN handlersocket SONAME 'handlersocket.so';
6) Stop MySQL... won't happen

Suggested fix:
Make the mutex recursive?
[28 Nov 2014 7:42] MySQL Verification Team
Hello Marcos Albe,

Thank you for the report.
Observed this behavior with 5.6.22.

Thanks,
Umesh
[28 Nov 2014 7:44] MySQL Verification Team
// How to repeat
- Followed steps outlines here
https://github.com/DeNA/HandlerSocket-Plugin-for-MySQL/blob/master/docs-en/installation.en...

// Binary - build details

[ushastry@cluster-repo mysql-advanced-5.6.22]$ more docs/INFO_SRC
revision-id: thirunarayanan.balathandayuth@oracle.com-20141121052002-u0rj3jaqb1i4sv5o
date: 2014-11-21 10:50:02 +0530
build-date: 2014-11-21 06:25:36 +0100
revno: 6235
branch-nick: mysql-5.6.22-release

MySQL source 5.6.22

// Source

[ushastry@cluster-repo setups]$ md5sum mysql-5.6.22.tar.gz
3985b634294482363f3d87e0d67f2262  mysql-5.6.22.tar.gz

// MySQL startup etc
scripts/mysql_install_db --basedir=/data/ushastry/setups/mysql-advanced-5.6.22 --datadir=./data --user=ushastry
bin/mysqld --basedir=/data/ushastry/setups/mysql-advanced-5.6.22 --datadir=./data --core-file --socket=/tmp/mysql.sock  --port=3306 --log-error=./data/log.err --loose_handlersocket_port=9998 --loose_handlersocket_port_wr=9999 --user=ushastry 2>&1 &

// Session 1
// Install plugin, confirm
[ushastry@cluster-repo mysql-advanced-5.6.22]$ bin/mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.22-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%version%';
+-------------------------+---------------------------------------------------------+
| Variable_name           | Value                                                   |
+-------------------------+---------------------------------------------------------+
| innodb_version          | 5.6.22                                                  |
| protocol_version        | 10                                                      |
| slave_type_conversions  |                                                         |
| version                 | 5.6.22-enterprise-commercial-advanced                   |
| version_comment         | MySQL Enterprise Server - Advanced Edition (Commercial) |
| version_compile_machine | x86_64                                                  |
| version_compile_os      | linux-glibc2.5                                          |
+-------------------------+---------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> install plugin handlersocket soname 'handlersocket.so';
Query OK, 0 rows affected (0.01 sec)

mysql> show plugins;
+----------------------------+----------+--------------------+------------------+-------------+
| Name                       | Status   | Type               | Library          | License     |
+----------------------------+----------+--------------------+------------------+-------------+
| binlog                     | ACTIVE   | STORAGE ENGINE     | NULL             | PROPRIETARY |
..
.
| partition                  | ACTIVE   | STORAGE ENGINE     | NULL             | PROPRIETARY |
| handlersocket              | ACTIVE   | DAEMON             | handlersocket.so | BSD         |
+----------------------------+----------+--------------------+------------------+-------------+
43 rows in set (0.00 sec)

// Session 2, try to bringdown and notice that it just hangs

[ushastry@cluster-repo mysql-advanced-5.6.22]$ bin/mysqladmin -uroot -p shutdown
Enter password:

^^ hanged

// Session 3
// pmp - http://poormansprofiler.org/

[root@cluster-repo setups]# ./pmp.sh
     16 at,_L_lock_995,__pthread_mutex_lock,inline_mysql_mutex_lock,at,~thr_init,optimized,operator(),optimized,start_thread,clone
      1 select,os_thread_sleep,srv_master_sleep,optimized,start_thread,clone
      1 select,os_thread_sleep,page_cleaner_sleep_if_needed,optimized,start_thread,clone
      1 __poll,vio_io_wait,vio_socket_io_wait,vio_read,net_read_raw_loop,net_read_packet_header,net_read_packet,at,do_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread,clone
      1 optimized,optimized,signal_hand,pfs_spawn_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f99baee58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f993ade58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f98bace58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f983abe58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f97baae58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f973a9e58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f96ba8e58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f963a7e58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f95ba6e58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 __io_getevents_0_4,os_aio_linux_collect,message1=0x7f8f953a5e58,,fil_aio_wait,io_handler_thread,start_thread,clone
      1 at,os_cond_wait_timed,time_in_usec=<value,srv_monitor_thread,start_thread,clone
      1 at,os_cond_wait_timed,time_in_usec=<value,srv_error_monitor_thread,start_thread,clone
      1 at,os_cond_wait_timed,time_in_usec=<value,lock_wait_timeout_thread,start_thread,clone
      1 at,os_cond_wait_timed,time_in_usec=<value,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 at,os_cond_wait_timed,time_in_usec=<value,dict_stats_thread,start_thread,clone
      1 at,os_cond_wait,reset_sig_count=1),srv_purge_coordinator_suspend,optimized,start_thread,clone
      1 at,os_cond_wait,reset_sig_count=1),buf_dump_thread,start_thread,clone
      1 at,_L_lock_995,__pthread_mutex_lock,kill_server,optimized,pfs_spawn_thread,start_thread,clone
      1 at,_L_lock_995,__pthread_mutex_lock,inline_mysql_mutex_lock,at,dena::dbcontext::term_thread,~thr_init,optimized,operator(),optimized,start_thread,clone
      1 at,inline_mysql_cond_wait,argv=0x1bf7e98),__libc_start_main,_start

// Session 1

mysql> show variables like '%version%';
ERROR 1053 (08S01): Server shutdown in progress
[28 Nov 2014 7:44] MySQL Verification Team
// pt-pmp

// Session 4

[ushastry@cluster-repo percona-toolkit-2.2.7]$ bin/pt-pmp
Sun Nov 30 14:36:58 IST 2014
     16 __lll_lock_wait,_L_lock_995(libpthread.so.0),__pthread_mutex_lock,inline_mysql_mutex_lock(mysql_thread.h:688),dena::dbcontext::term_thread(mysql_thread.h:688),~thr_init,dena::hstcpsvr_worker::run,operator,dena::thread<dena::worker_throbj>::thread_main,start_thread,clone
     10 __io_getevents_0_4,os_aio_linux_collect(os0file.cc:4934),os_aio_linux_handle(os0file.cc:4934),fil_aio_wait(fil0fil.cc:5753),io_handler_thread(srv0start.cc:485),start_thread,clone
      1 select,os_thread_sleep(os0thread.cc:261),srv_master_sleep(srv0srv.cc:2296),srv_master_thread(srv0srv.cc:2296),start_thread,clone
      1 select,os_thread_sleep(os0thread.cc:261),page_cleaner_sleep_if_needed(buf0flu.cc:2375),buf_flush_page_cleaner_thread(buf0flu.cc:2375),start_thread,clone
      1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),srv_purge_coordinator_suspend(srv0srv.cc:2633),srv_purge_coordinator_thread(srv0srv.cc:2633),start_thread,clone
      1 pthread_cond_wait,os_cond_wait(os0sync.cc:214),os_event_wait_low(os0sync.cc:214),buf_dump_thread(buf0dump.cc:594),start_thread,clone
      1 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),mysqld_main(mysql_thread.h:1162),__libc_start_main,_start
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),srv_monitor_thread(srv0srv.cc:1534),start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),srv_error_monitor_thread(srv0srv.cc:1765),start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),lock_wait_timeout_thread(lock0wait.cc:503),start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),ib_wqueue_timedwait(ut0wqueue.cc:154),fts_optimize_thread(fts0opt.cc:3003),start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:177),os_event_wait_time_low(os0sync.cc:177),dict_stats_thread(dict0stats_bg.cc:348),start_thread,clone
      1 __lll_lock_wait,_L_lock_995(libpthread.so.0),__pthread_mutex_lock,kill_server(mysql_thread.h:688),kill_server_thread(mysql_thread.h:688),pfs_spawn_thread(pfs.cc:1860),start_thread,clone
      1 __lll_lock_wait,_L_lock_995(libpthread.so.0),__pthread_mutex_lock,inline_mysql_mutex_lock(mysql_thread.h:688),remove_global_thread(mysql_thread.h:688),one_thread_per_connection_end(mysqld.cc:2748),do_handle_one_connection(sql_connect.cc:989),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone
      1 __lll_lock_wait,_L_lock_995(libpthread.so.0),__pthread_mutex_lock,inline_mysql_mutex_lock(mysql_thread.h:688),remove_global_thread(mysql_thread.h:688),dena::dbcontext::term_thread,~thr_init,dena::hstcpsvr_worker::run,operator,dena::thread<dena::worker_throbj>::thread_main,start_thread,clone
      1 do_sigwait,__sigwait,signal_hand(mysqld.cc:3185),pfs_spawn_thread(pfs.cc:1860),start_thread,clone
[28 Nov 2014 13:02] Jon Olav Hauglid
Hi,

This is not a MySQL bug, but rather a problem with HandlerSocket.
HandlerSocket should not lock LOCK_thread_count when calling 
remove_global_thread(). I suggest contacting the authors of HandlerSocket.