Bug #86902 Dead lock cause mysqld hand
Submitted: 3 Jul 7:10 Modified: 12 Jul 2:08
Reporter: qinglin zhang (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.6 OS:Linux
Assigned to: Bogdan Kecman
Tags: log_xid

[3 Jul 7:10] qinglin zhang
Description:
MySQL server may hang when we turn off binlog and load data from clients.

0) The pt-pmp of mysqld lies bellow:
Mon Jul  3 10:03:57 CST 2017
   8473 __lll_lock_wait(libpthread.so.0),_L_lock_975(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),inline_mysql_mutex_lock(mysql_thread.h:705),TC_LOG_MMAP::log_xid(mysql_thread.h:705),TC_LOG_MMAP::commit(log.cc:3034),THD::cleanup(sql_class.cc:1651),THD::release_resources(sql_class.cc:1708),one_thread_per_connection_end(mysqld.cc:3225),do_handle_one_connection(sql_connect.cc:1174),handle_one_connection(sql_connect.cc:961),start_thread(libpthread.so.0),clone(libc.so.6)
     28 poll(libc.so.6),vio_io_wait(viosocket.c:808),vio_socket_io_wait(viosocket.c:70),vio_read(viosocket.c:134),net_read_raw_loop(net_serv.cc:1066),net_read_packet_header(net_serv.cc:1235),net_read_packet(net_serv.cc:1235),my_net_read(net_serv.cc:1736),do_command(sql_parse.cc:1102),do_handle_one_connection(sql_connect.cc:1140),handle_one_connection(sql_connect.cc:961),start_thread(libpthread.so.0),clone(libc.so.6)
     12 pthread_cond_wait,std::condition_variable::wait(libstdc++.so.6),rocksdb::ThreadPoolImpl::Impl::BGThread(threadpool_imp.cc:182),rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(threadpool_imp.cc:263),libstdc++::??(libstdc++.so.6),start_thread(libpthread.so.0),clone(libc.so.6)
     11 __io_getevents_0_4(libaio.so.1),os_aio_linux_collect(os0file.cc:5534),os_aio_linux_handle(os0file.cc:5534),fil_aio_wait(fil0fil.cc:6479),io_handler_thread(srv0start.cc:501),start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_wait,os_cond_wait(os0sync.cc:217),os_event_wait_low2(os0sync.cc:217),srv_worker_thread(srv0srv.cc:3284),start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwaitinfo(libc.so.6),timer_notify_thread_func(posix_timers.c:77),start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwait(libpthread.so.0),signal_hand(mysqld.cc:4112),start_thread(libpthread.so.0),clone(libc.so.6)
      1 select(libc.so.6),os_thread_sleep(os0thread.cc:293),srv_master_sleep(srv0srv.cc:3084),srv_master_thread(srv0srv.cc:3084),start_thread(libpthread.so.0),clone(libc.so.6)
      1 select(libc.so.6),os_thread_sleep(os0thread.cc:293),page_cleaner_sleep_if_needed(buf0flu.cc:2425),buf_flush_page_cleaner_thread(buf0flu.cc:2425),start_thread(libpthread.so.0),clone(libc.so.6)
      1 select(libc.so.6),os_thread_sleep(os0thread.cc:293),btr_defragment_thread(btr0defragment.cc:870),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,rocksdb::port::CondVar::Wait(port_posix.cc:92),WaitInternal(instrumented_mutex.cc:62),rocksdb::InstrumentedCondVar::Wait(instrumented_mutex.cc:62),rocksdb::DeleteScheduler::BackgroundEmptyTrash(delete_scheduler.cc:138),libstdc++::??(libstdc++.so.6),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_cond_wait(os0sync.cc:217),os_event_wait_low2(os0sync.cc:217),srv_purge_coordinator_suspend(srv0srv.cc:3437),srv_purge_coordinator_thread(srv0srv.cc:3437),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_cond_wait(os0sync.cc:217),os_event_wait_low2(os0sync.cc:217),buf_resize_thread(buf0buf.cc:3118),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_cond_wait(os0sync.cc:217),os_event_wait_low2(os0sync.cc:217),buf_dump_thread(buf0dump.cc:625),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),srv_monitor_thread(srv0srv.cc:2299),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),srv_error_monitor_thread(srv0srv.cc:2532),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),lru_manager_sleep_if_needed(buf0flu.cc:2450),buf_flush_lru_manager_thread(buf0flu.cc:2450),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),lock_wait_timeout_thread(lock0wait.cc:518),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),ib_wqueue_timedwait(ut0wqueue.cc:172),fts_optimize_thread(fts0opt.cc:3074),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_cond_wait_timed(os0sync.cc:180),os_event_wait_time_low2(os0sync.cc:180),dict_stats_thread(dict0stats_bg.cc:542),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1214),myrocks::Rdb_drop_index_thread::run(mysql_thread.h:1214),myrocks::Rdb_thread::thread_func(rdb_threads.cc:32),start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,inline_mysql_cond_timedwait(mysql_thread.h:1214),myrocks::Rdb_background_thread::run(mysql_thread.h:1214),myrocks::Rdb_thread::thread_func(rdb_threads.cc:32),start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),handle_connections_sockets(mysqld.cc:8031),handle_connections_sockets_all(mysqld.cc:6748),mysqld_main(mysqld.cc:6748),__libc_start_main(libc.so.6),_start
(END)

How to repeat:
Code analyze.

1) Thread X is waiting for LOCK_active in log_xid.

Thread 8476 (Thread 0x7f9c6c2a4700 (LWP 32369)):
#0  0x00007f9ea77a71bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9ea77a2d38 in _L_lock_975 () from /lib64/libpthread.so.0
#2  0x00007f9ea77a2ce1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000007d1e66 in inline_mysql_mutex_lock (that=0x17c2b70 <tc_log_mmap+592>) at /data/musa/mysql-5.6/include/mysql/psi/mysql_thread.h:705

#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x270ce50, xid=5281169, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3081

#5  0x00000000007d2180 in TC_LOG_MMAP::commit (this=0x17c2920 <tc_log_mmap>, thd=0x270ce50, all=<optimized out>, async=<optimized out>) at /data/musa/mysql-5.6/sql/log.cc:3034
#6  0x00000000008370d7 in THD::cleanup (this=this@entry=0x270ce50) at /data/musa/mysql-5.6/sql/sql_class.cc:1651
#7  0x00000000008377b8 in THD::release_resources (this=this@entry=0x270ce50) at /data/musa/mysql-5.6/sql/sql_class.cc:1708
#8  0x00000000006f247a in one_thread_per_connection_end (thd=0x270ce50, block_pthread=true) at /data/musa/mysql-5.6/sql/mysqld.cc:3225
#9  0x0000000000842349 in do_handle_one_connection (thd_arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:1174
#10 0x00000000008426e9 in handle_one_connection (arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:961
#11 0x00007f9ea77a0dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9ea5e1474d in clone () from /lib64/libc.so.6

(gdb) p LOCK_active
$2 = {
  m_mutex = {
    __data = {
      __lock = 2,
      __count = 0,
      __owner = 32370,
      __nusers = 1,
      __kind = 3,
      __spins = 1,
      __list = {
        __prev = 0x0,
        __next = 0x0
      }
    },
    __size = "\002\000\000\000\000\000\000\000r~\000\000\001\000\000\000\003\000\000\000\001", '\000' <repeats 18 times>,
    __align = 2
  },
  m_psi = 0x0
}

2) swith to thread 8475 (the owner of LOCK_active)

Thread 8475 (Thread 0x7f9c6c223700 (LWP 32370)):
#0  0x00007f9ea77a71bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9ea77a2d38 in _L_lock_975 () from /lib64/libpthread.so.0
#2  0x00007f9ea77a2ce1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000007d1ec6 in inline_mysql_mutex_lock (that=0x2403380) at /data/musa/mysql-5.6/include/mysql/psi/mysql_thread.h:705

#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x25290b0, xid=5281170, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3098

#5  0x00000000007d2180 in TC_LOG_MMAP::commit (this=0x17c2920 <tc_log_mmap>, thd=0x25290b0, all=<optimized out>, async=<optimized out>) at /data/musa/mysql-5.6/sql/log.cc:3034
#6  0x00000000008370d7 in THD::cleanup (this=this@entry=0x25290b0) at /data/musa/mysql-5.6/sql/sql_class.cc:1651
#7  0x00000000008377b8 in THD::release_resources (this=this@entry=0x25290b0) at /data/musa/mysql-5.6/sql/sql_class.cc:1708
#8  0x00000000006f247a in one_thread_per_connection_end (thd=0x25290b0, block_pthread=true) at /data/musa/mysql-5.6/sql/mysqld.cc:3225
#9  0x0000000000842349 in do_handle_one_connection (thd_arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:1174
#10 0x00000000008426e9 in handle_one_connection (arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:961
#11 0x00007f9ea77a0dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9ea5e1474d in clone () from /lib64/libc.so.6

$3 = (TC_LOG_MMAP::PAGE *) 0x2403350
(gdb) p p->lock
$4 = {
  m_mutex = {
    __data = {
      __lock = 2,
      __count = 0,
      __owner = 32374,
      __nusers = 1,
      __kind = 3,
      __spins = 1,
      __list = {
        __prev = 0x0,
        __next = 0x0
      }
    },
    __size = "\002\000\000\000\000\000\000\000v~\000\000\001\000\000\000\003\000\000\000\001", '\000' <repeats 18 times>,
    __align = 2
  },
  m_psi = 0x0
}

3) Switch to thread 8471 (the owner of p->lock)

Thread 8471 (Thread 0x7f9c647f8700 (LWP 32374)):
#0  0x00007f9ea77a71bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9ea77a2d38 in _L_lock_975 () from /lib64/libpthread.so.0
#2  0x00007f9ea77a2ce1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000007d1f2e in inline_mysql_mutex_lock (that=0x17c2bd0 <tc_log_mmap+688>) at /data/musa/mysql-5.6/include/mysql/psi/mysql_thread.h:705

#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x269afd0, xid=<optimized out>, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3115

#5  0x00000000007d2180 in TC_LOG_MMAP::commit (this=0x17c2920 <tc_log_mmap>, thd=0x269afd0, all=<optimized out>, async=<optimized out>) at /data/musa/mysql-5.6/sql/log.cc:3034
#6  0x00000000008370d7 in THD::cleanup (this=this@entry=0x269afd0) at /data/musa/mysql-5.6/sql/sql_class.cc:1651
#7  0x00000000008377b8 in THD::release_resources (this=this@entry=0x269afd0) at /data/musa/mysql-5.6/sql/sql_class.cc:1708
#8  0x00000000006f247a in one_thread_per_connection_end (thd=0x269afd0, block_pthread=true) at /data/musa/mysql-5.6/sql/mysqld.cc:3225
#9  0x0000000000842349 in do_handle_one_connection (thd_arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:1174
#10 0x00000000008426e9 in handle_one_connection (arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:961
#11 0x00007f9ea77a0dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9ea5e1474d in clone () from /lib64/libc.so.6

p LOCK_sync

$6 = {
  m_mutex = {
    __data = {
      __lock = 2, 
      __count = 0, 
      __owner = 32376, 
      __nusers = 1, 
      __kind = 3, 
      __spins = 0, 
      __list = {
        __prev = 0x0, 
        __next = 0x0
      }
    }, 
    __size = "\002\000\000\000\000\000\000\000x~\000\000\001\000\000\000\003", '\000' <repeats 22 times>, 
    __align = 2
  }, 
  m_psi = 0x0
}

4) Swith to thread 8469 (the owner of LOCK_sync)
Thread 8469 (Thread 0x7f9c646f6700 (LWP 32376)):
#0  0x00007f9ea77a71bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9ea77a2d38 in _L_lock_975 () from /lib64/libpthread.so.0
#2  0x00007f9ea77a2ce1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000007d1ff3 in inline_mysql_mutex_lock (that=0x17c2b70 <tc_log_mmap+592>) at /data/musa/mysql-5.6/include/mysql/psi/mysql_thread.h:705
#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x253bd30, xid=<optimized out>, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3138
#5  0x00000000007d2180 in TC_LOG_MMAP::commit (this=0x17c2920 <tc_log_mmap>, thd=0x253bd30, all=<optimized out>, async=<optimized out>) at /data/musa/mysql-5.6/sql/log.cc:3034
#6  0x00000000008370d7 in THD::cleanup (this=this@entry=0x253bd30) at /data/musa/mysql-5.6/sql/sql_class.cc:1651
#7  0x00000000008377b8 in THD::release_resources (this=this@entry=0x253bd30) at /data/musa/mysql-5.6/sql/sql_class.cc:1708
#8  0x00000000006f247a in one_thread_per_connection_end (thd=0x253bd30, block_pthread=true) at /data/musa/mysql-5.6/sql/mysqld.cc:3225
#9  0x0000000000842349 in do_handle_one_connection (thd_arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:1174
#10 0x00000000008426e9 in handle_one_connection (arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:961
#11 0x00007f9ea77a0dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9ea5e1474d in clone () from /lib64/libc.so.6

#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x253bd30, xid=<optimized out>, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3138
3138    in /data/musa/mysql-5.6/sql/log.cc
(gdb) p LOCK_active
$7 = {
  m_mutex = {
    __data = {
      __lock = 2, 
      __count = 0, 
      __owner = 32370, 
      __nusers = 1, 
      __kind = 3, 
      __spins = 1, 
      __list = {
        __prev = 0x0, 
        __next = 0x0
      }
    }, 
    __size = "\002\000\000\000\000\000\000\000r~\000\000\001\000\000\000\003\000\000\000\001", '\000' <repeats 18 times>, 
    __align = 2
  }, 
  m_psi = 0x0
}

Suppose there are for threads in the function named log_xid and one thread of the stacks like :

#0  0x00007f9ea77a71bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f9ea77a2d38 in _L_lock_975 () from /lib64/libpthread.so.0
#2  0x00007f9ea77a2ce1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000007d1e66 in inline_mysql_mutex_lock (that=0x17c2b70 <tc_log_mmap+592>) at /data/musa/mysql-5.6/include/mysql/psi/mysql_thread.h:705
#4  TC_LOG_MMAP::log_xid (this=this@entry=0x17c2920 <tc_log_mmap>, thd=thd@entry=0x25dd250, xid=5280832, async=async@entry=false) at /data/musa/mysql-5.6/sql/log.cc:3081
#5  0x00000000007d2180 in TC_LOG_MMAP::commit (this=0x17c2920 <tc_log_mmap>, thd=0x25dd250, all=<optimized out>, async=<optimized out>) at /data/musa/mysql-5.6/sql/log.cc:3034
#6  0x00000000008370d7 in THD::cleanup (this=this@entry=0x25dd250) at /data/musa/mysql-5.6/sql/sql_class.cc:1651
#7  0x00000000008377b8 in THD::release_resources (this=this@entry=0x25dd250) at /data/musa/mysql-5.6/sql/sql_class.cc:1708
#8  0x00000000006f247a in one_thread_per_connection_end (thd=0x25dd250, block_pthread=true) at /data/musa/mysql-5.6/sql/mysqld.cc:3225
#9  0x0000000000842349 in do_handle_one_connection (thd_arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:1174
#10 0x00000000008426e9 in handle_one_connection (arg=<optimized out>) at /data/musa/mysql-5.6/sql/sql_connect.cc:961
#11 0x00007f9ea77a0dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f9ea5e1474d in clone () from /lib64/libc.so.6

The dead lock situation like this:

T1 ----- waiting for LOCK_active 

T2 ----- acquire LOCK_active       ---------------  waiting for p->lock 

T3 ---------------------           aquire  p->lock  -------------   waiting for LOCK_sync and release p->lock

T4 -----                                            acquire LOCK_sync --------------------------------------- waiting for LOCK_active and release LOCK_sync
[7 Jul 0:05] Bogdan Kecman
Hi,

Thanks for the report, looking at this I have issues reproducing. Can you share the config you are using, show global status and show global variables values?

all best
Bogdan
[7 Jul 2:08] qinglin zhang
HI, Bogdan.
   I just turn off binlog and load data from clients and cause mysqld's hang. I found the problem by gdb the context and analyze the function named TC_LOG_MMAP::log_xid.

# cat my.cnf

   [mysqld_safe]

[mysql]
port=3306
prompt=\\u@\\d \\r:\\m:\\s>
default-character-set=utf8                    # UTF-8
no-auto-rehash
skip-insecure-warning                         # We may use passwd in scripts,
  [client]
  port=3306

#innodb
  innodb_stats_persistent=0
  innodb_stats_auto_recalc=OFF
  innodb_data_file_path = ibdata1:4G;ibdata2:16M:autoextend
  innodb_buffer_pool_size=8G
  innodb_buffer_pool_instances=8
  innodb_log_files_in_group=4
  innodb_log_file_size=2G
  innodb_log_buffer_size=32M
  innodb_flush_log_at_trx_commit=1
  innodb_max_dirty_pages_pct_lwm=10
  innodb_max_dirty_pages_pct=60
  innodb_thread_concurrency=16
  innodb_read_io_threads=8
  innodb_write_io_threads=1
  innodb_open_files=60000
  innodb_file_format=Barracuda
  innodb_file_per_table=1
  innodb_flush_method=O_DIRECT
  innodb_change_buffering=all
  innodb_adaptive_flushing=ON
  innodb_flush_neighbors=0
  innodb_adaptive_hash_index=OFF
  innodb_old_blocks_time=1000
  innodb_stats_on_metadata=OFF
  innodb_read_ahead_threshold=0
  innodb_use_native_aio=1
  innodb_lock_wait_timeout=5
  innodb_rollback_on_timeout=0
  innodb_purge_threads=4
  transaction-isolation=READ-COMMITTED
  innodb_checksum_algorithm=crc32
  innodb_disable_sort_file_cache=ON
  innodb_io_capacity_max=6000
  innodb_io_capacity=2000
  innodb_page_size=16k

  innodb_lru_scan_depth=2048
  innodb_sync_array_size=16
  innodb_print_all_deadlocks

  binlog_checksum=none
  log_bin_use_v1_row_events=on
  explicit_defaults_for_timestamp=off
  binlog_row_image=FULL
  binlog_rows_query_log_events=off
  binlog_stmt_cache_size=32768

# myisam
  key_buffer=64M
  myisam_sort_buffer_size=64M
  concurrent_insert=2
  delayed_insert_timeout=300

# replication
  slave_type_conversions="ALL_NON_LOSSY"
  slave_net_timeout=4
  skip-slave-start
  sync_master_info=1000
  sync_relay_log_info=1000
  gtid_mode=ON
  enforce_gtid_consistency=ON

# binlog
  server_id=3306
  binlog_cache_size=32K
  max_binlog_cache_size=2G
  max_binlog_size=500M
  binlog-format=ROW
  sync_binlog=1
  log-slave-updates=1
  expire_logs_days=0
  rpl_stop_slave_timeout=300
  slave_checkpoint_group=1024
  slave_checkpoint_period=300
  slave_parallel_workers=16
  slave_pending_jobs_size_max=1073741824
  slave_rows_search_algorithms='TABLE_SCAN,INDEX_SCAN,HASH_SCAN'
  slave_sql_verify_checksum=off
  master_verify_checksum=off

# TABLE mode is crash-safety, while FILE is not.
  master_info_repository=TABLE
  relay_log_info_repository=TABLE

# server
  default-storage-engine=INNODB
  character-set-server=utf8                     # UTF-8
  lower_case_table_names=1
  skip-external-locking
  open_files_limit=65536
  safe-user-create
  local-infile=1
  max_digest_length=0

  log_slow_admin_statements=1
  log_warnings=1
  long_query_time=1
  slow_query_log=1
  general_log=0

  query_cache_type=0
  query_cache_limit=1M
  query_cache_min_res_unit=1K

  table_definition_cache=65536
  table_open_cache=65536
  metadata_locks_hash_instances=256
  metadata_locks_cache_size=32768
  eq_range_index_dive_limit=200
  table_open_cache_instances=16

  thread_stack=512K
  thread_cache_size=256
  read_rnd_buffer_size=128K
  sort_buffer_size=256K
  join_buffer_size=128K
  read_buffer_size=128K

  port=3306
  skip-name-resolve
  skip-ssl
  max_connections=8500
  max_user_connections=8000
  max_connect_errors=65536
  max_allowed_packet=128M
  connect_timeout=8
  net_read_timeout=30
  net_write_timeout=60
  back_log=1024

  
  allow-multiple-engines
# skip-innodb

  default-tmp-storage-engine=MyISAM
  collation-server=utf8_bin
[7 Jul 2:11] qinglin zhang
Hi Bogdan.
   Thank you for your response, we can find the problem by analyzing the function TC_LOG_MMAP::log_xid, there are four threads in the TC_LOG_MMAP::log_xid at the same time, the following sequence may cause mysqld hang. 

T1 ----- waiting for LOCK_active 

T2 ----- acquire LOCK_active       ---------------  waiting for p->lock 

T3 ---------------------           aquire  p->lock  -------------   waiting for LOCK_sync and release p->lock

T4 -----                                            acquire LOCK_sync --------------------------------------- waiting for LOCK_active and release LOCK_sync

  th
[7 Jul 2:11] qinglin zhang
Hi Bogdan.
   Thank you for your response, we can find the problem by analyzing the function TC_LOG_MMAP::log_xid, there are four threads in the TC_LOG_MMAP::log_xid at the same time, the following sequence may cause mysqld hang. 

T1 ----- waiting for LOCK_active 

T2 ----- acquire LOCK_active       ---------------  waiting for p->lock 

T3 ---------------------           aquire  p->lock  -------------   waiting for LOCK_sync and release p->lock

T4 -----                                            acquire LOCK_sync --------------------------------------- waiting for LOCK_active and release LOCK_sync

  th
[10 Jul 12:05] Bogdan Kecman
Hi,

In your gdb I see that you have a lock but I don't see what's causing it nor I see any bug there.

You stated "I just turn off binlog and load data from clients and cause mysqld's hang. " .. I need more details about this.

1. you stop binlogging on master or on slaves?
2. where do you "load data from clients"? on master? on slaves?
3. how do you "load data from clients"? you are letting your applications connect to master and write to it? you are using sql dump to load data on slaves? you are importing something on master?

MySQL, like any other RDBMS can have deadlocks and that's not a bug, it's normal behavior. You can read about it here
https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html
and here
https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

I don't see this is a bug attm so please share more details

all best
Bogdan
[11 Jul 2:16] qinglin zhang
Hi Bogdan.
    Thank you again for your response and sorry for trouble you so much. 

1) Dead lock is not bug, but this kind of dead lock cause mysqld hang and can't release connections. when connections exceed max_connections, mysqld hang.

2) The problem only exists at master side.

3) My "loading data" is just insert from proxy, nothing more special.

   Sorry for trouble  and thank you very much.
[11 Jul 2:28] Bogdan Kecman
Hi,

> 1) Dead lock is not bug, but this kind of dead lock
> cause mysqld hang and can't release connections. when
> connections exceed max_connections, mysqld hang.

How are your timeouts configured? What does InnoDB monitor states?
I don't see anything in the gdb that shows a problem.

> 3) My "loading data" is just insert from proxy, nothing more special.

This is not something I can reproduce. 
What I see is that you are having configuration / scaling / performance issue that's right in the MySQL Support alley, not a bug.

best regards
Bogdan
[11 Jul 2:54] qinglin zhang
HI.
   lock timeout is 5 seconds, the context has already gone away, but you can find the bug if you analyze the function named log_xid.
   Anyway , thank you.
[11 Jul 18:50] Bogdan Kecman
Hi,

> but you can find the bug if you analyze the function named log_xid.

not a bug :(

As I said going trough support can help you configure your system properly.
Partial fix is to disable binlog sync, full solution is to upgrade to 5.7.

all best
Bogdan
[11 Jul 22:51] Mark Callaghan
Whether or not this is a bug, tc_log_mmap is a likely problem for anyone trying to use TokuDB or MyRocks. See:
https://bugs.mysql.com/bug.php?id=80870
https://bugs.launchpad.net/percona-server/+bug/1550352
https://github.com/facebook/mysql-5.6/issues/227
[12 Jul 2:08] qinglin zhang
Yeah, we run test with binlog disabled, so there is nothing for binlog to sync. So I think mark is right.
[12 Jul 2:22] Bogdan Kecman
@qinglin zhang, are you 100% sure sync is off and that binlog is off as the gdb I see shows code executing only when binlog is on

@mark, 5.7 and later solves this "properly". While it's not a bug it surely is a "problem" with 5.6, one that is solved in modern mysql

all best
Bogdan
[12 Jul 5:40] Mark Callaghan
I am happy to learn this is fixed in modern MySQL. I have fallen behind but MyRocks is moving to MySQL 8 so I will become current.