MySQL Bugs: #76346: innodb_thread_concurrency=1 and queries using intrinsic temp tables, causes hang

Bug #76346	innodb_thread_concurrency=1 and queries using intrinsic temp tables, causes hang
Submitted:	16 Mar 2015 23:04	Modified:	12 Jun 2015 9:34
Reporter:	Roel Van de Paar	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.7.6-m16, 5.7.8	OS:	Any
Assigned to:	Naga Satyanarayana Bodapati	CPU Architecture:	Any
Tags:	query shutdown hang

Description:
Not sure what the cause is, but 5.7.6-m16 hung on shutdown during a QA run. Will upload full stacks (broke in with gdb -pid=x). Bug verification to be done using stacks/code review, as making a testcase for this would not be straightforward. 

How to repeat:
Stacks/code review. Uploading full gdb stacks & error log.

Error log

Attachment: mysqld.err (application/octet-stream, text), 35.10 KiB.

Ah, looks like some sort of long semaphore wait issue (ref error log)

2015-03-16T22:51:57.845878Z 0 [Note] Event Scheduler: Purging the queue. 3 events
2015-03-16T23:02:36.908074Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140046824883968 has waited at trx0purge.cc line 153 for 324.00 seconds the semaphore:
Mutex at 0x7f5f3f67b7a0 created file trx0rseg.cc line 211, lock var 1
Last time reserved in file /bzr/mysql-5.7.6-m16_dbg/storage/innobase/trx/trx0trx.cc line 1528

(!) Note that this run was only a single threaded run (i.e. a single client/thread) (!)

gdb trace, taken early on in the wait

Attachment: gdb_std.txt (text/plain), 33.73 KiB.

gdb full trace, taken somewhat later

Attachment: gdb_full.txt (text/plain), 81.65 KiB.

gdb trace, taken later again (diffs from previous gdb_std.txt)

Attachment: gdb_std_later.txt (text/plain), 33.43 KiB.

Guys, I think this issue is really serious and should get high prio attention. It's happening all over the 5.7.6-m16 build from what I can see. Only takes 1-3 trials to have it happen.

Stacks from another example

Attachment: gdb_2_std.txt (text/plain), 30.69 KiB.

Full stacks from the other example mentioned in last comment

Attachment: gdb_2_full.txt (text/plain), 31.71 KiB.

Added another example. Error log not showing any long semaphore waits yet (13 min lockup);

------------
2015-03-19T23:44:03.203802Z 0 [Note] Shutting down slave threads
2015-03-19T23:44:05.203885Z 0 [Note] Forcefully disconnecting 2 remaining clients
2015-03-19T23:44:05.203896Z 0 [Warning] /sda/MS-mysql-5.7.6-m16-linux-x86_64-debug/bin/mysqld: Forcing close of thread 6  user: 'root'

2015-03-19T23:44:05.203920Z 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 1
2015-03-19T23:44:05.203927Z 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2015-03-19T23:44:05.203989Z 0 [Note] Event Scheduler: Stopped
2015-03-19T23:44:05.203998Z 0 [Note] Event Scheduler: Purging the queue. 0 events
------------ <EOF>

There were other errors in the log, from executing various testcase sql's. Server was started clean/with freshly init data dir (--init)

Again the trial was single threaded. Notice the mutex issues in the gdb stacks.

threads hanging around in ut_delay suggest similar cause to bug #76402

Agreed re: ut_delay. I removed the options discussed there, but the issue is still present, though apparently less regular (TBD). Attaching new (and indeed different looking) stacks from a hang.

gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.31 KiB.

gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.31 KiB.

gdb_full.txt

Attachment: gdb_full.txt (text/plain), 99.59 KiB.

The 'long semaphore wait' seems unrelated.

Another set of stacks, similar to last set (after options were removed) gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.32 KiB.

matching gdb_full.txt

Attachment: gdb_full.txt (text/plain), 99.99 KiB.

First testcase;

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@session.max_heap_table_size=0;
SET @@global.innodb_thread_concurrency=1;
SELECT name,n_cols,file_format,row_format FROM information_schema.innodb_sys_tables;
select convert_tz(0,'a','a');

Client will hang on last query. Pressing CTRL+C will not help;

mysql> select convert_tz(0,'a','a');
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted

Press CTRL+Z to exit and kill the PID.

When this hang is occurring, one can check status with another client:

mysql> show processlist;
+----+------+-----------+------+---------+------+-------------+------------------------------+
| Id | User | Host      | db   | Command | Time | State       | Info                         |
+----+------+-----------+------+---------+------+-------------+------------------------------+
|  2 | root | localhost | test | Query   |  540 | System lock | select convert_tz(0,'a','a') |
| 29 | root | localhost | test | Query   |    0 | starting    | show processlist             |
+----+------+-----------+------+---------+------+-------------+------------------------------+
2 rows in set (0.00 sec)

And, trying to shutdown server in this case will hang also.

Looking at my previous stack in this issue (http://bugs.mysql.com/file.php?id=22521&bug_id=76346) I notice:

#16 0x00000000010a0bd4 in dispatch_command (command=COM_QUERY, thd=0x7f0a9001a000, packet=0x7f0a90512011 "set @@session.time_zone='UTC';", packet_length=30) at /bzr/mysql-5.7.6-m16_dbg/sql/sql_parse.cc:1249

So this would have been another occurrence, likely of the same problem. Issue looks clearly related to timezones, maybe related to nano (see stacks). 

Testcase is 100% reproducible so will not add additional stacks for this testcase, devs can break in with gdb directly.

#16 0x00000000010a0bd4 in dispatch_command (command=COM_QUERY, thd=0x7fd14f41a000, packet=0x7fd14f446011 "SET @@time_zone = '6';", packet_length=22) at /bzr/mysql-5.7.6-m16_dbg/sql/sql_parse.cc:1249

From another previously uploaded stack. Again, timezone related.

Serious regression in 5.7.6-m16. The issue does NOT reproduce in 5.7.5-m15.

5.7.5-m15 output is immediate;

mysql> select convert_tz(0,'a','a');
+-----------------------+
| convert_tz(0,'a','a') |
+-----------------------+
| NULL                  |
+-----------------------+
1 row in set (0.00 sec)

Another testcase

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@global.innodb_thread_concurrency=1;
SELECT table_catalog, table_schema, table_name, column_name FROM information_schema.columns WHERE table_catalog IS NULL OR table_catalog <> 'def';
HELP '%function_2';

Last testcase seems to indicate it's not TZ related.

what seems important is that some sql queries cause innodb tables in the mysql database to be accessed!

Hello Roel,

Thank you for the report and test case.
Observed similar behavior with 5.7.8(debug/release builds).

Thanks,
Umesh

// 5.7.8 - pmp 

[umshastr@hod03]~/bugs: ./pmp.sh
     10 __io_getevents_0_4,os_aio_linux_collect,os_aio_linux_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_worker_thread,start_thread,clone
      1 sigwait,signal_hand,pfs_spawn_thread,start_thread,clone
      1 sigwaitinfo,timer_notify_thread_func,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_purge_coordinator_suspend,srv_purge_coordinator_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_resize_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_dump_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,my_cond_wait,inline_mysql_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,pc_sleep_if_needed,buf_flush_page_cleaner_coordinator,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,lock_wait_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,dict_stats_thread,start_thread,clone
      1 poll,vio_io_wait,vio_socket_io_wait,vio_ssl_read,net_read_raw_loop,net_read_packet_header,net_read_packet,my_net_read,Protocol_classic::read_packet,Protocol_classic::get_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1 poll,Mysqld_socket_listener::listen_for_connection_event,Connection_acceptor<Mysqld_socket_listener>::connection_event_loop,mysqld_main,main
      1 nanosleep,os_thread_sleep,srv_master_sleep,srv_master_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,search_topics,mysqld_help,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1
[umshastr@hod03]~/bugs: ./pmp.sh
     10 __io_getevents_0_4,os_aio_linux_collect,os_aio_linux_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_worker_thread,start_thread,clone
      1 sigwaitinfo,timer_notify_thread_func,pfs_spawn_thread,start_thread,clone
      1 pthread_join,my_thread_join,mysqld_main,main
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_purge_coordinator_suspend,srv_purge_coordinator_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_resize_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_dump_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,my_cond_wait,inline_mysql_cond_wait,Global_THD_manager::wait_till_no_thd,close_connections,signal_hand,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,pc_sleep_if_needed,buf_flush_page_cleaner_coordinator,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,lock_wait_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,dict_stats_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_master_sleep,srv_master_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,search_topics,mysqld_help,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,join_init_read_record,sub_select,do_select,JOIN::exec,handle_query,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1

Posted by developer:
 
Fixed as of the upcoming 5.7.8, 5.8.0 releases, and here's the changelog entry:

When "innodb_thread_concurrency=1", queries on intrinsic temporary tables
caused other sessions to hang. Queries on intrinsic temporary tables
should not increment the number of active threads. 

Thank you for the bug report.

For clarity, this issue is still present in the RC (which is 5.7.7)