Bug #76346 innodb_thread_concurrency=1 and queries using intrinsic temp tables, causes hang
Submitted: 16 Mar 2015 23:04 Modified: 12 Jun 2015 9:34
Reporter: Roel Van de Paar (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.7.6-m16, 5.7.8 OS:Any
Assigned to: Naga Satyanarayana Bodapati CPU Architecture:Any
Tags: query shutdown hang

[16 Mar 2015 23:04] Roel Van de Paar
Description:
Not sure what the cause is, but 5.7.6-m16 hung on shutdown during a QA run. Will upload full stacks (broke in with gdb -pid=x). Bug verification to be done using stacks/code review, as making a testcase for this would not be straightforward. 

How to repeat:
Stacks/code review. Uploading full gdb stacks & error log.
[16 Mar 2015 23:09] Roel Van de Paar
Error log

Attachment: mysqld.err (application/octet-stream, text), 35.10 KiB.

[16 Mar 2015 23:10] Roel Van de Paar
Ah, looks like some sort of long semaphore wait issue (ref error log)

2015-03-16T22:51:57.845878Z 0 [Note] Event Scheduler: Purging the queue. 3 events
2015-03-16T23:02:36.908074Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140046824883968 has waited at trx0purge.cc line 153 for 324.00 seconds the semaphore:
Mutex at 0x7f5f3f67b7a0 created file trx0rseg.cc line 211, lock var 1
Last time reserved in file /bzr/mysql-5.7.6-m16_dbg/storage/innobase/trx/trx0trx.cc line 1528
[16 Mar 2015 23:10] Roel Van de Paar
(!) Note that this run was only a single threaded run (i.e. a single client/thread) (!)
[16 Mar 2015 23:11] Roel Van de Paar
gdb trace, taken early on in the wait

Attachment: gdb_std.txt (text/plain), 33.73 KiB.

[16 Mar 2015 23:11] Roel Van de Paar
gdb full trace, taken somewhat later

Attachment: gdb_full.txt (text/plain), 81.65 KiB.

[16 Mar 2015 23:12] Roel Van de Paar
gdb trace, taken later again (diffs from previous gdb_std.txt)

Attachment: gdb_std_later.txt (text/plain), 33.43 KiB.

[19 Mar 2015 22:58] Roel Van de Paar
Guys, I think this issue is really serious and should get high prio attention. It's happening all over the 5.7.6-m16 build from what I can see. Only takes 1-3 trials to have it happen.
[19 Mar 2015 23:56] Roel Van de Paar
Stacks from another example

Attachment: gdb_2_std.txt (text/plain), 30.69 KiB.

[19 Mar 2015 23:57] Roel Van de Paar
Full stacks from the other example mentioned in last comment

Attachment: gdb_2_full.txt (text/plain), 31.71 KiB.

[19 Mar 2015 23:59] Roel Van de Paar
Added another example. Error log not showing any long semaphore waits yet (13 min lockup);

------------
2015-03-19T23:44:03.203802Z 0 [Note] Shutting down slave threads
2015-03-19T23:44:05.203885Z 0 [Note] Forcefully disconnecting 2 remaining clients
2015-03-19T23:44:05.203896Z 0 [Warning] /sda/MS-mysql-5.7.6-m16-linux-x86_64-debug/bin/mysqld: Forcing close of thread 6  user: 'root'

2015-03-19T23:44:05.203920Z 0 [Note] Event Scheduler: Killing the scheduler thread, thread id 1
2015-03-19T23:44:05.203927Z 0 [Note] Event Scheduler: Waiting for the scheduler thread to reply
2015-03-19T23:44:05.203989Z 0 [Note] Event Scheduler: Stopped
2015-03-19T23:44:05.203998Z 0 [Note] Event Scheduler: Purging the queue. 0 events
------------ <EOF>

There were other errors in the log, from executing various testcase sql's. Server was started clean/with freshly init data dir (--init)
[20 Mar 2015 0:00] Roel Van de Paar
Again the trial was single threaded. Notice the mutex issues in the gdb stacks.
[20 Mar 2015 8:50] Shane Bester
threads hanging around in ut_delay suggest similar cause to bug #76402
[20 Mar 2015 9:28] Roel Van de Paar
Agreed re: ut_delay. I removed the options discussed there, but the issue is still present, though apparently less regular (TBD). Attaching new (and indeed different looking) stacks from a hang.
[20 Mar 2015 9:29] Roel Van de Paar
gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.31 KiB.

[20 Mar 2015 9:29] Roel Van de Paar
gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.31 KiB.

[20 Mar 2015 9:29] Roel Van de Paar
gdb_full.txt

Attachment: gdb_full.txt (text/plain), 99.59 KiB.

[20 Mar 2015 10:16] Roel Van de Paar
The 'long semaphore wait' seems unrelated.
[20 Mar 2015 10:17] Roel Van de Paar
Another set of stacks, similar to last set (after options were removed) gdb_std.txt

Attachment: gdb_std.txt (text/plain), 23.32 KiB.

[20 Mar 2015 10:17] Roel Van de Paar
matching gdb_full.txt

Attachment: gdb_full.txt (text/plain), 99.99 KiB.

[23 Mar 2015 22:08] Roel Van de Paar
First testcase;

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@session.max_heap_table_size=0;
SET @@global.innodb_thread_concurrency=1;
SELECT name,n_cols,file_format,row_format FROM information_schema.innodb_sys_tables;
select convert_tz(0,'a','a');

Client will hang on last query. Pressing CTRL+C will not help;

mysql> select convert_tz(0,'a','a');
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted

Press CTRL+Z to exit and kill the PID.

When this hang is occurring, one can check status with another client:

mysql> show processlist;
+----+------+-----------+------+---------+------+-------------+------------------------------+
| Id | User | Host      | db   | Command | Time | State       | Info                         |
+----+------+-----------+------+---------+------+-------------+------------------------------+
|  2 | root | localhost | test | Query   |  540 | System lock | select convert_tz(0,'a','a') |
| 29 | root | localhost | test | Query   |    0 | starting    | show processlist             |
+----+------+-----------+------+---------+------+-------------+------------------------------+
2 rows in set (0.00 sec)

And, trying to shutdown server in this case will hang also.

Looking at my previous stack in this issue (http://bugs.mysql.com/file.php?id=22521&bug_id=76346) I notice:

#16 0x00000000010a0bd4 in dispatch_command (command=COM_QUERY, thd=0x7f0a9001a000, packet=0x7f0a90512011 "set @@session.time_zone='UTC';", packet_length=30) at /bzr/mysql-5.7.6-m16_dbg/sql/sql_parse.cc:1249

So this would have been another occurrence, likely of the same problem. Issue looks clearly related to timezones, maybe related to nano (see stacks). 

Testcase is 100% reproducible so will not add additional stacks for this testcase, devs can break in with gdb directly.
[23 Mar 2015 22:10] Roel Van de Paar
#16 0x00000000010a0bd4 in dispatch_command (command=COM_QUERY, thd=0x7fd14f41a000, packet=0x7fd14f446011 "SET @@time_zone = '6';", packet_length=22) at /bzr/mysql-5.7.6-m16_dbg/sql/sql_parse.cc:1249

From another previously uploaded stack. Again, timezone related.
[23 Mar 2015 22:18] Roel Van de Paar
Serious regression in 5.7.6-m16. The issue does NOT reproduce in 5.7.5-m15.

5.7.5-m15 output is immediate;

mysql> select convert_tz(0,'a','a');
+-----------------------+
| convert_tz(0,'a','a') |
+-----------------------+
| NULL                  |
+-----------------------+
1 row in set (0.00 sec)
[24 Mar 2015 9:33] Roel Van de Paar
Another testcase

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@global.innodb_thread_concurrency=1;
SELECT table_catalog, table_schema, table_name, column_name FROM information_schema.columns WHERE table_catalog IS NULL OR table_catalog <> 'def';
HELP '%function_2';
[24 Mar 2015 9:34] Roel Van de Paar
Last testcase seems to indicate it's not TZ related.
[24 Mar 2015 9:40] Shane Bester
what seems important is that some sql queries cause innodb tables in the mysql database to be accessed!
[24 Mar 2015 11:02] Umesh Shastry
Hello Roel,

Thank you for the report and test case.
Observed similar behavior with 5.7.8(debug/release builds).

Thanks,
Umesh
[24 Mar 2015 11:04] Umesh Shastry
// 5.7.8 - pmp 

[umshastr@hod03]~/bugs: ./pmp.sh
     10 __io_getevents_0_4,os_aio_linux_collect,os_aio_linux_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_worker_thread,start_thread,clone
      1 sigwait,signal_hand,pfs_spawn_thread,start_thread,clone
      1 sigwaitinfo,timer_notify_thread_func,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_purge_coordinator_suspend,srv_purge_coordinator_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_resize_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_dump_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,my_cond_wait,inline_mysql_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,pc_sleep_if_needed,buf_flush_page_cleaner_coordinator,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,lock_wait_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,dict_stats_thread,start_thread,clone
      1 poll,vio_io_wait,vio_socket_io_wait,vio_ssl_read,net_read_raw_loop,net_read_packet_header,net_read_packet,my_net_read,Protocol_classic::read_packet,Protocol_classic::get_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1 poll,Mysqld_socket_listener::listen_for_connection_event,Connection_acceptor<Mysqld_socket_listener>::connection_event_loop,mysqld_main,main
      1 nanosleep,os_thread_sleep,srv_master_sleep,srv_master_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,search_topics,mysqld_help,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1
[umshastr@hod03]~/bugs: ./pmp.sh
     10 __io_getevents_0_4,os_aio_linux_collect,os_aio_linux_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      3 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_worker_thread,start_thread,clone
      1 sigwaitinfo,timer_notify_thread_func,pfs_spawn_thread,start_thread,clone
      1 pthread_join,my_thread_join,mysqld_main,main
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,srv_purge_coordinator_suspend,srv_purge_coordinator_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_resize_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,os_event::wait,os_event::wait_low,os_event_wait_low,buf_dump_thread,start_thread,clone
      1 pthread_cond_wait@@GLIBC_2.3.2,native_cond_wait,my_cond_wait,inline_mysql_cond_wait,Global_THD_manager::wait_till_no_thd,close_connections,signal_hand,pfs_spawn_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,pc_sleep_if_needed,buf_flush_page_cleaner_coordinator,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,lock_wait_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,os_event::timed_wait,os_event::wait_time_low,os_event_wait_time_low,dict_stats_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_master_sleep,srv_master_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,search_topics,mysqld_help,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1 nanosleep,os_thread_sleep,srv_conc_enter_innodb_with_atomics,srv_conc_enter_innodb,innobase_srv_conc_enter_innodb,ha_innobase::index_read,ha_innobase::index_first,ha_innobase::rnd_next,handler::ha_rnd_next,rr_sequential,join_init_read_record,sub_select,do_select,JOIN::exec,handle_query,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone
      1
[8 Apr 2015 19:02] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.8, 5.8.0 releases, and here's the changelog entry:

When "innodb_thread_concurrency=1", queries on intrinsic temporary tables
caused other sessions to hang. Queries on intrinsic temporary tables
should not increment the number of active threads. 

Thank you for the bug report.
[12 Jun 2015 9:34] Roel Van de Paar
For clarity, this issue is still present in the RC (which is 5.7.7)