MySQL Bugs: #91078: Queries stuck in 'query end' forever

Bug #91078	Queries stuck in 'query end' forever
Submitted:	30 May 2018 13:37	Modified:	20 Nov 2023 7:39
Reporter:	Fernando Ipar (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.7.21	OS:	Red Hat (RHEL 6.7)
Assigned to:		CPU Architecture:	x86

Description:
Sometimes connections get stuck in the 'query end' state and the only way to remove them is to restart the server (kill won't work). 

The queries seen have been simple joins and once a simple single-table primary key look-up select. 
The tables are InnoDB. 

Here's how the queries are seen in 'show processlist': 

     Id: 193287092
   User: <redacted>
   Host: <redacted>
     db: <redacted>
Command: Query
   Time: 1267816
  State: query end
   Info: SELECT * from <table> where <table.pk> = <value>

There is no corresponding transaction shown as ACTIVE in 'show engine innodb status' (to explain, for the example shown above, only one transaction was seen as ACTIVE, inserting for 1 second). 

Here are two stack traces corresponding to connections seen in this state, taken before restarting mysqld: 

Thread 64 (Thread 0x7f2ca353b700 (LWP 13021)):
#0  0x0000003e0600f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000010c3030 in os_thread_sleep (tm=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/os/os0thread.cc:279
#2  0x000000000105939b in wait (trx=0x7f387595c040, disable=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1557
#3  TrxInInnoDB::enter (trx=0x7f387595c040, disable=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1477
#4  0x00000000010479c1 in TrxInInnoDB (this=0x7f284088d858, thd=0x7f28bc7421e0) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1362
#5  ha_innobase::update_thd (this=0x7f284088d858, thd=0x7f28bc7421e0) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/handler/ha_innodb.cc:2836
#6  0x0000000001049ada in ha_innobase::external_lock (this=0x7f284088d858, thd=0x7f28bc7421e0, lock_type=2) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/handler/ha_innodb.cc:15464
#7  0x000000000081802a in handler::ha_external_lock (this=0x7f284088d858, thd=0x7f28bc7421e0, lock_type=2) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/handler.cc:7908
#8  0x0000000000e5f65f in QUICK_RANGE_SELECT::~QUICK_RANGE_SELECT (this=0x7f28bc3aa280, __in_chrg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/opt_range.cc:1587
#9  0x0000000000e5fe01 in QUICK_RANGE_SELECT::~QUICK_RANGE_SELECT (this=0x7f28bc3aa280, __in_chrg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/opt_range.cc:1597
#10 0x0000000000e5c655 in delete_elements (this=0x7f2840a40f70, __in_chrg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_list.h:565
#11 QUICK_ROR_INTERSECT_SELECT::~QUICK_ROR_INTERSECT_SELECT (this=0x7f2840a40f70, __in_chrg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/opt_range.cc:1917
#12 0x0000000000e5c6f1 in QUICK_ROR_INTERSECT_SELECT::~QUICK_ROR_INTERSECT_SELECT (this=0x7f2840a40f70, __in_chrg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/opt_range.cc:1923
#13 0x0000000000d528c4 in QEP_TAB::cleanup (this=0x7f284088cab8) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_select.cc:2394
#14 0x0000000000d52a29 in JOIN::destroy (this=0x7f2860a2fa70) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_select.cc:938
#15 0x0000000000d963a8 in st_select_lex::cleanup (this=0x7f28bc4f1310, full=true) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_union.cc:1078
#16 0x0000000000d962db in st_select_lex_unit::cleanup (this=0x7f28bc4f15f0, full=true) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_union.cc:923
#17 0x0000000000d11057 in mysql_execute_command (thd=0x7f28bc7421e0, first_level=true) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:5015
#18 0x0000000000d1612d in mysql_parse (thd=0x7f28bc7421e0, parser_state=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:5582
#19 0x0000000000d1790e in dispatch_command (thd=0x7f28bc7421e0, com_data=0x7f2ca353ad60, command=COM_QUERY) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:1458
#20 0x0000000000d18264 in do_command (thd=0x7f28bc7421e0) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:999
#21 0x0000000000dec77c in handle_connection (arg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/conn_handler/connection_handler_per_thread.cc:300
#22 0x0000000000fc9354 in pfs_spawn_thread (arg=0x4fb2e1d0) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/perfschema/pfs.cc:2190
#23 0x0000003e06007aa1 in start_thread () from /lib64/libpthread.so.0
#24 0x0000003e05ce893d in clone () from /lib64/libc.so.6

Thread 65 (Thread 0x7f2c5f3af700 (LWP 8118)):
#0  0x0000003e0600f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000010c3030 in os_thread_sleep (tm=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/os/os0thread.cc:279
#2  0x000000000105939b in wait (trx=0x7f387595b590, disable=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1557
#3  TrxInInnoDB::enter (trx=0x7f387595b590, disable=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1477
#4  0x000000000104625e in TrxInInnoDB (hton=0x36edeb0, thd=0x7f28be5ad600, commit_trx=false) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/include/trx0trx.h:1362
#5  innobase_commit (hton=0x36edeb0, thd=0x7f28be5ad600, commit_trx=false) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/innobase/handler/ha_innodb.cc:4311
#6  0x0000000000820a67 in ha_commit_low (thd=0x7f28be5ad600, all=false, run_after_commit=true) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/handler.cc:1900
#7  0x0000000000ef2f5e in MYSQL_BIN_LOG::commit (this=0x1e84160, thd=0x7f28be5ad600, all=false) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/binlog.cc:8314
#8  0x0000000000822414 in ha_commit_trans (thd=0x7f28be5ad600, all=false, ignore_global_read_lock=false) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/handler.cc:1796
#9  0x0000000000dcfb52 in trans_commit_stmt (thd=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/transaction.cc:458
#10 0x0000000000d112ec in mysql_execute_command (thd=0x7f28be5ad600, first_level=true) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:5004
#11 0x0000000000d1612d in mysql_parse (thd=0x7f28be5ad600, parser_state=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:5582
#12 0x0000000000d1790e in dispatch_command (thd=0x7f28be5ad600, com_data=0x7f2c5f3aed60, command=COM_QUERY) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:1458
#13 0x0000000000d18264 in do_command (thd=0x7f28be5ad600) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/sql_parse.cc:999
#14 0x0000000000dec77c in handle_connection (arg=<value optimized out>) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/sql/conn_handler/connection_handler_per_thread.cc:300
#15 0x0000000000fc9354 in pfs_spawn_thread (arg=0x4fd7fb10) at /rpm/BUILD/mysql-5.7.21/mysql-5.7.21/storage/perfschema/pfs.cc:2190
#16 0x0000003e06007aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x0000003e05ce893d in clone () from /lib64/libc.so.6

- Group Replication is not in use
- XA transactions are not used
- sync_binlog is 0 
- innodb_support_xa is 1

How to repeat:
I have been unable to reproduce in a non-production environment, but mysqld had to be restarted at least twice in the last few weeks because of this, as while otherwise harmless, connections stuck in that state hold the MDL on the used tables, so it does not allow schema changes to be made. 

Suggested fix:
Don't know. As a workaround, we may try 5.7.10 as that seems to be the last version where the blocking code is not used. Does that seem correct?

Hi,

Thank you for your report. This is an obvious case of typical query cache behaviour. Turn it off and your problems will go away.

Hello, 

Thank you for your response. 

The server is started with query_cache_type set to 0 which according to the manual should avoid acquiring the query cache mutex (https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_query_cache_ty...). 

Do you think we should also set query_cache_size explicitly to 0 here to help with this problem?

Thanks.

Hi,

Yes, that would be wise.

If that does not help, turn your attention to other settings. Next, fragmented OS memory can cause the problem, as well as fragmented disk partition. If you have binary log and other logs, that could be also a problem. With high number of concurrent transactions, log writing can become a bottleneck. This can all be tuned , as described in our manual.

Ok, thanks for the additional info. 

If you agree, I would suggest to at least link 75161 to this one. I understand that if you believe this is the result of a misconfiguration it is ok to resolve this specific case as 'Not a bug', but the fact that a connection can get stuck in an "unkillable" state while holding  the MDL on a table, even when the corresponding client connection is long gone is something that should be considered a bug. 

Thanks.

Hi,

We will consider your suggestion  regarding duplicate reports.

However, when it comes to queries being in status where they are not killable, that is documented in our manual. Simply, this is not like killing a process in Unix. Query can be killed only when it is in such a state that all memory is released, files closed, etc, etc ......  Hence, query can be quite long in a state where it can not be killed.

So the actual problem is: "Query end" status has too many different meanings / potential causes, making it the equivalent of the famous Dennis Ritchie Question Mark?

I think in the case of this specific report the problem is that the threads seem to be waiting on a function that may never return the required, and that perhaps they should not have been waiting in the first place (is_forced_rollback, https://github.com/mysql/mysql-server/blob/mysql-5.7.21/storage/innobase/include/trx0trx.h...).

In my last comment, please s/return the required/return the required result/

Actually, no.

For query to be in that state , all the results were already sent. That is a stage for the cleaning up, including logging , clearing entries in the query cache, etc , etc ....... Yes, many different tasks.

Yes, I understand that the query or queries in the connection had completed as there is none seen in processlist, nor is there an active transaction in InnoDB status. What I meant with 'may never return the required result' is the is_forced_rollback function, which is what the thread was waiting on in a loop.

Sinisa: and the only description of that state in the manual is 

"This state occurs after processing a query but before the freeing items state."

So if a query gets stuck in that state it's really just "something's wrong", and the fact that you at first claimed

"This is an obvious case of typical query cache behaviour. Turn it off and your problems will go away."

and the rowed back to a list of "or this or that or that" clearly shows ... ;)

Hartmut, my dear friend,

Yes, you are correct, but query cache is deprecated already, so we do not need to discuss it.

Hi Fernando,

You are correct that this is not a query cache bug. However, the two threads that you have presented are not in conflict, since those are both waiting.

You should inspect your settings and especially pay attention whether you have utilised FTWRL or some other external lock.

In short, we do not have enough info in order to verify this bug. A repeatable test case, involving in your case multi-thread operation, would be enough.

Hello Sinisia, 

Thanks for confirming it's not query cache related. 

I have been unable to reproduce, will update the bug report if I can do that, or if this happens again and I can produce different traces or any other additional debug from that.

I can't reliably reproduce this either, but I can confirm it's still a bug in 5.7.21
Here's some data that might help:

mysql> select * from information_schema.processlist where id=288978020\G
*************************** 1. row ***************************
           ID: 288978020
         USER: [redacted]
         HOST: [redacted]:46303
           DB: [redacted]
      COMMAND: Killed
         TIME: 1083534
        STATE: query end
         INFO: SELECT  * FROM [table] WHERE  [index_col] = [scalar]
      TIME_MS: 1083533853
    ROWS_SENT: 1
ROWS_EXAMINED: 1
1 row in set (0.02 sec)

mysql> select * from performance_schema.threads where processlist_id=288978020\G
*************************** 1. row ***************************
          THREAD_ID: 288978056
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 288978020
   PROCESSLIST_USER: [redacted]
   PROCESSLIST_HOST: [redacted]
     PROCESSLIST_DB: [redacted]
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 1083545
  PROCESSLIST_STATE: query end
   PROCESSLIST_INFO: SELECT  * FROM [table] WHERE  [index_col] = [int_scalar]
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: TCP/IP
       THREAD_OS_ID: 56599
1 row in set (0.01 sec)

-- structure of the table being selected from:
CREATE TABLE `table` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `index_col` bigint(20) NOT NULL,
  `other_col` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx` (`index_col`),
  CONSTRAINT `fk` FOREIGN KEY (`index_col`) REFERENCES `table2` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=333789 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

#0  0x00002ac852fa100d in nanosleep () from /lib64/libpthread.so.0
#1  0x0000000001045090 in os_thread_sleep (tm=<value optimized out>)
    at /[buildpath]/storage/innobase/os/os0thread.cc:303
#2  0x0000000000fd1282 in wait (trx=0x2ac87ea358e0, disable=<value optimized out>)
    at /[buildpath]/storage/innobase/include/trx0trx.h:1594
#3  TrxInInnoDB::enter (trx=0x2ac87ea358e0, disable=<value optimized out>)
    at /[buildpath]/storage/innobase/include/trx0trx.h:1514
#4  0x0000000000fc19cc in TrxInInnoDB (hton=0x2ac8567ab1e0, thd=0x2ae398ece100, commit_trx=false)
    at /[buildpath]/storage/innobase/include/trx0trx.h:1399
#5  innobase_commit (hton=0x2ac8567ab1e0, thd=0x2ae398ece100, commit_trx=false)
    at /[buildpath]/storage/innobase/handler/ha_innodb.cc:4936
#6  0x0000000000808b57 in ha_commit_low (thd=0x2ae398ece100, all=false, run_after_commit=true)
    at /[buildpath]/sql/handler.cc:1933
#7  0x0000000000ec12a9 in MYSQL_BIN_LOG::commit (this=0x1d7a8a0, thd=0x2ae398ece100, all=false)
    at /[buildpath]/sql/binlog.cc:8947
#8  0x000000000080aa64 in ha_commit_trans (thd=0x2ae398ece100, all=false, ignore_global_read_lock=false)
    at /[buildpath]/sql/handler.cc:1829
#9  0x0000000000da9ff2 in trans_commit_stmt (thd=<value optimized out>) at /[buildpath]/sql/transaction.cc:458
#10 0x0000000000ce665f in mysql_execute_command (thd=0x2ae398ece100, first_level=true)
    at /[buildpath]/sql/sql_parse.cc:5293
#11 0x0000000000cebb0d in mysql_parse (thd=0x2ae398ece100, parser_state=<value optimized out>)
    at /[buildpath]/sql/sql_parse.cc:5901
#12 0x0000000000cecd6c in dispatch_command (thd=0x2ae398ece100, com_data=0x2ae8b063dc30, command=COM_QUERY)
    at /[buildpath]/sql/sql_parse.cc:1490
#13 0x0000000000cedc64 in do_command (thd=0x2ae398ece100) at /[buildpath]/sql/sql_parse.cc:1021
#14 0x0000000000dc911c in handle_connection (arg=<value optimized out>)
    at /[buildpath]/sql/conn_handler/connection_handler_per_thread.cc:312
#15 0x0000000000f34ef1 in pfs_spawn_thread (arg=0x2ae1667f6e20) at /[buildpath]/storage/perfschema/pfs.cc:2190
#16 0x00002ac852f99aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00002ac854cf1bcd in clone () from /lib64/libc.so.6

And query cache is disabled, though that was ruled out above.

Hi,

From what the stacktrace shows, it does seem that some external lock is applied in your application(s).

One transaction is going for the rollback and it is waiting on the other transaction, which is doing the same. However, there is an external lock on the second transaction, which leads entire scenario to the wait.

You should make sure that there is no FTWRL or LOCK TABLES issued and you should have  no problems.

Also, try 5.7.25.

there was no ftwrl or lock tables (nor access to do so from clients, and no admin tools that were running execute such) - wish it were that simple.
hoping it's fixed by .25 - you have a release note item to reference which might be it?
we had to skip .22-.24 due to that regression re crashes on binlog activity and the rollout of .25 is being done cautiously to confirm that fix, so the vast majority of servers are still running .21, I'd like to assume it will not recur past that but really need some reference to where it was fixed would give confidence.

Hi,

We can not confirm nor deny whether this behaviour is fixed or not.

For the starters, we do not know whether what you report is bug or not ....

In order to analyse it at all, we need a fully repeatable test case. We can not proceed any step further without it.

That means, the entire scenario. Settings, tables, their contents, the concurrent statements that were run, etc ...

We encountered the same issue in mysql 5.7.20 recently,

pstack and pt-pmp

     95 pthread_cond_wait,Per_thread_connection_handler::block_until_new_connection,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
     10 pthread_cond_wait,os_event_wait_low,os_aio_handler,fil_aio_wait,io_handler_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      8 pthread_cond_wait,pop_jobs_item,slave_worker_exec_job_group,handle_slave_worker,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 pthread_cond_timedwait,MDL_wait::timed_wait,TABLE_SHARE::wait_for_old_version,close_cached_tables,reload_acl_and_cache,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      3 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,net_read_raw_loop,net_read_packet,my_net_read,Protocol_classic::read_packet,Protocol_classic::get_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 sigwaitinfo(libc.so.6),timer_notify_thread_func,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event_wait_low,srv_purge_coordinator_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event_wait_low,buf_resize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,os_event_wait_low,buf_dump_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,Event_queue::cond_wait,Event_queue::get_top_for_execution_if_time,Event_scheduler::run,event_scheduler_thread,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_error_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,lock_wait_timeout_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,dict_stats_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,MYSQL_BIN_LOG::wait_for_update_relay_log,handle_slave_sql,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 pthread_cond_timedwait,MYSQL_BIN_LOG::wait_for_update_bin_log,Binlog_sender::send_binlog,Binlog_sender::run,mysql_binlog_send,com_binlog_dump_gtid,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,vio_read_buff,net_read_raw_loop,net_read_packet,my_net_read,cli_safe_read_with_ok,handle_slave_io,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 poll(libc.so.6),Mysqld_socket_listener::listen_for_connection_event,mysqld_main,__libc_start_main(libc.so.6),_start
      1 poll(libc.so.6),Ack_receiver::run,ack_receive_handler,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 nanosleep(libpthread.so.0),os_thread_sleep,srv_master_thread,start_thread(libpthread.so.0),clone(libc.so.6)
      1 nanosleep(libpthread.so.0),os_thread_sleep,ha_innobase::store_lock,mysql_lock_tables,lock_tables,handle_query,execute_sqlcom_select,mysql_execute_command,Prepared_statement::execute,Prepared_statement::execute_loop,mysqld_stmt_execute,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

thead 11807874 get stuck in InnoDB, sleep forever, and can not be killed, the full stack is

(gdb) bt
#0  0x00002ad51d4af00d in nanosleep () from /lib64/libpthread.so.0
#1  0x0000000000f5acc0 in os_thread_sleep (tm=Unhandled dwarf expression opcode 0xf3) at storage/innobase/os/os0thread.cc:279
#2  0x0000000000edfd8c in wait (this=0x2ad674ec6830, thd=0x2ad6d9112000, to=Unhandled dwarf expression opcode 0xf3) at storage/innobase/include/trx0trx.h:1557
#3  enter (this=0x2ad674ec6830, thd=0x2ad6d9112000, to=Unhandled dwarf expression opcode 0xf3) at storage/innobase/include/trx0trx.h:1477
#4  TrxInInnoDB (this=0x2ad674ec6830, thd=0x2ad6d9112000, to=Unhandled dwarf expression opcode 0xf3) at storage/innobase/include/trx0trx.h:1362
#5  ha_innobase::store_lock (this=0x2ad674ec6830, thd=0x2ad6d9112000, to=Unhandled dwarf expression opcode 0xf3) at storage/innobase/handler/ha_innodb.cc:16321
#6  0x0000000000da815d in get_lock_data (thd=Unhandled dwarf expression opcode 0xf3) at sql/lock.cc:743
#7  mysql_lock_tables (thd=Unhandled dwarf expression opcode 0xf3) at sql/lock.cc:320
#8  0x0000000000c4434d in lock_tables (thd=0x2ad6d9112000, tables=0x2ad6f46fad78, count=Unhandled dwarf expression opcode 0xf3) at sql/sql_base.cc:6632
#9  0x0000000000cd4ef9 in handle_query (thd=0x2ad6d9112000, lex=0x2ad707074030, result=0x2ad6f46fb9b0, added_options=Unhandled dwarf expression opcode 0xf3) at sql/sql_select.cc:151
#10 0x000000000076374f in execute_sqlcom_select (thd=0x2ad6d9112000, all_tables=Unhandled dwarf expression opcode 0xf3) at sql/sql_parse.cc:5395
#11 0x0000000000c99b8d in mysql_execute_command (thd=Unhandled dwarf expression opcode 0xf3) at sql/sql_parse.cc:2919
#12 0x0000000000cc4309 in Prepared_statement::execute (this=0x2ad7072d9380, expanded_query=Unhandled dwarf expression opcode 0xf3) at sql/sql_prepare.cc:3962
#13 0x0000000000cc718a in Prepared_statement::execute_loop (this=0x2ad7072d9380, expanded_query=0x2ad6d8f09cd0, open_cursor=Unhandled dwarf expression opcode 0xf3) at sql/sql_prepare.cc:3569
#14 0x0000000000cc7464 in mysqld_stmt_execute (thd=0x2ad6d9112000, stmt_id=Unhandled dwarf expression opcode 0xf3) at sql/sql_prepare.cc:2573
#15 0x0000000000c9e7e2 in dispatch_command (thd=0x2ad6d9112000, com_data=0x2ad6d8f0a590, command=COM_STMT_EXECUTE) at sql/sql_parse.cc:1506
#16 0x0000000000c9f1e7 in do_command (thd=0x2ad6d9112000) at sql/sql_parse.cc:1052
#17 0x0000000000d5ca78 in handle_connection (arg=Unhandled dwarf expression opcode 0xf3
) at sql/conn_handler/connection_handler_per_thread.cc:301
#18 0x00000000011ad9b1 in pfs_spawn_thread (arg=0x2ad51f6c8c20) at storage/perfschema/pfs.cc:2190
#19 0x00002ad51d4a7aa1 in start_thread () from /lib64/libpthread.so.0

TrxInInnoDB::is_forced_rollback is always true
```
(gdb) p trx->in_innodb
$1 = 4294967295
(gdb) p/t trx->in_innodb
$2 = 11111111111111111111111111111111
```

IMHO, this is a bug somehow.

mysql> show processlist;
+----------+-----------------+----------------------+-------+------------------+---------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id       | User            | Host                 | db    | Command          | Time    | State                                                         | Info                                                                                                 |
+----------+-----------------+----------------------+-------+------------------+---------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|       59 | replicator      | 11.195.191.218:53448 | NULL  | Binlog Dump GTID | 8090020 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                                                 |
|       63 | event_scheduler | localhost            | NULL  | Daemon           | 8090020 | Waiting on empty queue                                        | NULL                                                                                                 |
|       66 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for master to send event                              | NULL                                                                                                 |
|       67 | system user     |                      | NULL  | Connect          |    7075 | Slave has read all relay log; waiting for more updates        | NULL                                                                                                 |
|       68 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       69 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       70 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       71 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       72 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       73 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       74 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|       75 | system user     |                      | NULL  | Connect          | 8090020 | Waiting for an event from Coordinator                         | NULL                                                                                                 |
|   275586 | root          | 11.192.196.223:41278 | NULL  | Sleep            |   53129 |                                                               | NULL                                                                                                 |
|   275621 | root     | 127.0.0.1:53608      | NULL  | Sleep            |      36 |                                                               | NULL                                                                                                 |
|   275697 | root     | 127.0.0.1:57896      | NULL  | Query            |   52815 | Waiting for table flush                                       | FLUSH NO_WRITE_TO_BINLOG TABLES                                                                      |
|   285038 | root     | 127.0.0.1:48462      | NULL  | Query            |   20718 | Waiting for table flush                                       | FLUSH NO_WRITE_TO_BINLOG TABLES                                                                      |
|   288950 | root     | 127.0.0.1:54915      | NULL  | Query            |    7293 | Waiting for table flush                                       | FLUSH NO_WRITE_TO_BINLOG TABLES                                                                      |
|   291039 | root     | 127.0.0.1:56281      | NULL  | Query            |       0 | starting                                                      | show processlist                                                                                     |
| 11807874 | w_user     | 192.168.10.31:39458  | weike | Killed           | 4871128 | init                                                          | SELECT name, round(SUM(num),2) as num FROM `weike_lock_log` WHERE  (  uid = 7340 AND status = 1 ) GR |
| 13660561 | root     | 127.0.0.1:54825      | NULL  | Sleep            |       6 |                                                               | NULL                                                                                                 |
+----------+-----------------+----------------------+-------+------------------+---------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+

this is the result of show processlist, the culprit thread is 11807874.

Thank you for the info.

As I wrote before, we do need a fully repeatable test case.

I think i have a workaround. I have explicity set in my.cnf:

query_cache_type=0
query_cache_size=0

and the problem did not arise in the last 14 days. So I think the problem is in query cache, and query cache is not turned off completely in the default MySQL 5.7. setting.

My version is 5.7.26

Hi,

As I wrote above, query cache is deprecated. Deprecation process involves three steps, of which the last one is total removal, which occurred in 8.0.

Anyway, it is at least five years since everybody knows that this implementation of the query cache is a huge bottleneck.

Sadly, I have to take my statement back. Adjusting query cache settings DID NOT solve the problem. It happened again, after two weeks of uptime :(

Set query_cache_limit to 256 or 128 K.

I also set query_cache_type=0 & query_cache_size=0 in my.cnf.
My version is 5.7.24.
This problem has appeared twice now.
I don't know how to solve it.

does anybody who encounters this bug have stored routines, stored functions, triggers that are frequently used?

It's a bug as in debug builds server will assert if in_innodb and in_depth will go negative.  Release builds don't have that assert so will end up hanging.

Yes, we are using about 200 triggers, 1 stored procedure and 1 task in event scheduler "ON SCHEDULE EVERY 2 SECOND".

We are having the same problem.
Query cache is not relevant :

mysql> SHOW VARIABLES LIKE '%query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| have_query_cache             | YES     |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
6 rows in set (0.00 sec)

mysql> SELECT VERSION();
+------------+
| VERSION()  |
+------------+
| 5.7.28-log |
+------------+

Small query is stuck for many hours for no apprent reason.
DB is doing nothing....
tried killing it and it is simply stuck on Command=Killed, Time=25680, State=query end.

No transtactions are involved

------------
TRANSACTIONS
------------
Trx id counter 184986441618
Purge done for trx's n:o < 184986440552 undo n:o < 0 state: running but idle
History list length 31
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422080922445648, not started estimating records in index range
mysql tables in use 5, locked 0
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 422080922446560, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 422080922448384, not started
0 lock struct(s), heap size 1136, 0 row lock(s)

Hi,

This is a real bug,  and there is one fix in 8.0.22 and 5.7.32 that *may* solve it.  Please wait for that release,  and search the change history notes for Bug #30594501 .  It is related to previously fixed Bug #27325898 .

regards,
sbester

see also:
https://bugs.mysql.com/bug.php?id=110652

HI,

All we can do is to recommend the upgrade.

Query cache is not maintained any more. 

It is far better to rely on InnoDB's AHI (Adaptive Hash Index) then on the query cache, which is obsolete now.

What I find puzzling here is that the symptom of in_innodb==-1 looks like the known bug in innobase_commit_by_xid and innobase_rollback_by_xid, but the original report claims that XA wasn't used, so I am not sure what could cause these functions to be called.

Yes, none of the stacktraces show either  innobase_commit_by_xid and neither innobase_rollback_by_xid.

Hence, this may not be the same bug.

Let us not forget that this is 5.7.21, which is now deprecated.