| Bug #97523 | Infinite loop on query end re TrxInInnoDB::wait() | ||
|---|---|---|---|
| Submitted: | 6 Nov 2019 21:02 | Modified: | 7 Dec 2019 16:55 | 
| Reporter: | Trey Raymond | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) | 
| Version: | 5.7.26 | OS: | Any | 
| Assigned to: | CPU Architecture: | Any | |
| Tags: | hang, loop, mutex | ||
   [6 Nov 2019 21:02]
   Trey Raymond        
  Thread 4 (Thread 0x7fde860ff700 (LWP 212096)):
#0  0x00007ffb6799200d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000010a33f0 in os_thread_sleep (tm=<value optimized out>)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/os/os0thread.cc:303
#2  0x00000000010254d2 in wait (trx=0x7ffa66801a90, disable=<value optimized out>)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/include/trx0trx.h:1815
#3  TrxInInnoDB::enter (trx=0x7ffa66801a90, disable=<value optimized out>)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/include/trx0trx.h:1735
#4  0x000000000101c25a in TrxInInnoDB (this=0x7f85a1cf2230, thd=0x7f85ba070e00, lock_type=2)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/include/trx0trx.h:1620
#5  update_thd (this=0x7f85a1cf2230, thd=0x7f85ba070e00, lock_type=2)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/handler/ha_innodb.cc:3361
#6  ha_innobase::external_lock (this=0x7f85a1cf2230, thd=0x7f85ba070e00, lock_type=2)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/innobase/handler/ha_innodb.cc:17159
#7  0x0000000000806b5a in handler::ha_external_lock (this=0x7f85a1cf2230, thd=0x7f85ba070e00, lock_type=2)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/handler.cc:8444
#8  0x0000000000e1c415 in unlock_external (thd=0x7f85ba070e00, sql_lock=0x7f8585786d90)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/lock.cc:667
#9  mysql_unlock_tables (thd=0x7f85ba070e00, sql_lock=0x7f8585786d90) at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/lock.cc:416
#10 0x0000000000c9408f in close_thread_tables (thd=0x7f85ba070e00) at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_base.cc:1732
#11 0x0000000000ceb428 in mysql_execute_command (thd=0x7f85ba070e00, first_level=false)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_parse.cc:5326
#12 0x0000000000c6a53d in sp_instr_stmt::exec_core (this=<value optimized out>, thd=<value optimized out>, nextp=0x7fde860fd228)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_instr.cc:1057
#13 0x0000000000c6aa6c in sp_lex_instr::reset_lex_and_exec_core (this=0x7f85852fcfd8, thd=0x7f85ba070e00, nextp=0x7fde860fd228, open_tables=false)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_instr.cc:446
#14 0x0000000000c6c544 in sp_lex_instr::validate_lex_and_execute_core (this=0x7f85852fcfd8, thd=0x7f85ba070e00, nextp=0x7fde860fd228, open_tables=false)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_instr.cc:748
#15 0x0000000000c6d2c8 in sp_instr_stmt::execute (this=0x7f85852fcfd8, thd=0x7f85ba070e00, nextp=0x7fde860fd228)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_instr.cc:944
#16 0x0000000000c65925 in sp_head::execute (this=0x7f8583cfb030, thd=0x7f85ba070e00, merge_da_on_success=true)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_head.cc:790
#17 0x0000000000c664fb in sp_head::execute_procedure (this=0x7f8583cfb030, thd=<value optimized out>, args=0x7f85ba0735f8)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sp_head.cc:1526
---Type <return> to continue, or q <return> to quit---
#18 0x0000000000ceccd0 in mysql_execute_command (thd=0x7f85ba070e00, first_level=true)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_parse.cc:4835
#19 0x0000000000cf074d in mysql_parse (thd=0x7f85ba070e00, parser_state=<value optimized out>, update_userstat=false)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_parse.cc:5905
#20 0x0000000000cf1fb8 in dispatch_command (thd=0x7f85ba070e00, com_data=0x7fde860fec30, command=COM_QUERY)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_parse.cc:1532
#21 0x0000000000cf2964 in do_command (thd=0x7f85ba070e00) at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_parse.cc:1053
#22 0x0000000000dcf1a4 in handle_connection (arg=<value optimized out>)
    at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/conn_handler/connection_handler_per_thread.cc:318
#23 0x0000000000f3e761 in pfs_spawn_thread (arg=0x7f87478c1a20) at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/storage/perfschema/pfs.cc:2190
#24 0x00007ffb6798aaa1 in start_thread () from /lib64/libpthread.so.0
#25 0x00007ffb65c81bdd in clone () from /lib64/libc.so.6
 
   [7 Nov 2019 16:55]
   MySQL Verification Team        
  Thank you for the bug report. #10 0x0000000000c9408f in close_thread_tables (thd=0x7f85ba070e00) at /var/builds/workspace/155674-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/x86_64-linux-gcc/percona-server-5.7.26-29/sql/sql_base.cc:1732 This bug system is for MySQL Oracle product, Percona server isn't supported here please use binary/source for MySQL Oracle. Thanks.
   [8 Dec 2019 1:00]
   Bugs System        
  No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
   [27 Jan 2021 5:59]
   MySQL Verification Team        
  Please try 5.7.32 where I hope it's fixed. See more info on https://bugs.mysql.com/bug.php?id=88757
   [27 Jan 2021 9:53]
   zhai weixiang        
  I think this is possiblely duplicate of the bug#99643 I reported last year
   [27 Jan 2021 10:18]
   MySQL Verification Team        
  so this is the fix i hope solved this, but not sure. (we haven't had any new reports on versions containing this fix). https://github.com/mysql/mysql-server/search?q=30594501&type=commits


Description: This thread is stuck in an unchanging state of 'closing tables' for almost a day: Command: Query Time: 75777 State: closing tables Info: select straight_join concat("insert into ",if(p.database='',"db_tmp_all","db_tmp")," (host,db,user,",group_concat(a.column separator ','),") VALUES ('",var_host,"','",substitute_database_name(p.database),"','",u.username,"',",group_concat("'Y'" separator ','),");") from users u join user_privileges_database p on p.user_id=u.id join available_privileges_database a on p.privilege=a.privilege where u.id=var_user and find_in_set(var_version,a.version)>0 and substitute_database_name(p.database) is not null group by p.database Rows_sent: 46926 Rows_examined: 181625 From perf schema, it's not doing much, but count on wait/synch/mutex/innodb/trx_mutex is rapidly rising. 1257354 as of that 75K seconds. For context: - the statement is inside a stored procedure - the substitute_database_name is an sql function, NO SQL + DETERMINISTIC, just some string operations - there is no explicit transaction, the statement is AC-NL-RO and should be running in that context Stack trace for the thread will be pasted below, too long for form. Some vars: (gdb) p (trx->in_innodb & TRX_FORCE_ROLLBACK)>0 $12 = true (gdb) p/x trx->in_innodb $13 = 0xffffffff (gdb) p/x TRX_FORCE_ROLLBACK $14 = 0x80000000 Notes: - Per mysql, the transaction is not started re innodb status, and last state committed from perf schema, nothing active for this thread. - The thread is stuck in an infinite loop unlocking and relocking the mutex with a 100ms sleep, here: https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/include/trx0trx.h#L1553-L1... - It is unkillable in this state, only restarting the server can clear it - AC-NL-RO statements should never be in this code path at all per https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/trx/trx0trx.cc#L1957 - 'in_innodb' has all 32 bits set to 1 here, which should not be possible. -- most of them are used as a counter, ref: --- https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/include/trx0trx.h#L916-L91... --- https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/include/trx0trx.h#L1496 -- the high bits are flags, ref https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/include/trx0types.h#L63-L7... - with all bits set to 1, the while condition will always return true, per https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/include/trx0trx.h#L1536-L1... How to repeat: This is tough. It recurs somewhat regularly, and on more than one server, but I can't yet trigger a recurrence. I need to work around this in production (going to move the loop around that query to an explicit RO trx, should hide the problem) but we've collected a lot of info here. Suggested fix: this raises a few points: - how did an ac-nl-ro trx get to this state in the first place? something with SP handling different from normal selects? or issue parsing the function? - why is trx->in_innodb overflowed? is it safe to use as a counter+bitmap combo, and if so how could that get so high? - is the while supposed to be checking bit TRX_FORCE_ROLLBACK, instead of TRX_FORCE_ROLLBACK_ASYNC ?