Bug #56696 | server stalls during truncate table for innodb | ||
---|---|---|---|
Submitted: | 9 Sep 2010 17:10 | Modified: | 29 Aug 2012 4:41 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S2 (Serious) |
Version: | 5.1.47 | OS: | Any |
Assigned to: | Jimmy Yang | CPU Architecture: | Any |
Tags: | DDL, innodb, stalls |
[9 Sep 2010 17:10]
Mark Callaghan
[9 Sep 2010 17:14]
Mark Callaghan
To reproduce this, create 3 tables: create table t1(i int primary key) engine=innodb; create table t2(i int primary key) engine=innodb; create table t3(i int primary key) engine=innodb; insert into t1 values (1), (2); insert into t2 values (1), (2); insert into t3 values (1), (2); hack mysqld by adding "sleep(30)" to row_truncate_table_for_mysql after dict_sys->mutex is locked restart hacked mysqld # this first gets t3 handler instance cached session 1: select * from t3; truncate table t1; # this blocks on dict_sys->mutex after locking LOCK_open session 2: select * from t2; # this then block on LOCK_open session 3: select * from t3;
[9 Sep 2010 17:51]
MySQL Verification Team
Thank you for the bug report.
[29 Oct 2010 3:57]
Jimmy Yang
Hello, this hang situation is resolved by LOCK_open change in 5.5, ie. MySQL no longer holds LOCK_open while opening table. stack: #0 ha_innobase::open #1 0x0832fe80 in handler::ha_open #2 0x08287d52 in open_table_from_share #3 0x081a4ab0 in open_table <=== release LOCK_open before calling open_table_from_share #4 0x081a6aee in open_and_process_table open_table() { mysql_mutex_unlock(&LOCK_open); error= open_table_from_share(...) } Verified with the repro: Session 1: mysql> select * from t3; +---+ | i | +---+ | 1 | | 2 | +---+ 2 rows in set (2.91 sec) mysql> truncate table t1; In row_truncate_table_for_mysql, after calling row_mysql_lock_data_dictionary(), loop the server: Breakpoint 3, row_truncate_table_for_mysql (table=0x91a7708, trx=0x91d6fa0) at /home/jy/work2/mysql-trunk-innodb/storage/innobase/row/row0mysql.c:2811 2811 foreign = UT_LIST_GET_FIRST(table->referenced_list); (gdb) call srv_debug_loop() <==== Let it loop [Switching to Thread 0xb2baeb70 (LWP 4902)] session 2: Opens t2, hang on dict_mutex, but no LOCK_open held mysql> select * from t2; Breakpoint 1, ha_innobase::open (this=0x91da638, name=0x91d99d8 "./test/t2", mode=2, test_if_locked=2) at /home/jy/work2/mysql-trunk-innodb/storage/innobase/handler/ha_innodb.cc:3795 3795 { The program stopped in another thread while making a function call from GDB. Evaluation of the expression containing the function (srv_debug_loop) will be abandoned. When the function is done executing, GDB will silently stop. (gdb) c Continuing. Session 3, Selection goes fine without block by LOCK_open mysql> select * from t3; +---+ | i | +---+ | 1 | | 2 | +---+ 2 rows in set (0.00 sec) Conclusion: this hang is no longer present in 5.5, resolved by MySQL LOCK_open related change.
[3 Nov 2010 23:36]
Calvin Sun
The issue is fixed in 5.5 release. Due to the complexity and risk of the fix, there is no plan to backport the fix to 5.1.
[29 Aug 2012 2:25]
Mark Callaghan
Is LOCK_open the only problem? If TRUNCATE takes a long time and dict_sys->mutex is held for the duration, won't there be other sessions stuck waiting to lock dict_sys->mutex?
[29 Aug 2012 4:41]
Mark Callaghan
After looking at the code I am much more sure that TRUNCATE TABLE will remain a problem in 5.5. I have instrumentation in 5.1 to count the time in buf_LRU_invalidate_tablespace and the number of times "goto scan_again" is hit. In some cases "goto scan_again" is hit 3000+ times and buf_LRU_invalidate_tablespace takes 200+ seconds. This only occurs for TRUNCATE because AHI entries are removed prior to reaching this function during DROP TABLE (which is what allowed DROP TABLE to be made faster). As the code for faster DROP TABLE still has the "goto scan_again" in buf_LRU_remove_all_pages I think the bug is still there.
[29 Aug 2012 4:41]
Mark Callaghan
After looking at the code I am much more sure that TRUNCATE TABLE will remain a problem in 5.5. I have instrumentation in 5.1 to count the time in buf_LRU_invalidate_tablespace and the number of times "goto scan_again" is hit. In some cases "goto scan_again" is hit 3000+ times and buf_LRU_invalidate_tablespace takes 200+ seconds. This only occurs for TRUNCATE because AHI entries are removed prior to reaching this function during DROP TABLE (which is what allowed DROP TABLE to be made faster). As the code for faster DROP TABLE still has the "goto scan_again" in buf_LRU_remove_all_pages I think the bug is still there.
[5 Jun 2013 13:37]
xiaobin lin
to Mark, yes there is always a dict_sys->mutex is held. That will block other sessions execute queries that need to access the mutex. For example, show craete another_table, inserting data into a innodb table that do not have a primary key, update rows in tables that have foreign key constrant. Perhaps the mutex can be splited into smaller dict_sys->mutex..
[19 Jan 2016 23:43]
Domas Mituzas
you didn't fix TRUNCATE TABLE stalls! http://bugs.mysql.com/bug.php?id=80060