Bug #80060 Concurrent TRUNCATE TABLEs cause stalls
Submitted: 19 Jan 2016 23:43 Modified: 12 Jan 2017 12:44
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.6.* OS:Any
Assigned to: CPU Architecture:Any

[19 Jan 2016 23:43] Domas Mituzas
Description:
This is different from many other bugs of this flavor (#68184, #56696).
In this case I'm looking at machines with innodb_adaptive_hash_index.

This one simply shows that even if TRUNCATE TABLE is fine, multiple are not (and other DDLs can have issues with TRUNCATE running):

1. TRUNCATE TABLE
fil_discard_tablespace() : fil_delete_tablespace() : buf_LRU_flush_or_remove_pages will hold dict_sys->mutex while doing expensive LRU scan

2. TRUNCATE TABLE
ha_innobase::close() : dict_table_close() will try to acquire dict_sys->mutex() while holding Table_cache::m_lock

Now that we really got some giant lock, let's enjoy server that is completely locked up:

3. SELECTs 
It is hard to SELECT when TABLE_CACHE is all locked up.

How to repeat:
T1: truncate table t1;
T2: truncate table t2;
T3: SELECT * FROM t3;

expect this:
Thread 4 (Thread 0x7fb7cf5db700 (LWP 1325193)):
#0  0x0000000000ae04bd in buf_LRU_flush_or_remove_pages(unsigned long, buf_remove_t, trx_t const*) (bpage=0x7fb2ade45540) at /oracle-mysql-5.6/storage/innobase/include/buf0buf.ic:304
#1  0x0000000000ae04bd in buf_LRU_flush_or_remove_pages(unsigned long, buf_remove_t, trx_t const*) (id=1814, buf_pool=<optimized out>) at /oracle-mysql-5.6/storage/innobase/buf/buf0lru.cc:301
#2  0x0000000000ae04bd in buf_LRU_flush_or_remove_pages(unsigned long, buf_remove_t, trx_t const*) (id=id@entry=1814, buf_remove=buf_remove@entry=BUF_REMOVE_ALL_NO_WRITE, trx=trx@entry=0x0) at /oracle-mysql-5.6/storage/innobase/buf/buf0lru.cc:916
#3  0x0000000000b1f378 in fil_delete_tablespace(unsigned long, buf_remove_t) (id=id@entry=1814, buf_remove=buf_remove@entry=BUF_REMOVE_ALL_NO_WRITE) at /oracle-mysql-5.6/storage/innobase/fil/fil0fil.cc:3051
#4  0x0000000000b1f741 in fil_discard_tablespace(unsigned long) (id=1814) at /oracle-mysql-5.6/storage/innobase/fil/fil0fil.cc:3164
#5  0x0000000000a1a5d8 in row_truncate_table_for_mysql(dict_table_t*, trx_t*) (table=0x7fa9a3c5e680, trx=0x7fa99a29b9c8) at /oracle-mysql-5.6/storage/innobase/row/row0mysql.cc:3465
#6  0x000000000096122e in ha_innobase::truncate() (this=0x7fa9a683c010) at /oracle-mysql-5.6/storage/innobase/handler/ha_innodb.cc:11356
#7  0x000000000087e3c0 in Sql_cmd_truncate_table::handler_truncate(THD*, TABLE_LIST*, bool) (this=<optimized out>, thd=0x7fa9d243d000, table_ref=0x7fa9d26508e8, is_tmp_table=<optimized out>) at /oracle-mysql-5.6/sql/sql_truncate.cc:242
#8  0x000000000087e870 in Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*) (this=this@entry=0x7fa9d2650e58, thd=thd@entry=0x7fa9d243d000, table_ref=table_ref@entry=0x7fa9d26508e8) at /oracle-mysql-5.6/sql/sql_truncate.cc:487
#9  0x000000000087e966 in Sql_cmd_truncate_table::execute(THD*) (this=0x7fa9d2650e58, thd=0x7fa9d243d000) at /oracle-mysql-5.6/sql/sql_truncate.cc:543
#10 0x0000000000700a22 in mysql_execute_command(THD*, unsigned long long*, unsigned long long*) (thd=thd@entry=0x7fa9d243d000, statement_start_time=statement_start_time@entry=0x7fb7cf5d98d0, post_parse=post_parse@entry=0x7fb7cf5d9ab0) at /oracle-mysql-5.6/sql/sql_parse.cc:5465
#11 0x000000000070624f in mysql_parse(THD*, char*, unsigned int, Parser_state*, unsigned long long*, char*) (thd=thd@entry=0x7fa9d243d000, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fb7cf5da220, last_timer=last_timer@entry=0x7fb7cf5d9ab0, async_commit=async_commit@entry=0x7fb7cf5d9aa0 "") at /oracle-mysql-5.6/sql/sql_parse.cc:7001
#12 0x00000000007089b1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) (command=COM_QUERY, thd=0x7fa9d243d000, packet=<optimized out>, packet_length=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1515
#13 0x000000000070934e in do_command(THD*) (thd=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1065
#14 0x00000000006d3732 in do_handle_one_connection(THD*) (thd_arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:1064
#15 0x00000000006d37b9 in handle_one_connection(void*) (arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:929
#16 0x00007fb7ceec9fa8 in start_thread () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#17 0x00007fb7cd24d5bd in clone () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6

Thread 3 (Thread 0x7fb7cf5aa700 (LWP 648830)):
#0  0x00007fb7ceece3dc in pthread_cond_wait@@GLIBC_2.3.2 () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#1  0x00000000009c1eec in os_event_wait_low2(os_event_struct*, long) (fast_mutex=<optimized out>, cond=<optimized out>) at /oracle-mysql-5.6/storage/innobase/os/os0sync.cc:217
#2  0x00000000009c1eec in os_event_wait_low2(os_event_struct*, long) (event=0x7fb7cbb95dc0, reset_sig_count=<optimized out>) at /oracle-mysql-5.6/storage/innobase/os/os0sync.cc:692
#3  0x0000000000a4b40f in sync_array_wait_event(sync_array_t*, unsigned long) (arr=0x7fb7cbbf2070, index=1) at /oracle-mysql-5.6/storage/innobase/sync/sync0arr.cc:424
#4  0x0000000000a4d341 in mutex_spin_wait(ib_mutex_t*, char const*, unsigned long) (mutex=0x7fb7cbb95dc0, file_name=file_name@entry=0xdebef8 "/oracle-mysql-5.6/storage/innobase/dict/dict0dict.cc", line=line@entry=568) at /oracle-mysql-5.6/storage/innobase/sync/sync0sync.cc:580
#5  0x0000000000aed788 in dict_table_close(dict_table_t*, unsigned long, unsigned long) (file_name=0xdebef8 "/oracle-mysql-5.6/storage/innobase/dict/dict0dict.cc", line=568, mutex=<optimized out>) at /oracle-mysql-5.6/storage/innobase/include/sync0sync.ic:229
#6  0x0000000000aed788 in dict_table_close(dict_table_t*, unsigned long, unsigned long) (table=0x7fa9a0c1fe80, dict_locked=dict_locked@entry=0, try_drop=try_drop@entry=1) at /oracle-mysql-5.6/storage/innobase/dict/dict0dict.cc:568
#7  0x0000000000a112ab in row_prebuilt_free(row_prebuilt_t*, unsigned long) (prebuilt=0x7fa99f562080, dict_locked=dict_locked@entry=0) at /oracle-mysql-5.6/storage/innobase/row/row0mysql.cc:950
#8  0x000000000095b4ff in ha_innobase::close() (this=0x7fa9d24e0810) at /oracle-mysql-5.6/storage/innobase/handler/ha_innodb.cc:6102
#9  0x0000000000786b99 in closefrm(TABLE*, bool) (table=table@entry=0x7fa9a0f71000, free_share=free_share@entry=true) at /oracle-mysql-5.6/sql/table.cc:2631
#10 0x00000000006b6f36 in intern_close_table(TABLE*) (table=0x7fa9a0f71000) at /oracle-mysql-5.6/sql/sql_base.cc:821
#11 0x0000000000790877 in Table_cache_manager::free_table(THD*, enum_tdc_remove_table_type, TABLE_SHARE*) (this=this@entry=0x138fdc0 <table_cache_manager>, thd=thd@entry=0x7fa9d25a3000, remove_type=remove_type@entry=TDC_RT_REMOVE_ALL, share=<optimized out>) at /oracle-mysql-5.6/sql/table_cache.cc:380
#12 0x00000000006bf29a in tdc_remove_table(THD*, enum_tdc_remove_table_type, char const*, char const*, bool) (thd=thd@entry=0x7fa9d25a3000, remove_type=remove_type@entry=TDC_RT_REMOVE_ALL, db=0x7fa99f43c650 "test", table_name=0x7fa99f43c0b0 "t2", has_lock=has_lock@entry=false) at /oracle-mysql-5.6/sql/sql_base.cc:9384
#13 0x000000000087e587 in Sql_cmd_truncate_table::lock_table(THD*, TABLE_LIST*, bool*) (this=this@entry=0x7fa99f43c658, thd=thd@entry=0x7fa9d25a3000, table_ref=table_ref@entry=0x7fa99f43c0e8, hton_can_recreate=hton_can_recreate@entry=0x7fb7cf5a7440) at /oracle-mysql-5.6/sql/sql_truncate.cc:393
#14 0x000000000087e628 in Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*) (this=this@entry=0x7fa99f43c658, thd=thd@entry=0x7fa9d25a3000, table_ref=table_ref@entry=0x7fa99f43c0e8) at /oracle-mysql-5.6/sql/sql_truncate.cc:464
#15 0x000000000087e966 in Sql_cmd_truncate_table::execute(THD*) (this=0x7fa99f43c658, thd=0x7fa9d25a3000) at /oracle-mysql-5.6/sql/sql_truncate.cc:543
#16 0x0000000000700a22 in mysql_execute_command(THD*, unsigned long long*, unsigned long long*) (thd=thd@entry=0x7fa9d25a3000, statement_start_time=statement_start_time@entry=0x7fb7cf5a88d0, post_parse=post_parse@entry=0x7fb7cf5a8ab0) at /oracle-mysql-5.6/sql/sql_parse.cc:5465
#17 0x000000000070624f in mysql_parse(THD*, char*, unsigned int, Parser_state*, unsigned long long*, char*) (thd=thd@entry=0x7fa9d25a3000, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fb7cf5a9220, last_timer=last_timer@entry=0x7fb7cf5a8ab0, async_commit=async_commit@entry=0x7fb7cf5a8aa0 "") at /oracle-mysql-5.6/sql/sql_parse.cc:7001
#18 0x00000000007089b1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) (command=COM_QUERY, thd=0x7fa9d25a3000, packet=<optimized out>, packet_length=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1515
#19 0x000000000070934e in do_command(THD*) (thd=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1065
#20 0x00000000006d3732 in do_handle_one_connection(THD*) (thd_arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:1064
#21 0x00000000006d37b9 in handle_one_connection(void*) (arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:929
#22 0x00007fb7ceec9fa8 in start_thread () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#23 0x00007fb7cd24d5bd in clone () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6

Thread 2 (Thread 0x7fb7cc4d4700 (LWP 2079084)):
#0  0x00007fb7ceed1024 in __lll_lock_wait () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#1  0x00007fb7ceecc6eb in _L_lock_1330 () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#2  0x00007fb7ceecc65b in pthread_mutex_lock () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#3  0x00000000006b7056 in close_thread_table(THD*, TABLE**, bool) (that=<optimized out>) at /oracle-mysql-5.6/include/mysql/psi/mysql_thread.h:688
#4  0x00000000006b7056 in close_thread_table(THD*, TABLE**, bool) (this=<optimized out>) at /oracle-mysql-5.6/sql/table_cache.h:115
#5  0x00000000006b7056 in close_thread_table(THD*, TABLE**, bool) (thd=0x7fa9d2712000, table_ptr=<optimized out>, update_stats=<optimized out>) at /oracle-mysql-5.6/sql/sql_base.cc:1526
#6  0x00000000006b758d in close_thread_tables(THD*, bool) (thd=0x7fa9d2712000) at /oracle-mysql-5.6/sql/sql_base.cc:1248
#7  0x00000000006b758d in close_thread_tables(THD*, bool) (thd=thd@entry=0x7fa9d2712000, async_commit=async_commit@entry=false) at /oracle-mysql-5.6/sql/sql_base.cc:1474
#8  0x00000000006ff3d3 in mysql_execute_command(THD*, unsigned long long*, unsigned long long*) (thd=thd@entry=0x7fa9d2712000, statement_start_time=statement_start_time@entry=0x7fb7cc4d28d0, post_parse=post_parse@entry=0x7fb7cc4d2ab0) at /oracle-mysql-5.6/sql/sql_parse.cc:5614
#9  0x000000000070624f in mysql_parse(THD*, char*, unsigned int, Parser_state*, unsigned long long*, char*) (thd=thd@entry=0x7fa9d2712000, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fb7cc4d3220, last_timer=last_timer@entry=0x7fb7cc4d2ab0, async_commit=async_commit@entry=0x7fb7cc4d2aa0 "") at /oracle-mysql-5.6/sql/sql_parse.cc:7001
#10 0x00000000007089b1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) (command=COM_QUERY, thd=0x7fa9d2712000, packet=<optimized out>, packet_length=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1515
#11 0x000000000070934e in do_command(THD*) (thd=<optimized out>) at /oracle-mysql-5.6/sql/sql_parse.cc:1065
#12 0x00000000006d3732 in do_handle_one_connection(THD*) (thd_arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:1064
#13 0x00000000006d37b9 in handle_one_connection(void*) (arg=<optimized out>) at /oracle-mysql-5.6/sql/sql_connect.cc:929
#14 0x00007fb7ceec9fa8 in start_thread () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libpthread.so.0
#15 0x00007fb7cd24d5bd in clone () at /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6

Suggested fix:
find ways to scan buffer pool without locking dict_sys->mutex, just like Mark Callaghan suggested in 56696
[29 Jan 2016 16:33] Sinisa Milivojevic
Hi Domas,

Thank you for your bug report. I need to try to repeat it, but I used rather small tables, so no luck so far , as it is tough to repeat a necessary concurrency. 
So, how big should be tables to be TRUNCATEd in order to get the same outcome as you did ??
[1 Feb 2016 18:19] Sinisa Milivojevic
Domas,

I have tried with very large tables. I made sure that adaptive hash index is on. Still, I am not able to repeat it. I have run it several times with 5.6.25 and each time first two tables were truncated, while I got the output of the third one, without any problems.

Just like you, I ran those 3 statements in parallel, just by using bash shell and separating each command with '&'. 

Can you tell me other parameters which might influence that the bug pops up ???

I must also tell you that my iMac is 6.5 years old, runs 10.6.8, with 2.66 GHz Intel Core Duo and 4 Gb of RAM.

Please, provide further info that would help me repeat the bug.
[1 Feb 2016 21:52] Domas Mituzas
table being truncated can be empty, you just need lots of pages on the LRU (so, large non-free buffer pool)
[2 Feb 2016 17:42] Sinisa Milivojevic
Trying to make a test case.
[3 Feb 2016 16:55] Sinisa Milivojevic
Hi Domas,

I have done lots of work in attempting to reproduce this bug.

I used buffer pool of 2 Gb. I have ran several range queries (each using index) , which covered entire contents of four different tables, each containing at least over 15.000 rows and being at least 200 Mb big. I ran those several times, after which I attempted to run your test case. I ran your test case and it worked quite well. I then restored first two tables (that have been truncated) and repeated the entire procedure all over. 

There were no problems, except that it was not lightning fast.

So, no luck yet. What else do you think that I should attempt ???

Have a nice time.
[3 Feb 2016 18:13] Domas Mituzas
Sinisa, oh yes, buffer pool has to be large enough. In our case it was probably more like 50GB, not 5GB, but it is not unlikely to have 200GB buffer pools nowadays (though some people take it far further).

If you want to verify bugs like this on your 6.5 year old iMac, you should actually pay attention to details in the bug and either somehow mimic large LRU scans, add sleeps or something. 

Or you could repro this with a large enough machine. Even 6.5 years ago server class machines could have tens of gigabytes of buffer pool.
[7 Apr 2016 13:24] Sinisa Milivojevic
Domas,

I have been running experiments with 200 Gb buffer pool for over a month. I managed to fill up the tables (each with 8 columns) with 1.5 to 2.0 million rows. I had three indices and ran queries that filled up the AHI.

I was not able to cause hangup with:

truncate & truncate & select ...   &

The best that I was able to achieve is to cause a long stall. Last night, it took the longest, 16 hours for those three commands to finish.

I can verify this bug as performance bug, but not as hangup bug.

Your thoughts ????
[7 Apr 2016 16:00] Domas Mituzas
Sinisa, what the heck are you smoking again. I never said it is "hangup". I said - stall. Let's look a dictionary definition and see if it implies "hangup": 

verb
1 [ no obj. ] (of a motor vehicle or its engine) stop running, typically because of an overload on the engine: her car stalled at the crossroads.
• (of an aircraft or its pilot) reach a condition where the speed is too low to allow effective operation of the controls.
• Sailing have insufficient wind power in the sails to give controlled motion.
• [ with obj. ] cause (an engine, vehicle, aircraft, or boat) to stall.
2 stop or cause to stop making progress: [ no obj. ] : his career had stalled, hers taken off | [ with obj. ] : the government has stalled the much-needed project.
• speak or act in a deliberately vague way in order to gain more time to deal with a question or issue; prevaricate: she was stalling for time.
• [ with obj. ] delay or divert (someone) by stalling: stall him until I've had time to take a look.

Nope, it does not. Just server operation stops for a while and then resumes.
[7 Apr 2016 16:04] Sinisa Milivojevic
Domas,

Then I have verified your bug !!!!

Fully verified stalling bug in AHI.
[7 Apr 2016 16:20] Domas Mituzas
this is a bug with AHI disabled too.
[7 Apr 2016 16:38] Sinisa Milivojevic
Domas,

Thanks for the info.

Detailed instructions to repeat the bug are huge, so they are prepared for a developer that will be assigned.

Also, I had to generate a very large amount of data.
[7 Apr 2016 16:52] Domas Mituzas
thats weird, one can prepare this testcase quite quick by writing tens of gigs of blob data (to populate buffer pool), and then creating and dropping empty tables.
[12 Jan 2017 12:44] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.18, 8.0.1 release, and here's the changelog entry:

A TRUNCATE TABLE operation held the dict_sys mutex while scanning for and
removing pages from the buffer pool, causing concurrent DDL operations to
stall. The mutex is now released during the scan and acquired again when
the scan is completed. 

Thank you for the bug report.
[27 Sep 2017 19:45] Miguel Solorzano
https://bugs.mysql.com/bug.php?id=87884 marked as duplicate of this one.