Bug #73816 | MySQL instance stalling “doing SYNC index” | ||
---|---|---|---|
Submitted: | 5 Sep 2014 12:56 | Modified: | 23 Feb 2016 19:28 |
Reporter: | Denis Jedig | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S2 (Serious) |
Version: | 5.6.20 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | DML, fulltext, insert, mutex, stall, UPDATE |
[5 Sep 2014 12:56]
Denis Jedig
[10 Jun 2015 13:47]
Brandon Johnson
This is a rather critical issue. It's causing 2 minute locks on all tables.
[9 Nov 2015 8:10]
Aurimas Mikalauskas
I have some further evidence for this. This is happening about once in 24h on our machine. When it happens, we always see the following mutex waits: --Thread 91109132288 has waited at fts0que.cc line 1287 for 59.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90488269824) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 --Thread 90943929344 has waited at fts0que.cc line 1287 for 49.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90488269824) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 --Thread 90487953408 has waited at fts0que.cc line 1287 for 48.00 seconds the semaphore: X-lock on RW-latch at 0x152695fc68 created in file fts0fts.cc line 645 a writer (thread id 90488269824) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /wrkdirs/usr/ports/databases/mysql56-server/work/mysql-5.6.21/storage/innobase/fts/fts0fts.cc line 4453 ... Hence, it's a &cache->lock wait in fts_query_intersect() function here: /* Search the cache for a matching word first. */ rw_lock_x_lock(&cache->lock); With our configuration it takes ~70s to complete and while it is running, there's hundreds of write queries waiting in a 'query end' state. Not sure though which mutex they are waiting on as I didn't have a chance to run PMP on it. And here's the thread doing the sync (and holding the lock): ---TRANSACTION 16597707497, ACTIVE 64 sec doing SYNC index 6 lock struct(s), heap size 1184, 0 row lock(s), undo log entries 18706 TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_1" trx id 16597707497 lock mode IX TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_2" trx id 16597707497 lock mode IX TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_3" trx id 16597707497 lock mode IX TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_4" trx id 16597707497 lock mode IX TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_5" trx id 16597707497 lock mode IX TABLE LOCK table "db"."FTS_0000000000001483_000000000000356e_INDEX_6" trx id 16597707497 lock mode IX MySQL 5.6.21, running on FreeBSD 10.1-RELEASE with the following innodb ft config: | innodb_ft_aux_table | | | innodb_ft_cache_size | 8000000 | | innodb_ft_enable_diag_print | OFF | | innodb_ft_enable_stopword | ON | | innodb_ft_max_token_size | 84 | | innodb_ft_min_token_size | 3 | | innodb_ft_num_word_optimize | 2000 | | innodb_ft_result_cache_limit | 2000000000 | | innodb_ft_server_stopword_table | | | innodb_ft_sort_pll_degree | 2 | | innodb_ft_total_cache_size | 640000000 | | innodb_ft_user_stopword_table | path/search_stopwords |
[14 Dec 2015 12:57]
Aurimas Mikalauskas
I was able to reproduce this by loading a 1GB size FTS indexed table on one thread and running inserts into an unrelated table on the other thread. With replication disabled, unrelated inserts are running just fine. However, when I enable replication, inserts are blocked while the FTS SYNC is in progress. Here's PMP sample from when this is happening. I've used community mysql 5.6.28 for this: 9 blocked threads (generated with mysqlslap --auto-generate-sql-load-type=write): 9 pthread_cond_wait,inline_mysql_cond_wait(mysql_thread.h:1162),Stage_manager::enroll_for(mysql_thread.h:1162),MYSQL_BIN_LOG::change_stage(binlog.cc:6783),MYSQL_BIN_LOG::ordered_commit(binlog.cc:7187),MYSQL_BIN_LOG::commit(binlog.cc:6500),ha_commit_trans(handler.cc:1436),trans_commit(transaction.cc:228),mysql_execute_command(sql_parse.cc:4280),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone And here's the SYNC thread and FTS table load thread: 1 __lll_lock_wait,_L_lock_1081(libpthread.so.0),__GI___pthread_mutex_lock,inline_mysql_mutex_lock(mysql_thread.h:688),MYSQL_BIN_LOG::change_stage(mysql_thread.h:688),MYSQL_BIN_LOG::ordered_commit(binlog.cc:7187),MYSQL_BIN_LOG::commit(binlog.cc:6500),ha_commit_trans(handler.cc:1436),trans_commit(transaction.cc:228),mysql_execute_command(sql_parse.cc:4280),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone 1 fsync,os_file_fsync(os0file.cc:2225),os_file_flush_func(os0file.cc:2225),fil_flush(os0file.ic:352),buf_dblwr_write_single_page(buf0dblwr.cc:1164),buf_flush_write_block_low(buf0flu.cc:959),buf_flush_page(buf0flu.cc:959),buf_flush_single_page_from_LRU(buf0flu.cc:1992),buf_LRU_get_free_block(buf0lru.cc:1367),buf_page_create(buf0buf.cc:3866),fsp_page_create(fsp0fsp.cc:1296),fseg_alloc_free_page_low(fsp0fsp.cc:2570),fseg_alloc_free_page_general(fsp0fsp.cc:2639),btr_page_alloc(btr0btr.cc:1141),btr_page_split_and_insert(btr0btr.cc:1141),btr_cur_pessimistic_insert(btr0cur.cc:1604),row_ins_clust_index_entry_low(row0ins.cc:2492),row_ins_clust_index_entry(row0ins.cc:2890),row_ins(row0ins.cc:2967),row_ins_step(row0ins.cc:2967),que_thr_step(que0que.cc:1059),que_run_threads_low(que0que.cc:1059),que_run_threads(que0que.cc:1059),fts_eval_sql(fts0sql.cc:286),fts_write_node(fts0fts.cc:3877),fts_sync_write_words(fts0fts.cc:4014),fts_sync_index(fts0fts.cc:4014),fts_sync(fts0fts.cc:4014),fts_add_doc_by_id(fts0fts.cc:3565),fts_add(fts0fts.cc:3565),fts_modify(fts0fts.cc:3028),fts_commit_table(fts0fts.cc:3028),fts_commit(fts0fts.cc:3028),trx_commit_low(trx0trx.cc:1352),trx_commit(trx0trx.cc:1416),trx_commit_for_mysql(trx0trx.cc:1416),innobase_commit_low(ha_innodb.cc:3400),innobase_commit(ha_innodb.cc:3400),ha_commit_low(handler.cc:1493),MYSQL_BIN_LOG::process_commit_stage_queue(binlog.cc:6678),MYSQL_BIN_LOG::ordered_commit(binlog.cc:7198),MYSQL_BIN_LOG::commit(binlog.cc:6500),ha_commit_trans(handler.cc:1436),trans_commit_stmt(transaction.cc:434),mysql_execute_command(sql_parse.cc:5037),mysql_parse(sql_parse.cc:6386),dispatch_command(sql_parse.cc:1340),do_command(sql_parse.cc:1037),do_handle_one_connection(sql_connect.cc:982),handle_one_connection(sql_connect.cc:898),pfs_spawn_thread(pfs.cc:1860),start_thread,clone The simple workaround is to switch off binlog_order_commits: SET global binlog_order_commits = off; I'm still not sure how binlog_order_commits works though, so can't guarantee this is safe.
[14 Dec 2015 13:29]
Aurimas Mikalauskas
Just to follow-up, I have made the same tests on 5.7.10-log MySQL Community Server (GPL) and got the same results, hence upgrading to MySQL 5.7 isn't going to solve this. Here's backtraces from 5.7.10: Threads waiting to commit: 8 pthread_cond_wait,native_cond_wait(thr_cond.h:136),my_cond_wait(thr_cond.h:136),inline_mysql_cond_wait(thr_cond.h:136),Stage_manager::enroll_for(thr_cond.h:136),MYSQL_BIN_LOG::change_stage(binlog.cc:8376),MYSQL_BIN_LOG::ordered_commit(binlog.cc:8764),MYSQL_BIN_LOG::commit(binlog.cc:8061),ha_commit_trans(handler.cc:1689),trans_commit(transaction.cc:239),mysql_execute_command(sql_parse.cc:4206),mysql_parse(sql_parse.cc:5508),dispatch_command(sql_parse.cc:1425),do_command(sql_parse.cc:993),handle_connection(connection_handler_per_thread.cc:301),pfs_spawn_thread(pfs.cc:2192),start_thread,clone And the FTS loading and SYNC: 1 __lll_lock_wait,_L_lock_1081(libpthread.so.0),__GI___pthread_mutex_lock,native_mutex_lock(thr_mutex.h:84),my_mutex_lock(thr_mutex.h:84),inline_mysql_mutex_lock(thr_mutex.h:84),MYSQL_BIN_LOG::change_stage(thr_mutex.h:84),MYSQL_BIN_LOG::ordered_commit(binlog.cc:8888),MYSQL_BIN_LOG::commit(binlog.cc:8061),ha_commit_trans(handler.cc:1689),trans_commit(transaction.cc:239),mysql_execute_command(sql_parse.cc:4206),mysql_parse(sql_parse.cc:5508),dispatch_command(sql_parse.cc:1425),do_command(sql_parse.cc:993),handle_connection(connection_handler_per_thread.cc:301),pfs_spawn_thread(pfs.cc:2192),start_thread,clone 1 fsync,os_file_fsync_posix(os0file.cc:2872),os_file_flush_func(os0file.cc:2872),pfs_os_file_flush_func(os0file.ic:402),fil_flush(os0file.ic:402),buf_dblwr_write_single_page(buf0dblwr.cc:1282),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),buf_page_init_for_read(buf0buf.cc:5188),buf_read_page_low(buf0rea.cc:149),buf_read_page(buf0rea.cc:406),buf_page_get_gen(buf0buf.cc:4239),btr_cur_search_to_nth_level(btr0cur.cc:1107),btr_pcur_open_low(btr0pcur.ic:465),row_ins_clust_index_entry_low(btr0pcur.ic:465),row_ins_clust_index_entry(row0ins.cc:3158),row_ins_index_entry(row0ins.cc:3288),row_ins_index_entry_step(row0ins.cc:3288),row_ins(row0ins.cc:3288),row_ins_step(row0ins.cc:3288),que_thr_step(que0que.cc:1031),que_run_threads_low(que0que.cc:1031),que_run_threads(que0que.cc:1031),fts_eval_sql(fts0sql.cc:245),fts_write_node(fts0fts.cc:3958),fts_sync_write_words(fts0fts.cc:4099),fts_sync_index(fts0fts.cc:4099),fts_sync(fts0fts.cc:4534),fts_add_doc_by_id(fts0fts.cc:3644),fts_add(fts0fts.cc:2989),fts_commit_table(fts0fts.cc:2989),fts_commit(fts0fts.cc:2989),trx_commit_low(trx0trx.cc:2113),trx_commit(trx0trx.cc:2205),trx_commit_for_mysql(trx0trx.cc:2424),innobase_commit_low(ha_innodb.cc:3881),innobase_commit(ha_innodb.cc:4040),ha_commit_low(handler.cc:1794),MYSQL_BIN_LOG::process_commit_stage_queue(binlog.cc:8260),MYSQL_BIN_LOG::ordered_commit(binlog.cc:8918),MYSQL_BIN_LOG::commit(binlog.cc:8061),ha_commit_trans(handler.cc:1689),trans_commit_stmt(transaction.cc:458),mysql_execute_command(sql_parse.cc:4946),mysql_parse(sql_parse.cc:5508),dispatch_command(sql_parse.cc:1425),do_command(sql_parse.cc:993),handle_connection(connection_handler_per_thread.cc:301),pfs_spawn_thread(pfs.cc:2192),start_thread,clone Similarly, disabling binlog_order_commits solves this here too. So looks like they're all waiting on &LOCK_commit otherwise.
[7 Jan 2016 18:34]
MySQL Verification Team
take a look at the problem discussed here: http://www.speedemy.com/mysql-troubleshooting-crash-course/
[8 Jan 2016 10:22]
Jimmy Yang
Posted by developer: There are 2 potential fix of this bug: 1. While running sync, not to hold fts cache lock, as we will eliminate duplicates later anyway. Only when we delete the rbt tree in memory we will hold the lock 2. Do the sync in a background optimizer thread
[8 Jan 2016 20:28]
Denis Jedig
As so many people seem to be suffering badly (and still actively) from this bug, I would like to repost a workaround found by Daniel Golding and posted at http://dba.stackexchange.com/a/108259/5861: set global binlog_order_commits = 0; [...] Explanation: InnoDB full-text uses a cache (by default 8M in size) containing changes that need to be applied to the actual full-text index on disk. Once the cache fills up, a couple of transactions are created to perform the work of merging the data that was contained in the cache - this tends to be a large amount of random IO, so unless your entire full-text index can be loaded into the buffer pool, it's a long and slow transaction. With binlog_order_commits set to true, all transactions containing inserts and updates, started after the long-running fts_sync_index transaction, must wait until it has completed before they can commit. This is only an issue if binary logging is enabled
[8 Jan 2016 21:21]
Aurimas Mikalauskas
Note that disabling binlog_order_commits will not fix the issue fully. FTS queries against InnoDB tables with FTS index will still block, but at least in this case unrelated queries will not be affected. I have added a few more details on this in the following post: http://www.speedemy.com/mysql-troubleshooting-crash-course/
[9 Jan 2016 9:51]
Denis Jedig
@Aurimas Nice. Looks like a fair bunch of work has been done to produce the writeup. What irritates me is that it took the MySQL dev team more than 15 months just to acknowledge the issue.
[1 Feb 2016 11:23]
Shaohua Wang
fix
Attachment: bug73816.patch (text/x-patch), 17.84 KiB.
[1 Feb 2016 11:23]
Shaohua Wang
We are working on this fix, please give us your feedback if it solves your problem.
[3 Feb 2016 1:10]
Shaohua Wang
new patch
Attachment: bug73816-v2.patch (text/x-patch), 25.56 KiB.
[19 Feb 2016 11:28]
Shaohua Wang
patch v3
Attachment: bug73816-v3.patch (text/x-patch), 18.14 KiB.
[23 Feb 2016 19:28]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.6.30, 5.7.12, 5.8.0 release, and here's the changelog entry: MySQL stalled when syncing the InnoDB full-text index cache. Thank you for the bug report.
[18 Apr 2016 3:18]
MySQL Verification Team
Bug 22996488 - CRASH IN FTS_SYNC_INDEX WHEN DOING DDL IN A LOOP the fix here caused a regression that can lead to crashes. folks should rather wait for 5.6.31.