Bug #115994 InnoDB: system hang because of buf flush wait flushed keeps waiting for ever
Submitted: 4 Sep 2024 6:19 Modified: 4 Sep 2024 7:08
Reporter: Shaohua Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7.41, 8.0 OS:Any
Assigned to: CPU Architecture:Any

[4 Sep 2024 6:19] Shaohua Wang
Description:
Found a coredump like this for the first time, it semes there is a bug in page cleane logic. Maybe Yasufumi can have a look.

DDL thread holds dict_sys->mutext and dict_operation_lock, and waits for dirty pages flushed.
buf_flush_wait_flushed (new_oldest=new_oldest@entry=14631034488637)

dirty page:
newest_modification = 14631034430421, oldest_modification = 14631034427699,

srv_n_page_cleaners = 1;

infos in gdb:
buf_flush_sync_lsn = 0
p page_cleaner->flush_pass
$2 = 9

only one dirty page in flush list.
{id = {m_space = 4183875, m_page_no = 23786, m_fold = 1953510445}, size = {m_physical = 4096, m_logical = 16384, m_is_compressed = 1}, buf_fix_count = 0, io_fix = BUF_IO_NONE, 
  state = BUF_BLOCK_ZIP_DIRTY, flush_type = 0, buf_pool_index = 6, zip = {data = 0x7ff706705000 "8)", <incomplete sequence \313>, m_end = 2196, m_nonempty = 0, n_blobs = 0, ssize = 3}, 
  hash = 0x0, list = {prev = 0x0, next = 0x0}, flush_observer = 0x0, newest_modification = 14631034430421, oldest_modification = 14631034427699, LRU = {prev = 0x7fea8c921b18, 
    next = 0x7fea8c911918}, old = 1, freed_page_clock = 112981966, access_time = 3220553233, lru_object_id = 1860, flush_object_id = 1860, chunk_no = 0, in_chunk_list = 0, in_zip_list = 1, 
  object_list = {prev = 0x0, next = 0x0}, object_LRU = {prev = 0x7fea8c911918, next = 0x7fea8c921b18}, chunk_list = {prev = 0x7ff618ddcd98, next = 0x7ff618e17590}, zip_list = {
    prev = 0x7fea8c911918, next = 0x7fea8c921b18}}

#0  0x00007ff7de32be9d in nanosleep () from /lib64/libpthread.so.0
#1  0x0000000001056a20 in os_thread_sleep (tm=tm@entry=10000) at os/os0thread.cc:287
#2  0x00000000011c7894 in buf_flush_wait_flushed (new_oldest=new_oldest@entry=14631034488637) at buf/buf0flu.cc:2114
#3  0x000000000102b8bb in log_preflush_pool_modified_pages (new_oldest=14631034488637) at log/log0log.cc:1538
#4  log_checkpoint_margin () at log/log0log.cc:1980
#5  log_check_margins () at log/log0log.cc:2072
#6  0x00000000010793a5 in log_free_check () at include/log0log.ic:499
#7  que_run_threads_low (thr=0x7ff339e7cf88) at que/que0que.cc:1113
#8  que_run_threads (thr=thr@entry=0x7ff33997da18) at que/que0que.cc:1159
#9  0x00000000010b6531 in row_create_table_for_mysql (table=0x7ff3ed5bd088, compression=compression@entry=0x7ff3d143dc50 "", trx=0x7ff7d6b2b3f0, commit=commit@entry=false)
    at row/row0mysql.cc:3058
#10 0x0000000000fefbb2 in prepare_inplace_alter_table_dict (ha_alter_info=ha_alter_info@entry=0x7ff3d737a660, altered_table=altered_table@entry=0x7ff339e35820, old_table=<optimized out>, 
    table_name=<optimized out>, flags=<optimized out>, flags2=<optimized out>, fts_doc_id_col=18446744073709551615, add_fts_doc_id=<optimized out>, add_fts_doc_id@entry=false, 
    add_fts_doc_id_idx=<optimized out>, add_fts_doc_id_idx@entry=false) at handler/handler0alter.cc:4725
#11 0x0000000000ff4116 in ha_innobase::prepare_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>) at handler/handler0alter.cc:6214
#12 0x0000000000d8a9f5 in mysql_inplace_alter_table (target_mdl_request=0x7ff3d737a0a0, alter_ctx=0x7ff3d737abf0, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, 
    ha_alter_info=0x7ff3d737a660, altered_table=0x7ff339e35820, table=0x7ff339e8a020, table_list=0x7ff339ed3b88, thd=0x7ff339de2000) at sql_table.cc:7814
#13 mysql_alter_table (thd=thd@entry=0x7ff339de2000, new_db=<optimized out>, new_name=<optimized out>, create_info=create_info@entry=0x7ff3d737b7c0, table_list=table_list@entry=0x7ff339ed3b88, 
    alter_info=alter_info@entry=0x7ff3d737b710) at sql_table.cc:10212
#14 0x0000000000ebcfa0 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x7ff339de2000) at sql_alter.cc:340
#15 0x0000000000d1c01a in mysql_execute_command (thd=thd@entry=0x7ff339de2000, first_level=first_level@entry=true) at sql_parse.cc:5312
#16 0x0000000000d20add in mysql_parse (thd=thd@entry=0x7ff339de2000, parser_state=parser_state@entry=0x7ff3d737ceb0) at sql_parse.cc:6123
#17 0x0000000000d217ad in dispatch_command (thd=thd@entry=0x7ff339de2000, com_data=com_data@entry=0x7ff3d737d570, command=COM_QUERY) at sql_parse.cc:1648
#18 0x0000000000d2337e in do_command (thd=thd@entry=0x7ff339de2000, dispatcher=dispatcher@entry=0x0) at sql_parse.cc:1086
#19 0x0000000000d2341c in do_command (thd=thd@entry=0x7ff339de2000) at sql_parse.cc:1098
#20 0x0000000000e2e370 in handle_connection (arg=arg@entry=0x7ff45a3ff6c0) at conn_handler/connection_handler_per_thread.cc:320
#21 0x0000000001315444 in pfs_spawn_thread (arg=0x7ff45887bc20) at pfs.cc:2197
#22 0x00007ff7de324ea5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007ff7dcff38dd in clone () from /lib64/libc.so.6

How to repeat:
Not yet to reproduce.

Suggested fix:
Is it possible to disable log_free_check()?
[4 Sep 2024 7:05] MySQL Verification Team
Hello Shaohua,

Thank you for the report and feedback.

As a side note, I would like to add here that (and I assume you are also aware) - per Oracle's Lifetime Support policy, as of October 25, 2023, MySQL 5.7 is covered under Oracle Sustaining Support. 
We don't fix bugs in old versions, don't backport bug fixes. So may I ask you check with the latest MySQL Server 8.0/8.4/9.0 please. Thank you.

Regards,
Umesh
[4 Sep 2024 7:08] Shaohua Wang
Considering the page clean logic is almost the same, I suspect 8.0+ is also affected.
[4 Sep 2024 7:41] MySQL Verification Team
Thank you, Shaohua for the details.

regards,
Umesh
[25 Nov 2024 14:16] Marcin Babij
Posted by developer:
 
Hello Shaohua,
8.0 is not using the `que_run_threads` in `row_create_table_for_mysql` anymore. Furthermore the usages of `que` is greatly reduced in 8.0.
`log_free_check` requires to not hold any latched that could prevent page flushing from progressing. It would be great if you have callstacks from all page cleaners or all threads provided to peek who may be blocking it.
If the `log_free_check` has some work to do, it means the system is not configured efficiently and suffers from redolog files capacity exhaustion.

Overall, it seems this particular bug is 5.7-only.
[15 May 10:29] Arek M
Probably the same issue. Not sure if backtrace is helpful as I run percona fork.

(gdb) bt
#0  0x00007f8ba9c84455 in clock_nanosleep () from /lib64/libc.so.6
#1  0x00007f8ba9c8fe57 in nanosleep () from /lib64/libc.so.6
#2  0x00000000011bdbdd in os_thread_sleep (tm=tm@entry=10000) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/os/os0thread.cc:311
#3  0x000000000133f392 in buf_flush_wait_flushed (new_oldest=new_oldest@entry=55213605454167) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/buf/buf0flu.cc:2182
#4  0x0000000001185146 in log_preflush_pool_modified_pages (new_oldest=55213605454167) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/log/log0log.cc:1946
#5  log_checkpoint_margin () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/log/log0log.cc:2403
#6  log_check_margins () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/log/log0log.cc:2505
#7  0x00000000011e1844 in log_free_check () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/include/log0log.ic:533
#8  que_run_threads_low (thr=<optimized out>) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/que/que0que.cc:1115
#9  que_run_threads (thr=<optimized out>) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/que/que0que.cc:1161
#10 0x00000000011e2716 in que_eval_sql (info=info@entry=0x0,
    sql=sql@entry=0x174b320 <row_merge_drop_temp_indexes()::sql> "PROCEDURE DROP_TEMP_INDEXES_PROC () IS\nixid CHAR;\nfound INT;\nDECLARE CURSOR index_cur IS\n SELECT ID FROM SYS_INDEXES\n WHERE SUBSTR(NAME,0,1)='\377'\nFOR UPDATE;\nBEGIN\nfound := 1;\nOPEN index_cur;\nWHILE fou"..., reserve_dict_mutex=reserve_dict_mutex@entry=0, trx=trx@entry=0x7f8ba0901c70) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/que/que0que.cc:1238
#11 0x0000000001214d0f in row_merge_drop_temp_indexes () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/row/row0merge.cc:3846
#12 0x0000000001195289 in recv_recovery_rollback_active () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/log/log0recv.cc:4388
#13 0x000000000128bc39 in innobase_start_or_create_for_mysql () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/srv/srv0start.cc:2784
#14 0x000000000111f3f4 in innobase_init (p=<optimized out>) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/storage/innobase/handler/ha_innodb.cc:4844
#15 0x000000000078af55 in ha_initialize_handlerton (plugin=0x35546b8) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/sql/handler.cc:881
#16 0x0000000000dbc034 in plugin_initialize (plugin=plugin@entry=0x35546b8) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/sql/sql_plugin.cc:1260
#17 0x0000000000dbe337 in plugin_register_builtin_and_init_core_se (argc=argc@entry=0x1d61da0 <remaining_argc>, argv=0x33c42a8) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/sql/sql_plugin.cc:1623
#18 0x0000000000e7affb in init_server_components () at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/sql/mysqld.cc:4379
#19 0x0000000000e7c843 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/percona-server-5.7.44-52.1.x86_64/sql/mysqld.cc:5174
#20 0x00007f8ba9bcf17c in ?? () from /lib64/libc.so.6
#21 0x00007f8ba9bcf239 in __libc_start_main () from /lib64/libc.so.6
#22 0x000000000077d3c5 in _start () at ../sysdeps/x86_64/start.S:115
[15 May 10:30] Arek M
thread apply all bt

Attachment: all-threads-backtrace.txt (text/plain), 28.79 KiB.