MySQL Bugs: #101860: Deadlock happend under heavily write and delete workload

Bug #101860	Deadlock happend under heavily write and delete workload
Submitted:	3 Dec 2020 18:15	Modified:	8 Feb 2021 13:51
Reporter:	Zongzhi Chen (OCA)	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.6/5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Deadlock under heavily write workload..

This is the stack after using pt-pmp..

You can get from the stack that.
1. there is lots of thread waiting in trx_start_low() function, they are waiting for the trx_sys.mutex. Who got this mutex?

pthread_cond_wait,os_event_wait_low,sync_array_wait_event,bool,,trx_start_low,row_search_for_mysql,ha_innobase::index_read,handler::ha_index_read_map,handler::read_range_first,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

2. It is thread 44 that got the mutex.. and it is flushing the max_trx_id and committing the mtr, and wait for the log_sys.mutex. Who got log_sys.mutex?

pthread_cond_wait,os_event_wait_low,sync_array_wait_event,bool,,mtr_commit,trx_sys_flush_max_trx_id,trx_start_low,row_search_for_mysql,ha_innobase::index_read,handler::ha_index_read_map,handler::read_range_first,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

3. It is thread 74 that got the log_sys.mutex. After commit the mtr, and close the redo log, it need to check whether it need to do checkpoint, then it need to get the flush_list.mutex. it is waiting for the buffer pool flush_list.mutex. Who got the log_sys.mutex?
pthread_cond_wait,os_event_wait_low,sync_array_wait_event,bool,,mtr_commit,trx_sys_flush_max_trx_id,trx_start_low,row_search_for_mysql,ha_innobase::index_read,handler::ha_index_read_map,handler::read_range_first,handler::multi_range_read_next,QUICK_RANGE_SELECT::get_next,rr_quick,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,do_handle_one_connection,handle_one_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

4. It is thread 81 that got the flush_list.mutex. When page cleaner thread need to flush the page, it hold the relate flush_list.mutex. However, it also need to promise that the redo log must flush to disk before it can flush the relate dirty page. Then in this case, the redo log havn't flush to disk, then it need to wait for it. so this hold the flush_list.mutex and wait for the log_sys.mutex

Then Dead lock happend, thread 74 hold the log_sys.mutex and wait for flush_list.mutex and thread 81 hold the flush_list.mutex and wait for log_sys.mutex

How to repeat:
test under heavily write and delete workload

Suggested fix:
Avoid call buf_pool_get_oldest_modification when mtr commit in log_close() function..
I find in MySQL 8.0, it won't call this function.

log checkpoint can decide whether to do the checkpoint itself..

the origin pstack

Attachment: pstack.txt (text/plain), 150.16 KiB.

pt-pmp pstack

Attachment: pt-pstack.txt (text/plain), 10.55 KiB.

Hi Mr. zongzhi,

Thank you for your bug report.

I have analysed the stacktraces carefully and it seems to me that you are correct. However, I am not sure whether this bug can be fixed in versions that are older then 8.0.

Can you please let me know whether you succeeded in observing this deadlock in 8.0, at all ???

Many thanks in advance.

No, This Deadlock won't happen in MySQL 8.0. Since mtr_commit() won't call buf_pool_get_oldest_modification().

I thought we can move the logic ouside of scope of log_sys.mutex to avoid this deadlock in mysql 5.6/5.7

This is the code that I suggest

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
index bcfba19d5b9..23110d646c7 100644
--- a/storage/innobase/mtr/mtr0mtr.cc
+++ b/storage/innobase/mtr/mtr0mtr.cc
@@ -963,11 +963,26 @@ mtr_t::Command::execute()
                log_flush_order_mutex_enter();
        }

+       lsn_t           lsn;
+       lsn = log->lsn;
+  lsn_t max_modified_age_sync = log->max_modified_age_sync;
+  lsn_t max_checkpoint_age_async = log->max_checkpoint_age_async;
        /* It is now safe to release the log mutex because the
        flush_order mutex will ensure that we are the first one
        to insert into the flush list. */
        log_mutex_exit();

+  // It's time to check whether need to make checkpoint of flush
+       oldest_lsn = buf_pool_get_oldest_modification();
+
+       if (!oldest_lsn
+           || lsn - oldest_lsn > max_modified_age_sync;
+           || checkpoint_age > max_checkpoint_age_async) {
+
+    // need to change check_flush_or_checkpoint to atomic
+               log->check_flush_or_checkpoint = true;
+       }
+
        m_impl->m_mtr->m_commit_lsn = m_end_lsn;

        release_blocks();

Hi Mr. zongzhi,

Our Development department has taken a look at the code and concluded that this is not a problem with versions older than 8.0, since that parallelism was not present in the previous versions.

Not a bug.

can you ask Development department check the analysis I published in this issue..
There is a deadlock in our case indeed..

Hi Mr. zongzhi,

Our Development department has analysed the entire report and did not find a problem with the locks that you are writing about.

Hello, I still confuse about the stack..
Can your Development department explain why there is a deadlock in the stack that

Then Dead lock happened, thread 74 hold the log_sys.mutex and wait for flush_list.mutex and thread 81 hold the flush_list.mutex and wait for log_sys.mutex

This is really a dead lock..

Hi,

There is no parallelism in that code in the version 5.6 and 5.7.

Hence, what we would like to see is a fully repetitive test case that leads to the deadlock that you describe ...

It's my mistake..
The user use Percona version mysql, it's version number is 5.6.27-75.0
in upstream version, flushing the page won't hold the flush list mutex.

Hi,

Thank you for your feedback.

Since it is not repeated with our server, we are setting this status to the correct value.