MySQL Bugs: #106604: Contribution by Tencent Cloud-Native Database team: Purge suspend forever

Bug #106604	Contribution by Tencent Cloud-Native Database team: Purge suspend forever
Submitted:	1 Mar 2022 3:43	Modified:	1 Mar 2022 8:01
Reporter:	yewei Xu (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	5.7.37	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	Contribution, purge

Description:
We found in some situation purge will suspend forever until new update arrive.

srv_purge_coordinator_suspend:

                if (ret == OS_SYNC_TIME_EXCEEDED) {

                        /* No new records added since wait started then simply
                        wait for new records. The magic number 5000 is an
                        approximation for the case where we have cached UNDO
                        log records which prevent truncate of the UNDO
                        segments. */

                        if (rseg_history_len == trx_sys->rseg_history_len
                            && trx_sys->rseg_history_len < 5000) {

                                stop = true;
                        }
                }

reading comments above we known cached UNDO won't remove from rseg history list and 5000 is a approximation of cached UNDO in rseg history list, 

this is true before MySQL 5.7.17, but Bug #24450908 UNDO LOG EXISTS AFTER SLOW SHUTDOWN removed cached UNDO from rseg history list in MySQL 5.7.17, so this code and comment is out of date.

If we entered this code and stop assigend to true, purge thread will suspend forever until new update arrive.

How to repeat:
run mtr bugfix_purge_suspend_forever.test in repeat.patch

Suggested fix:
only set stop to true when trx_sys→rseg_history_len==0

--- a/storage/innobase/srv/srv0srv.cc                                                                                                                                                                                             
+++ b/storage/innobase/srv/srv0srv.cc                                                                                                                                                                                             
@@ -2733,17 +2733,11 @@ srv_purge_coordinator_suspend(                                                                                                                                                                            
                rw_lock_x_unlock(&purge_sys->latch);                                                                                                                                                                              
                                                                                                                                                                                                                                  
                if (ret == OS_SYNC_TIME_EXCEEDED) {                                                                                                                                                                               
-                                                                                                                                                                                                                                 
-                       /* No new records added since wait started then simply                                                                                                                                                    
-                       wait for new records. The magic number 5000 is an                                                                                                                                                         
-                       approximation for the case where we have cached UNDO                                                                                                                                                      
-                       log records which prevent truncate of the UNDO                                                                                                                                                            
-                       segments. */                                                                                                                                                                                              
-                                                                                                                                                                                                                                 
-                       if (rseg_history_len == trx_sys->rseg_history_len                                                                                                                                                         
-                           && trx_sys->rseg_history_len < 5000) {                                                                                                                                                                
-                                                                                                                                                                                                                                 
+                       if (trx_sys->rseg_history_len == 0) {                                                                                                                                                                     
                                stop = true;                                                                                                                                                                                      
+                       } else {                                                                                                                                                                                                  
+                               os_event_wait_time_low(                                                                                                                                                                           
+                               slot->event, 100 * SRV_PURGE_MAX_TIMEOUT, sig_count);                                                                                                                                             
                        }                                                                                                                                                                                                         
                }

repeat.patch

Attachment: 0001-repeat-purge-suspend-forever.patch (application/octet-stream, text), 4.02 KiB.

fix.patch

Attachment: fix.patch (application/octet-stream, text), 848 bytes.

Hello  Yewei Xu,

Thank you for the report and contribution.

regards,
Umesh