Bug #106604 Contribution by Tencent Cloud-Native Database team: Purge suspend forever
Submitted: 1 Mar 2022 3:43 Modified: 1 Mar 2022 8:01
Reporter: yewei Xu (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:5.7.37 OS:Linux
Assigned to: CPU Architecture:Any
Tags: Contribution, purge

[1 Mar 2022 3:43] yewei Xu
Description:
We found in some situation purge will suspend forever until new update arrive.

srv_purge_coordinator_suspend:

                if (ret == OS_SYNC_TIME_EXCEEDED) {

                        /* No new records added since wait started then simply
                        wait for new records. The magic number 5000 is an
                        approximation for the case where we have cached UNDO
                        log records which prevent truncate of the UNDO
                        segments. */

                        if (rseg_history_len == trx_sys->rseg_history_len
                            && trx_sys->rseg_history_len < 5000) {

                                stop = true;
                        }
                }

reading comments above we known cached UNDO won't remove from rseg history list and 5000 is a approximation of cached UNDO in rseg history list, 

this is true before MySQL 5.7.17, but Bug #24450908 UNDO LOG EXISTS AFTER SLOW SHUTDOWN removed cached UNDO from rseg history list in MySQL 5.7.17, so this code and comment is out of date.

If we entered this code and stop assigend to true, purge thread will suspend forever until new update arrive.

How to repeat:
run mtr bugfix_purge_suspend_forever.test in repeat.patch

Suggested fix:
only set stop to true when trx_sys→rseg_history_len==0

--- a/storage/innobase/srv/srv0srv.cc                                                                                                                                                                                             
+++ b/storage/innobase/srv/srv0srv.cc                                                                                                                                                                                             
@@ -2733,17 +2733,11 @@ srv_purge_coordinator_suspend(                                                                                                                                                                            
                rw_lock_x_unlock(&purge_sys->latch);                                                                                                                                                                              
                                                                                                                                                                                                                                  
                if (ret == OS_SYNC_TIME_EXCEEDED) {                                                                                                                                                                               
-                                                                                                                                                                                                                                 
-                       /* No new records added since wait started then simply                                                                                                                                                    
-                       wait for new records. The magic number 5000 is an                                                                                                                                                         
-                       approximation for the case where we have cached UNDO                                                                                                                                                      
-                       log records which prevent truncate of the UNDO                                                                                                                                                            
-                       segments. */                                                                                                                                                                                              
-                                                                                                                                                                                                                                 
-                       if (rseg_history_len == trx_sys->rseg_history_len                                                                                                                                                         
-                           && trx_sys->rseg_history_len < 5000) {                                                                                                                                                                
-                                                                                                                                                                                                                                 
+                       if (trx_sys->rseg_history_len == 0) {                                                                                                                                                                     
                                stop = true;                                                                                                                                                                                      
+                       } else {                                                                                                                                                                                                  
+                               os_event_wait_time_low(                                                                                                                                                                           
+                               slot->event, 100 * SRV_PURGE_MAX_TIMEOUT, sig_count);                                                                                                                                             
                        }                                                                                                                                                                                                         
                }
[1 Mar 2022 3:44] yewei Xu
repeat.patch

Attachment: 0001-repeat-purge-suspend-forever.patch (application/octet-stream, text), 4.02 KiB.

[1 Mar 2022 3:44] yewei Xu
fix.patch

Attachment: fix.patch (application/octet-stream, text), 848 bytes.

[1 Mar 2022 8:01] MySQL Verification Team
Hello  Yewei Xu,

Thank you for the report and contribution.

regards,
Umesh