Bug #105592 Lob's purge may suffer from nested mtr, which cause FSEG NOT FULL list broken.
Submitted: 16 Nov 2021 11:39 Modified: 10 May 2022 14:32
Reporter: biao li Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:MySQL 8.0.27, 8.0.28 OS:Linux
Assigned to: CPU Architecture:x86
Tags: lob, purge, regression, rollback

[16 Nov 2021 11:39] biao li
Description:
When we rollback a 'TRX_UNDO_UPD_DEL_REC' record contained more than than two lob fields, we may get an confict mtr:

[mtr_t: this=0x7fc2f147c630, [mtr_memo_slot_t: object=0x7fbe3c5bde00, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=626]]
[mtr_memo_slot_t: object=0x7fc2bf52fa70, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=2]]
[mtr_memo_slot_t: object=0x7fc2bf52f500, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=0]]
[mtr_memo_slot_t: object=0x7fc2bf52f500, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=0]]
[mtr_memo_slot_t: object=0x7fc2bf52fa70, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=2]]
[mtr_memo_slot_t: object=0x7fb924271fc8, type=128 (MTR_MEMO_X_LOCK), page_id=[page id: space=4294967295, page number=4294967295]]
[mtr_memo_slot_t: object=0x7fc2bf530298, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=4]]
[mtr_memo_slot_t: object=0x7fbe3c5be0b8, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=627]]
[mtr_memo_slot_t: object=0x7fbe3c5bde00, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=626]]
[mtr_memo_slot_t: object=0x7fbe3c5bde00, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=626]]
]
[mtr_t: this=0x7fc2f147db00, [mtr_memo_slot_t: object=0x7fbe3c5bde00, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=626]]
[mtr_memo_slot_t: object=0x7fc2bf530298, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=4]]
[mtr_memo_slot_t: object=0x7fc2bf52f500, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=0]]
[mtr_memo_slot_t: object=0x7fc2bf52fa70, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=2]]
[mtr_memo_slot_t: object=0x7fc2bf52f500, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=0]]
[mtr_memo_slot_t: object=0x7fc2bf52fa70, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=2]]
[mtr_memo_slot_t: object=0x7fb924271fc8, type=128 (MTR_MEMO_X_LOCK), page_id=[page id: space=4294967295, page number=4294967295]]
[mtr_memo_slot_t: object=0x7fc2bf530298, type=4 (MTR_MEMO_PAGE_SX_FIX), page_id=[page id: space=2, page number=4]]
[mtr_memo_slot_t: object=0x7fc2a64df338, type=32 (MTR_MEMO_MODIFY), page_id=[page id: space=2, page number=127]]
[mtr_memo_slot_t: object=0x7fc2a64df338, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=127]]
[mtr_memo_slot_t: object=0x7fc2f6e5d1a0, type=64 (MTR_MEMO_S_LOCK), page_id=[page id: space=4294967295, page number=4294967295]]
[mtr_memo_slot_t: object=0x7fc2bf530298, type=2 (MTR_MEMO_PAGE_X_FIX), page_id=[page id: space=2, page number=4]]
[mtr_memo_slot_t: object=0x7fc2bf530298, type=8 (MTR_MEMO_BUF_FIX), page_id=[page id: space=2, page number=4]]
[mtr_memo_slot_t: object=0x7fb924271c58, type=256 (MTR_MEMO_SX_LOCK), page_id=[page id: space=4294967295, page number=4294967295]]
]

2021-11-15T15:55:37.065291+08:00 264 [ERROR] [MY-013183] [InnoDB] Assertion failure: lob0first.cc:306:!local_mtr.conflicts_with(m_mtr) thread 140475248424704
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
07:55:37 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7fb91c000da0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc2f1482b70 thread_stack 0x100000
/home/lb/mysql-server/install/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x47cf781]
/home/lb/mysql-server/install/bin/mysqld(handle_fatal_signal+0x279) [0x34981c6]
/lib64/libpthread.so.0(+0xf630) [0x7fc2fe679630]
/lib64/libc.so.6(gsignal+0x37) [0x7fc2fcbc3387]
/lib64/libc.so.6(abort+0x148) [0x7fc2fcbc4a78]
/home/lb/mysql-server/install/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1b9) [0x4c1bc4d]
/home/lb/mysql-server/install/bin/mysqld(lob::first_page_t::free_all_data_pages()+0x225) [0x4eb39bd]
/home/lb/mysql-server/install/bin/mysqld(lob::first_page_t::make_empty()+0x18) [0x4eb423c]
/home/lb/mysql-server/install/bin/mysqld(lob::first_page_t::destroy()+0x18) [0x4eb4214]
/home/lb/mysql-server/install/bin/mysqld(lob::purge(lob::DeleteContext*, dict_index_t*, unsigned long, unsigned long, unsigned long, upd_field_t const*, purge_node_t*)+0x54f) [0x4ebe84e]
/home/lb/mysql-server/install/bin/mysqld(lob::BtrContext::free_externally_stored_fields(unsigned long, unsigned long, bool, unsigned long, purge_node_t*)+0x282) [0x49c70d0]
/home/lb/mysql-server/install/bin/mysqld(btr_cur_pessimistic_delete(dberr_t*, unsigned long, btr_cur_t*, unsigned int, bool, unsigned long, unsigned long, unsigned long, mtr_t*, btr_pcur_t*, purge_node_t*)+0x391) [0x4c993a3]
/home/lb/mysql-server/install/bin/mysqld() [0x4ee3024]
/home/lb/mysql-server/install/bin/mysqld() [0x4ee3671]
/home/lb/mysql-server/install/bin/mysqld(row_undo_mod(undo_node_t*, que_thr_t*)+0x31d) [0x4ee601a]
/home/lb/mysql-server/install/bin/mysqld() [0x4b58869]
/home/lb/mysql-server/install/bin/mysqld(row_undo_step(que_thr_t*)+0xbb) [0x4b58aa8]
/home/lb/mysql-server/install/bin/mysqld() [0x4ab7c66]
/home/lb/mysql-server/install/bin/mysqld() [0x4ab7e31]
/home/lb/mysql-server/install/bin/mysqld(que_run_threads(que_thr_t*)+0x94) [0x4ab8084]
/home/lb/mysql-server/install/bin/mysqld() [0x4beb1ee]
/home/lb/mysql-server/install/bin/mysqld() [0x4beb392]
/home/lb/mysql-server/install/bin/mysqld() [0x4beb74d]
/home/lb/mysql-server/install/bin/mysqld(trx_rollback_for_mysql(trx_t*)+0x51) [0x4bebaf0]
/home/lb/mysql-server/install/bin/mysqld() [0x48f8886]
/home/lb/mysql-server/install/bin/mysqld(ha_rollback_low(THD*, bool)+0x168) [0x362738c]
/home/lb/mysql-server/install/bin/mysqld(MYSQL_BIN_LOG::rollback(THD*, bool)+0x26a) [0x4382572]
/home/lb/mysql-server/install/bin/mysqld(ha_rollback_trans(THD*, bool)+0x13d) [0x3627653]
/home/lb/mysql-server/install/bin/mysqld(trans_rollback(THD*)+0xb4) [0x3440c3d]
/home/lb/mysql-server/install/bin/mysqld(mysql_execute_command(THD*, bool)+0x400d) [0x32b0d8c]
/home/lb/mysql-server/install/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x742) [0x32b3d74]
/home/lb/mysql-server/install/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x14a3) [0x32aa4cf]
/home/lb/mysql-server/install/bin/mysqld(do_command(THD*)+0x47e) [0x32a879c]
/home/lb/mysql-server/install/bin/mysqld() [0x3483d10]
/home/lb/mysql-server/install/bin/mysqld() [0x5017161]
/lib64/libpthread.so.0(+0x7ea5) [0x7fc2fe671ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fc2fcc8bb0d]

How to repeat:
Run the following testcase:

--source include/have_debug.inc
--source include/count_sessions.inc

CREATE TABLE `t1` (
  `a` int NOT NULL,
  `b` text,
  `c` text,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB;

# Prepare Data.                                                                                                                                                                                                                   
--disable_query_log
let $i= 0;
while ($i < 125)
{
  if ($i == 23) 
  {
    eval INSERT INTO `t1` VALUES($i, repeat("a",32768),repeat("b",32768));
  }

  if ($i != 23) 
  {
    eval INSERT INTO `t1` VALUES($i, repeat("a",49152),repeat("b",49152));
  }
  inc $i; 
}
--enable_query_log

connect(conn1,localhost,root,,);
connection default;
# disable purge thread to construct `update on delete mark record`
SET global innodb_purge_stop_now=ON;
DELETE FROM t1 WHERE a = 24; 

connection conn1;
BEGIN;
INSERT INTO `t1` VALUES(24, repeat("c",49152),repeat("d",49152));

# enable purge thread to update purge's view.
SET global innodb_purge_run_now=ON;
SELECT sleep(10);

# rollback for undo `TRX_UNDO_UPD_DEL_REC`
ROLLBACK;

connection default;
disconnect conn1;
--source include/count_sessions.inc 

Suggested fix:
Use local mtr in first page instead of the btr_mtr, especially the function lob::first_page_t::dealloc() to free the page.

Why:
  When a page is freed, once the full extent to which the page belongs becomes not full, and the extent will be added to the `FSEG_NOT_FULL` list.
	
  Suppose there following scenario, let's say there are two mtrs: mtr1, mtr2:

    mtr1 start
      free page A and add A's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
        X(next: A, pre:null) -> A(next: null, pre: X)
    
      mtr2 start
        free page B and add B's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
          X(next:A, prev:null) -> A(next:B,prev:X) -> B(next:null,prev:A)
      mtr2 commit
	  
    mtr1 commit.
  
  mtr2 commit before mtr1, but mtr2's operation relies mtr1's result. Once server crashs after mtr1 commits and restart the server, MySQL will firstly replay mtr2's redo log and then replay mtr1's redo log. Finnay the `FSEG_NOT_FULL` list will be:
  
    X(next:A, prev:null) -> A(next:null,prev:X) -> B(next:null,prev:A).
  
  Node A's next node is null, but Node B's previous node is A, the double linked list is broken.

  When user rolls back a 'TRX_UNDO_UPD_DEL_REC' record contained more than than two lob fields, it will invoke `lob::BtrContext::free_externally_stored_fields`(Note: MySQL pass `rollback = false` and `purge_node = nullptr` to it). In the function, if there are two lob cols, the mtrs we used is as following:
   col 1:
      lob::purge() -> first_page_t::destroy()
         lob::first_page_t::free_all_data_pages()     // local_mtr1
         lob::first_page_t::free_all_index_pages()    // local_mtr2
         lob::first_page_t::dealloc();                // btr_mtr
		  
   col 2:
      lob::purge() -> first_page_t::destroy()
         lob::first_page_t::free_all_data_pages()     // local_mtr3
         lob::first_page_t::free_all_index_pages()    // local_mtr4
         lob::first_page_t::dealloc();                // btr_mtr

  btr_mtr is similar to the previous mtr1, `local_mtr3` and `local_mtr4` is similar to the previous mtr2, the nested mtr may result in an invalid `FSEG_NOT_FULL` list.
	
  However, in MySQL 8.0.27, commit 415c4aec03292f097f4ba9817be69828f93abe8a add some asserts, so we will get an assert crash.
[16 Nov 2021 11:39] biao li
Before 8.0.26, purge also suffers from this nested mtr problem, however, the commit 5ba03cadf63b6c53a780ffd887d903eae6aac4ee(in 8.0.26) happens to fix the issue for purge, however, for the rollback case, there is still the problem.

key code:
  if (rec_type == TRX_UNDO_DEL_MARK_REC || ok_to_free) {
    ut_ad(btr_first.get_page_type() == FIL_PAGE_TYPE_LOB_FIRST);

    if (dict_index_is_online_ddl(index)) {
      row_log_table_blob_free(index, ref.page_no());
    }
    /* use the btr_mtr to free lob's first page, if we continue to free another lob field, nested mtr occurs.*/
    if (purge_node == nullptr) {
      btr_first.destroy();
    } else {
      /* In this case, the LOB is left with only the first page.  Subsequently
      the LOB first page number in the LOB reference is set to FIL_NULL.
      This means that the LOB page is only accessible via an in-memory
      reference held in the purge node. If a crash happens after the btr_mtr
      commit and before freeing the LOB first page, then the LOB first page
      will be leaked. We need to come up with a mechanism to avoid this leak.*/
      btr_first.make_empty();
      purge_node->add_lob_page(index, page_id);
    }

    ref.set_page_no(FIL_NULL, mtr);
    ref.set_length(0, mtr);

    return;
  }
[16 Nov 2021 12:34] MySQL Verification Team
Hello biao li,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[21 Jan 2022 3:52] Rahul Sisondia
Since there are two columns involved, I think the analysis added by biao is simplified version of the following. Correct me if I am wrong. 

mtr1 start
    
      mtr2 start
        free page A and add A's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
          X(next: A, pre:null) -> A(next: null, pre: X) 
      mtr2 commit
      
      free page B and add B's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
        X(next:A, prev:null) -> A(next:B,prev:X) -> B(next:null,prev:A)
            
      mtr3 start
        free page C and add C's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
          X(next:A, prev:null) -> A(next:B,prev:X) -> B(next:null,prev:A) -> C(next:null,prev:B) 
      mtr3 commit

     free page D and add D's extent to `FSEG_NOT_FULL`, and the FSEG_NOT_FULL list is:
        X(next:A, prev:null) -> A(next:B,prev:X) -> B(next:null,prev:A) -> C(next:null,prev:B)->D(... 
           	  
mtr1 commit.
    >>>>Crashed
    
replay mtr2
  X(next: A, pre:null) -> A(next: null, pre: X) 
replay mtr3          
  X(next:A, prev:null) -> A(next:null, prev:X) -> C(next:null,prev:B)    <---Broken

mtr1 : btree mtr
mtr2 : local mtr for column `b` in the table t1 
mtr3 : local mtr for column `c` in the table t2
[21 Jan 2022 5:27] Rahul Sisondia
I also think this might be related to https://bugs.mysql.com/bug.php?id=105176
[21 Jan 2022 7:05] Rahul Sisondia
Just to complete the previous comment. And what would happen when the mtr1 will be replayed. 

replay mtr2
  X(next: A, pre:null) -> A(next: null, pre: X) 
replay mtr3          
  X(next:A, prev:null) -> A(next:null, prev:X) -> C(next:null,prev:B)    <---Broken        
replay mtr1
   X(next:A, prev:null) ->A(next:B,prev:X) -> B(next:null,prev:A) -> C(next:null,prev:B)   < --- Broken
   <not shown for D>
[25 Jan 2022 3:09] Rahul Sisondia
Proposed patch :

diff --git a/storage/innobase/lob/lob0purge.cc b/storage/innobase/lob/lob0purge.cc
index 584fe362a18..8456eaf3438 100644
--- a/storage/innobase/lob/lob0purge.cc
+++ b/storage/innobase/lob/lob0purge.cc
@@ -509,7 +509,7 @@ void purge(DeleteContext *ctx, dict_index_t *index, trx_id_t trxid,
   do not make conflicting modifications. */
   ut_ad(!lob_mtr.conflicts_with(mtr));
   mtr_commit(&lob_mtr);
-  first.set_mtr(ctx->get_mtr());
+  mtr_start(&lob_mtr);
   first.load_x(page_id, page_size);

   bool ok_to_free = (rec_type == TRX_UNDO_UPD_EXIST_REC ||
@@ -525,9 +525,12 @@ void purge(DeleteContext *ctx, dict_index_t *index, trx_id_t trxid,
     }
     first.destroy();
   }
-
-  ref.set_page_no(FIL_NULL, mtr);
-  ref.set_length(0, mtr);
+  page_id = ctx->block()->get_page_id();
+  ut_a(buf_page_get(page_id, page_size, RW_X_LATCH, &lob_mtr) == ctx->block());
+  ref.set_page_no(FIL_NULL, &lob_mtr);
+  ref.set_length(0, &lob_mtr);
+  ut_ad(!lob_mtr.conflicts_with(mtr));
+  mtr_commit(&lob_mtr);
 }

 } /* namespace lob */
[18 Feb 2022 6:52] MySQL Verification Team
8.0.28 mtr test results

Attachment: 105592_8.0.28_debug.results (application/octet-stream, text), 100.87 KiB.

[22 Feb 2022 2:37] Annamalai Gurusami
Posted by developer:
 
Yes, I am able to reproduce the problem.  I'll analysis the problem and will report back.
[10 May 2022 14:32] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.30 release:

Purging a record with multiple binary large object values raised an
insertion failure due to a mini-transaction (mtr) conflict.