Bug #120698 Contribution by Tencent: Compressed pages' records may out of order when using change buffer
Submitted: 16 Jun 3:17 Modified: 16 Jun 9:02
Reporter: Adria Lee Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:8.0.44 OS:Any
Assigned to: CPU Architecture:Any

[16 Jun 3:17] Adria Lee
Description:
During our usage, we observed a phenomenon where the last record of the prev 
page is bigger than the first record of the next page.

DB_AdIndex_97/Tbl_AdIndex_0 is a compressed table:

```
2026-06-15T06:47:25.330933+08:00 0 [ERROR] InnoDB: btr_check_sibling_boundary: 
last record on left page >= first record on right page! index 
`FUId_FCreativeTemplateId` table DB_AdIndex_97/Tbl_AdIndex_0 left_page_no 14802482 
right_page_no 14802487
2026-06-15T06:47:25.330960+08:00 0 [ERROR] InnoDB: left page last record:
PHYSICAL RECORD: n_fields 6; compact format; info bits 0 
 0: len 8; hex 0000000004fb21f9; asc       ! ;;
 1: len 4; hex 000002d1; asc     ;;     
 2: len 8; hex 80000018e4c63d8d; asc       = ;;
 3: len 8; hex 800000191edf46c3; asc       F ;;
 4: len 8; hex 800000191edfc5b0; asc         ;;
 5: len 8; hex 800000191edfc15e; asc        ^;;
2026-06-15T06:47:25.331085+08:00 0 [Note] InnoDB: n_owned: 0; heap_no: 166; next rec: 112
2026-06-15T06:47:25.331089+08:00 0 [ERROR] InnoDB: right page first record:
PHYSICAL RECORD: n_fields 6; compact format; info bits 0 
 0: len 8; hex 0000000004fb21f9; asc       ! ;;
 1: len 4; hex 000002d1; asc     ;;     
 2: len 8; hex 80000018e4c63d8d; asc       = ;;
 3: len 8; hex 8000000000000000; asc         ;;
 4: len 8; hex 8000000000000000; asc         ;;
 5: len 8; hex 800000191edfcfb8; asc         ;;
2026-06-15T06:47:25.331207+08:00 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 174
```
Note: We added this check to verify the order of records between adjacent pages.

In the `zip_page_handler` function, we made an assumption that if 
`access_time != 0`, the change buffer merge operation could be skipped. This was
 based on the belief that the conditions `access_time != 0` and 
`IBUF_BITMAP_BUFFERED != 0` would not coexist; however, in reality, such a 
scenario is possible.

When an LRU eviction process discards an uncompressed frame, the page briefly 
disappears from `page_hash`. During this window, another thread utilizes change
 buffering to write an `ibuf` entry for the page. Subsequently, the "compressed-
only" descriptor is re-linked with a non-zero `access_time`; a later decompression
 attempt skips the `ibuf` merge because `access_time != 0`, thereby delaying the
 application of the pending `ibuf` entry until after the page boundary has 
shifted, ultimately violating the pages' records order.

So, the chronological order of events is:

  1. Thread B (eviction) enters buf_LRU_block_remove_hashed holding LRU_list_mutex +
     hash_lock.
  2. Thread B removes the page via HASH_DELETE.
  3. Thread B calls rw_lock_x_unlock(hash_lock) ← window opens 
     (still holding LRU_list_mutex).
  4. Thread A (DML) at this point goes through btr0cur and calls buf_page_get_gen
     (BUF_GET_IF_IN_POOL); holding only hash_lock, it finds nothing.
  5. Thread A enters ibuf_insert → buf_page_get_also_watch: holding only hash_lock, 
     finds nothing → passes.
  6. Thread A enters ibuf_insert_low → buf_page_peek: holding only hash_lock, 
     finds nothing → passes.
  7. Thread A sets BUFFERED=1 and writes the entry into the ibuf.
  8. Thread B re-acquires hash_lock at buf_LRU_free_page and, HASH_INSERTs b — 
     the compressed descriptor carrying the stale access_time — back into the page
     hash.

 Afterwards, the page may undergo a split, after which the page_id that the ibuf
 entry was buffered against no longer holds.

The underlying defect is that access_time == 0 is overloaded to mean "this 
incarnation still needs an ibuf merge," but access_time is freely overwritten by
 LRU/read-ahead/zip-access bookkeeping and inherited across uncompressed-frame 
eviction, so it is not a reliable gate for the change-buffer merge.

How to repeat:
It's hard to repeat, better to analyze the code.

Suggested fix:
1. set access_time = 0 when free compressed page
2. call ibuf_merge_or_delete_for_page when IBUF_BITMAP_BUFFERED = 1, rather than rely on access_time.
[16 Jun 14:22] Jean-François Gagné
Wow, great finding !  I am curious about 9.7 being affected by this.

The reason I am curious is because Oracle disabled the InnoDB Change Buffer in 8.4.0 and re-enabled it in 9.5.0.  The justification given for this is obscure, and there are reasons to believe that it was disabled in 8.4.0 because it was unsafe.  After issues were fixed in 9.3, it was then safe again to re-enable it in 9.5.

I am eluding to this in the blog post linked below, relevant quote also below.

https://jfg-mysql.blogspot.com/2026/02/more-than-flushing-also-caching-for-innodb-flush-me...

> Change Buffering / innodb_change_buffering, enabled by default in 8.0, disabled in the new defaults of 8.4.0 (doc / release notes), is re-enabled in 9.5.0 [...].  A little weird, don't your think ?  [...]  Searching the GitHub repository for WL #16967 [...¸] shed some light on this turnaround [...]