Bug #69780 Fix for bug 14606334 in 5.6.11 breaks backward compatibility for InnoDB recovery
Submitted: 18 Jul 2013 14:03 Modified: 9 Oct 2015 8:58
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.12 OS:Any
Assigned to: CPU Architecture:Any

[18 Jul 2013 14:03] Alexey Kopytov
Description:
The following fix in 5.6.11 / 5.7.1 introduced a change that can make
InnoDB crash on recovery when replaying redo logs created on earlier
versions:

"InnoDB: The server could exit during an attempt by InnoDB to reorganize
or compress a compressed secondary index page. (Bug #14606334)"

With that fix InnoDB resets an index page after deleting the last record
on the page. This is from page_cur_delete_rec():

	if (page_get_n_recs(page) == 1) {
                ...                                  
		page_create_empty(page_cur_get_block(cursor),
				  const_cast<dict_index_t*>(index), mtr);
		return;
	}

In particular, page_create_empty() resets the records heap.

Now, since InnoDB doesn't do that in server versions < 5.6.11, it may
later insert records into free slots on the heap (and store the
corresponding offsets in MLOG_COMP_REC_INSERT records in redo log). But
if later versions replay MLOG_COMP_REC_DELETE on recovery, and have to
delete the last record on a page, the records heap will be reset. Which
leads to a crash if a MLOG_COMP_REC_INSERT record has to be replayed later
with an offset which is beyond the new heap bounds.

Naturally, this change also makes XtraBackup/MEB incompatible with
previous server versions.

How to repeat:
The testcase can likely be reduced, but the following procedure is simple enough:

1. Download the Sakila DB from http://dev.mysql.com/doc/index-other.html
2. Use the following script with MySQL < 5.6.11 (I tested on 5.6.10):

---
#!/bin/sh

# Download the sakila db and unpack it to /tmp

set -e

mysql -uroot -e "DROP DATABASE IF EXISTS sakila"

mysql -uroot < /tmp/sakila-db/sakila-schema.sql

mysql -uroot < /tmp/sakila-db/sakila-data.sql

for t in actor address category city country customer film film_actor film_category film_text inventory language payment rental staff store; do
    mysql -uroot sakila <<EOF
SET foreign_key_checks=0;
DELETE FROM $t;
EOF
done

killall -9 mysqld
---

3. After the script kills mysqld, upgrade to 5.6.11/5.6.12

4. InnoDB will crash on recovery. Release builds crash with a segfault, debug ones crash with the following assertion failure:

---
2013-07-18 17:25:30 130ffc000  InnoDB: Assertion failure in thread 5117034496 in file page0page.ic line 657
InnoDB: Failing assertion: page_offset(rec) <= page_header_get_field(page, PAGE_HEAP_TOP)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
---
[18 Jul 2013 14:12] Alexey Kopytov
Debug log:

...
# The following deletes the last record on the page
record type = 42 PAGE_HEAP_TOP=15408
# Page heap is reset in 5.6.11+
record type = 8 PAGE_HEAP_TOP=120 PAGE_N_RECS=0
record type = 38 PAGE_HEAP_TOP=120 PAGE_N_RECS=0 offset=99
record type = 38 PAGE_HEAP_TOP=128 PAGE_N_RECS=1 offset=125
record type = 38 PAGE_HEAP_TOP=136 PAGE_N_RECS=2 offset=133
record type = 38 PAGE_HEAP_TOP=144 PAGE_N_RECS=3 offset=141
# The following applies just fine with MySQL < 5.6.11, because the heap
# is not reset, but crashes on 5.6.11+
record type = 38 PAGE_HEAP_TOP=152 PAGE_N_RECS=4 offset=8093
[29 Jul 2013 11:49] Shane Bester
looks like my internally filed:
Bug 16996584 - 5.6.11+ REGRESSION: MULTIPLE CRASHES DURING CRASH RECOVERY
[29 Jul 2013 11:57] Umesh Shastry
Hello Alexey,

Thank you for the bug report. 
As Shane pointed out, this is duplicate of internally reported Bug 16996584 - 5.6.11+ REGRESSION: MULTIPLE CRASHES DURING CRASH RECOVERY

Thanks,
Umesh
[5 Aug 2013 9:54] Marko Mäkelä
I am sorry for introducing this regression. The new function page_create_empty() should not have been invoked when applying redo log entries for deleting records, because this could bring the PAGE_FREE list out of sync with subsequent redo log entries for the page.

We should improve our cross-version regression testing. This is somewhat of a corner case, requiring a suitable combination of INSERT, UPDATE and DELETE operations and giving the purge an opportunity to run.
[9 Oct 2015 8:58] Umesh Shastry
Noted in 5.6.14, 5.7.2 changelogs. 

"A regression introduced in the fix for Bug #14606334 would cause crashes on 
startup during crash recovery."
[15 Jul 2016 9:53] zhai weixiang
Hi,
I just check the code and find that page_create_empty may still be invoked in function page_delete_rec_list_start during crash recovery process.  All other places that invoke page_create_empty have checked if recv_recovery_is_on()

Do i miss something ? Thank you for any comment :)