MySQL Bugs: #79674: InnoDB table space corruption prevents MySQL server from starting

Bug #79674	InnoDB table space corruption prevents MySQL server from starting
Submitted:	16 Dec 2015 16:08	Modified:	17 Jul 2018 7:36
Reporter:	Dennis Buteyn	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.5.46	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	corrupt, crash, innodb

Description:
During endurance tests on one of our products we noticed highly reproducible fatal failures of MySQL. The entire table space becomes unrecoverable, resulting in complete data loss. The issue exists cross-platform (ARM and x64) and cross-distro (Debian and Ubuntu). The same issue also appears in MariaDB. I will provide the MariaDB logs as they are much more informative.

All recovery attempts suggested at http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html failed. Setting innodb_force_recovery progressively to 1-6 did not allow recovery.

I constructed some diagnostic utilities from MySQL source which allowed me to verify that there is no corruption in ibdata0, ib_logfile0 nor ib_logfile1. This was done by recalculating and verifying checksums of pages and blocks respectively.

For the page in question however (page 1225) I found a discrepancy. My utility correctly calculates and verifies the old and new checksum as 0xf9d68471 and 0x8419b0e8. MySQL however reports that it should have been 0x04fd4e17 and 0xe2eb5049. It also incorrectly reports the old checksum on disk as 0x00000000.

I also found the LSN reported by MySQL to be incorrect, on disk I found 969649444 but MySQL claims 995272885.

How to repeat:
The problem can be reproduced with 50-80% reliability by performing insertions on a large table with index or by importing a large MySQL dump (with indexes), large being +100k rows. Power should then be interrupted to the system. To my understanding this corrupts the index that was being updated by rows being inserted.

Suggested fix:
Uncertain, it appears to be some logic issue in page0page.c.

MySQL error log

Attachment: error.log (text/x-log), 52.44 KiB.

Diagnostic utility output of the affected page

Attachment: page-1225.log (text/x-log), 286 bytes.

I've finished analysing the ibdata1 file and could not find any problems with it. All records on the specified page are in order, the record directory is also proper. I do not know what is wrong with the code below. It would be very appreciated to get some kind of response as I do not know what further information I need to provide.

page = page_align(rec);
first_slot = page_dir_get_nth_slot(page, 0);
slot = page_dir_get_nth_slot(page, page_dir_get_n_slots(page) - 1);

if (page_is_comp(page)) {
	while (rec_get_n_owned_new(r) == 0) {
		r = rec_get_next_ptr_const(r, TRUE);
		// ...
	}
} else {
	// ...
}
rec_offs_bytes = mach_encode_2(r - page);
while (UNIV_LIKELY(*(uint16*) slot != rec_offs_bytes)) {
	if (UNIV_UNLIKELY(slot == first_slot)) {
		// Reaches here and asserts
	}
}

Detailed output of all records on the affected page

Attachment: records.log (text/x-log), 92.02 KiB.

I have also found the cause of the discrepancy I found previously in the error log and created a separate bug report for it: http://bugs.mysql.com/bug.php?id=79713

Bug found found in multiple systems.

For what it's worth, I found one anomaly on Win64, maybe you can look if similar issue may occur on your system?

https://bugs.mysql.com/bug.php?id=81859

Hi,

First , please reply to the question posed by my colleague Shane Bester.

Second, switching power off requires that entire system is setup as ACID one, not just MySQL. First of all, all ACID parameters of InnoDB, like O_DIRECT etc, should be used. Next, your disk controller should have its. cache turned off. Your disk must have its cache turned of and your entire filesystem must have caching turned off on Linux. I do not think that this is possible on Linux, but it is possible on Solaris.

As I wrote, your ACID settings in InnoDB should be set at maximum security level. You also must try to mount your filesystem without any cache. You should also check out whether your kernel and filesystem support O_DIRECT or any similar settings.

I do not believe the bug that Shane referenced is related, considering it has preprocessor blocks that are specific for Windows. Moreover the bug I reported specifically states that the file is probably damaged in some way, but not by normal operation.

Sinisia, while I understand the requirements for perfect and reliable operation, this is not what this bug report is about. This bug report is about when damage does occurs (after all, we do not live in a perfect world), ALL data is lost AND MySQL enters an unrecoverable state.

For our use case, we can live with a couple rows damaged/lost as these rows are most likely the ones changed or added recently. In our case, they just hold statistics and logging information. What we cannot live with is a system that becomes completely inoperable by MySQL entering an infinite restart loop.

Requiring that we must use battery backups, disable all write caching (have fun getting ANY performance out of your storage media with that) just sounds like pointing fingers. Yes, the file is corrupt. Yes, it occurred due to non-ideal conditions. Yes, you should probably do all those things if you have really really valuable data. I never denied that this happened under less-than-ideal conditions.

However, these requirements do no apply when you do not value your data that much. As stated previously, we do not care if some of it is lost. We do care when ALL of it is lost. We care more when the database service itself is lost. Somehow an assertion was hit. In other words, a fault was detect but there was no fault handling. No amount of ACID-compliance will handle a fault for which no code exists.

Claiming you cannot reproduce the conditions of an assertion after two years is insulting. Any competent programmer can glean even from the code posted in my previous comments how to reproduce it. I wish I was lucky enough to get bug reports as detailed as this one, usually I have to work with a screen shot of an error message or less!

So let's see how unreproducible this bug is. The first condition states:
while (*slot != rec_offs_bytes)
where:
slot = page_dir_get_nth_slot(page, page_dir_get_n_slots(page) - 1)
and:
rec_offs_bytes = mach_encode_2(r - page)

This appears to be a loop over all slots within some page, attempting to locate some record. Not very interesting.

Now for the second condition:
if (slot == first_slot)
where:
first_slot = page_dir_get_nth_slot(page, 0)

In other words: If the slot being examined is the first slot of that page, crash and burn.

What I think happened is that the page directory is messed up and page_dir_get_nth_slot is returning nonsense.

So here is the million dollar question: Is it acceptable for MySQL to fail completely when one or more pages are lost? If the answer is yes, you may want to reconsider your target audience. Bad things happen and stuff does break, no matter how hard you try. The difference lies in how such situations are handled.

Hi,

Yes, we know that we are not living in the perfect world.

However, we can not control operating system settings, filesystem settings or the hardware. Corruption that you experience is due to the fact that you have not setup your host for ACID performance. You also might not have set all InnoDB parameters so that loss of data is minimised.

As an example of the latter, 5.5 still supports raw disk partitions. Hence, if you would set all OS / hardware parameters properly, disabled Innodb file-per-table option and use raw disk partitions instead of the filesystem(s).

Needless to say, if you setup your system and MySQL to be 100 % ACID, your performance will suffer. There is no balance here. You either do frequent backups with a risk to suffer total loss of data, or you have full ACID compatibility with no loss of data.

However, as the things are as you describe, there is nothing we can do regarding the amount of the data loss.

Anyway, most of the above is described in our Reference Manual, in the chapter about InnoDB.

I see... So total loss of data and complete denial of service is acceptable due to some flipped bits in the page directory. I cannot argue with that logic.

For what its worth, at the time we changed the engine for the table causing the issue to MyISAM and had no further failures. Not only did MyISAM give us a more stable system, performance was improved as well.

Thank you for your time, you can close this S1 severity bug (raised after re-reading the definitions) as "working as intended".