Bug #103429 Information about corruption when speaking about innodb_flush_log_at_trx_commit
Submitted: 22 Apr 2021 12:15 Modified: 22 Apr 2021 16:25
Reporter: Niranjan R Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:All OS:Any
Assigned to: CPU Architecture:Any
Tags: corruption, Durability

[22 Apr 2021 12:15] Niranjan R
Description:
In the below documentation about the variable "innodb_flush_log_at_trx_commit", there is a Caution stating that:
"Many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. 
In this case, the durability of transactions is not guaranteed even with the recommended settings, and in the worst case, a power outage can corrupt InnoDB data."

[+] https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_flush_log_at_...

== From the above statements, it is clear that durability of transactions might not be guaranteed because of the behavior of OS and/or Hardware.

That being said, why will this cause a "CORRUPTION" in InnoDB data. (EVEN IN THE WORST CASE).
As per my understanding, even in the worst case, there should just be "DATA LOSS" due to durability issues.
But, Why will there be a "CORRUPTION"?

Can this be corrected?

== If my understanding is wrong, and if it is indeed expected to cause "CORRUPTION", can you please explain how it can cause corruption.

Further, if durability issues due to OS/Hardware can cause "CORRUPTION", then can we also incur that just setting the variable "innodb_flush_log_at_trx_commit" to 0 or 2 also cause "CORRUPTION"(in the worst case), since the DURABILITY is lost even when you set this variable to a value other than 1?

If setting "innodb_flush_log_at_trx_commit" to 0 or 2 can also cause "CORRUPTION", can you please add the same to the document for more clarity?

How to repeat:
Documentation correction

Suggested fix:
If durability of transaction is not guaranteed, there should only be data loss and not corruption.

If durability issues can cause corruption, then setting the variable "innodb_flush_log_at_trx_commit" to 0 or 2 should also cause corruption.
[22 Apr 2021 13:36] MySQL Verification Team
Hi Mr. R,

Thank you for your bug report.

However, this is not a bug.

What we wrote in that manual is based on our direct experience for the last 20 (twenty) years. Namely, computers can have cache RAM on both disk controller chip / card or on hard disk itself. So, when you flush and sync a file, hardware will return OK for both to the operating system, BUT ...... all data are still in cache and not on the disk.

Now, if the power outage occurs, all these data, which could be Gbs large will be TOTALLY lost. This would lead, not just to the loss of data, but to the total corruption of data, because parts of the tablespace will be on disk and parts will be lost in the caches, when the power is out.
[22 Apr 2021 14:36] Niranjan R
Thanks for the update.

So, considering the explanation given, can I conclude that corruption due to flushing can happen only if Hardware and/or OS returns "OK" before actually flushing everything to disk.
Because, as far as I can see, this is the only scenario where "part" of the data is in cache, and "part" of it is in "DISK"

Considering the case where Hardware/OS is 'NOT' fooling flush-to-disk operation, corruption will not happen even if you set the variable "innodb_flush_log_at_trx_commit" to 0(or 2).
Because in this case, the data is either completely on disk, or not on disk. There is no "partial" data written to disk.

Is my understanding correct?
[22 Apr 2021 15:27] Sunny Bains
As long as the sequence of writes are correct as per ARIES ie redo/wal writes before data file writes, there shouldn't be any corruption just loss of data. If the disk firmware is lying about  persisting the data to disk and the the firmware  writes the data from its cache out of order ie. data pages before the redo logged pages that have the deltas from the data files pages,  then there is a chance of corruption.
[22 Apr 2021 16:25] Niranjan R
Thanks for the clarification, Sunny.

So, I understand that the corruption can happen if and only if the writes happen in a 'wrong' sequence.

And based on my understanding this 'wrong' sequence will come into picture only if the firmware is lying about persisting the data to disk.

Considering a scenario where the firmware is not lying, I don't think the sequence of write is going to change even if you set the variable "innodb_flush_log_at_trx_commit" to 0(or 2).

So, is it safe to conclude that if the firmware is "NOT" lying, then there won't be any corruption even if you set the variable "innodb_flush_log_at_trx_commit" to 0(or 2)?