Bug #108654 Doument explains about change buffer merging, but mentions its as "purging"
Submitted: 29 Sep 2022 16:46 Modified: 7 Oct 2022 13:04
Reporter: Niranjan R Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: Change buffer, documentation, purge

[29 Sep 2022 16:46] Niranjan R
Description:
The document about change buffer (https://dev.mysql.com/doc/refman/8.0/en/innodb-change-buffer.html) mentions:

Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately. 

How to repeat:
NA

Suggested fix:
I don't think it is the purge operation that is responsible to write the updated index pages on disk.

Can this be corrected.
[30 Sep 2022 11:43] MySQL Verification Team
Hi Mr. Niranjan,

Thank you for your documentation bug report.

We have checked the source code and we have consulted with the InnoDB team and this is a full description.

Purge thread is the only one that reads the undo log in batches and applies the undo log changes on the table pages and then the regular flush thread flush it. 

Since the flush thread is described in a separate chapter, what the documentation states is absolutely correct.

Not a bug.
[30 Sep 2022 20:26] Niranjan R
Thanks for the update.

I really understand:
"Purge thread is the only one that reads the undo log in batches and applies the undo log changes on the table pages and then the regular flush thread flush it."

But, the concern is that the document states "Purge writes the updated index pages to disk"  --> Which seems incorrect.

Can you please check again
[3 Oct 2022 11:59] MySQL Verification Team
Hi,

Yes, the documentation is correct.

The purge thread is writing the pages and and the separate flush thread is flushing them. These are two separate operations.

Flushing thread has its own section in the documentation.
[4 Oct 2022 21:39] Valter Rehn
The phrase "The purge thread is writing the pages and and the separate flush thread is flushing them. These are two separate operations." does not have the same meaning as "the purge operation ... writes the updated index pages to disk".

Even if not incorrect, this line in the documentation is confusing. 

How about adding a small clarification, changing from 
"Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately."

To
"Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. With purge buffering, the purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately."
[7 Oct 2022 9:01] Jakub Lopuszanski
FWIW I side with the reporter.
The documentation sounds like a nonsense to me.
First of all I see no reason why this paragraph is in the article about "change buffering" at all.
It seems to mix 3 completely different concepts:

1. Change Buffering, which is the practice of using a completely separate B-tree to accumulate changes intended for secondary indexes' pages which are currently not in BP, so that modification of those secondary indexes can be postponed until the target page is read back into the Buffer Pool. Here the task of applying this delayed change operation is conducted by the io completion thread which happens to handle read operation of that page some time in the future. (Or by a slightly different mechanism during slow shutdown)

2. Purging of Undo Log, which is the duty of Purge Threads (srv_purge_coordinator_thread and srv_n_purge_threads-1 instances of srv_worker_thread), an operation which consist of removal of old transaction's records from Undo Log - which are also in the undo log chains hanging off the clustered index record - an operation which requires removing old delete-marked records from secondary indexes which pointed to those no longer needed versions of clustered index records.

3. (Dirty Pages) Flushing, a.k.a. Fuzzy Checkpointing, a.k.a. Page Cleaning, which is a task performed by Page Cleaners (buf_flush_page_coordinator_thread and m_page_cleaner_workers_n instances of buf_flush_page_cleaner_thread), which consists of taking a dirty page modified long time ago from BP instance's flush list and writing it back to disc, removing it from flush list and marking it as clean again. This eventually allows the checkpoint lsn to advance.

I think this article should focus on 1., but instead it seems to talk about 3. using terminology of 2.
[7 Oct 2022 13:04] MySQL Verification Team
Hi Mr. R,

Since our Development considers that documentation needs to be reorganised, we are verifying this bug.

We do not know how much time it will take to reorganise entire chapter.

Verified for both 5.7 and 8.0.