MySQL Bugs: #94403: Question about the lwm_lsn and dpa_lsn in log_get_available_for_checkpoint

Bug #94403	Question about the lwm_lsn and dpa_lsn in log_get_available_for_checkpoint_lsn
Submitted:	20 Feb 2019 6:30	Modified:	26 Feb 2019 14:18
Reporter:	Zongzhi Chen (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	8.0.14	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hello guys

I was confuse about the lwm_lsn and dpa_lsn in log_get_available_for_checkpoint_lsn().

  } else {
    /* Cannot go beyound dpa_lsn.

    Note that log_closer might still not advance dpa enough,
    because it might be scheduled out. Yes, the returned lwm
    guarantees it is able to advance, but it needs to do it! */

    lwm_lsn = std::min(lwm_lsn, dpa_lsn);
  }

we know that the dpa_lsn is the recent_closed.tail().
And the lwm_lsn is the buf_pool_get_oldest_modification_approx() - recent_closed.size().

We know that buf_pool_get_oldest_modification_approx() get the lsn from the buffer pool. Then after subtract recent_closed.size(), we can promise that the lwm_lsn is smaller than the dpa_lsn.

Even in the case as the comment said, since adding the page to the flush list and clear the recent closed buffer operation in the mtr_commit() won't be atomic. It is possible that we may add the page to the flush list, but we havn't advance the recent closed buffer. However, since the page_lsn is equal the start of the mtr, then after subtracting to the size of the recent closed buffer, it is still smaller than the dpa_lsn

How to repeat:
I have add a ut_a before here, and I find that the server never crashed.

  } else {
    /* Cannot go beyound dpa_lsn.

    Note that log_closer might still not advance dpa enough,
    because it might be scheduled out. Yes, the returned lwm
    guarantees it is able to advance, but it needs to do it! */

    ut_a(lwm_lsn <= dpa_lsn);
    lwm_lsn = std::min(lwm_lsn, dpa_lsn);
  }

Suggested fix:
we don't need the dpa_lsn here, maybe we only need the dpa_lsn when we find the lwm_lsn == 0

Hello, first of all, I would like to say that it is very good analysis!

You are right in principle - we could simply assign lsn to the lwm_lsn in that branch and it would be correct.

However there is a reason why it's written that way (doing extra computation which *almost* always evaluates to the lwm_lsn). Let me try to explain that below.

As you noticed, the lwm_lsn promises itself that it is smaller or equal to the recent_closed's tail (that's guaranteed because before we add pages to flush lists, we wait for space in the recent_closed).

However, the assertion you added there (for a test), could fail (!).

That's because load of the dpa_lsn and load of the lwm_lsn are not a single atomic operation. After the dpa_lsn is loaded, new pages could be added and the recent_closed could become advanced before the lwm_lsn is loaded. The dpa_lsn would be then outdated.

If we swapped order of these two loads, the assertion would then always hold.

However, for the case of empty flush lists (lwm_lsn == 0), we need to have the dpa_lsn from *before* the load of the lwm_lsn. This is extremely important, because otherwise following could happen (if we swapped order of the loads):

1. After we detected the lwm_lsn is zero, but before we loaded the dpa_lsn:
- new dirty pages could be added,
- recent_closed could become advanced up to the current lsn.

2. Then we would read the dpa_lsn (equal to the current_lsn) and write checkpoint at such lsn - bypassing the new dirty pages which were added to flush lists.

This explains why we must not swap order of these two loads.

However, still following options are correct:

1. The new assertion could be based on yet another load of the recent_closed's tail (after load of the lwm_lsn).

2. We could skip the new assertion and skip the computation there - just assign the lwm_lsn.

3. Current solution. Note that the dpa_lsn could be smaller than the lwm_lsn if it is outdated, because of the order of these two loads.

Now, if we decided for solution 2 (which you suggest AFAIU), then we would introduce kind of "optimization" for unlikely scenario with the outdated dpa_lsn. It would be correct, but the result of function would no longer be a consistent decision made according to the two values we loaded there: the dpa_lsn and the lwm_lsn.

In case lwm_lsn == 0 - it would stay consistent.

In case lwn_lsn != 0, it would ignore the fact that possibly the dpa_lsn is much smaller (very unlikely as the thread would have to be scheduled out for longer between these two loads, and the loads happen one after another in the code).

The consistency I mentioned here, is no longer important as soon as we exit the function. However I believe it provides some safety for any future changes to that function.

I believe it's better to stay with the current approach. However one could consider the difference as a matter of taste.

I think the comment could be enhanced there to cover what I described here (stressing the fact that the two loads are not atomic operation).

I am open to suggestions and discussion here. Please let me know what you think.

Thanks for the very good and deep analysis of the code!

Thank you for you detailed explain. It make sense now.
We had discussed that the load of two lsn won't be atomic. However, we hadn't  find the corner case..

Hi Chen,

I understood that you are happy with the explanation provided.

Hence, I have changed the status of this report.

If you have any other related questions, or find other problems stemming from those watermarks (which is how I call them), please do let us know.

Thank you for your report.