MySQL Bugs: #60776: InnoDB Max Purge Lag setting is wrongly designed

Bug #60776	InnoDB Max Purge Lag setting is wrongly designed
Submitted:	6 Apr 2011 11:42	Modified:	1 May 2013 20:09
Reporter:	Dimitri KRAVTCHUK	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S2 (Serious)
Version:	any	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	innodb, innodb_max_purge_lag

Description:
Currently the innodb_max_purge_lag=N feature is implemented within all InnoDB versions like:

trx0purge.c:
...
       /* If we cannot advance the 'purge view' because of an old
        'consistent read view', then the DML statements cannot be delayed.
        Also, srv_max_purge_lag <= 0 means 'infinity'. Note: we do a dirty
        read of the trx_sys_t data structure here, without holding
        trx_sys->read_view_mutex. */
        if (srv_max_purge_lag > 0
            && !UT_LIST_GET_LAST(trx_sys->view_list)) {
                float   ratio = (float) trx_sys->rseg_history_len
                        / srv_max_purge_lag;
                if (ratio > ULINT_MAX / 10000) {
                        /* Avoid overflow: maximum delay is 4295 seconds */
                        delay = ULINT_MAX;
                } else if (ratio > 1) {
                        /* If the history list length exceeds the
                        innodb_max_purge_lag, the
                        data manipulation statements are delayed
                        by at least 5000 microseconds. */
                        delay = (ulint) ((ratio - .5) * 10000);
                }

...

Problems:

#1 - in most cases DML delay will not be involved due condition check for a presence of consistent read, which finally will be true if there are *any* reads are currently running within InnoDB (even not accessing the data which should be purged).. - then, after all, as it's hard to say which exactly session is blocking a purge from progressing (if it's so), it's better simply to remove this condition check and involve DML delays always when the purge lag is out-passing innodb_max_purge_lag setting..

#2 - the max possible delay value is too high!! - currently it's even *very* dangerous, because if one will use purge lag setting today there may happen a situation when the delay timeout will remain constantly incremented but never applied due consistent read condition, and reach a very high value before it'll be finally used - so once the delay is applied it may delay a DML for one hour for ex. (or even more - you may just google user forums to find such cases).. - So, I think the max value should be limited to say 5 or 10 seconds, or be configurable..

How to repeat:
Find any Read+Write test which is having purge lag growing during the execution (DBT2 for ex., or dbSTRESS). Then restart the same test, but with setting innodb_max_purge_lag=500000 (for ex.) and observe purge lag still continuing to grow..

Suggested fix:
Proposed solutions:

  for #1: 
    - completely remove condition check for consistent read
    - add a flag and counter into purge thread to account situations when purge was blocked

  for #2:
    - reduce the max DML delay to 10 sec. or make it configurable

NOTE: in some situations DML delay may not help anymore (when you have a massive DELETE operation for ex.) - so, once the max DML delay is applied and purge lag continues to grow - it's better to add delays on REDO writes in this case as we cannot slow down any DML statements which are already running..

NOTE: for the moment using innodb_max_purge_lag is giving the best performance results when we have to keep the max possible performance and keep purge lag stable (ex: http://dimitrik.free.fr/blog/archives/2010/05/mysql-performance-improving-stability.html (section "Purge Lag") - so it's quite important to get it fixed asap (as a starter: there are just 2 lines to change)..

It's worth remembering why we have this setting: it's to prevent the server from filling the hard drive, not for normal use throttling. The setting should normally be at a high anti-disaster value, not one that is ever reached during normal activities, including normal backups.

For normal work it's the normal purging that should handle things, not this.

The rest of the discussion covers normal use, not anti-disaster use. That is, it covers use of this setting for tasks which it wasn't intended for.

I agree that the maximum delay is too high. From a support perspective what we get is someone with a large lag and we can't tell them to set the max purge lag to the final target value immediately or it will effectively kill their server. Instead we have to tell them to reduce it gradually. Some gradual increasing of the delay from a low value to a higher one over time if the initial delay doesn't produce a decrease would be more helpful.

It's also worth considering what happens during backups: the lag can increase to a value in the few billion range during the backup due to disk I/O contention lasting for many hours. It's OK to change the maximum purge lag for a backup to a higher value but then you again need to be very careful when reducing it, lest you freeze your server. And this has to be done while nobody is watching the server because backups are usually unattended. There's no problem with having a high purge lag during a backup.

However, consider a server that does all of its work in large batch transactions that run for many hours. It may really be necessary to insert large delays between those because of the large amount of work that each does.

This causes me to think that we need two different things:

1. the existing disaster prevention setting, innodb_max_purge_lag.
2. purging able to use smaller throttling, but only if it's been behind for a long time and isn't already catching up, lest it throttle due to normal daily variations like backup.

Both purging and flushing need to be able to throttle foreground threads because it's possible to submit more work to a cached data set than the underlying durable storage can keep up with. But that throttling needs to ramp up gradually, not start with a cliff.

Added a changelog entry for 5.6.5:

"The "innodb_max_purge_lag" variable controls how to delay DML operations
when purge operations are lagging. Previously, if an old consistent read
view was detected, DML operations would not be delayed even though the
purge lag exceeded the "innodb_max_purge_lag" setting. 

Additionally, if the "innodb_max_purge_lag" setting was used, situations could arise in which the DML delay time would continue to increase but not be applied
right away due to the presence an old consistent read view. This could
result in a lengthy DML delay when the accumulated DML delay time is
eventually applied. 

This fix caps the DML delay at a maximum value, removes the consistent read check, and revises the DML delay calculation."