Bug #117691 MySQL 8.4 and 9 are not taking advantage of the Linux Page Cache (contrarily to 8.0).
Submitted: 12 Mar 19:12 Modified: 26 Mar 14:41
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.4.0, 8.4.4, 9.2.0 OS:Linux
Assigned to: CPU Architecture:Any
Tags: regression

[12 Mar 19:12] Jean-François Gagné
Description:
Hi,

when doing IO tests with MySQL 9.2.0, I am not able to simulate fast IOs that I was able to simulate with previous versions of MySQL.  The way I simulate such fast IO is with a small InnoDB Buffer Pool and an ibd file in the Linux Page Cache.

I trace this down to MySQL 8.4.0.  In MySQL 8.0.41, I am able to simulate fast IOs, but with 9.2.0, 8.4.4 and 8.4.0, I am not.  These tests are done on the same Debian vm in AWS with gp3 disks.  See How to repeat for details.

Setting this as S2 / Serious, because a sub-optimal configuration of MySQL 8.0 (small buffer pool) on a server with a lot of RAM will lead to reduced performance after upgrade.  For the same reason, I tagged as "Regression".  There might be an argument for setting this as a S5 / Performance problem, but the regression makes me think S2 is more appropriate.

This might be caused by new InnoDB defaults in 8.4, but I have not checked in more detail.  Because of the unknown source, setting this as Category MySQL Server.

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
######################
# Environment details.

# Hardware details.
$ cat /proc/cpuinfo  | grep proc | wc -l
8

$ cat /proc/meminfo  | grep MemTotal
MemTotal:       32346516 kB

# OS details.
$ cat /etc/debian_version 
12.7

$ uname -a
Linux ip-172-31-0-54 6.1.0-26-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux

##########################
# Tests with MySQL 8.0.41.

dbdeployer deploy single mysql_8.0.41

# Filling one table with data.
# Setting things for having a 3 GB table with 4 rows per 16 KB InnoDB Page.
# The pv are a trick to time command execution.
nb_rows=$((3*1024*1024*1024 / (16*1024) * 4))
{
  ./use <<< "
     CREATE DATABASE test_jfg;
     CREATE TABLE test_jfg.t (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY)"

  seq 1 $nb_rows |
    awk '{print "(null)"}' |
    tr " " "," | paste -s -d "$(printf ',%.0s' {1..100})\n" |
    sed -e 's/.*/INSERT INTO t values &;/' |
    ./use test_jfg | pv -t

  { echo "ALTER TABLE t ADD COLUMN c0 CHAR(200) DEFAULT ''"
           seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT ''" 1 15
  } | paste -s -d "," | ./use test_jfg

  ./use test_jfg <<< "ALTER TABLE t FORCE"      | pv -t
  ./use test_jfg <<< "FLUSH TABLE t FOR EXPORT" | pv -t

  ls -lh data/test_jfg/t.ibd
}

# Below is the output of above.
0:00:39
0:03:06
0:00:00
-rw-r----- 1 jgagne jgagne 4.7G Mar 12 15:07 data/test_jfg/t.ibd

# Generating load.
while sleep 0.1; do ./use -N test_jfg <<< "SET @i = ROUND(RAND() * $nb_rows); SELECT * from t where id = @i;"; done

# Monitoring IOs (columns are: date, nb_ios, total_wait_usec, and avg_wait_usec).
sql4="select COUNT_STAR, SUM_TIMER_WAIT/1000/1000"; \
sql4="$sql4 from file_summary_by_event_name where EVENT_NAME = 'wait/io/file/innodb/innodb_data_file'"; \
while sleep 1; do date; ./use -N performance_schema <<< "$sql4"; done |
  stdbuf -oL paste -s -d " \n" | awk -W interactive '{
    aa=$7-a; bb=$8-b;
    a =$7;   b =$8;
    NF=6; print $0, aa, bb, bb/aa}' | tail -n +2

# Making sure nothing is in the page cache.
sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"

# Without a cache, we have avg_wait_usec ~600 usec.
Wed Mar 12 19:00:50 UTC 2025 16 9631.26 601.954
Wed Mar 12 19:00:51 UTC 2025 15 9172.63 611.508
Wed Mar 12 19:00:52 UTC 2025 14 8980.29 641.449

# Loading the table in cache.
pv -etbr data/test_jfg/t.ibd > /dev/null
4.63GiB 0:00:36 [ 129MiB/s]            

# When the table is in cache, we have fast IOs (~7 usec).
Wed Mar 12 19:02:05 UTC 2025 15 106.806 7.1204
Wed Mar 12 19:02:06 UTC 2025 17 107.185 6.30497
Wed Mar 12 19:02:07 UTC 2025 15 109.916 7.32773

# When repeating above with 8.4.4 (or 8.4.0 or 9.2.0),
#   I have similar uncached IOs.
Wed Mar 12 19:04:19 UTC 2025 10 6431.71 643.171
Wed Mar 12 19:04:20 UTC 2025 17 10822.1 636.595
Wed Mar 12 19:04:21 UTC 2025 12 6994.24 582.853

# But I do not have fast cached IOs.
Wed Mar 12 19:05:35 UTC 2025 13 7872.91 605.608
Wed Mar 12 19:05:36 UTC 2025 15 8442.59 562.839
Wed Mar 12 19:05:37 UTC 2025 13 7935.47 610.42

# And I know the table is in cache because below.
date; pv -etbr data/test_jfg/t.ibd > /dev/null
Wed Mar 12 19:05:35 UTC 2025
4.63GiB 0:00:00 [8.89GiB/s]

# EOF.
[14 Mar 7:45] MySQL Verification Team
Default changed:

https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_flush_method

https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_flush_method
[24 Mar 12:38] Jean-François Gagné
This bug has been in Analyzing since Thu, Mar 13.  Anything I can help you with for verifying if ?
[24 Mar 13:08] MySQL Verification Team
Hello Jean-François,

My apologies, was caught up in other bugs and I missed this.
I'll verify this by tomorrow(will request you if I have any issues while reproducing). Thank you.

Also, my senior colleague Shane mentioned to me that defaults have changed from 8.0 to 8.4/9.2 and that might be the reason here(will check while verifying):

Default changed:

https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_flush_method
https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_flush_method

regards,
Umesh
[25 Mar 15:09] MySQL Verification Team
Hello Jean-François,

My apologies for the delay on this(Was occupied in some other bugs, later needed compute with sudo)
Verified as described.

Later tonight will check 8.4.4 with innodb_flush_method=fsync and update my findings.

regards,
Umesh
[25 Mar 15:50] MySQL Verification Team
Hello Jean-François,

I checked 8.4.4 with --innodb-flush-method=fsync and conclusion from my tests i.e. 8.4.4 with --innodb-flush-method=fsync is same as that of 8.0.41 i.e When the table is in cache, both 8.4 and 8.0 have fast IOs.

Could you please check this from your end as well? Thank you.
Joining the test results shortly.

Sincerely,
Umesh
[25 Mar 15:56] MySQL Verification Team
8.0.41, 8.4.4 test results

Attachment: 117691.results (application/octet-stream, text), 11.17 KiB.

[26 Mar 14:41] Jean-François Gagné
Thanks for verifying this bug Umesh.

I confirm that with innodb-flush-method=fsync, I get the cache effect back with 8.4, details below.  So as I suspected, this was caused by the new defaults in 8.4, and you and Shane narrowed this down to innodb-flush-method, thanks again for investigating that.

Now I think the question is: Is this intended or not ?  The new default of innodb_flush_method=O_DIRECT might be better for flushing (see below quote from LeFred post about new default [1]), but this coming with the drawback of not caching reads might impact a lot of MySQL installations.

[1]: https://lefred.be/content/mysql-8-4-lts-new-production-ready-defaults-for-innodb/

> When supported, O_DIRECT has always been the preferred value and we recommended using it to bypass the filesystem cache to flush InnoDB changes to disk (for data files and log files).

Note that not caching reads with innodb-flush-method=O_DIRECT might be "working as intended", see below quotes from the documentation [2], the important part being "InnoDB uses O_DIRECT to open the data files".  But it is unclear from the documentation that this also applies for reading datafiles, and it is probably something that needs clarification.

[2]: https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_flush_method

> O_DIRECT or 4: InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files. This option is available on some GNU/Linux versions, FreeBSD, and Solaris.

Penalizing reads for making writes faster is not a tradeoff that I would have imposed to all MySQL users (maybe it was a blindspot when changing the default, or maybe it was actually intended).  So maybe the default for innodb-flush-method should be rethought, even thought changing it is not a small thing as 8.4 is not supposed to get incompatible changes.

Also, unclear if it makes sense to have O_DIRECT for flushing and not have it for reading (or also have a parameter for reading), I will let you and other Oracle Engineers think about this.

In all cases, the impact remains: when upgrading from 8.0 to 8.4, someone who was leveraging the Linux Page Cache in 8.0 (not allocating RAM to the Buffer Pool) will see degraded performances after the upgrade, and either has to set innodb-flush-method to fsync or tune the Buffer Pool.

There might be another more convoluted impact when running MySQL in containers.  With a container with little RAM on a host with a lot of RAM, a data file might not fit in the memory allocated to the container, but might fit in the RAM of the host.  I do not know enough about how Linux manages the Page Cache with containers to predict the impact, and I do not have time to test.

###################
# With 8.4 default.

./use -N <<< "show global variables like 'innodb_flush_method'"
innodb_flush_method     O_DIRECT

[...same results as before...]

###################
# With 8.0 default.

echo "innodb-flush-method=fsync" >> my.sandbox.cnf
./stop; ./start

[...Generating load and Monitoring IOs (see How to repeat)...]

# Making sure nothing is in the page cache.
sudo bash -c "echo 3 > /proc/sys/vm/drop_caches"

# Without a cache.
Wed Mar 26 14:17:03 UTC 2025 11 8023.97 729.451
Wed Mar 26 14:17:05 UTC 2025 15 11568.4 771.23
Wed Mar 26 14:17:06 UTC 2025 12 8487.53 707.294

# Loading the table in cache.
Wed Mar 26 14:17:43 UTC 2025
4.63GiB 0:00:36 [ 128MiB/s]            

# With innodb-flush-method=fsync, we see the cache effect.
Wed Mar 26 14:18:28 UTC 2025 14 169.589 12.1135                                                                        
Wed Mar 26 14:18:29 UTC 2025 14 158.048 11.2891
Wed Mar 26 14:18:30 UTC 2025 15 153.433 10.2289
[28 Mar 7:44] MySQL Verification Team
Hello Jean-François,

Thank you for verifying and confirming that with innodb-flush-method=fsync you get the cache effect
back with 8.4.

Sincerely,
Umesh