MySQL Bugs: #117533: The performance is poor when concurrently reading undo records

Bug #117533	The performance is poor when concurrently reading undo records
Submitted:	20 Feb 13:37	Modified:	27 Mar 8:31
Reporter:	Ruyi Zhang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	8.0.41	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	fil_system

Description:
Recently, our team has found that when there are high - concurrent reads (using S latch) on pages in the undo space (there are two undo spaces by default), the page reading efficiency of MySQL will be significantly reduced. 

We may not be able to observe this phenomenon in general table access, but it is very easy to notice it when reading undo records concurrently. 

Although multiple read sessions read undo pages through S latch, the call of `fil_space_get_page_size` (specifically, `Fil_shard::mutex_acquire`) in `trx_undo_get_undo_rec_low` prevents parallelism among the read sessions.

How to repeat:
`cat scan_old_recs.lua`

```
require("oltp_common")

sysbench.cmdline.commands.prepare = {
   function ()
      if (not sysbench.opt.auto_inc) then
         -- Create empty tables on prepare when --auto-inc is off, since IDs
         -- generated on prepare may collide later with values generated by
         -- sysbench.rand.unique()
         sysbench.opt.table_size=0
      end

      cmd_prepare()
      con:query("SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;")
      con:query("begin")
   end,
   sysbench.cmdline.PARALLEL_COMMAND
}

function prepare_statements()
   -- We do not use prepared statements here, but oltp_common.sh expects this
   -- function to be defined
end

function event()
   local table_name = "sbtest" .. sysbench.rand.uniform(1, sysbench.opt.tables)
   con:query("set transaction_isolation='REPEATABLE-READ';")
   con:query("set autocommit=off")

   con:query(string.format("select sum(id) from %s force index(primary) limit 100", table_name))

end
```

Start the read load `scan_old_recs`. Subsequently, if there is no write operation on the sbtest1 table, `scan_old_recs` will not read the undo - rec.

```
/usr/bin/sysbench scan_old_recs --mysql-socket=/tmp/mysql.sock  --mysql-user=root --mysql-db=test --tables=1 --table-size=10000 --report-interval=1 --percentile=99 --max-time=3000000 --threads=32 --rand-type=uniform prepare/run
```

Update the sbtest1 table so that `scan_old_recs` reads the undo-rec when scanning records.

```
update sbtest1 set k=k+1;
Query OK, 10000 rows affected (0.39 sec)
Rows matched: 10000  Changed: 10000  Warnings: 0
```

The test results show that after the sbtest1 table is updated, the performance of the `scan_old_recs` read load is significantly reduced.

```
[ 1s ] thds: 32 tps: 10881.48 qps: 32708.32 (r/w/o: 10881.48/0.00/21826.84) lat (ms,99%): 4.57 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 32 tps: 12010.28 qps: 36029.84 (r/w/o: 12010.28/0.00/24019.56) lat (ms,99%): 4.74 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 32 tps: 12504.17 qps: 37511.51 (r/w/o: 12504.17/0.00/25007.34) lat (ms,99%): 4.74 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 32 tps: 13726.21 qps: 41180.62 (r/w/o: 13726.21/0.00/27454.41) lat (ms,99%): 4.65 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 32 tps: 14084.23 qps: 42251.70 (r/w/o: 14084.23/0.00/28167.47) lat (ms,99%): 4.57 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 32 tps: 14316.86 qps: 42947.59 (r/w/o: 14316.86/0.00/28630.73) lat (ms,99%): 4.49 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 32 tps: 14394.25 qps: 43186.74 (r/w/o: 14394.25/0.00/28792.50) lat (ms,99%): 4.49 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 32 tps: 3562.11 qps: 10686.32 (r/w/o: 3562.11/0.00/7124.21) lat (ms,99%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 32 tps: 192.00 qps: 576.00 (r/w/o: 192.00/0.00/384.00) lat (ms,99%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 32 tps: 184.00 qps: 551.99 (r/w/o: 184.00/0.00/367.99) lat (ms,99%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 11s ] thds: 32 tps: 182.00 qps: 546.01 (r/w/o: 182.00/0.00/364.01) lat (ms,99%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 12s ] thds: 32 tps: 182.00 qps: 546.00 (r/w/o: 182.00/0.00/364.00) lat (ms,99%): 186.54 err/s: 0.00 reconn/s: 0.00
[ 13s ] thds: 32 tps: 184.99 qps: 554.98 (r/w/o: 184.99/0.00/369.99) lat (ms,99%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 14s ] thds: 32 tps: 184.01 qps: 552.03 (r/w/o: 184.01/0.00/368.02) lat (ms,99%): 189.93 err/s: 0.00 reconn/s: 0.00
```

Suggested fix:
In `buf_page_init_for_read`, `buf_block_t` is also associated with `fil_space_t` under the protection of `Fil_shard::mutex_acquire`(There may also be concurrent calls on other paths.
). 

Therefore, perhaps `Fil_shard::m_mutex` should be replaced with `rw_lock_t` to improve the concurrency among read sessions.

I tried to remove the call to `fil_space_get_page_size` in `trx_undo_get_undo_rec_low`, but the TPS only increased from over 100 to over 300. The locking and unlocking of `purge_sys->latch` in `trx_undo_get_undo_rec` seems to have become the new bottleneck.

Hello Ruyi Zhang,

Thank you for the report and feedback.
Verified as described.

regards,
Umesh