Bug #113505 Performance schema counter increments make parallel index create slow
Submitted: 24 Dec 2023 18:23 Modified: 11 Jan 18:42
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0.35 OS:Any
Assigned to: CPU Architecture:Any

[24 Dec 2023 18:23] Mark Callaghan
On small and big servers InnoDB parallel index create is much faster with performance_schema =0 than with =1. The problem is memory system contention from this line of code that all of the parallel threads are incrementing:

By "small server" I mean an 8-core mini-PC and index create is ~1.5X faster with perf_schema=0.

By "big server" I mean a 2-socket, 12-core server and index create is up to 2.4X faster with perf_schema=0.

I confirmed this by repeating index creation using MySQL 8.0.35 as-is and then repeating it using a build with that line of code removed. Tests were done for perf_schema =0 and =1

How to repeat:
Run the insert benchmark using 1 client session and 80M rows in one table.
There is a benchmark step that creates 3 secondary indexes on the table using one ALTER statement.

I am sure you can do this without the Insert Benchmark, but if you want to use it then get it from https://github.com/mdcallag/mytools/blob/master/bench/ibench/iibench.py

At this git hash 17ae9bb8fb58304

# loads the table with a PK index but no secondary indexes
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host= --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --max_rows=80000000 --table_name=pi1 --setup --num_secondary_indexes=0 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=1703371433 --dbopt=none --my_id=1

# creates the secondary indexes
python3 iibench.py --dbms=mysql --db_name=ib --secs_per_report=1 --db_host= --db_user=root --db_password=pw --engine=innodb --engine_options= --unique_checks=1 --bulk_load=0 --secondary_at_end --max_rows=100 --table_name=pi1 --num_secondary_indexes=3 --data_length_min=10 --data_length_max=20 --rows_per_commit=100 --inserts_per_second=0 --query_threads=0 --seed=1703372349 --dbopt=none --my_id=1

Suggested fix:
Don't increment that counter so often or use thread-local counters.
[24 Dec 2023 18:39] Mark Callaghan
I wrote more about it here
[24 Dec 2023 19:36] Mark Callaghan
I also tried removing the calls to inc() in ddl0par-scan.cc but that didn't make a big difference.
[24 Dec 2023 19:38] Mark Callaghan
I wonder whether the n_rows counters here should be split across multiple cache lines to avoid cache line conflicts, but my hacky attempt didn't show a benefit.

[27 Dec 2023 5:26] MySQL Verification Team
Hello Mark,

Thank you for the report and feedback.
I'll try to reproduce at my end and would get back to you if anything further needed on this.

[8 Jan 8:04] MySQL Verification Team
Thank you, Mark for the detailed explanation at https://smalldatum.blogspot.com/2023/12/create-innodb-indexes-2x-faster-with.html

Attempted only 3 times, didn't notice even once fixed build is 2x faster but surely much better than "as is" build(most likely due to diff in h/w used, cmake options used for the source build etc.). For now verifying this report but if possible could you please share the exact cmake options used for the build then I'll give it a try once again(will see if I get even bigger box to reproduce). If you think I'm missing anything else then kindly suggest and will re-attempt. Thank you.

Case: When performance_schema=1 then the fixed build is ~2X faster than the as-is build

Conf	      As Is build   MarkC fixed build
ddl1		1366.4 secs   1192.4 secs
ddl4		1069.7 secs   928.4 secs
ddl8		1112.4 secs   995.5 secs
ddl16		1276.3 secs   1201.6 secs

ddl1		1390.4 secs   1210.4 secs
ddl4		1250.5 secs   1128.1 secs
ddl8		1162.4 secs   1050.5 secs
ddl16		1176.3 secs   1130.2 secs

ddl1		1200.1 secs   1082.4 secs
ddl4		1069.5 secs   978.4 secs
ddl8		1212.2 secs   1095.5 secs
ddl16		1276.6 secs   1201.6 secs 

Case: When performance_schema=0 then there is no difference between the as-is and fixed builds. Didn't observe any noticeable drop with or without fix(including p_s 0).

[8 Jan 8:05] MySQL Verification Team
Test results - 8.0.35

Attachment: 113505_8.0.35_AsIs_VS_MarkCFix.log (application/octet-stream, text), 48.40 KiB.

[9 Jan 15:05] Mark Callaghan
What size server do you use -- number of cores (not HW threads), memory, etc?

CMake script was:

BF=" -g1 "
CF=" $BF "
CXXF=" $BF "

cmake .. \
  -DWITH_SSL=system \
  -DWITH_ZLIB=bundled \
  -DWITH_BOOST=$PWD/../boost \
[10 Jan 6:57] MySQL Verification Team
Thank you for the cmake details.
I'll try to attempt once again and update you on the same instance/bigger(bug# is already verified).

>>What size server do you use -- number of cores (not HW threads), memory, etc?
This is one of our test server having 120GB+ RAM and 16 cores(have included environment details in the Attachment: 113505_8.0.35_AsIs_VS_MarkCFix.log).  

[11 Jan 13:40] MySQL Verification Team
Hello Mark,

Thank you once again for the cmake details. On a larger box(250G+ RAM, 224 CPU(s) aarch64 instance - more details are in the test results file which I'll attach later), re-verified and confirmed that fixed build is >2X faster than the as-is build:

Conf	As-is build	Source build with MarkC fix
ddl1	1220.5 secs	      1037.1 secs
ddl4	486.9 secs	      328.7 secs
ddl8	380.5 secs	      121.3 secs
ddl16	314.9 secs	      151.2 secs

Internally, this was concluded to be "Innodb" specific issue and thus re-categorized hence changing category here to reflect the same. Thank you.

[11 Jan 18:42] Mark Callaghan
Thank you for doing a great job on the verification.
[12 Jan 9:30] MySQL Verification Team
Re-test on huge test box instance results - 8.0.35

Attachment: 113505_8.0.35_AsIs_VS_MarkCFix_Aarch64.log (application/octet-stream, text), 48.05 KiB.