Bug #112621 Performance degradation from 8.0.30 onwards related to performance_schema
Submitted: 4 Oct 2023 16:39 Modified: 20 Nov 2023 15:30
Reporter: Marc Reilly Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S5 (Performance)
Version:8.0.34 OS:Any
Assigned to: CPU Architecture:Any
Tags: performance, performance_schema, stored routine

[4 Oct 2023 16:39] Marc Reilly
Description:
While doing some perf testing on 8028-8034 I have noticed that when performance schema is enabled, in some cases we are seeing up to 2x performance degradation between versions -  especially when using stored procedures which execute multiple statements. To demonstrate the behavior I have put together the below sysbench test to mimic it using a while loop, the same can be reprod by just using multiple statements inside a proc.

Based on the lua script I seen the following p95 latency on each version with 10 concurrent threads:
Hardware: ec2 r6i.8xlarge 32 (vCPU/256 GiB)
Dominant perf top func:
- 8028: pfs_end_statement_v2
- 8030-8034: pfs_end_statement_vc

There were four different configs, since pfs_end_statement_vc was introduced after 8028 I will use 8028 as the baseline

pfs_off : performance schema disabled
+---------+---------------------+------------------------+--------------------+
| version | pfs_off p95 latency | Diff between releases: | Diff from baseline |
+---------+---------------------+------------------------+--------------------+
|    8028 |                0.24 | BASELINE               |                    |
|    8030 |                0.24 | 0.00%                  | 0.00%              |
|    8031 |                0.26 | 8.33%                  | 8.33%              |
|    8032 |                0.26 | 0.00%                  | 8.33%              |
|    8033 |                0.28 | 7.69%                  | 16.67%             |
|    8034 |                0.26 | -7.14%                 | 8.33%              |
+---------+---------------------+------------------------+--------------------+

pfs_on_default(MySQL default) : default with minimal instrumentation enabled
+---------+--------------------------------------------+------------------------+--------------------+
| version | pfs_on_default(MySQL default)  p95 latency | Diff between releases: | Diff from baseline |
+---------+--------------------------------------------+------------------------+--------------------+
|    8028 |                                       0.52 | BASELINE               |                    |
|    8030 |                                       0.56 | 7.69%                  | 7.69%              |
|    8031 |                                       0.56 | 0.00%                  | 7.69%              |
|    8032 |                                       0.57 | 1.79%                  | 9.62%              |
|    8033 |                                        0.8 | 40.35%                 | 53.85%             |
|    8034 |                                        0.8 | 0.00%                  | 53.85%             |
+---------+--------------------------------------------+------------------------+--------------------+

pfs_on_waits : Perf schema enabled with additional instrumentation
+---------+-----------------------------------------+------------------------+--------------------+
| version | pfs_on_waits(See [1] below) p95 latency | Diff between releases: | Diff from baseline |
+---------+-----------------------------------------+------------------------+--------------------+
|    8028 |                                    0.53 | BASELINE               |                    |
|    8030 |                                     0.6 | 13.21%                 | 13.21%             |
|    8031 |                                    0.61 | 1.67%                  | 15.09%             |
|    8032 |                                    0.57 | -6.56%                 | 7.55%              |
|    8033 |                                    0.81 | 42.11%                 | 52.83%             |
|    8034 |                                    0.89 | 9.88%                  | 67.92%             |
+---------+-----------------------------------------+------------------------+--------------------+

pfs_on_waits_statements_off :  Perf schema enabled with additional instrumentation except I disabled events_statements_current consumer
+---------+--------------------------------------------------------+------------------------+--------------------+
| version | pfs_on_waits_statements_off(see [2] below) p95 latency | Diff between releases: | Diff from baseline |
+---------+--------------------------------------------------------+------------------------+--------------------+
|    8028 |                                                   0.34 | BASELINE               |                    |
|    8030 |                                                   0.35 | 2.94%                  | 2.94%              |
|    8031 |                                                   0.36 | 2.86%                  | 5.88%              |
|    8032 |                                                   0.35 | -2.78%                 | 2.94%              |
|    8033 |                                                   0.39 | 11.43%                 | 14.71%             |
|    8034 |                                                   0.37 | -5.13%                 | 8.82%              |
+---------+--------------------------------------------------------+------------------------+--------------------+

[1]
performance-schema-consumer-global-instrumentation='ON'
performance-schema-consumer-thread-instrumentation='ON'
performance-schema-consumer-events-waits-current='ON'
performance-schema-instrument='wait/%=ON'
performance_schema='1'

[2]
Same as [1] except events_waits_current is disabled
update performance_schema.setup_consumers set enabled='no' where name='events_statements_current';

How to repeat:
1. Load this stored proc
```
drop procedure if exists test.loop_test;
delimiter //
create procedure test.loop_test(d_d_id int, loop_counter int )
BEGIN
DECLARE d_d_id	    	INTEGER;
DECLARE loop_counter  	INT;
DECLARE id  	INT;
SET loop_counter = 1;
WHILE loop_counter <= 50 DO
SET d_d_id = loop_counter;
select d_d_id into id;
COMMIT;
set loop_counter = loop_counter + 1;
END WHILE;
select id;
END//
delimiter ;
```

2. Run the following sysbench lua script on 10 threads against each version/combination
```
$ cat /usr/local/share/sysbench/loop.lua
#!/usr/bin/env sysbench
require("oltp_common")
function prepare_statements()
   -- We do not use prepared statements here, but oltp_common.sh expects this
   -- function to be defined
end

function event()
   local rs = con:query([[CALL loop_test(1,1)]])
   while rs ~= nil do
      rs = con:next_result()
   end
   check_reconnect()
end
```
for i in 28 30 31 32 33 34;
do 
sysbench /usr/local/share/sysbench/loop.lua \
--db-driver=mysql --mysql-user=msandbox \
--mysql-password=msandbox --mysql-db=test \
--threads=10 --report-interval=10 --time=180 --skip-trx \
--mysql-socket=/tmp/mysql_sandbox80${i}.sock --mysql-port=80${i} \
run;
done

3. Observe differences in througput and latency between minor releases.

Suggested fix:
It seems that these initial perf differences came in with the introduction of the pfs_end_statement_vc in place of pfs_end_statement_v2 around 8029/8030 and seems related to how stats in events waits current are updated(sub_stmt_stat->m*). 

In addition, the increased overhead in 8033 onwards seems concening, there is a 42.11% increase in latency from 8032-->8034 and taking all into account a 67.92% increase in latency between 8028 and 8034 for the pfs_on_waits scenario.
[4 Oct 2023 16:41] Marc Reilly
updated affected versions
[17 Oct 2023 6:39] MySQL Verification Team
Hello Marc Reilly,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[20 Nov 2023 15:30] Edward Gilmore
Posted by developer:
 
Added the following note to the MySQL Server 8.0.36 and 8.3.0 release notes:
	
The performance of the Performance Schema statement instrumentation has been improved.
Specifically, collecting MESSAGE_TEXT data is now more efficient.