Description:
While doing some perf testing on 8028-8034 I have noticed that when performance schema is enabled, in some cases we are seeing up to 2x performance degradation between versions - especially when using stored procedures which execute multiple statements. To demonstrate the behavior I have put together the below sysbench test to mimic it using a while loop, the same can be reprod by just using multiple statements inside a proc.
Based on the lua script I seen the following p95 latency on each version with 10 concurrent threads:
Hardware: ec2 r6i.8xlarge 32 (vCPU/256 GiB)
Dominant perf top func:
- 8028: pfs_end_statement_v2
- 8030-8034: pfs_end_statement_vc
There were four different configs, since pfs_end_statement_vc was introduced after 8028 I will use 8028 as the baseline
pfs_off : performance schema disabled
+---------+---------------------+------------------------+--------------------+
| version | pfs_off p95 latency | Diff between releases: | Diff from baseline |
+---------+---------------------+------------------------+--------------------+
| 8028 | 0.24 | BASELINE | |
| 8030 | 0.24 | 0.00% | 0.00% |
| 8031 | 0.26 | 8.33% | 8.33% |
| 8032 | 0.26 | 0.00% | 8.33% |
| 8033 | 0.28 | 7.69% | 16.67% |
| 8034 | 0.26 | -7.14% | 8.33% |
+---------+---------------------+------------------------+--------------------+
pfs_on_default(MySQL default) : default with minimal instrumentation enabled
+---------+--------------------------------------------+------------------------+--------------------+
| version | pfs_on_default(MySQL default) p95 latency | Diff between releases: | Diff from baseline |
+---------+--------------------------------------------+------------------------+--------------------+
| 8028 | 0.52 | BASELINE | |
| 8030 | 0.56 | 7.69% | 7.69% |
| 8031 | 0.56 | 0.00% | 7.69% |
| 8032 | 0.57 | 1.79% | 9.62% |
| 8033 | 0.8 | 40.35% | 53.85% |
| 8034 | 0.8 | 0.00% | 53.85% |
+---------+--------------------------------------------+------------------------+--------------------+
pfs_on_waits : Perf schema enabled with additional instrumentation
+---------+-----------------------------------------+------------------------+--------------------+
| version | pfs_on_waits(See [1] below) p95 latency | Diff between releases: | Diff from baseline |
+---------+-----------------------------------------+------------------------+--------------------+
| 8028 | 0.53 | BASELINE | |
| 8030 | 0.6 | 13.21% | 13.21% |
| 8031 | 0.61 | 1.67% | 15.09% |
| 8032 | 0.57 | -6.56% | 7.55% |
| 8033 | 0.81 | 42.11% | 52.83% |
| 8034 | 0.89 | 9.88% | 67.92% |
+---------+-----------------------------------------+------------------------+--------------------+
pfs_on_waits_statements_off : Perf schema enabled with additional instrumentation except I disabled events_statements_current consumer
+---------+--------------------------------------------------------+------------------------+--------------------+
| version | pfs_on_waits_statements_off(see [2] below) p95 latency | Diff between releases: | Diff from baseline |
+---------+--------------------------------------------------------+------------------------+--------------------+
| 8028 | 0.34 | BASELINE | |
| 8030 | 0.35 | 2.94% | 2.94% |
| 8031 | 0.36 | 2.86% | 5.88% |
| 8032 | 0.35 | -2.78% | 2.94% |
| 8033 | 0.39 | 11.43% | 14.71% |
| 8034 | 0.37 | -5.13% | 8.82% |
+---------+--------------------------------------------------------+------------------------+--------------------+
[1]
performance-schema-consumer-global-instrumentation='ON'
performance-schema-consumer-thread-instrumentation='ON'
performance-schema-consumer-events-waits-current='ON'
performance-schema-instrument='wait/%=ON'
performance_schema='1'
[2]
Same as [1] except events_waits_current is disabled
update performance_schema.setup_consumers set enabled='no' where name='events_statements_current';
How to repeat:
1. Load this stored proc
```
drop procedure if exists test.loop_test;
delimiter //
create procedure test.loop_test(d_d_id int, loop_counter int )
BEGIN
DECLARE d_d_id INTEGER;
DECLARE loop_counter INT;
DECLARE id INT;
SET loop_counter = 1;
WHILE loop_counter <= 50 DO
SET d_d_id = loop_counter;
select d_d_id into id;
COMMIT;
set loop_counter = loop_counter + 1;
END WHILE;
select id;
END//
delimiter ;
```
2. Run the following sysbench lua script on 10 threads against each version/combination
```
$ cat /usr/local/share/sysbench/loop.lua
#!/usr/bin/env sysbench
require("oltp_common")
function prepare_statements()
-- We do not use prepared statements here, but oltp_common.sh expects this
-- function to be defined
end
function event()
local rs = con:query([[CALL loop_test(1,1)]])
while rs ~= nil do
rs = con:next_result()
end
check_reconnect()
end
```
for i in 28 30 31 32 33 34;
do
sysbench /usr/local/share/sysbench/loop.lua \
--db-driver=mysql --mysql-user=msandbox \
--mysql-password=msandbox --mysql-db=test \
--threads=10 --report-interval=10 --time=180 --skip-trx \
--mysql-socket=/tmp/mysql_sandbox80${i}.sock --mysql-port=80${i} \
run;
done
3. Observe differences in througput and latency between minor releases.
Suggested fix:
It seems that these initial perf differences came in with the introduction of the pfs_end_statement_vc in place of pfs_end_statement_v2 around 8029/8030 and seems related to how stats in events waits current are updated(sub_stmt_stat->m*).
In addition, the increased overhead in 8033 onwards seems concening, there is a 42.11% increase in latency from 8032-->8034 and taking all into account a 67.92% increase in latency between 8028 and 8034 for the pfs_on_waits scenario.