Bug #93365 Query on performance_schema.data_locks causes replication issues
Submitted: 27 Nov 2018 14:39 Modified: 7 May 9:07
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S2 (Serious)
Version:8.0.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: replication

[27 Nov 2018 14:39] Daniël van Eeden
Description:
Situation:
- SQL Thread: A big Delete_rows event
- Some monitoring colletor: SELECT * FROM performance_schema.data_locks

Result: 
Last_SQL_Error: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.

2018-11-27T08:42:08.710441Z 5 [Warning] [MY-010584] [Repl] Slave SQL for channel '': Could not execute Delete_rows event on table foo.bar; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log binlog.208550, end_log_pos 54871005, Error_code: MY-001205

Adding a MAX_EXECUTION_TIME query hint did help a bit, but removing the query on data_locks fixed the issue.

The query on data_locks also seemed to block 'show processlist' amongst others

How to repeat:
Query P_S data_locks every minute
Run a multi gigabyte Delete_rows event

Expected result: Replication continues and P_S query returns
Actual result: Replication retries on HA_ERR_LOCK_WAIT_TIMEOUT and dies after a number of retries.

Suggested fix:
Make P_S data_locks not block other processes.

Note that on 5.7 we used INFORMATION_SCHEMA.INNODB_LOCKS instead. This didn't seem to cause the same issues (even on 8.0.x)
[3 Dec 2018 14:21] Sinisa Milivojevic
Hi,

Thank you for your bug report.

Yes, it is true that querying  P_S on data_locks will set locks that will prevent many operations, including the ones that you have mentioned.

Hence, this is a designed behaviour. However, if you wish, I can promote this bug report into the feature request that will split some of those locks into few more with more limited range of queries that would be affected.

What is your opinion on this proposition ???

Many thanks in advance.
[3 Dec 2018 18:58] Daniël van Eeden
Splitting this might help a bit. However I think the InnoDB specific tables this replaces didn't have this behaviour. Maybe the InnoDB specific tables should be brought back?

I think documentation should be updated with a clear warning about the implications of querying this table.
[6 Dec 2018 13:45] Sinisa Milivojevic
Hi Daniel,

I am in full agreement with you and I am verifying this as a code bug.

After a code bug is resolved, documentation has to be updated as well.

Thank you for your contribution.
[14 Jan 20:52] Sveta Smirnova
Sinisa,

Performance Schema is always advertised as a schema with fewer locks than Information Schema, not as one having more locks. Monitoring query which causes stalls of the useful job makes no sense.
[15 Jan 12:53] Sinisa Milivojevic
I agree that this change in behaviour goes against proclaimed objectives, so I am raising the Severity.
[7 May 9:07] Erlend Dahl
Despite our best efforts, we are unable to reproduce this problem.