Bug #116978 Global schema lock deadlock avoidance
Submitted: 14 Dec 2024 11:56 Modified: 14 Dec 2024 12:44
Reporter: Mikael Ronström Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.0.34 OS:Any
Assigned to: CPU Architecture:Any

[14 Dec 2024 11:56] Mikael Ronström
Description:
When disks are overloaded the CPU is put into IO wait state and very little progress is made. This can lead to situations where we get timeout errors (266) simply because very little progress is made. To raise the bar for getting this error in acquiring the global schema lock, we wait until we have reached 5 such timeouts before we report a deadlock situation.

When releasing the global schema lock we report an error if the release failed due to e.g. a timeout. The only thing the close of the transaction will do is release the lock and this will happen for sure. So no need to fail the metadata transaction due this error, but ok to report a warning. 

How to repeat:
Run ./mtr --suite=ndb_ddl --parallel=12 on a machine with limited disk resources.

Suggested fix:
Introduce a timeout counter in gsl_lock_ext such that at least e.g. 5 timeouts need to happen before timeout is converted into deadlock.

If a timeout happens when releasing global schema lock ignore the error and only report a warning about it, don't return a failure to the upper layers.
[14 Dec 2024 12:44] MySQL Verification Team
Hello Mikael,

Thank you for the report and feedback.

Sincerely,
Umesh