MySQL Bugs: #116978: Global schema lock deadlock avoidance

Bug #116978	Global schema lock deadlock avoidance
Submitted:	14 Dec 2024 11:56	Modified:	14 Dec 2024 12:44
Reporter:	Mikael Ronström	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	8.0.34	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When disks are overloaded the CPU is put into IO wait state and very little progress is made. This can lead to situations where we get timeout errors (266) simply because very little progress is made. To raise the bar for getting this error in acquiring the global schema lock, we wait until we have reached 5 such timeouts before we report a deadlock situation.

When releasing the global schema lock we report an error if the release failed due to e.g. a timeout. The only thing the close of the transaction will do is release the lock and this will happen for sure. So no need to fail the metadata transaction due this error, but ok to report a warning. 

How to repeat:
Run ./mtr --suite=ndb_ddl --parallel=12 on a machine with limited disk resources.

Suggested fix:
Introduce a timeout counter in gsl_lock_ext such that at least e.g. 5 timeouts need to happen before timeout is converted into deadlock.

If a timeout happens when releasing global schema lock ignore the error and only report a warning about it, don't return a failure to the upper layers.

Hello Mikael,

Thank you for the report and feedback.

Sincerely,
Umesh