Description:
When disks are overloaded the CPU is put into IO wait state and very little progress is made. This can lead to situations where we get timeout errors (266) simply because very little progress is made. To raise the bar for getting this error in acquiring the global schema lock, we wait until we have reached 5 such timeouts before we report a deadlock situation.
When releasing the global schema lock we report an error if the release failed due to e.g. a timeout. The only thing the close of the transaction will do is release the lock and this will happen for sure. So no need to fail the metadata transaction due this error, but ok to report a warning.
How to repeat:
Run ./mtr --suite=ndb_ddl --parallel=12 on a machine with limited disk resources.
Suggested fix:
Introduce a timeout counter in gsl_lock_ext such that at least e.g. 5 timeouts need to happen before timeout is converted into deadlock.
If a timeout happens when releasing global schema lock ignore the error and only report a warning about it, don't return a failure to the upper layers.