Bug #77886 User locks slow down after some time in heavy contension
Submitted: 30 Jul 2015 13:05 Modified: 30 Aug 2015 15:08
Reporter: Ben Clewett Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Locking Severity:S4 (Feature request)
Version:5.6.24 OS:Any (5.6.24-ndb-7.4.6-cluster-gpl-log)
Assigned to: CPU Architecture:Any

[30 Jul 2015 13:05] Ben Clewett
Description:
Sorry this is a bit vague.  But its been consistent over several recent versions.  Nobody else seems to have reported this, so I thought I should.

In a production system we have 30 or so MySQL connections trying to get and release a named lock, in order to implement a site-wide mutex.  After some time this procedure will slow to such an extent that our system becomes unusable.

This can be fixed by the unrelated OPTIMIZE statement.

I am sorry I cannot tie this down further at this time.

How to repeat:
Have 30 or so connections requesting the same named lock using:

SELECT GET_LOCK('ws_queue_lock',2);
-- Sleep for about 0.1 seconds
SELECT RELEASE_LOCK('ws_queue_lock');
-- Sleep for about 5 seconds

On a continuous loop.

Normally this happens so fast the the PROCESSLIST shows only one or zero GET_LOCK entries.

After some hours/days/weeks: the system will enter a 'slow' mode where locks are granted at greatly delayed rate.  The PROCESSLIST will now show 10 or so connections queueing for a lock.  This makes the system unusable.

A PROCESSLIST entry looks like:

     ID: 618753094
   USER: ***
   HOST: ***
     DB: ***
COMMAND: Query
   TIME: 0
  STATE: User lock
   INFO: SELECT GET_LOCK('ws_queue_lock',2)

Stopping, waiting, and re-starting all connections: will not fix this issue.

However an OPTIMIZE TABLE statement on some random unrelated InnoDB table will fix this issue immediately, and the system will return to normal for another few hours/days/weeks.

(This table is DDL'd by the connections, but otherwise is unrelated, and can be one of several tables.  Possibly any table or any engine.)
[30 Jul 2015 14:01] MySQL Verification Team
Hi!

What you are describing is expected behavior. This is how GET_LOCK() was designed since MySQL 4.0. There is a single mutex lock for all named locks.

However, it would be possible to make a separate mutex for each different name. This would make a feature request.

Unless you have any comments, this is what I would do.
[30 Jul 2015 14:09] Ben Clewett
Thanks for replying to my report, and explaining how the names locks work.  Certainly this might explain something.

However it does not explain the degradation of performance over time.  

My qualitative observation is that the named lock has a 'slow' mode and a 'fast' mode, which significant difference in performance.  Once MySQL is in the 'slow' mode, you cannot break out of it.  Accept by issuing an unrelated OPTIMIZE TABLE statement.  This feels like a bug...
[30 Jul 2015 14:14] MySQL Verification Team
What you describe is exactly the expected behavior. Since there is a single mutex for all user locks, increasing their number will lead to a very large slowdown.
[30 Jul 2015 14:32] Ben Clewett
Thanks again for getting back to me.  I am sorry I am not explaining my self very well.

At all times I have the same number of connections/threads in MySQL, all running at the same rate.  This is constant.

At the start it runs fast.  Then something happens, and MySQL goes into 'slow' mode.  Same number of connections and same rate.  Then I issue some OPTIMIZE TABLE, and MySql jumps back into 'fast' mode.  Until the next onset at some later time.

When it jumps into 'slow' mode, by threads are spending so much time waiting for the lock that they don't actually do much work.
[30 Jul 2015 15:08] MySQL Verification Team
Can you provide the entire, fully repeatable test case, that will prove what you are claiming on each of its runs. Test case should contain all necessary details and must repeat its behavior on every run.
[31 Aug 2015 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".