Bug #68207 too much contention on "os_mutex"
Submitted: 28 Jan 2013 17:01 Modified: 31 Jan 2013 11:50
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[28 Jan 2013 17:01] Mark Callaghan
Which use of os_mutex_create is the problem when too much contention is reported for "os_mutex"? The callers in my 5.6 branch are:
os/os0file.cc:	os_file_count_mutex = os_mutex_create();
os/os0file.cc:		os_file_seek_mutexes[i] = os_mutex_create();
os/os0file.cc:	array->mutex = os_mutex_create();
os/os0sync.cc:	os_sync_mutex = os_mutex_create();
sync/sync0arr.cc:	arr->os_mutex = os_mutex_create();

See the recent updates to http://bugs.mysql.com/bug.php?id=68079 for an example of this.

How to repeat:
see linked bug, read source to find how "os_mutex" is used by PS
[31 Jan 2013 11:50] Marc ALFF
Hi Mark.

In this case, the performance schema report statistics against
"wait/sync/mutex/innodb/os_mutex", because this is how the instrument was declared to the performance schema.

The problem is that there is common code, namely os_mutex_create(),
used for many distinct purposes.

A much better way to instrument the code would be to pass the mutex key that define what the mutex is used for, as a parameter to os_mutex_create(),
instead of using os_mutex_key for everything.

Note that there have been related cases where common code reported waits using a generic instrument, that had to be changed to be more specific to have better granularity (and visibility, clarity).

For example, the replication code was fixed to use different instruments to differentiate the binary log from the relay log, on a relaying slave.

As such, I consider this not a feature request but an actual bug, in the instrumented code.

Looking at the related issues, personally I am convinced the problem with contention found is related to the sync_array (still investigating this area).

A possible way to validate this, while waiting for the bug fix, is to look at table performance_schema.mutex_instances, to see how many "os_mutex" are actually hot ... and see if that matches the number of sync_array::os_mutexes, which is sync_array_size.

-- Marc
[31 Jan 2013 12:05] Marc ALFF
Query that can help to understand the contention:

mysql> select * from events_waits_summary_by_instance where EVENT_NAME = "wait/synch/mutex/innodb/os_mutex" order by COUNT_STAR desc;