Bug #54982 trx_create and kernel_mutex
Submitted: 3 Jul 2010 16:12 Modified: 13 Feb 2012 1:46
Reporter: Mark Callaghan Email Updates:
Status: Won't fix Impact on me:
Category:MySQL Server: InnoDB Plugin storage engine Severity:S5 (Performance)
Version:5.1.47 OS:Any
Assigned to: Sunny Bains CPU Architecture:Any
Tags: innodb, kernel_mutex
Triage: Triaged: D3 (Medium) / R2 (Low) / E2 (Low)

[3 Jul 2010 16:12] Mark Callaghan
Why does InnoDB lock kernel_mutex before calling trx_create? trx_create calls several memory alloc functions that may call malloc. These are not cheap. kernel_mutex is one the most or second most contended mutex in InnoDB. This increases contention and I don't understand why it must be locked when trx_create is called. I don't see any reading/writing of kernel_mutex protected resources within trx_create.

How to repeat:
read trx_create and the callers of it

Suggested fix:
change trx_allocate_for_mysql to call trx_create before locking kernel_mutex
[3 Jul 2010 16:54] Mark Callaghan
I have the same question about trx_free
[4 Jul 2010 0:31] Sunny Bains
There is no requirement, I'm trying to get rid of the kernel mutex altogether.
In my work in progress code, trx_create() doesn't need to be covered by any
mutex. trx_free() does require mutex coverage for trx_print() only.  The work
is currently stalled because of issues with locking code being the new bottleneck
after factoring out several mutexes from the kernel mutex.

See my comment for bug# 53825, is it possible to provide some numbers so
that we can decided whether to ditch the lock rec bitmap or work around it
as Heikki suggests ?
[5 Jul 2010 16:45] Mark Callaghan
For some of our workloads a big problem is read_view_open_now which does the following while kernel_mutex is locked:
1) allocate (8 * #open_transactions) bytes
2) allocate sizeof(read_view_t) bytes
3) iterate over open transactions to populate array from 1

I don't know what you can do about 3) but I think you can do things to avoid 1) and 2) including extending a struct to have space for 2) and possible for 1)
[5 Jul 2010 16:46] Mark Callaghan
I pinged Harrison about 53825
[5 Jul 2010 17:08] Mark Callaghan
See http://bugs.mysql.com/bug.php?id=49169 for more on read_view_open_now
[6 Jul 2010 2:57] Sunny Bains
If I understand your suggestion correctly then this is what you are suggesting is that
instead of what we have now:

        read_view_t*    view;

        view = mem_heap_alloc(heap, sizeof(read_view_t));

        view->n_trx_ids = n;    
        view->trx_ids = mem_heap_alloc(heap, n * sizeof(dulint));

We do this:

view = mem_heap_alloc(heap, sizeof(*view) + n * sizeof(dulint));

And view ->trx_ids then becomes (dulint*) (view+1).

I think that is a reasonable suggestion.
[6 Jul 2010 17:59] Mark Callaghan
I am trying something different. I first tried having read_view_create_low callers optionally provide the memory for read_view_t and trx_ids. But I thought that was too messy. Instead of that I am changing the allocation of global_read_view_heap to be large enough so that the allocations in read_view_create_low do not need to call malloc.
[7 Jul 2010 0:42] Sunny Bains
Why not then simply have a pool of (sizeof(dulint*) * SOME_MAX_VALUE) array
hanging of trx_sys_t ?
[7 Jul 2010 3:33] Mark Callaghan
It gets messy when you try to pass that in. There are two callers for read_view_create_low -- one to create the global read view and the other to create it for a cursor. The global read view can use it, the cursor cannot.

Also, how big is big enough for a static allocation. My servers might have many open transactions at times (100 to 1000). And dulint is 16 bytes on x86-64 servers so the array can be 16kb (16 * 1000).
[7 Jul 2010 4:26] Sunny Bains
On a 64 bit server:

  16 * 2048  = 32K * 2048 instances = 64M

It is a quick and easy way to test what if any benefit there is in the
change to memory allocation. The pool should give you O(1) alloc/free
behaviour and I can't see how it can get any faster than that. ie. you
then have a lower bound.
[8 Jul 2010 17:41] Mark Callaghan
I don't have good results yet. It may be that most of the overhead is the loop in read_view_open_now. I need to profile that function and determine whether trx_id_t should be changed from dulint to ib_uint64_t.
[9 Jul 2010 0:29] Sunny Bains
If you find that changing it from dulint to ib_uint64_t is what you need then
we already have a patch that has been checked in for 5.6. The patch also
changes red_view_sees_trx_id() from a linear search to a binary search. This
latter change reduces the overhead of the function considerably.
[9 Jul 2010 6:32] Sunny Bains
I found an old patch where I tried to speed up the creation of the
trx_ids array using loop unrolling :-). You could try the same.