Bug #54982 | trx_create and kernel_mutex | ||
---|---|---|---|
Submitted: | 3 Jul 2010 16:12 | Modified: | 13 Feb 2012 1:46 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S5 (Performance) |
Version: | 5.1.47 | OS: | Any |
Assigned to: | Sunny Bains | CPU Architecture: | Any |
Tags: | innodb, kernel_mutex |
[3 Jul 2010 16:12]
Mark Callaghan
[3 Jul 2010 16:54]
Mark Callaghan
I have the same question about trx_free
[4 Jul 2010 0:31]
Sunny Bains
There is no requirement, I'm trying to get rid of the kernel mutex altogether. In my work in progress code, trx_create() doesn't need to be covered by any mutex. trx_free() does require mutex coverage for trx_print() only. The work is currently stalled because of issues with locking code being the new bottleneck after factoring out several mutexes from the kernel mutex. See my comment for bug# 53825, is it possible to provide some numbers so that we can decided whether to ditch the lock rec bitmap or work around it as Heikki suggests ?
[5 Jul 2010 16:45]
Mark Callaghan
For some of our workloads a big problem is read_view_open_now which does the following while kernel_mutex is locked: 1) allocate (8 * #open_transactions) bytes 2) allocate sizeof(read_view_t) bytes 3) iterate over open transactions to populate array from 1 I don't know what you can do about 3) but I think you can do things to avoid 1) and 2) including extending a struct to have space for 2) and possible for 1)
[5 Jul 2010 16:46]
Mark Callaghan
I pinged Harrison about 53825
[5 Jul 2010 17:08]
Mark Callaghan
See http://bugs.mysql.com/bug.php?id=49169 for more on read_view_open_now
[6 Jul 2010 2:57]
Sunny Bains
If I understand your suggestion correctly then this is what you are suggesting is that instead of what we have now: read_view_t* view; view = mem_heap_alloc(heap, sizeof(read_view_t)); view->n_trx_ids = n; view->trx_ids = mem_heap_alloc(heap, n * sizeof(dulint)); We do this: view = mem_heap_alloc(heap, sizeof(*view) + n * sizeof(dulint)); And view ->trx_ids then becomes (dulint*) (view+1). I think that is a reasonable suggestion.
[6 Jul 2010 17:59]
Mark Callaghan
I am trying something different. I first tried having read_view_create_low callers optionally provide the memory for read_view_t and trx_ids. But I thought that was too messy. Instead of that I am changing the allocation of global_read_view_heap to be large enough so that the allocations in read_view_create_low do not need to call malloc.
[7 Jul 2010 0:42]
Sunny Bains
Why not then simply have a pool of (sizeof(dulint*) * SOME_MAX_VALUE) array hanging of trx_sys_t ?
[7 Jul 2010 3:33]
Mark Callaghan
It gets messy when you try to pass that in. There are two callers for read_view_create_low -- one to create the global read view and the other to create it for a cursor. The global read view can use it, the cursor cannot. Also, how big is big enough for a static allocation. My servers might have many open transactions at times (100 to 1000). And dulint is 16 bytes on x86-64 servers so the array can be 16kb (16 * 1000).
[7 Jul 2010 4:26]
Sunny Bains
On a 64 bit server: 16 * 2048 = 32K * 2048 instances = 64M It is a quick and easy way to test what if any benefit there is in the change to memory allocation. The pool should give you O(1) alloc/free behaviour and I can't see how it can get any faster than that. ie. you then have a lower bound.
[8 Jul 2010 17:41]
Mark Callaghan
I don't have good results yet. It may be that most of the overhead is the loop in read_view_open_now. I need to profile that function and determine whether trx_id_t should be changed from dulint to ib_uint64_t.
[9 Jul 2010 0:29]
Sunny Bains
If you find that changing it from dulint to ib_uint64_t is what you need then we already have a patch that has been checked in for 5.6. The patch also changes red_view_sees_trx_id() from a linear search to a binary search. This latter change reduces the overhead of the function considerably.
[9 Jul 2010 6:32]
Sunny Bains
I found an old patch where I tried to speed up the creation of the trx_ids array using loop unrolling :-). You could try the same.