| Bug #14638 | ptr_compare processes data only byte wise | ||
|---|---|---|---|
| Submitted: | 4 Nov 2005 13:10 | Modified: | 12 Dec 2009 15:41 |
| Reporter: | Gunnar von Boehn | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: General | Severity: | S5 (Performance) |
| Version: | 5.0-BK, 5.1.43-bzr | OS: | Any (any) |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[4 Nov 2005 13:28]
Valeriy Kravchuk
In current 5.0-BK sources (ChangeSet@1.1957.1.18, 2005-11-03 20:29:21+02:00, jani@ua141d10.elisa.omakaista.fi) one may find the following in mysys/ptr_cmp.c (NOT ptr_compare.c):
static int ptr_compare(uint *compare_length, uchar **a, uchar **b)
{
reg3 int length= *compare_length;
reg1 uchar *first,*last;
first= *a; last= *b;
while (--length)
{
if (*first++ != *last++)
return (int) first[-1] - (int) last[-1];
}
return (int) first[0] - (int) last[0];
}
If it can be optimized for some plafroms - great! Let it be...
[20 Feb 2008 12:36]
Martin Hansson
Benchmark comparing memory comparison performance.
Attachment: bug14638.test (application/octet-stream, text), 787 bytes.
[20 Feb 2008 12:36]
Martin Hansson
I don't believe it is worthwhile hand-optimizing library functions. Especially since gcc versions after 4.1.2 inline their own versions of them. Following is a patch that replaces our hand-turned copy routine with a wrapper for memcmp. However, the attached benchmark gives a slightly worse performance for it. Benchmark run on a 32 bit Pentium Core2 Quad. unfixed: 32.811 memcmp: 36.595 s
[20 Feb 2008 12:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/42642 ChangeSet@1.2594, 2008-02-20 13:44:15+01:00, mhansson@riffraff.(none) +4 -0 Bug#14638: ptr_compare processes data only byte wise The comparison function for memory regions is not the optimal way to compare memory regions on modern architectures. This patch replaces the hand-optimized function with a wrapper for memcmp, which is inlined in gcc versions >= 4.1.2.
[20 Feb 2008 12:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/42643 ChangeSet@1.2594, 2008-02-20 13:51:23+01:00, mhansson@riffraff.(none) +4 -0 Bug#14638: ptr_compare processes data only byte wise The comparison function for memory regions is not the optimal way to compare memory regions on modern architectures. This patch replaces the hand-optimized function with a wrapper for memcmp, which is inlined in gcc versions >= 4.1.2.
[19 Jun 2008 8:56]
Martin Hansson
It is the general consensus that we should not try and implement our own memcmp's. It is not a current priority.
[12 Dec 2009 15:41]
Valeriy Kravchuk
In current 5.1.43 from bzr ptr_compare still looks the same for all platforms:
static int ptr_compare(size_t *compare_length, uchar **a, uchar **b)
{
reg3 int length= *compare_length;
reg1 uchar *first,*last;
first= *a; last= *b;
while (--length)
{
if (*first++ != *last++)
return (int) first[-1] - (int) last[-1];
}
return (int) first[0] - (int) last[0];
}

Description: in mysys/ptr_compare.c are several compare functions which are used by our sorting to order rows (ORDER BY) These compare functions process the data(string) only byte wise. Platforms which allow unaligned 32bit access could compare longer strings much faster if we would change the function to work on 32bit words. Example of current 8 bit compare static int ptr_compare(uint *compare_length, uchar **a, uchar **b){ reg3 int length= *compare_length; reg1 uchar *first,*last; first= *a; last= *b; while (--length){ if (*first++ != *last++) return (int) first[-1] - (int) last[-1]; } return (int) first[0] - (int) last[0]; } For possible speedup see below How to repeat: na Suggested fix: Big endian architectures which support unaligned 32bit word access could very easely speed up the compare. Here is a asm example which is up to 5 times faster than our current function. It handles every case and can be used as dropin replacement for every ptr_compare_x function. asm( ".text \n" ".align 4 \n" ".globl ptr_compare_asm \n" "ptr_compare_asm: \n" " \n" " lwz r9,0(r4) \n" // put first string pointer into r9 " lwz r4,0(r5) \n" // second string pointer in r4 " lwz r5,0(r3) \n" // length in r5 " \n" " rlwinm 0,5,0,0,29 \n" // r0 = len & ~3 " subf r5,r0,r5 \n" // r5 = len - r0 " rlwinm r0,0,30,2,31 \n" // r0 = r0 >> 2 " \n" " li r6,0 \n" // offset, see cmp_tail_loop " cmpli cr0,r0,0 \n" " beq- cr0,cmp_tail_shortcut \n" " \n" // main compare loop " addi r9,r9,-4 \n" " addi r4,r4,-4 \n" " mtctr r0 \n" // move number of words to CTR "cmp_loop: \n" " lwzu r7,4(r9) \n" // load u32 from first string " lwzu r8,4(r4) \n" // load u32 from second string " cmplw cr0,r7,r8 \n" // compare the two u32s, put result in CR0 " bdnzt eq, cmp_loop \n" // if equal and not at end, loop " beq cr0,cmp_maybe_tail \n" // if equal, needs tail checking " li r3,-1 \n" // return value -1 " bltlr cr0 \n" " li r3,1 \n" // return value 1 " blr \n" // return " \n" "cmp_maybe_tail: \n" " cmpli cr7,r5,0 \n" " li r3,0 \n" // return 0 if no tail " beqlr cr7 \n" " \n" // handle the rest here, if length wasn't a multiple of 4 " li r6,4 \n" // r9/r4 are pointers to the current " \n" // position - 4, so add 4 here "cmp_tail_shortcut: \n" " mtctr r5 \n" // load CTR with tail bytes "cmp_tail_loop: \n" " lbzx r7,r6,r9 \n" // load MEM[r9+r6,1] " lbzx r8,r6,r4 \n" // load MEM[r4+r6,1] " subf. r3,r8,r7 \n" // subtract with CR update " addi r6,r6,1 \n" " bdnzt eq, cmp_tail_loop \n" // if equal and not at end, loop " blr \n" // return Cheers Gunnar