Bug #14638 ptr_compare processes data only byte wise
Submitted: 4 Nov 2005 13:10 Modified: 12 Dec 2009 15:41
Reporter: Gunnar von Boehn Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: General Severity:S5 (Performance)
Version:5.0-BK, 5.1.43-bzr OS:Any (any)
Assigned to: Assigned Account CPU Architecture:Any

[4 Nov 2005 13:10] Gunnar von Boehn
Description:
in mysys/ptr_compare.c
are several compare functions which are used by our sorting to order rows (ORDER BY)
These compare functions process the data(string) only byte wise.
Platforms which allow unaligned 32bit access could compare longer strings much
faster if we would change the function to work on 32bit words.

Example of current 8 bit compare
static int ptr_compare(uint *compare_length, uchar **a, uchar **b){
  reg3 int length= *compare_length;
  reg1 uchar *first,*last;
  first= *a; last= *b;
  while (--length){
    if (*first++ != *last++)
     return (int) first[-1] - (int) last[-1];
  }
  return (int) first[0] - (int) last[0];
}

For possible speedup see below

How to repeat:
na

Suggested fix:
Big endian architectures which support unaligned 32bit word access 
could very easely speed up the compare.

Here is a asm example which is up to 5 times faster than our current function.

It handles every case and can be used as dropin replacement for every ptr_compare_x function.

asm(
".text                          \n"
".align 4                       \n"
".globl ptr_compare_asm                 \n"
"ptr_compare_asm:                       \n"
"                                  \n"
"       lwz     r9,0(r4)            \n" // put first string pointer into r9
"       lwz     r4,0(r5)            \n" // second string pointer in r4
"       lwz     r5,0(r3)            \n" // length in r5
"                                     \n"
"       rlwinm  0,5,0,0,29               \n" // r0 = len & ~3
"       subf    r5,r0,r5              \n" // r5 = len - r0
"       rlwinm  r0,0,30,2,31            \n" // r0 = r0 >> 2
"                                       \n"
"       li      r6,0                  \n" // offset, see cmp_tail_loop
"       cmpli   cr0,r0,0            \n"
"       beq-    cr0,cmp_tail_shortcut   \n"
"                                       \n"
                // main compare loop
"       addi    r9,r9,-4                \n"
"       addi    r4,r4,-4                \n"
"       mtctr   r0                      \n"     // move number of words to CTR
"cmp_loop:                              \n"
"       lwzu    r7,4(r9)                \n"     // load u32 from first string
"       lwzu    r8,4(r4)                \n"     // load u32 from second string
"       cmplw   cr0,r7,r8               \n"     // compare the two u32s, put result in CR0
"       bdnzt   eq, cmp_loop            \n"     // if equal and not at end, loop
"       beq     cr0,cmp_maybe_tail      \n"     // if equal, needs tail checking
"       li      r3,-1                   \n"     // return value -1
"       bltlr   cr0                     \n"
"       li      r3,1                    \n"     // return value 1
"       blr                             \n"     // return
"                                       \n"
"cmp_maybe_tail:                        \n"
"       cmpli   cr7,r5,0                \n"
"       li      r3,0                    \n"     // return 0 if no tail
"       beqlr   cr7                     \n"
"                                       \n"     // handle the rest here, if length wasn't a multiple of 4
"       li      r6,4                    \n"     // r9/r4 are pointers to the current
"                                       \n"     // position - 4, so add 4 here
"cmp_tail_shortcut:                     \n"
"       mtctr   r5                      \n"     // load CTR with tail bytes
"cmp_tail_loop:                         \n"
"       lbzx    r7,r6,r9                \n"     // load MEM[r9+r6,1]
"       lbzx    r8,r6,r4              \n"     // load MEM[r4+r6,1]
"       subf.   r3,r8,r7             \n"     // subtract with CR update
"       addi    r6,r6,1              \n"
"       bdnzt eq, cmp_tail_loop \n"     // if equal and not at end, loop
"       blr                     \n"     // return

Cheers
Gunnar
[4 Nov 2005 13:28] Valeriy Kravchuk
In current 5.0-BK sources (ChangeSet@1.1957.1.18, 2005-11-03 20:29:21+02:00, jani@ua141d10.elisa.omakaista.fi) one may find the following in mysys/ptr_cmp.c (NOT ptr_compare.c):

static int ptr_compare(uint *compare_length, uchar **a, uchar **b)
{
  reg3 int length= *compare_length;
  reg1 uchar *first,*last;

  first= *a; last= *b;
  while (--length)
  {
    if (*first++ != *last++)
      return (int) first[-1] - (int) last[-1];
  }
  return (int) first[0] - (int) last[0];
}

If it can be optimized for some plafroms - great! Let it be...
[20 Feb 2008 12:36] Martin Hansson
Benchmark comparing memory comparison performance.

Attachment: bug14638.test (application/octet-stream, text), 787 bytes.

[20 Feb 2008 12:36] Martin Hansson
I don't believe it is worthwhile hand-optimizing library functions. Especially since gcc versions after 4.1.2 inline their own versions of them. 

Following is a patch that replaces our hand-turned copy routine with a wrapper for memcmp. However, the attached benchmark gives a slightly worse performance for it.

Benchmark run on a 32 bit Pentium Core2 Quad. 

unfixed: 32.811
memcmp:  36.595 s
[20 Feb 2008 12:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42642

ChangeSet@1.2594, 2008-02-20 13:44:15+01:00, mhansson@riffraff.(none) +4 -0
  Bug#14638: ptr_compare processes data only byte wise
  
  The comparison function for memory regions is not the optimal way to 
  compare memory regions on modern architectures. 
  This patch replaces the hand-optimized function with a wrapper for
  memcmp, which is inlined in gcc versions >= 4.1.2.
[20 Feb 2008 12:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42643

ChangeSet@1.2594, 2008-02-20 13:51:23+01:00, mhansson@riffraff.(none) +4 -0
  Bug#14638: ptr_compare processes data only byte wise
  
  The comparison function for memory regions is not the optimal way to 
  compare memory regions on modern architectures. 
  This patch replaces the hand-optimized function with a wrapper for
  memcmp, which is inlined in gcc versions >= 4.1.2.
[19 Jun 2008 8:56] Martin Hansson
It is the general consensus that we should not try and implement our own memcmp's. It is not a current priority.
[12 Dec 2009 15:41] Valeriy Kravchuk
In current 5.1.43 from bzr ptr_compare still looks the same for all platforms:

static int ptr_compare(size_t *compare_length, uchar **a, uchar **b)
{
  reg3 int length= *compare_length;
  reg1 uchar *first,*last;

  first= *a; last= *b;
  while (--length)
  {
    if (*first++ != *last++)
      return (int) first[-1] - (int) last[-1];
  }
  return (int) first[0] - (int) last[0];
}