Description:
In the bug report mysqlbug14637(http://bugs.mysql.com/bug.php?id=14637), reporter finds that two functions, my_hash_sort_simple() and my_lengthsp_8bit(), trim blank spaces from the end of a string. These two functions parse the string byte wise and are quite slow for longer strings.
The reporter suggests a word wise way to parse the string, and the patch for this bug exactly does the same thing. Another important thing worth mentioning is that the patch employs a string length threshold, and only when the length of string is longer than 20, the patched function will use word-wise comparison.
I check the patched version 6.0.11-alpha, and find these five places are patched:
mysql-6.0.11-alpha/strings/ctype-simple.c: my_hash_sort_simple function
mysql-6.0.11-alpha/strings/ctype-simple.c: my_lengthsp_8bit function
mysql-6.0.11-alpha/strings/ctype-mb.c: my_hash_sort_mb_bin function
mysql-6.0.11-alpha/strings/ctype-latin1.c: my_hash_sort_latin1_de function
mysql-6.0.11-alpha/strings/ctype-bin.c: my_hash_sort_8bit_bin function
But in file strings/ctype-utf8.c, there is a my_hash_sort_utf8 function:
static void my_hash_sort_utf8(CHARSET_INFO *cs, const uchar *s, size_t slen,
ulong *n1, ulong *n2)
{
my_wc_t wc;
int res;
const uchar *e=s+slen;
MY_UNICASE_INFO **uni_plane= cs->caseinfo;
/*
Remove end space. We have to do this to be able to compare
'A ' and 'A' as identical
*/
while (e > s && e[-1] == ' ')
e--;
while ((s < e) && (res=my_utf8_uni(cs,&wc, (uchar *)s, (uchar*)e))>0 )
{
int plane = (wc>>8) & 0xFF;
wc = uni_plane[plane] ? uni_plane[plane][wc & 0xFF].sort : wc;
n1[0]^= (((n1[0] & 63)+n2[0])*(wc & 0xFF))+ (n1[0] << 8);
n2[0]+=3;
n1[0]^= (((n1[0] & 63)+n2[0])*(wc >> 8))+ (n1[0] << 8);
n2[0]+=3;
s+=res;
}
}
I check the uses of this function, and find that the use of my_hash_sort_utf8 is very similar to other my_hash_sort_xxx function. So the character comparison part in my_hash_sort_utf8 should also be patched.
And these 6 places are not patched in mysql-5.1.55 version, and I feel that they all need to be patched in that version.
How to repeat:
code review
Suggested fix:
I put the patch in mysql-6.0.11-alpha here:
static inline const uchar *skip_trailing_space(const uchar *ptr,size_t len)
{
const uchar *end= ptr + len;
if (len > 20)
{
const uchar *end_words= (const uchar *)(intptr)
(((ulonglong)(intptr)end) / SIZEOF_INT * SIZEOF_INT);
const uchar *start_words= (const uchar *)(intptr)
((((ulonglong)(intptr)ptr) + SIZEOF_INT - 1) / SIZEOF_INT * SIZEOF_INT);
DBUG_ASSERT(((ulonglong)(intptr)ptr) >= SIZEOF_INT);
if (end_words > ptr)
{
while (end > end_words && end[-1] == 0x20)
end--;
if (end[-1] == 0x20 && start_words < end_words)
while (end > start_words && ((unsigned *)end)[-1] == SPACE_INT)
end -= SIZEOF_INT;
}
}
while (end > ptr && end[-1] == 0x20)
end--;
return (end);
}