Bug #60349 trim beginning spaces processes data only byte wise (continue of bug14637 (3))
Submitted: 4 Mar 2011 21:20 Modified: 15 Jan 2013 15:25
Reporter: Linhai Song Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.1.55 OS:Any
Assigned to: CPU Architecture:Any
Tags: performance

[4 Mar 2011 21:20] Linhai Song
Description:
In the bug report mysqlbug14637(http://bugs.mysql.com/bug.php?id=14637), reporter finds that two functions, my_hash_sort_simple() and my_lengthsp_8bit(), trim blank spaces from the end of a string. These two functions parse the string byte wise and are quite slow for longer strings.

The patch for this bug has a string length threshold. If the string length is shorter than the threshold, it will perform byte wise comparison. And if the string length is longer than the threshold, it will perform word wise comparison. After employing this patch, we will not get a worse performance, and in some situation, we will get better performance.

In the bug report http://bugs.mysql.com/bug.php?id=60348, I report some trim trailing blank spaces code places, which can use the patch directly. After that, I check the code again, and find some trim beginning blank spaces code places. I think they should be patched in a similar way. 

I list the places I found as follows: 

1 /mysql-5.1.55/mysys/my_handler.c:68
	        if (piks &&

    for (end= a + a_length-length; a < end ; a++)
    {
      if (*a != ' ')
	return (*a < ' ') ? -swap : swap;
    }

===============================================================================
2 /mysql-5.1.55/mysys/mf_format.c:139
	      while (*++pos == ' ') {};			/* Use old extension */

===============================================================================
3 /mysql-5.1.55/mysys/mf_wfile.c:42
	  
  while (*str == ' ')
    str++;

===============================================================================
4 /mysql-5.1.55/mysys/mf_wfile.c:47
	    while (*++str == ' ') {};

===============================================================================
5 /mysql-5.1.55/mysys/typelib.c:94
	      while (*i == ' ')
     while (*i == ' ')
	i++;					/* skip_end_space */

===============================================================================
6 mysql-5.1.55/mysys/my_handler.c:383
	        for ( ; alength && *a == ' ' ; a++, alength--) ;

===============================================================================
7 /mysql-5.1.55/mysys/my_handler.c:384
	        for ( ; blength && *b == ' ' ; b++, blength--) ;

===============================================================================
8 /mysql-5.1.55/client/mysqladmin.cc:710
	      

while (*status == ' ') status++;	/* to next info */

===============================================================================
9 /mysql-5.1.55/client/mysql.cc:4460
	      while (*status_str == ' ')
  while (*status_str == ' ')
        status_str++;  /* to next info */

===============================================================================
10 /mysql-5.1.55/client/mysqltest.cc:4594
	    while (*p && *p == ' ')
  while (*p && *p == ' ')
      p++;

===============================================================================
11 /mysql-5.1.55/client/mysqltest.cc:1017

      while (*ptr && *ptr == ' ')
        ptr++;

===============================================================================
12 /mysql-5.1.55/storage/innobase/rem/rem0cmp.c:177
			for (; a_length && *a == ' '; a++, a_length--);

===============================================================================
13 /mysql-5.1.55/storage/innodb_plugin/rem/rem0cmp.c:187
			for (; a_length && *a == ' '; a++, a_length--);

===============================================================================
14 /mysql-5.1.55/storage/myisam/mi_search.c:475
		    for ( ; k < k_end && *k == ' '; k++) ;

===============================================================================
15 /mysql-5.1.55/storage/myisam/mi_key.c:259
	

while (pos < end && pos[0] == ' ')
	  pos++;

===============================================================================
16 /mysql-5.1.55/storage/myisam/mi_dynrec.c:1158

  while (pos < end && *pos == ' ')
	    pos++;

===============================================================================
17 /mysql-5.1.55/storage/myisam/mi_dynrec.c:1038

  while (pos < end && *pos == ' ')
	    pos++;

===============================================================================
18 /mysql-5.1.55/storage/myisam/myisampack.c:2586
		  for (pos=start_pos ; pos < end_pos && pos[0] == ' ' ; pos++) ;

===============================================================================
19 /mysql-5.1.55/sql/item_xmlfunc.cc:1359
	  for ( ; beg < end && *beg == ' ' ; beg++) ; // skip leading spaces

==============================================================================
20 /mysql-5.1.55/sql/field.cc:2363
	  for (str=ptr ; *str == ' ' ; str++) ;
==============================================================================
21 /mysql-5.1.55/sql/mysqld.cc:9075
	  while (*pos == ' ') pos++;
==============================================================================
22 /mysql-5.1.55/sql/sql_cache.cc:587
 while (sql[i] == ' ')
    ++i;

How to repeat:
code review

Suggested fix:

static inline const uchar *skip_beginning_space(const uchar *ptr,size_t len)
{
  const uchar *end= ptr + len;

  if (len > 20)
  {
    const uchar *end_words= (const uchar *)(intptr)
      (((ulonglong)(intptr)end) / SIZEOF_INT * SIZEOF_INT);
    const uchar *start_words= (const uchar *)(intptr)
       ((((ulonglong)(intptr)ptr) + SIZEOF_INT - 1) / SIZEOF_INT * SIZEOF_INT);

    DBUG_ASSERT(((ulonglong)(intptr)ptr) >= SIZEOF_INT);
    if (end > start_words)
    {
      while (ptr < start_words && ptr[0] == 0x20)
        ptr++;
      if (ptr[0] == 0x20 && start_words < end_words)
        while (ptr < end_words && ((unsigned *)ptr)[0] == SPACE_INT)
          ptr += SIZEOF_INT;
    }
  }
  while (ptr < end && ptr[0] == 0x20)
    ptr++;
  return (ptr);
}
[12 Mar 2011 2:07] Linhai Song
I have done some unit test for the patch of mysqlbug14637, and found that if the number of blank characters is larger than 4, patch version will work better. 

I put my unit test results as follows:

blank characters              patch                   un-patch
       1                      0.016                    0.013
       2                      0.019                    0.017
       3                      0.022                    0.019
       4                      0.02                     0.21
       5                      0.021                    0.024
       6                      0.023                    0.026
       7                      0.028                    0.029
       8                      0.023                    0.03
       9                      0.024                    0.035
       10                     0.028                    0.037
       11                     0.029                    0.04
       12                     0.024                    0.044
       13                     0.026                    0.045
       14                     0.028                    0.048
       15                     0.031                    0.051

code fragments are run 1000000 in my unit test, and time unit is second.

I think trim beginning spaces must have similar results.
[15 Jan 2013 15:25] Matthew Lord
I think that Shane has identified the underlying issue and created an internal feature request for it (14057034).

Due to this, I will mark this as verified.

Thank you for your helpful reports, Linhai!