Bug #60348 trim trailing spaces processes data only byte wise (continue of bug14637 (2))
Submitted: 4 Mar 2011 20:14 Modified: 15 Jan 2013 15:24
Reporter: Linhai Song Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.1.55 OS:Any
Assigned to: CPU Architecture:Any
Tags: performance

[4 Mar 2011 20:14] Linhai Song
Description:
In the bug report mysqlbug14637(http://bugs.mysql.com/bug.php?id=14637), reporter finds that two functions, my_hash_sort_simple() and my_lengthsp_8bit(), trim blank spaces from the end of a string. These two functions parse the string byte wise and are quite slow for longer strings.

The reporter suggests a word wise way to parse the string, and the patch for this bug exactly does the same thing. Another important thing worth mentioning is that the patch employs a string length threshold, and only when the length of string is longer than 20, the patched function will use word-wise comparison. In my point of view, I think all skip_trailing_space operations should use this patch. After we have done this, we will get at least not worse performance, and in some situation we can get better performance. 

In the bug report of mysqlbug60345 (http://bugs.mysql.com/bug.php?id=60345), I report some missing bug places in mysql-5.1.55 and mysql-6.0.11. Some of these places are fixed in mysql-6.0.11, and some of these places are not fixed in either of these two versions. 

After checking other parts of codes, I find 12 places which can use the patch directly. I believe after we use the patch, we will not get worse performance, and in some situation, we will get better performance. 

1 /mysql-5.1.55/cmd-line-utils/libedit/refresh.c:497
			
	while (ofd < o) {
		if (o[-1] != ' ')
			break;
		o--;
	}

2 /mysql-5.1.55/storage/innobase/row/row0mysql.c:333

                                   while (col_len > 0
				       && ptr[col_len - 1] == 0x20) {
					col_len--;}

3 /mysql-5.1.55/storage/innobase/row/row0mysql.c:372
                                 while (col_len > 0
				       && ptr[col_len - 1] == 0x20) {
					col_len--;}

4 /mysql-5.1.55/storage/innodb_plugin/row/row0mysql.c:377
	                   while (col_len > 0
				       && ptr[col_len - 1] == 0x20) {
					col_len--;}
5 /mysql-5.1.55/storage/innodb_plugin/row/row0mysql.c:416
			
      while (col_len > n_chars && ptr[col_len - 1] == 0x20) {
			col_len--;}
6 /mysql-5.1.55/storage/myisam/mi_key.c:264
  while (end > pos && end[-1] == ' ')
	  end--;

7 /mysql-5.1.55/storage/myisam/mi_dynrec.c:1153
  while (end > record && *(end-1) == ' ')
	    end--;
8 /mysql-5.1.55/storage/myisam/mi_dynrec.c:1033
   while (end > from && *(end-1) == ' ')
	    end--;
9 /mysql-5.1.55/storage/myisam/myisampack.c:2538
		  for (pos=end_pos ; pos > start_pos && pos[-1] == ' ' ; pos--) ;

10 /mysql-5.1.55/sql/field.cc:6904

  while (end > ptr && end[-1] == ' ')
    end--;

11 /mysql-5.1.55/sql/mysqld.cc:9081

  while (end > pos && end[-1] == ' ')
	end--;	

12 /mysql-5.1.55/sql/sql_select.cc:14484

   for (str=copy->str,end= str+copy->length;
               end > str && end[-1] == ' ' ;
               end--) ;

How to repeat:
code review

Suggested fix:
use the patch of mysql14637

static inline const uchar *skip_trailing_space(const uchar *ptr,size_t len)
{
  const uchar *end= ptr + len;

  if (len > 20)
  {
    const uchar *end_words= (const uchar *)(intptr)
      (((ulonglong)(intptr)end) / SIZEOF_INT * SIZEOF_INT);
    const uchar *start_words= (const uchar *)(intptr)
       ((((ulonglong)(intptr)ptr) + SIZEOF_INT - 1) / SIZEOF_INT * SIZEOF_INT);

    DBUG_ASSERT(((ulonglong)(intptr)ptr) >= SIZEOF_INT);
    if (end_words > ptr)
    {
      while (end > end_words && end[-1] == 0x20)
        end--;
      if (end[-1] == 0x20 && start_words < end_words)
        while (end > start_words && ((unsigned *)end)[-1] == SPACE_INT)
          end -= SIZEOF_INT;
    }
  }
  while (end > ptr && end[-1] == 0x20)
    end--;
  return (end);
}
[4 Mar 2011 22:07] Linhai Song
there is another place follow this situation:

13 /mysql-5.1.55/sql/sql_select.cc:14484
	          

 for (str=copy->str,end= str+copy->length;
               end > str && end[-1] == ' ' ;
               end--) ;
[12 Mar 2011 2:05] Linhai Song
I have done some unit test, and found that if the number of blank characters is larger than 4, patch version will work better. 

I put my unit test results as follows:

blank characters              patch                   un-patch
       1                      0.016                    0.013
       2                      0.019                    0.017
       3                      0.022                    0.019
       4                      0.02                     0.21
       5                      0.021                    0.024
       6                      0.023                    0.026
       7                      0.028                    0.029
       8                      0.023                    0.03
       9                      0.024                    0.035
       10                     0.028                    0.037
       11                     0.029                    0.04
       12                     0.024                    0.044
       13                     0.026                    0.045
       14                     0.028                    0.048
       15                     0.031                    0.051

code fragments are run 1000000 in my unit test, and time unit is second.
[15 Jan 2013 15:24] Matthew Lord
I think that Shane has identified the underlying issue and created an internal feature request for it (14057034).

Due to this, I will mark this as verified.

Thank you for your helpful reports, Linhai!