Bug #60351 trim same character processes data only byte wise (continue of bug14637 (4))
Submitted: 4 Mar 2011 22:15 Modified: 15 Jan 2013 15:27
Reporter: Linhai Song Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.1.55 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution, performance

[4 Mar 2011 22:15] Linhai Song
Description:
In the bug report mysqlbug14637(http://bugs.mysql.com/bug.php?id=14637), reporter finds that two functions, my_hash_sort_simple() and my_lengthsp_8bit(), trim blank spaces from the end of a string. These two functions parse the string byte wise and are quite slow for longer strings.

The patch for this bug has a string length threshold. If the string length is shorter than the threshold, it will perform byte wise comparison. And if the string length is longer than the threshold, it will perform word wise comparison. After employing this patch, we will not get a worse performance, and in some situation, we will get better performance.

In the bug report http://bugs.mysql.com/bug.php?id=60348, I report some trim trailing blank spaces code places, which can use the patch directly. In the bug report http://bugs.mysql.com/bug.php?id=60349, I report some trim beginning blank spaces code places, which can be patched in a similar way. 

There are also some places which trim the same characters from the beginning or the end of a string, and they can be patched in a similar way:

1  /mysql-5.1.55/strings/str2int.c:111
	  while (*src == '0') src++;

===============================================================================
2 /mysql-5.1.55/strings/strtod.c:137
	 
/* Skip pre-zero for easier calculation of overflows */
  while (*str == '0')
  {
    if (++str == end)
      goto done;
    start_of_number= 0;                         /* Found digit */
  }

================================================================================

3 /mysql-5.1.55/strings/ctype-simple.c:1107
	      for(str++ ; str != end && *str == '0' ; str++);

===============================================================================
4 /mysql-5.1.55/sql/field.cc:1992
	  for (; from!=end && *from == '0'; from++) ;	// Read prezeros

===============================================================================
5 /mysql-5.1.55/sql/field.cc:2098

  for (; int_digits_tail_from != frac_digits_from &&
	     *int_digits_tail_from == '0'; int_digits_tail_from++) ;

6  /mysql-5.1.55/sql/field.cc:9012
	  for (; length && !*from; from++, length--) ;         // skip left 0's

===============================================================================
7 /mysql-5.1.55/sql/field.cc:9434
	  for (; length && !*from; from++, length--) ;         // skip left 0's
===============================================================================
8 /mysql-5.1.55/sql/sql_analyse.cc:249
	    for (str++; *(end - 1) == '0'; end--) ; // jump over zeros at the end
===============================================================================
9 /mysql-5.1.55/sql/sql_cache.cc:1433
	    while (sql[i]=='(')
 while (sql[i]=='(')
      i++;

===============================================================================
10 /mysql-5.1.55/server-tools/instance-manager/instance_map.cc:117
	    parse_option(option, option_name, option_value);
 while (*ptr == '-')
    ++ptr;

I have found another two situation: 
The first one is to skip consecutive two characters:

1 /mysql-5.1.55/storage/innobase/row/row0mysql.c:326
					

		while (col_len >= 2 && ptr[col_len - 2] == 0x00
				       && ptr[col_len - 1] == 0x20) {
					col_len -= 2;

===============================================================================

2 /mysql-5.1.55/storage/innodb_plugin/row/row0mysql.c:370
					
            while (col_len >= 2 && ptr[col_len - 2] == 0x00
				       && ptr[col_len - 1] == 0x20) {
					col_len -= 2;

The second one is to compare two string and stop when they are not equal:

/mysql-5.1.55/cmd-line-utils/libedit/refresh.c:483
			

for (o = old, n = new; *o && (*o == *n); o++, n++)
		continue;

And I think these two situations should also be patched.

How to repeat:
code review

Suggested fix:
similar patch to mysqlbug14637 (http://bugs.mysql.com/bug.php?id=14637)
[12 Mar 2011 2:10] Linhai Song
I have done some unit test for the patch of mysqlbug14637, and found that if the number of blank characters is larger than 4, patch version will work better. 

I put my unit test results as follows:

blank characters              patch                   un-patch
       1                      0.016                    0.013
       2                      0.019                    0.017
       3                      0.022                    0.019
       4                      0.02                     0.21
       5                      0.021                    0.024
       6                      0.023                    0.026
       7                      0.028                    0.029
       8                      0.023                    0.03
       9                      0.024                    0.035
       10                     0.028                    0.037
       11                     0.029                    0.04
       12                     0.024                    0.044
       13                     0.026                    0.045
       14                     0.028                    0.048
       15                     0.031                    0.051

code fragments are run 1000000 in my unit test, and time unit is second.
[15 Jan 2013 15:27] Matthew Lord
I think that Shane has identified the underlying issue and created an internal feature request for it (14057034).

Due to this, I will mark this as verified.

Thank you for your helpful reports, Linhai!