Bug #83247 Severe regression in my_strnxfrm() from mysql-5.5 -> 5.6
Submitted: 3 Oct 2016 14:02 Modified: 13 Oct 2016 14:41
Reporter: Ole John Aske Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.33 OS:Any
Assigned to: CPU Architecture:Any

[3 Oct 2016 14:02] Ole John Aske
Description:
There is a severe performance regression in the internal function my_strnxfrm()
from mysql-5.5 to 5.6... and into later versions.

Using the test program below:

In mysql 5.5:

perf stat ./mysqlchars 1000000
mysqlchars
Timing 1000000 iterations
Done

 Performance counter stats for './mysqlchars 1000000':

        271.960839 task-clock                #    0.999 CPUs utilized          
                 7 context-switches          #    0.026 K/sec                  
                 1 cpu-migrations            #    0.004 K/sec                  
               804 page-faults               #    0.003 M/sec                  
       809,536,382 cycles                    #    2.977 GHz                     [50.07%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     1,625,790,615 instructions              #    2.01  insns per cycle         [75.00%]
       323,778,351 branches                  # 1190.533 M/sec                   [75.00%]
         5,582,210 branch-misses             #    1.72% of all branches         [75.00%]

       0.272336814 seconds time elapsed

---- And then 5.6.33 ------

perf stat ./mysqlchars 1000000
mysqlchars
Timing 1000000 iterations
Done

 Performance counter stats for './mysqlchars 1000000':

       2562.233439 task-clock                #    1.000 CPUs utilized          
                13 context-switches          #    0.005 K/sec                  
                 1 cpu-migrations            #    0.000 K/sec                  
               804 page-faults               #    0.314 K/sec                  
     7,635,860,919 cycles                    #    2.980 GHz                     [50.00%]
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
    21,279,965,921 instructions              #    2.79  insns per cycle         [75.02%]
     5,236,207,871 branches                  # 2043.611 M/sec                   [75.02%]
        19,301,342 branch-misses             #    0.37% of all branches         [75.00%]

       2.562713772 seconds time elapsed

That is ~10x slower!

Part of the regression has been identified to be related to a code change 
when appending pad characters (my_strxfrm_pad_nweights_unicode()). Due
to this change the compiler will not any longer do loop unrolling 
when generating code in the pad-loop - That contribute to 2-4x of
the regression (depending on hardware. A patch for this will be 
contributed for ctype-utf8.c xfrm-padding. However similar changes
should be considered for the other ctype-*.c.

However, there remains a ~5x regression due to the strnxfrm for
my_charset_utf8_bin is now being handled by my_strnxfrm_unicode() 
instead of by my_strnxfrm_mb_bin(). 

That change seems to have come from:

commit ffe804129c002821d96a8907b3616abf78be1211
Author: Alexander Barkov <bar@mysql.com>
Date:   Mon Mar 22 13:13:41 2010 +0400

    Back-porting tasks from mysql-6.0:
    
    WL#3664 strnxfrm() changes for prefix keys and NOPAD
    WL#3716 WEIGHT_STRING function
    WL#4013 Unicode german2 collation

It remains to be analyzed whether this was a required change,
or if it happened by accident.

How to repeat:

#include <my_global.h>
#include <my_sys.h>
#include <my_user.h>
#include <m_string.h>
#include <m_ctype.h>
#include <hash.h>
#include <stdarg.h>

int main(int argc, char **argv)
{
  MY_INIT("mysqlchars");

  printf("mysqlchars\n");

  if (argc != 2)
  {
    printf("Bad usage\n");
    exit(-1);
  }

  int numIterations = atoi(argv[1]);
  const int keyCols = 12;
  const int keyColChars = 80;
  const int bytesPerChar = 3;
  const int keyBytes = keyColChars * bytesPerChar;
  const int bufferBytes = keyCols * keyBytes;
  
  const int setKeyCols = 6;

  unsigned char source[bufferBytes];
  unsigned char dest[bufferBytes];
  
  /* Setup source */
  const char* content="PolyFilla27773";
  const int setStrLen = strlen(content);

  int offset = 0;
  for (int k=0; k<setKeyCols; k++)
  {
    memcpy(source + offset, content, setStrLen);
    offset+= keyBytes;
  }

  printf("Timing %u iterations\n", numIterations);
  
  for (int i=0; i < numIterations; i++)
  {
    offset = 0;
    for (int k=0; k < keyCols; k++)
    {
      if (k < setKeyCols)
      {
        my_strnxfrm(&my_charset_utf8_bin, dest+offset, keyBytes, source+offset, setStrLen);
      }
      else
      {
        my_strnxfrm(&my_charset_utf8_bin, dest+offset, keyBytes, source+offset, 0);
      }
      offset+= keyBytes;
    }
  }

  printf("Done\n");

  my_end(0);
  return 0;
}
[13 Oct 2016 14:41] Paul Dubois
Posted by developer:
 
Noted in 8.0.1 changelog.

The my_strnxfrm() function was reimplemented to improve performance 
of character set operations that use it.
[15 Dec 2016 15:06] Paul Dubois
Posted by developer:
 
Revised changelog entry:

The performance of UTF-8 binary collations was improved.