Bug #51249 performance regression in mysql 5.1.41
Submitted: 17 Feb 2010 17:16 Modified: 23 Feb 2010 16:43
Reporter: Mark Callaghan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.1.41, mysql-5.5.99 OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any
Tags: 5.1.41, grant, performance, regression

[17 Feb 2010 17:16] Mark Callaghan
Description:
MySQL 5.1.41 is much slower for concurrent queries that fetch by primary key than 5.1.40. This problem is independent of the storage engine.

sysbench results in transactions/second for 16 threads:
* 133001 -> 5.0.84 innodb
* 122766 -> 5.1.38 innodb plugin 
* 121403 -> 5.1.40 innodb plugin
*  86433 -> 5.1.41 innodb plugin

* 118948 -> 5.0.84 myisam
* 111926 -> 5.1.38 myisam
* 110277 -> 5.1.40 myisam
*  68656 -> 5.1.41 myisam

How to repeat:
sysbench command line:
../sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 \
--max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable \
--mysql-engine-trx=yes --oltp-read-only --oltp-skip-trx \
--oltp-test-mode=simple --oltp-point-select-all-cols \
--oltp-dist-type=uniform --oltp-range-size=1000 X
where X is prepare, then run

--oltp-point-select-all-cols requires use of http://launchpad.net/sysbench/0.4

client and mysqld run on same server
server is 16 core nehalem -- 8 real cores X 2 HW threads per core

my.cnf:
[mysqld]
plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so;libpbxt.so

pbxt
pbxt_index_cache_size=1000M
pbxt_record_cache_size=1000M

innodb_buffer_pool_size=2000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
max_connections=2000
table_cache=2000
innodb_max_dirty_pages_pct=80

Suggested fix:
When I run poor man's profiler to get stack traces during the benchmark, there is a lot of contention for LOCK_grant as displayed by this stack:
__lll_mutex_lock_wait,pthread_rwlock_rdlock,check_grant,check_table_access,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection,start_thread,clone

I think that part of the problem is this change. One guess is that the comparisons with my_charset_utf8_bin are slower than with system_charset_info, but that is not the only cause for the performance difference.

revno: 1810.3961.1
committer: Sergey Glukhov <Sergey.Glukhov@sun.com>
branch nick: mysql-5.0-bugteam
timestamp: Tue 2009-10-27 12:09:19 +0400
message:
  Bug#41049 does syntax "grant" case insensitive?
  Problem 1:
  column_priv_hash uses utf8_general_ci collation
  for the key comparison. The key consists of user name,
  db name and table name. Thus user with privileges on table t1
  is able to perform the same operation on T1
  (the similar situation with user name & db name, see acl_cache).
  So collation which is used for column_priv_hash and acl_cache
  should be case sensitive.
  The fix:
  replace system_charset_info with my_charset_utf8_bin for
  column_priv_hash and acl_cache
  Problem 2:
  The same situation with proc_priv_hash, func_priv_hash,
  the only difference is that Routine name is case insensitive.
  So the fix is to use my_charset_utf8_bin for
  proc_priv_hash & func_priv_hash and convert routine name into lower
  case before writing the element into the hash and
  before looking up the key.
  Additional fix: mysql.procs_priv Routine_name field collation
  is changed to utf8_general_ci.
  It's necessary for REVOKE command
  (to find a field by routine hash element values).
  Note: 
  It's safe for lower-case-table-names mode too because
  db name & table name are converted into lower case
  (see GRANT_NAME::GRANT_NAME).
[18 Feb 2010 7:53] Giuseppe Maxia
probably related to Bug#51272
[18 Feb 2010 11:00] Sergei Glukhov
Hi,

I checked 5.1.[40|41|45] trees on 32bit OS using Bug#51249 test,
there is no considerable performance degradation:

5.1.40: transactions: 539498 (8991.60 per sec.)
5.1.41: transactions: 535800 (8929.97 per sec.)
5.1.45: transactions: 537302 (8954.99 per sec.)

I suppose you use 64bit OS for testing, is it so?
[18 Feb 2010 12:04] Alexey Stroganov
Hi Mark, 

I've run your test on box with 8 cores for 5.1.39/5.1.40/5.1.41 and don't see 
notable drops, just some slight variance:

#
# Test: POINT_SELECT_ALL_COLUMNS:throughput, TPS
#
# Server 1 - 5.1.39
# Server 2 - 5.1.40
# Server 3 - 5.1.41
#
#             Server 1  Server 2  Server 3
# Thread        INNODB    INNODB    INNODB
          8   53250.08  53042.33  52298.77
         16   64194.03  64244.51  63137.66
#
#             Server 1  Server 2  Server 3
# Thread        MYISAM    MYISAM    MYISAM
          8   81815.43  82577.68  81300.79
         16   81004.77  79801.87  79776.50
[18 Feb 2010 12:54] Alexey Stroganov
just to clarify - innodb results posted above are for innodb_plugin in these releases.
[18 Feb 2010 15:05] Mark Callaghan
Note with these results for the 8-core server, there is another performance regression between 5.1.38 and 5.1.40. It is independent of storage engine (reproduces for myisam, innodb plugin). And it reproduces on an 8 core server. I filed http://bugs.mysql.com/bug.php?id=51285 for that.

@Sergey Gluhov - I think your results are for 1 thread. The problem doesn't appear until 16 threads (--num-threads=... option for sysbench)

@Alexey Stroganov - I too cannot repeat this on an 8 core server. It requires a 16 core server. My test server uses Nehalem with 8 real cores X 2 HW threads per core. It reports 16 CPUs when hyperthreading (HT) is on and 8 CPUs when off. The results earlier in this report are from 16 CPUs with HT on. I repeated the tests with 8 CPUs and HT off.

Below I report results for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 threads. The binaries are:
my5138 - mysql 5.1.38 (innodb plugin 1.0.4)
my5140 - mysql 5.1.40 (innodb plugin 1.0.4)
my5141 - mysql 5.1.41 (innodb plugin 1.0.5)

Innodb plugin for 16 CPUs and HT on:
  1     2     4     8      16     32     64    128   256   512   1024 
7380 14248 29527 55302 122766  120263 118870 108139 47091 30571 18414 my5138
7835 15138 29203 52240  86433   72591  70742  57496 39253 25957 16216 my5141

Innodb plugin for 8 CPUs and HT off:
regression after 5.1.38 starts at 128 threads
  1     2     4     8      16     32     64    128   256   512   1024
8835 17482 38935 85122  84690   84111  82990  81326 75501 45811 21732 my5138
8879 18405 40494 85424  83698   81055  78629  52533 32070 19957 12040 my5140
8534 18023 38781 85857  83245   81657  78754  52102 32207 19919 12431 my5141

MyISAM for 16 CPUs with HT on:
regression in 5.1.41 starts at 16 threads
  1     2     4     8      16     32     64    128   256   512   1024
7213 15684 29775 53251 111926  106993 111754 105556 46470 29786 18149 my5138
7310 15117 28828 51758  68656   67561  67372  56958 39430 25937 15598 my5141

MyISAM for 8 CPUs with HT off:
regression after 5.1.38 is obvious at 128 threads
  1     2     4     8      16     32     64    128   256   512   1024
8991 16794 39066 80860  81091   81668  80831  78566 72943 35295 20908 my5138
9287 18211 39800 82929  79917   78941  76866  52276 32674 20039 11963 my5140
8426 17116 39631 83269  80263   78765  77597  57462 32483 19887 12060 my5141
[22 Feb 2010 21:14] Mark Callaghan
binaries:
my5138 - 5.1.38 -O2, -fno-omit-frame-pointer --without-fast-mutexes
my5140 - 5.1.40 -O2, -fno-omit-frame-pointer --without-fast-mutexes
my5141 - 5.1.41 -O2, -fno-omit-frame-pointer --without-fast-mutexes
my5144 - 5.1.44 -O2, -fno-omit-frame-pointer --without-fast-mutexes
my5141wfm - 5.1.41 -O2, -fno-omit-frame-pointer --with-fast-mutexes
my5144wfm - 5.1.44 -O2, -fno-omit-frame-pointer --with-fast-mutexes
my5144opt - 5.1.44 --without-fast-mutexes and uses -O3 by default
my5141wfmopt - 5.1.41 --with-fast-mutexes and uses -O3 by default
my5144wfmopt - 5.1.44 --with-fast-mutexes and uses -O3 by default

There are several things to note:
* 5.1.44 is much better than earlier versions at high concurrency without using --with-fast-mutexes
* I cannot reproduce the results for which I opened this bug (5.1.41 was much slower than 5.1.40). It is possible my 5.1.40 build used --with-fast-mutexes

I think this should be closed as 'cannot reproduce'

Transactions per second from sysbench readonly for InnoDB
  1     2     4     8     16     32     64    128    256   512  1024
7371 15213 29522 53386  88257  72475  70591  57203  39411 26126 15687  my5138
7289 13418 30422 52629  84754  72322  70562  58145  39739 26313 16183  my5140
7327 13876 29894 52379  87767  72690  70985  57593  39242 25920 15648  my5141
8179 15308 29546 54528 123425 119408 118610 110951  46720 29748 18787  my5144

8046 14617 29477 53473 124566 121269 118703 113215 106673 43920 21101  my5141wfm
8251 13657 30576 54793 123665 121089 119198 113337 105275 37597 20282  my5144wfm
7801 15252 31119 56074 126246 121729 121149 110764  48259 30618 19094  my5144opt
7822 15569 30599 53254 125769 124286 122390 116264 108702 40026 20944    5144wfmopt

Transactions per second from sysbench readonly for MyISAM
  1     2     4     8     16     32     64    128    256   512  1024
7735 15636 28956 54766  72928  69480  69030  58691  40026 26475 15921  my5138
7520 14597 30001 53577  72465  69377  68638  57898  39529 26387 15976  my5140
7930 14801 29344 53177  72186  69496  69260  57768  39853 26052 15548  my5141
7717 15253 30407 51848 119521 113086 116003 108197  46631 30073 17979  my5144

7297 15032 30033 51760 121001 117144 115557 111824 111108 38925 20782  my5141wfm
7919 15470 29921 55474 121744 116424 114552 111897 109731 37933 22784  my5144wfm
7177 14981 30441 54984 120744 113811 117082 108195  48195 31354 19176  my5144opt
8330 16405 31086 52737 124536 118904 117588 114778 113070 38755 37934         5144wfmopt