Bug #38993 Inconsistent behaviour of optimizer in "statistics" phase
Submitted: 24 Aug 2008 19:03 Modified: 26 Dec 2008 11:39
Reporter: Patrick van Bergen Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.26, 5.0.67 OS:Any
Assigned to: CPU Architecture:Any

[24 Aug 2008 19:03] Patrick van Bergen
Description:
In our use of MySQL we come across the following situation: a specific query, that contains many self-joined tables, was performing very well (taking less than 0.1 second) up until recently. The number of records in the tables increased steadily (7.000 and 200.000). Suddenly the query became very slow (45 seconds).

"Show processlist" shows that the optimizer is spending nearly all this time in the state "statistics".

A combination of running "check table" and "optimize table" on the tables involved brought performance back to its original level.

However, a restart of the MySQL server regressed the poor performance level once more.

The situation has been found on MySQL 5.0.23, 5.0.67, and 5.1.26 for Windows (running Vista) and 5.0.67 for Linux (Ubuntu).

respectfully,

Patrick van Bergen

How to repeat:
This situation is repeatable by importing the database that I will upload shortly via FTP (it is, of necessity, larger than 500Kb) and by executing the accompanying sql query.

After importing the database ("pro1"), perform these steps to get poor performance:

- execute the query from within a MySQL client

and perform these steps (Windows Vista) to get good performance:

from the command prompt:
- perform "mysqlcheck -o -u root -p pro1"
from the MySQL client:
- "optimize table mod_lime_int;"
- execute the query
[25 Aug 2008 4:36] Valeriy Kravchuk
Thank you for a problem report. I had found 2 files on our FTP server, both are 393216 bytes in size. Is this correct?
[26 Aug 2008 5:27] Valeriy Kravchuk
I had found your new file, 1324546 bytes in size. Is it correct?
[5 Sep 2008 19:53] Patrick van Bergen
I assume you have all the information needed; if there's anything I can do to help, let me know!
[25 Dec 2008 15:11] Valeriy Kravchuk
Sorry for a delay with this bug report.

I do not see any problem with this query on 5.1.31 in a small virtual machine. Please, check:

openxs@suse:/home2/openxs/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.31-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> explain SELECT SQL_NO_CACHE `aaaaa`.`subject_id` `a`, `nnnnnn`.`subject_id` `b`, `gggggg`.`object` `g`, `hhhhhh`.`object` `h`, `mmmmmm`.`object` `m`, `iiiiii`.`object` `i`, `jjjjjj`.`object` `j`
    -> FROM `mod_lime_varchar` `aaaaa`
    -> INNER JOIN `mod_lime_int` `bbbbbb`  ON  ((`bbbbbb`.`subject_id` = `aaaaa`.`subject_id`) AND (`bbbbbb`.`predicate_id` = 28) AND (`bbbbbb`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `cccccc`  ON  ((`cccccc`.`subject_id` = `aaaaa`.`subject_id`) AND (`cccccc`.`predicate_id` = 29) AND (`cccccc`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `dddddd`  ON  ((`dddddd`.`subject_id` = `aaaaa`.`subject_id`) AND (`dddddd`.`predicate_id` = 30) AND (`dddddd`.`active` = 1))
    -> INNER JOIN (`mod_lime_varchar` `nnnnnn`
    -> LEFT JOIN `mod_lime_int` `eeeeee`  ON  ((`eeeeee`.`subject_id` = `nnnnnn`.`subject_id`) AND (`eeeeee`.`predicate_id` = 17) AND (`eeeeee`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `ffffff`  ON  ((`ffffff`.`subject_id` = `nnnnnn`.`subject_id`) AND (`ffffff`.`predicate_id` = 66) AND (`ffffff`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `gggggg`  ON  ((`gggggg`.`subject_id` = `nnnnnn`.`subject_id`) AND (`gggggg`.`predicate_id` = 64) AND (`gggggg`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `hhhhhh`  ON  ((`hhhhhh`.`subject_id` = `nnnnnn`.`subject_id`) AND (`hhhhhh`.`predicate_id` = 67) AND (`hhhhhh`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `mmmmmm`  ON  ((`mmmmmm`.`subject_id` = `nnnnnn`.`subject_id`) AND (`mmmmmm`.`predicate_id` = 68) AND (`mmmmmm`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `iiiiii`  ON  ((`iiiiii`.`subject_id` = `nnnnnn`.`subject_id`) AND (`iiiiii`.`predicate_id` = 17253) AND (`iiiiii`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `jjjjjj`  ON  ((`jjjjjj`.`subject_id` = `nnnnnn`.`subject_id`) AND (`jjjjjj`.`predicate_id` = 17348) AND (`jjjjjj`.`active` = 1))
    -> ) ON  ((`nnnnnn`.`predicate_id` = 2) AND (`nnnnnn`.`object` = 'def') AND (`nnnnnn`.`active` = 1) AND (`dddddd`.`object` = `nnnnnn`.`subject_id`))
    -> LEFT JOIN `mod_lime_int` `kkkkkk`  ON  ((`kkkkkk`.`subject_id` = `aaaaa`.`subject_id`) AND (`kkkkkk`.`predicate_id` = 94) AND (`kkkkkk`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `llllll`  ON  ((`llllll`.`subject_id` = `aaaaa`.`subject_id`) AND (`llllll`.`predicate_id` = 95) AND (`llllll`.`active` = 1))
    -> WHERE `aaaaa`.`predicate_id` = 2 AND `aaaaa`.`object` = 'abc' AND `aaaaa`.`active` = 1 AND  (( ((`kkkkkk`.`object` IS NULL) OR ( ((`kkkkkk`.`object` < 1218914111) AND (`kkkkkk`.`object` < `llllll`.`object`)) )) ) AND (`ffffff`.`object` = 1))
    -> \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ffffff
         type: ref
possible_keys: subject_id,predicate_id
          key: predicate_id
      key_len: 8
          ref: const,const
         rows: 11
        Extra: Using where
...

*************************** 14. row ***************************
           id: 1
  select_type: SIMPLE
        table: llllll
         type: ref
possible_keys: subject_id,predicate_id
          key: subject_id
      key_len: 8
          ref: test.bbbbbb.subject_id,const
         rows: 1
        Extra: Using where
14 rows in set (0.11 sec)

mysql> SELECT SQL_NO_CACHE `aaaaa`.`subject_id` `a`, `nnnnnn`.`subject_id` `b`, `gggggg`.`object` `g`, `hhhhhh`.`object` `h`, `mmmmmm`.`object` `m`, `iiiiii`.`object` `i`, `jjjjjj`.`object` `j`
    -> FROM `mod_lime_varchar` `aaaaa`
    -> INNER JOIN `mod_lime_int` `bbbbbb`  ON  ((`bbbbbb`.`subject_id` = `aaaaa`.`subject_id`) AND (`bbbbbb`.`predicate_id` = 28) AND (`bbbbbb`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `cccccc`  ON  ((`cccccc`.`subject_id` = `aaaaa`.`subject_id`) AND (`cccccc`.`predicate_id` = 29) AND (`cccccc`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `dddddd`  ON  ((`dddddd`.`subject_id` = `aaaaa`.`subject_id`) AND (`dddddd`.`predicate_id` = 30) AND (`dddddd`.`active` = 1))
    -> INNER JOIN (`mod_lime_varchar` `nnnnnn`
    -> LEFT JOIN `mod_lime_int` `eeeeee`  ON  ((`eeeeee`.`subject_id` = `nnnnnn`.`subject_id`) AND (`eeeeee`.`predicate_id` = 17) AND (`eeeeee`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `ffffff`  ON  ((`ffffff`.`subject_id` = `nnnnnn`.`subject_id`) AND (`ffffff`.`predicate_id` = 66) AND (`ffffff`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `gggggg`  ON  ((`gggggg`.`subject_id` = `nnnnnn`.`subject_id`) AND (`gggggg`.`predicate_id` = 64) AND (`gggggg`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `hhhhhh`  ON  ((`hhhhhh`.`subject_id` = `nnnnnn`.`subject_id`) AND (`hhhhhh`.`predicate_id` = 67) AND (`hhhhhh`.`active` = 1))
    -> INNER JOIN `mod_lime_int` `mmmmmm`  ON  ((`mmmmmm`.`subject_id` = `nnnnnn`.`subject_id`) AND (`mmmmmm`.`predicate_id` = 68) AND (`mmmmmm`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `iiiiii`  ON  ((`iiiiii`.`subject_id` = `nnnnnn`.`subject_id`) AND (`iiiiii`.`predicate_id` = 17253) AND (`iiiiii`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `jjjjjj`  ON  ((`jjjjjj`.`subject_id` = `nnnnnn`.`subject_id`) AND (`jjjjjj`.`predicate_id` = 17348) AND (`jjjjjj`.`active` = 1))
    -> ) ON  ((`nnnnnn`.`predicate_id` = 2) AND (`nnnnnn`.`object` = 'def') AND (`nnnnnn`.`active` = 1) AND (`dddddd`.`object` = `nnnnnn`.`subject_id`))
    -> LEFT JOIN `mod_lime_int` `kkkkkk`  ON  ((`kkkkkk`.`subject_id` = `aaaaa`.`subject_id`) AND (`kkkkkk`.`predicate_id` = 94) AND (`kkkkkk`.`active` = 1))
    -> LEFT JOIN `mod_lime_int` `llllll`  ON  ((`llllll`.`subject_id` = `aaaaa`.`subject_id`) AND (`llllll`.`predicate_id` = 95) AND (`llllll`.`active` = 1))
    -> WHERE `aaaaa`.`predicate_id` = 2 AND `aaaaa`.`object` = 'abc' AND `aaaaa`.`active` = 1 AND  (( ((`kkkkkk`.`object` IS NULL) OR ( ((`kkkkkk`.`object` < 1218914111) AND (`kkkkkk`.`object` < `llllll`.`object`)) )) ) AND (`ffffff`.`object` = 1))
    -> ;
+-------+-------+-----+---+---+------+------+
| a     | b     | g   | h | m | i    | j    |
+-------+-------+-----+---+---+------+------+
| 13887 |    63 | 201 | 1 | 2 |    0 |   12 |
...
| 17366 | 17252 | 201 | 3 | 2 |    0 | NULL |
+-------+-------+-----+---+---+------+------+
314 rows in set (1.02 sec)

Same pland and result with 5.0.76. Note also that in case of many UPDATEs and DELETEs you have to do OPTIMIZE TABLE periodically for optimal performance.
[26 Dec 2008 11:14] Patrick van Bergen
Hello Valeriy,

I just imported the test database once again and ran the testquery, like you did, a few times. Query times were like before: about 45 seconds. I then performed ANALYZE TABLE on both tables involved. This immediately brought down the query time to 0.03 seconds(!)

Since the bugreport I learned that the statistics that MySQL collects on tables can be inefficient. These statistics are not only renewed by performing OPTIMIZE TABLE, but also on SHOW TABLE STATUS and SHOW INDEX. MySQL 5.1.17 will even have a property to suppress this statistics renewal (http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#option_mysqld_innodb_stats_o...).

10 minutes later I ran OPTIMIZE TABLE on both tables again, and performance was back to 45 seconds(!)

Could you try this as well, running OPTIMIZE TABLE followed by performing the query, a few times, in your test-environment?

Thanks for taking this time. The issue is still very important to us.
[26 Dec 2008 11:39] Patrick van Bergen
In my previous post I used "ANALYZE TABLE" once. This should say "OPTIMIZE TABLE".