Bug #35850 Performance regression in 5.1.23/5.1.24
Submitted: 5 Apr 2008 4:09 Modified: 20 Jun 2010 22:50
Reporter: Jeb Mershon Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.23 OS:Any (Tested w/ SUSE)
Assigned to: CPU Architecture:Any

[5 Apr 2008 4:09] Jeb Mershon
Description:
FlightStats benchmark results for certain queries (all including subqueries) in the 5.1.23 and 5.1.24 releases are showing performance regression when compared with 5.0 releases. The 3 queries identified all take approximately 50% longer to return in comparison to 5.0.58. This regression appears with InnoDB, but not with MyISAM.

Packages used in testing:
 mysql-enterprise-5.0.58-linux-x86_64-glibc23.tar.gz
 mysql-5.1.24-rc-linux-x86_64-glibc23.tar.gz

How to repeat:
First query -  172.35s under 5.0.58, 243.45s under 5.1.24
select count(*) 
from FlightStats_test.ontime_all 
where origin in 
  (select code from FlightStats_test.airports where state = 'TX' and major = 'Y')
and destination not in 
  (select code from FlightStats_test.airports where state = 'TX' and major = 'Y');

5.0.58 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ontime_all
         type: index
possible_keys: NULL
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: DEPENDENT SUBQUERY
        table: airports
         type: index_subquery
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: airports
         type: index_subquery
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

5.1.24 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ontime_all
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 37568853
        Extra: Using where
*************************** 2. row ***************************
           id: 3
  select_type: DEPENDENT SUBQUERY
        table: airports
         type: index_subquery
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: airports
         type: index_subquery
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
3 rows in set (0.04 sec)

Second query -  189.43s under 5.0.58, 280.19s under 5.1.24
select count(*) 
from FlightStats_test.ontime_all ot
where exists 
  (select 1 from FlightStats_test.airports a 
   where state = 'TX' and major = 'Y' and 
         a.code = ot.origin)

5.0.58 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: NULL
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

5.1.24 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 37568853
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

Third Query - 184.41s under 5.0.58, 277.72s under 5.1.24
select count(*) from
  FlightStats_test.ontime_all ot inner join
  FlightStats_test.airports a on a.code = ot.origin
where
  not exists 
    (select 1 from FlightStats_test.airports b 
     where (state != 'TX' or (state = 'TX' and major = 'N')) and 
            b.code = ot.origin)

5.0.58 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: idx_origin
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ref
possible_keys: idx_code
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

5.1.24 EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: idx_origin
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 37568853
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ref
possible_keys: idx_code
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

Note the similar changes in all 3 explains when moving from 5.0.58 to 5.1.24. For the ontime_all table (ot alias), the plan goes from 'Using where; Using index' to 'Using where'.

Table sizes for the affected queries:
ontime_all - 37,568,583 rows
airports - 19,793 rows

Because of the size of the dataset, it will probably be easiest to look at this on the QA system where this was identified. A trimmed down version of the test script has been created to run through these queries under both versions.

To run the minimal set of tests (this will probably take 30 min or so to complete):

Connect to dl360-g5-b as qauser.
cd to ~/flightstats
execute: runsubqry.sh

This script copies a base dataset and then executes the 3 queries in the same manner as the FlightStats tests. Note that there is a fourth query that starts each run to populate cache. Without this, the first problematic query doesn't give the same results. Currently, the script is set up to run both 5.1.24 and 5.0.58 with InnoDB. The results are placed in subdirectories under ~/flightstats/output

To view the summarized timing results together, run:
 perl subq_results.pl output
[21 Apr 2008 22:16] Sergey Petrunya
Is this issue about subqueries at all? All the posted EXPLAIN pairs differ in the same way, and the difference is in how the first table is accessed: 

 
-5.0.58 EXPLAIN output:
+5.1.24 EXPLAIN output:
 *************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: ot
          type: index
 possible_keys: NULL
-          key: idx_origin
-      key_len: 6
+          key: PRIMARY
+      key_len: 4
           ref: NULL
          rows: 37568853
-        Extra: Using where; Using index
+        Extra: Using where
 *************************** 2. row ***************************
[21 Apr 2008 22:42] Sergey Petrunya
We need to run this query

select count(*) from FlightStats_test.ontime_all where origin = destination;

and see if there is any EXPLAIN and/or speed difference betwen 5.0.58 and
5.1.24
[22 Apr 2008 18:10] Jeb Mershon
As Sergey suspected, it looks like this is not a subquery issue.

Here is the requested information:

-- 5.0.58 Cold
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (3 min 26.98 sec)

-- 5.0.58 Warm
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (11.15 sec)

-- 5.0.58 
mysql> explain select count(*) from FlightStats_test.ontime_all where origin = destination \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_all
         type: index
possible_keys: NULL
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
1 row in set (0.00 sec)

-- 5.1.24 Cold
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (1 min 8.11 sec)

-- 5.1.24 Warm
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (1 min 13.88 sec)

-- 5.1.24 
mysql> explain select count(*) from FlightStats_test.ontime_all where origin = destination \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_all
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 37568853
        Extra: Using where
1 row in set (0.01 sec)

The timings noted earlier were the taken after running several queries in sequence. Here are cold and warm timings for the 'exists' query:

-- 5.0.58 Cold
mysql> select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (6 min 53.10 sec)

-- 5.0.58 Warm
mysql> select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (3 min 7.43 sec)

-- 5.1.24 Cold
mysql> select count(*) from FlightStats_test.ontime_all ot
    -> where
    -> exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (4 min 36.67 sec)

-- 5.1.24 Warm
mysql> select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (4 min 37.03 sec)

It appears that 5.1.24 gives a very consistent timing whether warm or cold. 5.0.58 is returning a slower cold result (as compared to 5.1.24), but a faster warm result. It was this discrepancy between 5.0.58 and 5.1.24 warm timings that was noted.
[23 Apr 2008 3:19] Sergey Petrunya
Jeb, thanks for the info. I was able to repeat (different plan choice between 5.0 and 5.1, didn't benchmark which is faster) on first 1M rows from ontime_all.dump file, with the subquery-free query:

explain select count(*) from ontime_all where origin = destination;
[23 Apr 2008 3:25] Sergey Petrunya
Summarizing the previous post: for the query

explain select count(*) from ontime_all where origin = destination;

We get: 

5.0.58: full index scan on index idx_origin:
Cold - 3 min 26.98 sec
Warm - 11.15 sec

5.1.24: full index scan on index PRIMARY:
Cold 1 min 8.11 sec
Warm 1 min 13.88 sec
[23 Apr 2008 3:31] Sergey Petrunya
The change seems to have been introduced by fix for BUG#26447.

Look what we got from Heikki on that bug:

> [19 Feb 2007 14:57] Heikki Tuuri
>
> Hi!
> 
> This is a docs bug, an SQL optimizer bug, and an SQL interpreter bug.
>
> The fact that SELECT * FROM t does not use the PRIMARY KEY (= clustered
> index)to retrieve the rows might be considered an optimizer bug. Using the
> clustered index for a table scan is always faster than using a secondary 
> index.

We took this advice and put it as a rule-based optimization into 5.1's code. 
The rule is actually not crucial, so we could remove it.

Perhaps we could discuss this with Heikki also.
[23 Apr 2008 4:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45858

ChangeSet@1.2580, 2008-04-23 08:35:21+04:00, sergefp@mysql.com +1 -0
  BUG#35850: Subquery (actually table access) performance regression in 5.1.23/5.1.24:
  - Disable the rule introduced by BUG#35850.
  This is an experimental patch
[23 Apr 2008 4:37] Sergey Petrunya
Jeb, could you please try re-running selects and subquery queries on 5.1 with the above patch applied?
[24 Apr 2008 5:33] Jeb Mershon
After applying the patch, results look more like those returned by 5.0.58:

-- 5.1.24 Cold
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (3 min 24.54 sec)

-- 5.1.24 Warm
mysql> select count(*) from FlightStats_test.ontime_all where origin = destination;
+----------+
| count(*) |
+----------+
|     6262 | 
+----------+
1 row in set (10.28 sec)

-- 5.1.24 
mysql> explain select count(*) from FlightStats_test.ontime_all where origin = destination \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_all
         type: index
possible_keys: NULL
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
1 row in set (0.00 sec)

-- 5.1.24 Cold
mysql> select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (6 min 55.79 sec)

-- 5.1.24 Warm
mysql> select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (3 min 11.58 sec)

-- 5.1.24
mysql> explain select count(*) from FlightStats_test.ontime_all ot where exists (select 1 from FlightStats_test.airports a where state = 'TX' and major = 'Y' and a.code = ot.origin) \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: NULL
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

-- 5.1.24 Cold
mysql> select count(*) from FlightStats_test.ontime_all ot inner join FlightStats_test.airports a on a.code = ot.origin where not exists (select 1 from FlightStats_test.airports a where (state != 'TX' or (state = 'TX' and major = 'N')) and a.code = ot.origin);
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (6 min 51.03 sec)

-- 5.1.24 Warm
+----------+
| count(*) |
+----------+
|  4360697 | 
+----------+
1 row in set (3 min 5.05 sec)

-- 5.1.24
mysql> explain select count(*) from FlightStats_test.ontime_all ot inner join FlightStats_test.airports a on a.code = ot.origin where not exists (select 1 from FlightStats_test.airports a where (state != 'TX' or (state = 'TX' and major = 'N')) and a.code = ot.origin) \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: ot
         type: index
possible_keys: idx_origin
          key: idx_origin
      key_len: 6
          ref: NULL
         rows: 37568853
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ref
possible_keys: idx_code
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: a
         type: ref
possible_keys: idx_code,idx_state,idx_state_2,idx_major
          key: idx_code
      key_len: 4
          ref: FlightStats_test.ot.origin
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)
[5 May 2008 17:23] Sergey Petrunya
Changed synopsis to more accurately reflect the nature of the bug
[7 May 2008 5:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46438

ChangeSet@1.2613, 2008-05-07 09:58:21+04:00, sergefp@mysql.com +5 -0
  BUG#35850 "Performance regression in 5.1.23/5.1.24"
  - Disable the "prefer full scan on clustered primary key over full scan
    of any secondary key" rule introduced by BUG#35850.
  - Update test results accordingly 
  (bk trigger: file this for BUG#35850)
[7 May 2008 9:03] Heikki Tuuri
I would recommend leaving 5.1.24 as is. The reason:

In many cases, a disk-based table scan through a secondary index is a lot slower than through the clustered index, because in a secondary index, pages are usually scattered randomly on the disk.

Doing the table scan through the clustered index gives a more predictable performance. I would prefer that.

Regards, 

Heikki
[7 May 2008 9:05] Heikki Tuuri
A question to Jeb: can you add suitable indexes, so that a table scan is not needed?
[14 May 2008 15:21] Bugs System
Pushed into 5.1.25-rc
[22 May 2008 9:50] Bugs System
Pushed into 6.0.6-alpha
[29 May 2008 3:39] Paul DuBois
Noted in 5.1.25, 6.0.6 changelogs.

The "prefer full scan on clustered primary key over full scan of any
secondary key" optimizer rule introduced by Bug#26447 caused a
performance regression for some queries, so it has been disabled.
[26 Jun 2008 16:13] Sergey Petrunya
See also BUG#36259
[5 May 2010 15:11] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 14:09] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[28 May 2010 6:01] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:30] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:58] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[30 May 2010 1:04] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[17 Jun 2010 12:05] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:49] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:32] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)