Bug #48763 Slow Query, sub query + query returning in 88 seconds
Submitted: 13 Nov 2009 14:46 Modified: 21 Mar 2011 4:17
Reporter: Roberto Spadim (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.45 OS:Linux
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: semijoin

[13 Nov 2009 14:46] Roberto Spadim
Description:
Hello, i'm having a problem with a query, my table have 120.000 rows, and in slow query log, i have this line: Rows_examined: 58.663.238, could anyone help me on query optimization or maybe on optimizer error?
i will send query and table on attachments

How to repeat:
create table, insert data
run query:

SELECT SQL_SMALL_RESULT SQL_CACHE  SUM(a.pecas_ant +IF(a.data_est='2009-11-01',0,a.pecas_e-a.pecas_s)) AS pecas_ant,SUM(a.quant_ant +IF(a.data_est='2009-11-01',0,a.quant_e-a.quant_s)) AS quant_ant,SUM(a.pliq_ant  +IF(a.data_est='2009-11-01',0,a.pliq_e-a.pliq_s)) AS pliq_ant,SUM(a.pbruto_ant+IF(a.data_est='2009-11-01',0,a.pbruto_e-a.pbruto_s)) AS pbruto_ant,SUM(a.vliq_ant  +IF(a.data_est='2009-11-01',0,a.vliq_e-a.vliq_s)) AS vliq_ant,SUM(a.vbruto_ant+IF(a.data_est='2009-11-01',0,a.vbruto_e-a.vbruto_s)) AS vbruto_ant,SUM(a.custo_ant +IF(a.data_est='2009-11-01',0,a.custo_e-a.custo_s)) AS custo_ant,SUM(a.pecas_rastr_ant +IF(a.data_est='2009-11-01',0,a.pecas_rastr_e-a.pecas_rastr_s)) AS pecas_rastr_ant,SUM(a.quant_rastr_ant +IF(a.data_est='2009-11-01',0,a.quant_rastr_e-a.quant_rastr_s)) AS quant_rastr_ant,SUM(a.pliq_rastr_ant  +IF(a.data_est='2009-11-01',0,a.pliq_rastr_e-a.pliq_rastr_s)) AS pliq_rastr_ant,SUM(a.pbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.pbruto_rastr_e-a.pbruto_rastr_s)) AS pbruto_rastr_ant,SUM(a.vliq_rastr_ant  +IF(a.data_est='2009-11-01',0,a.vliq_rastr_e-a.vliq_rastr_s)) AS vliq_rastr_ant,SUM(a.vbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.vbruto_rastr_e-a.vbruto_rastr_s)) AS vbruto_rastr_ant,SUM(a.custo_rastr_ant +IF(a.data_est='2009-11-01',0,a.custo_rastr_e-a.custo_rastr_s)) AS custo_rastr_ant FROM est_dia AS a WHERE a.item_id=50 AND a.item_id_red=1514 AND a.unidade_id=1000 AND (a.centro_atividade,a.data_est) IN (SELECT b.centro_atividade,MAX(b.data_est) FROM est_dia AS b FORCE INDEX(est_dia_data) WHERE b.item_id=50 AND b.item_id_red=1514 AND b.unidade_id=1000 AND b.data_est<='2009-11-01' GROUP BY b.centro_atividade) GROUP BY a.item_id,a.item_id_red;
[13 Nov 2009 14:53] Roberto Spadim
create table file is more than 500kb, download here:
http://www.spadim.com.br/tmp.zip

thanks
[13 Nov 2009 15:32] Valeriy Kravchuk
Thank you for the problem report. Please (while I am downloading your test case), send the results of EXPLAIN for your select statement.
[13 Nov 2009 15:52] Valeriy Kravchuk
I had to rename table from your dump... anyway, I see the following EXPLAIN results:

+----+--------------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
| id | select_type        | table | type  | possible_keys                                           | key          | key_len | ref               | rows | Extra                                                     |
+----+--------------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
|  1 | PRIMARY            | a     | ref   | PRIMARY,est_dia,est_dia_data,item_est_dia,item_est_dia2 | est_dia      | 9       | const,const,const | 7632 | Using where                                               | 
|  2 | DEPENDENT SUBQUERY | b     | range | est_dia_data                                            | est_dia_data | 12      | NULL              | 7315 | Using where; Using index; Using temporary; Using filesort | 
+----+--------------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
2 rows in set (0.01 sec)

mysql> select 7632*7315;
+-----------+
| 7632*7315 |
+-----------+
|  55828080 | 
+-----------+
1 row in set (0.00 sec)

So, that optimizer estimations are correct enough, and you indeed will have to read that many rows, because MySQL 5.0 considers your subquery as correlated (dependent), and execute it for each row from the outer query. This problem is fixed in 6.0.x already:

77-52-7-73:6.0-codebase openxs$ bin/mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.14-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> alter table est_dia_bkp_20091113 rename to est_dia;Query OK, 0 rows affected (0.00 sec)mysql> explain SELECT SQL_SMALL_RESULT SQL_CACHE  SUM(a.pecas_ant +IF(a.data_est='2009-11-01',0,a.pecas_e-a.pecas_s)) AS pecas_ant,SUM(a.quant_ant +IF(a.data_est='2009-11-01',0,a.quant_e-a.quant_s)) AS quant_ant,SUM(a.pliq_ant  +IF(a.data_est='2009-11-01',0,a.pliq_e-a.pliq_s)) AS pliq_ant,SUM(a.pbruto_ant+IF(a.data_est='2009-11-01',0,a.pbruto_e-a.pbruto_s)) AS pbruto_ant,SUM(a.vliq_ant  +IF(a.data_est='2009-11-01',0,a.vliq_e-a.vliq_s)) AS vliq_ant,SUM(a.vbruto_ant+IF(a.data_est='2009-11-01',0,a.vbruto_e-a.vbruto_s)) AS vbruto_ant,SUM(a.custo_ant +IF(a.data_est='2009-11-01',0,a.custo_e-a.custo_s)) AS custo_ant,SUM(a.pecas_rastr_ant +IF(a.data_est='2009-11-01',0,a.pecas_rastr_e-a.pecas_rastr_s)) AS pecas_rastr_ant,SUM(a.quant_rastr_ant +IF(a.data_est='2009-11-01',0,a.quant_rastr_e-a.quant_rastr_s)) AS quant_rastr_ant,SUM(a.pliq_rastr_ant  +IF(a.data_est='2009-11-01',0,a.pliq_rastr_e-a.pliq_rastr_s)) AS pliq_rastr_ant,SUM(a.pbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.pbruto_rastr_e-a.pbruto_rastr_s)) AS pbruto_rastr_ant,SUM(a.vliq_rastr_ant  +IF(a.data_est='2009-11-01',0,a.vliq_rastr_e-a.vliq_rastr_s)) AS vliq_rastr_ant,SUM(a.vbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.vbruto_rastr_e-a.vbruto_rastr_s)) AS vbruto_rastr_ant,SUM(a.custo_rastr_ant +IF(a.data_est='2009-11-01',0,a.custo_rastr_e-a.custo_rastr_s)) AS custo_rastr_ant FROM est_dia AS a WHERE a.item_id=50 AND a.item_id_red=1514 AND a.unidade_id=1000 AND (a.centro_atividade,a.data_est) IN (SELECT b.centro_atividade,MAX(b.data_est) FROM est_dia AS b FORCE INDEX(est_dia_data) WHERE b.item_id=50 AND b.item_id_red=1514 AND b.unidade_id=1000 AND b.data_est<='2009-11-01' GROUP BY b.centro_atividade) GROUP BY a.item_id,a.item_id_red;
+----+-------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
| id | select_type | table | type  | possible_keys                                           | key          | key_len | ref               | rows | Extra                                                     |
+----+-------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
|  1 | PRIMARY     | a     | ref   | PRIMARY,est_dia,est_dia_data,item_est_dia,item_est_dia2 | est_dia      | 9       | const,const,const | 7632 | Using where                                               |
|  2 | SUBQUERY    | b     | range | est_dia_data                                            | est_dia_data | 12      | NULL              | 7315 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------+-------+---------------------------------------------------------+--------------+---------+-------------------+------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

For 5.0 and 5.1 you have to rewrite your query.
[13 Nov 2009 15:55] Valeriy Kravchuk
With 6.0 my laptop executed this query only 0.18 seconds:

    -> a.item_id,a.item_id_red;
+------------+------------+--------------+--------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| pecas_ant  | quant_ant  | pliq_ant     | pbruto_ant   | vliq_ant | vbruto_ant | custo_ant      | pecas_rastr_ant | quant_rastr_ant | pliq_rastr_ant | pbruto_rastr_ant | vliq_rastr_ant | vbruto_rastr_ant | custo_rastr_ant |
+------------+------------+--------------+--------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| 6493.00000 | 6493.00000 | 715570.31100 | 715570.31100 |  0.00000 |    0.00000 | 15611681.93961 |      6493.00000 |      6493.00000 |   715570.31100 |     715570.31100 |        0.00000 |          0.00000 |  15634640.39961 |
+------------+------------+--------------+--------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
1 row in set (0.18 sec)

Eventually this optimizer enhancement will be backported to 5.5 or whatever 5.x that will be declared new GA version.
[13 Nov 2009 17:23] Roberto Spadim
result from explain:

id|select_type|table|type|possible_keys|key|HEX(key_len)|ref|rows|Extra
1|PRIMARY|a|ref|PRIMARY,est_dia,est_dia_data,item_est_dia,item_est_dia2|est_dia|39|const,const,const|7634|Using where
2|DEPENDENT SUBQUERY|b|range|est_dia_data|est_dia_data|3132|NULL|7317|Using where; Using index; Using temporary; Using filesort

maybe with a upgrade from mysql 5.0 to 6.0 could solve this problem? any idea how could i rewrite this query?
[13 Nov 2009 18:00] Valeriy Kravchuk
You derived table instead of subquery, something like this:

77-52-7-73:5.0 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.88-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SELECT SQL_SMALL_RESULT SQL_CACHE  SUM(a.pecas_ant
    -> +IF(a.data_est='2009-11-01',0,a.pecas_e-a.pecas_s)) AS pecas_ant,SUM(a.quant_ant
    -> +IF(a.data_est='2009-11-01',0,a.quant_e-a.quant_s)) AS quant_ant,SUM(a.pliq_ant 
    -> +IF(a.data_est='2009-11-01',0,a.pliq_e-a.pliq_s)) AS
    -> pliq_ant,SUM(a.pbruto_ant+IF(a.data_est='2009-11-01',0,a.pbruto_e-a.pbruto_s)) AS
    -> pbruto_ant,SUM(a.vliq_ant  +IF(a.data_est='2009-11-01',0,a.vliq_e-a.vliq_s)) AS
    -> vliq_ant,SUM(a.vbruto_ant+IF(a.data_est='2009-11-01',0,a.vbruto_e-a.vbruto_s)) AS
    -> vbruto_ant,SUM(a.custo_ant +IF(a.data_est='2009-11-01',0,a.custo_e-a.custo_s)) AS
    -> custo_ant,SUM(a.pecas_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.pecas_rastr_e-a.pecas_rastr_s)) AS
    -> pecas_rastr_ant,SUM(a.quant_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.quant_rastr_e-a.quant_rastr_s)) AS
    -> quant_rastr_ant,SUM(a.pliq_rastr_ant 
    -> +IF(a.data_est='2009-11-01',0,a.pliq_rastr_e-a.pliq_rastr_s)) AS
    -> pliq_rastr_ant,SUM(a.pbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.pbruto_rastr_e-a.pbruto_rastr_s))
    -> AS pbruto_rastr_ant,SUM(a.vliq_rastr_ant 
    -> +IF(a.data_est='2009-11-01',0,a.vliq_rastr_e-a.vliq_rastr_s)) AS
    -> vliq_rastr_ant,SUM(a.vbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.vbruto_rastr_e-a.vbruto_rastr_s))
    -> AS vbruto_rastr_ant,SUM(a.custo_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.custo_rastr_e-a.custo_rastr_s)) AS custo_rastr_ant 
    -> FROM est_dia as a, 
    -> (SELECT b.centro_atividade c1, MAX(b.data_est) c2 FROM
    -> est_dia AS b FORCE INDEX(est_dia_data) WHERE b.item_id=50 AND b.item_id_red=1514 AND
    -> b.unidade_id=1000 AND b.data_est<='2009-11-01' GROUP BY b.centro_atividade) as c
    -> WHERE a.item_id=50 AND a.item_id_red=1514 AND a.unidade_id=1000 AND
    -> (a.centro_atividade,a.data_est) IN (select c.c1, c.c2) GROUP BY
    -> a.item_id,a.item_id_red;
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| pecas_ant   | quant_ant   | pliq_ant      | pbruto_ant    | vliq_ant | vbruto_ant | custo_ant      | pecas_rastr_ant | quant_rastr_ant | pliq_rastr_ant | pbruto_rastr_ant | vliq_rastr_ant | vbruto_rastr_ant | custo_rastr_ant |
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| 12637.00000 | 12637.00000 | 1397136.69200 | 1397136.69200 |  0.00000 |    0.00000 | 30875531.15087 |     12637.00000 |     12637.00000 |  1397136.69200 |    1397136.69200 |        0.00000 |          0.00000 |  30893605.37342 | 
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
1 row in set (0.55 sec)

Compare execution time to your original query on my laptop with default settings for server variables:

mysql> SELECT SQL_SMALL_RESULT SQL_CACHE  SUM(a.pecas_ant
    -> +IF(a.data_est='2009-11-01',0,a.pecas_e-a.pecas_s)) AS pecas_ant,SUM(a.quant_ant
    -> +IF(a.data_est='2009-11-01',0,a.quant_e-a.quant_s)) AS quant_ant,SUM(a.pliq_ant 
    -> +IF(a.data_est='2009-11-01',0,a.pliq_e-a.pliq_s)) AS
    -> pliq_ant,SUM(a.pbruto_ant+IF(a.data_est='2009-11-01',0,a.pbruto_e-a.pbruto_s)) AS
    -> pbruto_ant,SUM(a.vliq_ant  +IF(a.data_est='2009-11-01',0,a.vliq_e-a.vliq_s)) AS
    -> vliq_ant,SUM(a.vbruto_ant+IF(a.data_est='2009-11-01',0,a.vbruto_e-a.vbruto_s)) AS
    -> vbruto_ant,SUM(a.custo_ant +IF(a.data_est='2009-11-01',0,a.custo_e-a.custo_s)) AS
    -> custo_ant,SUM(a.pecas_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.pecas_rastr_e-a.pecas_rastr_s)) AS
    -> pecas_rastr_ant,SUM(a.quant_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.quant_rastr_e-a.quant_rastr_s)) AS
    -> quant_rastr_ant,SUM(a.pliq_rastr_ant 
    -> +IF(a.data_est='2009-11-01',0,a.pliq_rastr_e-a.pliq_rastr_s)) AS
    -> pliq_rastr_ant,SUM(a.pbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.pbruto_rastr_e-a.pbruto_rastr_s))
    -> AS pbruto_rastr_ant,SUM(a.vliq_rastr_ant 
    -> +IF(a.data_est='2009-11-01',0,a.vliq_rastr_e-a.vliq_rastr_s)) AS
    -> vliq_rastr_ant,SUM(a.vbruto_rastr_ant+IF(a.data_est='2009-11-01',0,a.vbruto_rastr_e-a.vbruto_rastr_s))
    -> AS vbruto_rastr_ant,SUM(a.custo_rastr_ant
    -> +IF(a.data_est='2009-11-01',0,a.custo_rastr_e-a.custo_rastr_s)) AS custo_rastr_ant FROM
    -> est_dia AS a WHERE a.item_id=50 AND a.item_id_red=1514 AND a.unidade_id=1000 AND
    -> (a.centro_atividade,a.data_est) IN (SELECT b.centro_atividade,MAX(b.data_est) FROM
    -> est_dia AS b FORCE INDEX(est_dia_data) WHERE b.item_id=50 AND b.item_id_red=1514 AND
    -> b.unidade_id=1000 AND b.data_est<='2009-11-01' GROUP BY b.centro_atividade) GROUP BY
    -> a.item_id,a.item_id_red;
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| pecas_ant   | quant_ant   | pliq_ant      | pbruto_ant    | vliq_ant | vbruto_ant | custo_ant      | pecas_rastr_ant | quant_rastr_ant | pliq_rastr_ant | pbruto_rastr_ant | vliq_rastr_ant | vbruto_rastr_ant | custo_rastr_ant |
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
| 12637.00000 | 12637.00000 | 1397136.69200 | 1397136.69200 |  0.00000 |    0.00000 | 30875531.15087 |     12637.00000 |     12637.00000 |  1397136.69200 |    1397136.69200 |        0.00000 |          0.00000 |  30893605.37342 | 
+-------------+-------------+---------------+---------------+----------+------------+----------------+-----------------+-----------------+----------------+------------------+----------------+------------------+-----------------+
1 row in set (6 min 0.69 sec)

Results looks the same (and different fro, 6.0.14, so that new version may have a bug actually :), but compare 1 second to 6 minutes :)
[1 Feb 2011 20:57] Roberto Spadim
that´s the point, i have another bug in mysql bug list
a idea is:
on subquery i put a sql modifier to make it a list (execute subquery and save it as a temporary table), some modifier like SQL_BUFFERED_SUBQUERY
this could replace query:
(SELECT SQL_BUFFERED_SUBQUERY field FROM table WHERE with some filters)
to
(1,2,3,4,5,6,7,8,9,10,12,13,14,15,16)
or
(SELECT field FROM temporary_table WITHOUT FILTERS, MAYBE FIELD COULD BE A INDEX FOR FASTER SELECT)
[21 Mar 2011 4:17] Roberto Spadim
i will close since bug 24770 is similar to this
[26 Mar 2012 19:39] Paul DuBois
Noted in 5.6.5 changelog.

Several subquery performance issues were resolved through the
implementation of semi-join subquery optimization strategies.