MySQL Bugs: #87641: Execution plan differs after upgrade to 5.7

Bug #87641	Execution plan differs after upgrade to 5.7
Submitted:	1 Sep 2017 14:20	Modified:	6 Sep 2017 15:47
Reporter:	KORBULY Francois	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.6.33 / 5.7.18	OS:	CentOS (6 (x86_64))
Assigned to:		CPU Architecture:	Any
Tags:	execution plan;difference;5.6;5.7

Description:
Planning to upgrade our production servers to MySQL Server 5.7, we have updaded our QC environment to perform testing.

Migration was from 5.6.33 to 5.7.18.

And we have an real performance issue with one of our queries, where the execution plan on 5.7 is not the same as on 5.6, resulting in a very slow response time (1 second on 5.6, 30 seconds on 5.7)

Both databases (QC and PROD) are complete replica of each other (tables, schema, structure, index, foreign_keys)
Same data too, since we refresh often to perform our tests.

Query :

SELECT DISTINCT sa.attribute 
FROM sample_attribute sa 
WHERE (SELECT sk.category FROM sample s JOIN sample_kind sk ON (sk.id = s.sample_kind_id) WHERE s.id = sa.sample_id) = 'physiological'
ORDER BY sa.attribute

Execution plan (5.6)
    id  select_type         table   type    possible_keys            key                      key_len  ref                         rows  Extra                                  
------  ------------------  ------  ------  -----------------------  -----------------------  -------  ------------------------  ------  ---------------------------------------
     1  PRIMARY             sa      range   sample_attribute_unique  sample_attribute_unique  767      (NULL)                       101  Using where; Using index for group-by  
     2  DEPENDENT SUBQUERY  s       eq_ref  PRIMARY,sample_kind_id   PRIMARY                  4        nk_lims.sa.sample_id           1  (NULL)                                 
     2  DEPENDENT SUBQUERY  sk      eq_ref  PRIMARY                  PRIMARY                  4        nk_lims.s.sample_kind_id       1  (NULL)                                 

Execution plan (5.7)
    id  select_type         table   partitions  type    possible_keys            key                      key_len  ref                              rows  filtered  Extra                     
------  ------------------  ------  ----------  ------  -----------------------  -----------------------  -------  ----------------------------  -------  --------  --------------------------
     1  PRIMARY             sa      (NULL)      index   sample_attribute_unique  sample_attribute_unique  771      (NULL)                        4589316    100.00  Using where; Using index  
     2  DEPENDENT SUBQUERY  s       (NULL)      eq_ref  PRIMARY,sample_kind_id   PRIMARY                  4        nk_lims_dev.sa.sample_id            1    100.00  (NULL)                    
     2  DEPENDENT SUBQUERY  sk      (NULL)      eq_ref  PRIMARY                  PRIMARY                  4        nk_lims_dev.s.sample_kind_id        1    100.00  (NULL)                    
                                                                                                                                                                                              

I could not find a specific change as too maybe rewrite certains queries due to change in 5.7 Optimizer.

I tried to re-analyse/optimize all tables, to no avail.

Maybe you could hint us as to what we can do or if it can be a bug.
Because of the performance drop, this is a show-stopper for us.

Thanks for your help.
Francois

How to repeat:
- See tables structure in attached file
- Tables 'sample' and 'sample_attribute' are several millions rows.
- I don't know if specific to our structure, data, so hard to describe how to reproduce

Create tables SQL

Attachment: Bug_87641_table_structure.txt (text/plain), 3.94 KiB.

In order to be able to analyze your issue, it would be helpful if you could provide optimizer for both 5.6 and 5.7.  (For instructions on how to obtain such trace, see http://oysteing.blogspot.co.uk/2016/01/how-to-get-optimizer-trace-for-query.html)

I also suggest to see if the following variant performs better:

SELECT DISTINCT sa.attribute 
FROM sample_attribute sa 
WHERE sa.sample_id IN 
  (SELECT s.id 
   FROM sample s JOIN sample_kind sk ON (sk.id = s.sample_kind_id) 
   WHERE sk.category = 'physiological')
ORDER BY sa.attribute

Query optimizer trace on 5.6 Mysql server

Attachment: optimizer_trace_5.6.txt (text/plain), 19.89 KiB.

Query optimizer trace on 5.7 Mysql server

Attachment: optimizer_trace_5.7.txt (text/plain), 16.89 KiB.

Hi Øystein,

Thanks for your answer.

Please find the 2 traces in files for our query on both servers.

Also, I gave a try with your suggested query.
Execution time on 5.6 : 29 sec
Execution time on 5.7 : 17 sec

Let me know if you'd like to have optimizer trace for you suggested query on both servers, if it can help.

I wish you a nice day,
Francois

Looking at the optimizer trace, it seems that the cause for the change in query plan is the fix to Bug#83005: "When the Optimiser is using index for group-by it often gives wrong results."  This bug was fixed in 5.6.36 and 5.7.18.  Hence, you should be able to see the performance regression also in later versions of 5.6.

AFAICT, also the query discussed here may give wrong result with the old query plan.  "Using index for group-by" means that MySQL will only check the one row for each value of sa.attribute.  If this row happens to be for a sample that is not of a "physiological" kind, this attribute value will not be part of the result even if there are other samples for this attribute value that are in the requested category.

In other words, the change in query plan is intentional since the old plan is not guaranteed to give correct results.