Bug #96549 MySQL query optimizer rewrites query and gets different results than MySQL5.5
Submitted: 15 Aug 2019 10:59 Modified: 28 Aug 2019 12:23
Reporter: lalit Choudhary Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6,5.7,8.0, 5.6.45, 5.7.27, 8.0.17 OS:Any
Assigned to: CPU Architecture:Any

[15 Aug 2019 10:59] lalit Choudhary
Description:
As shown in the following example MySQL Query optimizer is rewriting the query in 5.6,5.7,8.0 and removing the 'group by' that is in the subquery, which results in the difference in the query result. 
we don't see any error and warnings.

Loaded attached bug_dump_mysql.sql

Query:
SELECT now() date_created,
       NULL date_undo,
       spf.created_by_id created_by_id,
       spf.date_created staff_preferred_facility_date_created,
       spf.date_modified date_modified,
       spf.facility_id facility_id,
       spf.id spf_id,
       spf.modified_by_id modified_by_id,
       spf.staff_preferences_id staff_preferences_id,
       spf.default_facility default_facility
FROM test.staff_preferred_facility_test spf
         JOIN test.staff_preferences_test sp ON spf.staff_preferences_id = sp.id
         JOIN test.account_user_test fu ON fu.id = sp.staff_id
         JOIN test.account_test f ON fu.account_id = f.id
WHERE f.account_name='Test_Account'
AND spf.id NOT IN
(SELECT spf.id spf_id
FROM test.staff_preferred_facility_test spf
         JOIN test.staff_preferences_test sp ON spf.staff_preferences_id = sp.id
         JOIN test.account_user_test fu ON fu.id = sp.staff_id
         JOIN test.account_test f ON fu.account_id = f.id
WHERE f.account_name='Test_Account'
GROUP BY spf.facility_id, sp.id)
ORDER BY spf_id;

 mysql 5.5 Query result:

+---------------------+-----------+---------------+---------------------------------------+---------------------+-------------+--------+----------------+----------------------+------------------+
| date_created        | date_undo | created_by_id | staff_preferred_facility_date_created | date_modified       | facility_id | spf_id | modified_by_id | staff_preferences_id | default_facility |
+---------------------+-----------+---------------+---------------------------------------+---------------------+-------------+--------+----------------+----------------------+------------------+
| 2019-08-15 16:17:52 |      NULL |          7517 | 2019-01-09 21:57:22                   | 2019-01-09 21:57:22 |         107 |   2956 |           NULL |                  429 |                0 |
| 2019-08-15 16:17:52 |      NULL |          7517 | 2019-01-09 21:57:22                   | 2019-01-09 21:57:22 |         149 |   2957 |           NULL |                  429 |                0 |
| 2019-08-15 16:17:52 |      NULL |          7517 | 2019-01-09 21:57:22                   | 2019-01-09 21:57:22 |          49 |   2958 |           NULL |                  429 |                0 |
| 2019-08-15 16:17:52 |      NULL |          7517 | 2019-01-15 20:25:14                   | 2019-01-15 20:25:14 |          49 |   2969 |           NULL |                  240 |                0 |
| 2019-08-15 16:17:52 |      NULL |          7517 | 2019-01-15 20:25:14                   | 2019-01-15 20:25:14 |         107 |   2970 |           NULL |                  240 |                0 |
+---------------------+-----------+---------------+---------------------------------------+---------------------+-------------+--------+----------------+----------------------+------------------+
5 rows in set (0.00 sec)

MySQL 5.6, 5.7,8.0 result:

mysql [localhost] {msandbox} (test1) > SELECT now() date_created,
    ->        NULL date_undo,
    ->        spf.created_by_id created_by_id,
    ->        spf.date_created staff_preferred_facility_date_created,
    ->        spf.date_modified date_modified,
    ->        spf.facility_id facility_id,
    ->        spf.id spf_id,
    ->        spf.modified_by_id modified_by_id,
    ->        spf.staff_preferences_id staff_preferences_id,
    ->        spf.default_facility default_facility
    -> FROM test.staff_preferred_facility_test spf
    ->          JOIN test.staff_preferences_test sp ON spf.staff_preferences_id = sp.id
    ->          JOIN test.account_user_test fu ON fu.id = sp.staff_id
    ->          JOIN test.account_test f ON fu.account_id = f.id
    -> WHERE f.account_name='Test_Account'
    -> AND spf.id NOT IN
    -> (SELECT spf.id spf_id
    -> FROM test.staff_preferred_facility_test spf
    ->          JOIN test.staff_preferences_test sp ON spf.staff_preferences_id = sp.id
    ->          JOIN test.account_user_test fu ON fu.id = sp.staff_id
    ->          JOIN test.account_test f ON fu.account_id = f.id
    -> WHERE f.account_name='Test_Account'
    -> GROUP BY spf.facility_id, sp.id)
    -> ORDER BY spf_id;
Empty set (0.00 sec)

How to repeat:
1. start mysql with default settings and Load attached bug_dump_mysql.sql

2. run the given query on MySQL 5.5  and mysql 5.6,5.7,8.0 version for comparison. 

Suggested fix:
The result should be consistent, even if there is any change due to optimizer_switch settings it should give warning/error.
[15 Aug 2019 10:59] lalit Choudhary
load data

Attachment: bug_dump_mysql.sql (application/sql, text), 6.41 KiB.

[19 Aug 2019 5:47] MySQL Verification Team
Hello Lalit,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[19 Aug 2019 5:48] MySQL Verification Team
Test results - 8.0.17, 5.7.27, 5.6.45 and 5.5.62

Attachment: 96549.results (application/octet-stream, text), 40.92 KiB.

[28 Aug 2019 12:23] Roy Lyseng
Hi lalit,

we do not consider this to be a bug.

The reason is that the subquery as presented here is non-deterministic, because it selects a column that is not part of the GROUP BY columns. If you specify only the subquery, you will see the message "Expression #1 of SELECT list is not in GROUP BY clause and contains non-aggregated column 'test.spf.id'". In the case of non-deterministic subqueries, the result of the complete query may differ.

We do not provide a way to eliminate removal of GROUP BY clauses in such cases, but there is a workaround you can use if you are sure this is the result that you want. It involves using the ANY_VALUE() function and wrapping the subquery in a CTE (Common Table Expression):

with q as
   (SELECT any_value(spf.id) spf_id
    FROM test.staff_preferred_facility_test spf
         JOIN test.staff_preferences_test sp
           ON spf.staff_preferences_id = sp.id
         JOIN test.account_user_test fu
           ON fu.id = sp.staff_id
         JOIN test.account_test f
           ON fu.account_id = f.id
    WHERE f.account_name='Test_Account'
    GROUP BY spf.facility_id, sp.id
   )
SELECT now() date_created,
       NULL date_undo,
       spf.created_by_id created_by_id,
       spf.date_created staff_preferred_facility_date_created,
       spf.date_modified date_modified,
       spf.facility_id facility_id,
       spf.id spf_id,
       spf.modified_by_id modified_by_id,
       spf.staff_preferences_id staff_preferences_id,
       spf.default_facility default_facility
FROM test.staff_preferred_facility_test spf
     JOIN test.staff_preferences_test sp
       ON spf.staff_preferences_id = sp.id
     JOIN test.account_user_test fu
       ON fu.id = sp.staff_id
     JOIN test.account_test f
       ON fu.account_id = f.id
WHERE f.account_name='Test_Account' AND
      spf.id NOT IN (SELECT spf_id from q);

The workaround seems to be correct for 8.0. In 5.6 and 5.6, you should be able to replace the CTE with a derived table.