Bug #119039 Inconsistent GROUP BY results between MySQL 5.5 and 5.7 for complex CASE expression in derived table
Submitted: 19 Sep 6:23 Modified: 19 Sep 15:49
Reporter: mengjia Li Email Updates:
Status: Analyzing Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.7 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[19 Sep 6:23] mengjia Li
Description:
After upgrading from MySQL 5.5 to 5.7, a statistical query that uses a complex CASE expression in a derived table returns inconsistent results. In MySQL 5.5, the query returns 5 rows as expected, while in MySQL 5.7 it returns 432 rows, with many duplicate apptype values and incorrect counts.

How to repeat:
Create the table userlabels_static as provided.
CREATE TABLE userlabels_static (
    appid VARCHAR(20) NOT NULL,
    uid VARCHAR(20) NOT NULL,
    app_categoryname_first_regist VARCHAR(50),
    app_categoryname_second_regist VARCHAR(50),
    app_categoryname_third_regist VARCHAR(50)
);

Execute the following query:

sql
SELECT 
    apptype,
    COUNT(DISTINCT appid) AS appidcnt,
    COUNT(DISTINCT uid) AS uidcnt
FROM (
    SELECT 
        CASE  when app_categoryname_first_regist = 'Office ' and app_categoryname_second_regist = 'Other' then 'Office ' 
            -- ... (long CASE expression as provided)
            ELSE 'Other' 
        END AS apptype,
        appid,
        uid
    FROM userlabels_static
    WHERE appid IS NOT NULL AND appid != ''
) t 
GROUP BY t.apptype 
ORDER BY 2 DESC;

Expected Result (MySQL 5.5):

text
+-----------+----------+--------+
| apptype   | appidcnt | uidcnt |
+-----------+----------+--------+
| Other     |      726 |    718 |
| Office    |      137 |    136 |
| Education |       45 |     45 |
| Medical   |       38 |     38 |
| Conference|       29 |     29 |
+-----------+----------+--------+

Actual Result (MySQL 5.7):
Returns 432 rows with many duplicate apptype values and incorrect counts.

Execution Plans:

MySQL 5.5 uses a derived table and temporary table + filesort.
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table             | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>        | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using temporary; Using filesort |
|  2 | DERIVED     | userlabels_static | ALL  | NULL          | NULL | NULL    | NULL |  996 | Using where                     |
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+

MySQL 5.7 does not use a derived table in the plan and performs a full scan with temporary + filesort.
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table             | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                        |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | userlabels_static | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    81.00 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+

Request:
Please investigate whether this is a known issue or a regression in MySQL 5.7 related to derived tables, CASE expressions, or GROUP BY behavior.