Description:
After upgrading from MySQL 5.5 to 5.7, a statistical query that uses a complex CASE expression in a derived table returns inconsistent results. In MySQL 5.5, the query returns 5 rows as expected, while in MySQL 5.7 it returns 432 rows, with many duplicate apptype values and incorrect counts.
How to repeat:
Create the table userlabels_static as provided.
CREATE TABLE userlabels_static (
appid VARCHAR(20) NOT NULL,
uid VARCHAR(20) NOT NULL,
app_categoryname_first_regist VARCHAR(50),
app_categoryname_second_regist VARCHAR(50),
app_categoryname_third_regist VARCHAR(50)
);
Execute the following query:
sql
SELECT
apptype,
COUNT(DISTINCT appid) AS appidcnt,
COUNT(DISTINCT uid) AS uidcnt
FROM (
SELECT
CASE when app_categoryname_first_regist = 'Office ' and app_categoryname_second_regist = 'Other' then 'Office '
-- ... (long CASE expression as provided)
ELSE 'Other'
END AS apptype,
appid,
uid
FROM userlabels_static
WHERE appid IS NOT NULL AND appid != ''
) t
GROUP BY t.apptype
ORDER BY 2 DESC;
Expected Result (MySQL 5.5):
text
+-----------+----------+--------+
| apptype | appidcnt | uidcnt |
+-----------+----------+--------+
| Other | 726 | 718 |
| Office | 137 | 136 |
| Education | 45 | 45 |
| Medical | 38 | 38 |
| Conference| 29 | 29 |
+-----------+----------+--------+
Actual Result (MySQL 5.7):
Returns 432 rows with many duplicate apptype values and incorrect counts.
Execution Plans:
MySQL 5.5 uses a derived table and temporary table + filesort.
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary; Using filesort |
| 2 | DERIVED | userlabels_static | ALL | NULL | NULL | NULL | NULL | 996 | Using where |
+----+-------------+-------------------+------+---------------+------+---------+------+------+---------------------------------+
MySQL 5.7 does not use a derived table in the plan and performs a full scan with temporary + filesort.
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
| 1 | SIMPLE | userlabels_static | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 81.00 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------+
Request:
Please investigate whether this is a known issue or a regression in MySQL 5.7 related to derived tables, CASE expressions, or GROUP BY behavior.