Bug #119056 | column value change cased by SUM(DISTINCT) changed to AVG(DISTINCT)*COUNT(DISTINCT) | ||
---|---|---|---|
Submitted: | 22 Sep 15:21 | Modified: | 23 Sep 1:51 |
Reporter: | cl hl | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 9.4.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Sep 15:21]
cl hl
[22 Sep 17:19]
MySQL Verification Team
Just changing sum() to avg()*(count) does not change anything: mysql> SELECT huk80.amount AS col_1, SUM(DISTINCT huk80.user_id) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5; +--------+-------+ | col_1 | col_2 | +--------+-------+ | 770.00 | 9052 | | 166.00 | 9722 | | 626.00 | 2170 | | 309.00 | 9722 | | 37.00 | 1172 | +--------+-------+ 5 rows in set (0.001 sec) mysql> SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5; +--------+-----------+ | col_1 | col_2 | +--------+-----------+ | 770.00 | 9052.0000 | | 166.00 | 9722.0000 | | 626.00 | 2170.0000 | | 309.00 | 9722.0000 | | 37.00 | 1172.0000 | +--------+-----------+ 5 rows in set (0.001 sec)
[22 Sep 17:20]
MySQL Verification Team
Adding union: mysql> (SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4; +---------+-----------+ | col_1 | col_2 | +---------+-----------+ | 770.00 | 3485.0000 | | 166.00 | 3485.0000 | | 626.00 | 3485.0000 | | 309.00 | 3485.0000 | | 37.00 | 3485.0000 | | 1172.00 | 3530.0000 | | 2170.00 | 27.0000 | | 3415.00 | 7965.0000 | | 3485.00 | 3088.0000 | | 3485.00 | 6421.0000 | | 7588.00 | 3246.0000 | | 9052.00 | 1178.0000 | | 9052.00 | 9518.0000 | | 9722.00 | 8856.0000 | | 9722.00 | 9904.0000 | +---------+-----------+ 15 rows in set (0.001 sec) mysql>
[22 Sep 17:30]
cl hl
not only union cause difference ,but also AVG()*count cause difference.maybe avg*count part is also a logical bug
[22 Sep 17:34]
MySQL Verification Team
This is not a bug. This is a subtle but important SQL aliasing and naming issue that can lead to unexpected results, especially when combining queries using UNION ALL (or UNION). First query alone produces expected results—col_1 and col_2 have the computed values from your aggregates as you intended. When we add the union, suddenly all of the col_2 values from the first (top) part are showing as 3485.0000, which corresponds to the user_id value 3485 found in your data (remember, one user has id 3485). This typically occurs because LIMIT with ORDER BY inside a subquery/derived table that is then unioned is not guaranteed to be executed as expected; MySQL (and some other RDBMSs) will remove or ignore the ORDER BY/LIMIT inside the union branch unless you wrap it in a derived/inline view or subquery. Most RDBMSs ignore the ORDER BY ... LIMIT 5 within the union member unless you are using a subquery. As a result, you end up with seemingly "garbage" (unexpected) values, or (often) you see an artifact of the optimizer re-ordering or substituting values as it processes the union. But that's not the main cause of your col_2 issue here. Core Issue is "Ambiguous Aliases + Column Promotion" If you run the query without a wrapping SELECT, the first query performs aggregate calculations and returns those as col_1 and col_2. When you union this with another SELECT, the column names and data types are determined by the first select list. MySQL does not preserve the ORDER BY/LIMIT inside the parentheses of the union branch unless you enclose it in a derived/inline view (a subquery in the FROM clause). Instead, you end up with a result set where rows from both SELECTs are combined, and the values for col_2 are not guaranteed to be correct—often, unintentionally, the evaluation of expressions gets corrupted or overridden by the optimizer, especially if temporary tables are involved or if column types differ. Solution is to wrap your first SELECT in a derived table: mysql> SELECT * -> FROM ( -> SELECT huk80.amount AS col_1, -> (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 -> FROM orders AS huk80 -> GROUP BY huk80.amount, huk80.order_date -> ORDER BY huk80.order_date ASC -> LIMIT 5 -> ) AS main_part -> -> UNION ALL -> -> SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4; +---------+-----------+ | col_1 | col_2 | +---------+-----------+ | 770.00 | 9052.0000 | | 166.00 | 9722.0000 | | 626.00 | 2170.0000 | | 309.00 | 9722.0000 | | 37.00 | 1172.0000 | | 1172.00 | 3530.0000 | | 2170.00 | 27.0000 | | 3415.00 | 7965.0000 | | 3485.00 | 3088.0000 | | 3485.00 | 6421.0000 | | 7588.00 | 3246.0000 | | 9052.00 | 1178.0000 | | 9052.00 | 9518.0000 | | 9722.00 | 8856.0000 | | 9722.00 | 9904.0000 | +---------+-----------+ 15 rows in set (0.001 sec) mysql> Kind regards Bogdan Kecman, MySQL Principal Technical Engineer MySQL @ Oracle, Belgrade, Serbia
[23 Sep 1:51]
MySQL Verification Team
Hi, I discussed this with a colleague and we believe that this could actually be a bug. I already shown you the workaround, but we will verify this and work on this more. Kind regards Bogdan