Bug #119056 column value change cased by SUM(DISTINCT) changed to AVG(DISTINCT)*COUNT(DISTINCT)
Submitted: 22 Sep 15:21 Modified: 23 Sep 1:51
Reporter: cl hl Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:9.4.0 OS:Any
Assigned to: CPU Architecture:Any

[22 Sep 15:21] cl hl
Description:
after changing the aggregate function SUM(DISTINCT) to AVG(DISTINCT)*COUNT(DISTINCT),the col_2's value changes.

mysql> (SELECT huk80.amount AS col_1, SUM(DISTINCT huk80.user_id) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4 UNION ALL SELECT hhp28.id AS col_1, hhp28.amount AS col_2 FROM orders AS hhp28 WHERE (hhp28.status <> 'sample_47');
+---------+---------+
| col_1   | col_2   |
+---------+---------+
|  770.00 | 9052.00 |
|  166.00 | 9722.00 |
|  626.00 | 2170.00 |
|  309.00 | 9722.00 |
|   37.00 | 1172.00 |
| 1172.00 | 3530.00 |
| 2170.00 |   27.00 |
| 3415.00 | 7965.00 |
| 3485.00 | 3088.00 |
| 3485.00 | 6421.00 |
| 7588.00 | 3246.00 |
| 9052.00 | 1178.00 |
| 9052.00 | 9518.00 |
| 9722.00 | 8856.00 |
| 9722.00 | 9904.00 |
|   27.00 |  626.00 |
| 1178.00 |   64.00 |
| 3088.00 |   63.00 |
| 3246.00 |  782.00 |
| 3530.00 |   37.00 |
| 6421.00 |  902.00 |
| 7965.00 |  562.00 |
| 8856.00 |  309.00 |
| 9518.00 |  770.00 |
| 9904.00 |  166.00 |
+---------+---------+
25 rows in set (0.009 sec)

mysql> (SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4 UNION ALL SELECT hhp28.id AS col_1, hhp28.amount AS col_2 FROM orders AS hhp28 WHERE (hhp28.status <> 'sample_47');
+---------+-----------+
| col_1   | col_2     |
+---------+-----------+
|  770.00 | 3485.0000 |
|  166.00 | 3485.0000 |
|  626.00 | 3485.0000 |
|  309.00 | 3485.0000 |
|   37.00 | 3485.0000 |
| 1172.00 | 3530.0000 |
| 2170.00 |   27.0000 |
| 3415.00 | 7965.0000 |
| 3485.00 | 3088.0000 |
| 3485.00 | 6421.0000 |
| 7588.00 | 3246.0000 |
| 9052.00 | 1178.0000 |
| 9052.00 | 9518.0000 |
| 9722.00 | 8856.0000 |
| 9722.00 | 9904.0000 |
|   27.00 |  626.0000 |
| 1178.00 |   64.0000 |
| 3088.00 |   63.0000 |
| 3246.00 |  782.0000 |
| 3530.00 |   37.0000 |
| 6421.00 |  902.0000 |
| 7965.00 |  562.0000 |
| 8856.00 |  309.0000 |
| 9518.00 |  770.0000 |
| 9904.00 |  166.0000 |
+---------+-----------+
25 rows in set (0.004 sec)

How to repeat:
DROP DATABASE IF EXISTS test;
CREATE DATABASE IF NOT EXISTS test;
USE test;
CREATE TABLE users (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NULL,
    age INT NULL,
    created_at DATE NOT NULL,
    sex VARCHAR(10) NOT NULL,
    PRIMARY KEY (id)
);

CREATE TABLE orders (
    id INT NOT NULL AUTO_INCREMENT,
    user_id INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL,
    status VARCHAR(50) NOT NULL,
    order_date DATE NOT NULL,
    PRIMARY KEY (id),
    FOREIGN KEY (user_id) REFERENCES users(id) 
);

INSERT INTO users (id, name, email, age, created_at, sex) VALUES (7588, 'sample_970', '1950901515@qq.com', 71, '2025-07-07', 'boy');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (2170, 'sample_318', '3858244564@qq.com', 34, '2025-05-12', 'boy');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (6219, 'sample_453', '8089211259@qq.com', 83, '2024-12-01', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (8492, 'sample_128', '3366242209@qq.com', 22, '2024-11-12', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (1172, 'sample_590', '1484604155@qq.com', 50, '2024-10-14', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (3415, 'sample_434', '2281411193@qq.com', 6, '2025-04-27', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (9722, 'sample_312', '7012963222@qq.com', 87, '2025-08-07', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (9052, 'sample_722', '3294990367@qq.com', 55, '2025-02-07', 'boy');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (8253, 'sample_766', '3785152080@qq.com', 91, '2024-11-14', 'girl');
INSERT INTO users (id, name, email, age, created_at, sex) VALUES (3485, 'sample_615', '5883334870@qq.com', 28, '2025-09-22', 'girl');

INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (3530, 1172, 37, 'to_finish', '2025-08-03');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (9518, 9052, 770, 'finished', '2025-01-19');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (3246, 7588, 782, 'to_finish', '2025-09-17');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (3088, 3485, 63, 'to_finish', '2025-09-13');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (9904, 9722, 166, 'finished', '2025-01-30');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (6421, 3485, 902, 'finishing', '2025-09-19');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (8856, 9722, 309, 'finishing', '2025-07-24');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (1178, 9052, 64, 'finishing', '2025-09-04');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (27, 2170, 626, 'finishing', '2025-04-02');
INSERT INTO orders (id, user_id, amount, status, order_date) VALUES (7965, 3415, 562, 'finishing', '2025-09-19');

(SELECT huk80.amount AS col_1, SUM(DISTINCT huk80.user_id) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4 UNION ALL SELECT hhp28.id AS col_1, hhp28.amount AS col_2 FROM orders AS hhp28 WHERE (hhp28.status <> 'sample_47');
(SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4 UNION ALL SELECT hhp28.id AS col_1, hhp28.amount AS col_2 FROM orders AS hhp28 WHERE (hhp28.status <> 'sample_47');
[22 Sep 17:19] MySQL Verification Team
Just changing sum() to avg()*(count) does not change anything:

mysql> SELECT huk80.amount AS col_1, SUM(DISTINCT huk80.user_id) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5;
+--------+-------+
| col_1  | col_2 |
+--------+-------+
| 770.00 |  9052 |
| 166.00 |  9722 |
| 626.00 |  2170 |
| 309.00 |  9722 |
|  37.00 |  1172 |
+--------+-------+
5 rows in set (0.001 sec)

mysql> SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5;
+--------+-----------+
| col_1  | col_2     |
+--------+-----------+
| 770.00 | 9052.0000 |
| 166.00 | 9722.0000 |
| 626.00 | 2170.0000 |
| 309.00 | 9722.0000 |
|  37.00 | 1172.0000 |
+--------+-----------+
5 rows in set (0.001 sec)
[22 Sep 17:20] MySQL Verification Team
Adding union:

mysql> (SELECT huk80.amount AS col_1, (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2 FROM orders AS huk80 GROUP BY huk80.amount, huk80.order_date ORDER BY huk80.order_date ASC LIMIT 5) UNION ALL SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4;
+---------+-----------+
| col_1   | col_2     |
+---------+-----------+
|  770.00 | 3485.0000 |
|  166.00 | 3485.0000 |
|  626.00 | 3485.0000 |
|  309.00 | 3485.0000 |
|   37.00 | 3485.0000 |
| 1172.00 | 3530.0000 |
| 2170.00 |   27.0000 |
| 3415.00 | 7965.0000 |
| 3485.00 | 3088.0000 |
| 3485.00 | 6421.0000 |
| 7588.00 | 3246.0000 |
| 9052.00 | 1178.0000 |
| 9052.00 | 9518.0000 |
| 9722.00 | 8856.0000 |
| 9722.00 | 9904.0000 |
+---------+-----------+
15 rows in set (0.001 sec)

mysql>
[22 Sep 17:30] cl hl
not only union cause difference ,but also AVG()*count cause difference.maybe avg*count part is also a logical bug
[22 Sep 17:34] MySQL Verification Team
This is not a bug. This is a subtle but important SQL aliasing and naming issue that can lead to unexpected results, especially when combining queries using UNION ALL (or UNION).

First query alone produces expected results—col_1 and col_2 have the computed values from your aggregates as you intended.
When we add the union, suddenly all of the col_2 values from the first (top) part are showing as 3485.0000, which corresponds to the user_id value 3485 found in your data (remember, one user has id 3485).

This typically occurs because LIMIT with ORDER BY inside a subquery/derived table that is then unioned is not guaranteed to be executed as expected; MySQL (and some other RDBMSs) will remove or ignore the ORDER BY/LIMIT inside the union branch unless you wrap it in a derived/inline view or subquery.

Most RDBMSs ignore the ORDER BY ... LIMIT 5 within the union member unless you are using a subquery. As a result, you end up with seemingly "garbage" (unexpected) values, or (often) you see an artifact of the optimizer re-ordering or substituting values as it processes the union. 

But that's not the main cause of your col_2 issue here. 

Core Issue is "Ambiguous Aliases + Column Promotion"

If you run the query without a wrapping SELECT, the first query performs aggregate calculations and returns those as col_1 and col_2. 

When you union this with another SELECT, the column names and data types are determined by the first select list. 

MySQL does not preserve the ORDER BY/LIMIT inside the parentheses of the union branch unless you enclose it in a derived/inline view (a subquery in the FROM clause). Instead, you end up with a result set where rows from both SELECTs are combined, and the values for col_2 are not guaranteed to be correct—often, unintentionally, the evaluation of expressions gets corrupted or overridden by the optimizer, especially if temporary tables are involved or if column types differ.
     
Solution is to wrap your first SELECT in a derived table:

mysql> SELECT *
    -> FROM (
    ->   SELECT huk80.amount AS col_1,
    ->          (AVG(DISTINCT huk80.user_id) * COUNT(DISTINCT huk80.user_id)) AS col_2
    ->     FROM orders AS huk80
    ->    GROUP BY huk80.amount, huk80.order_date
    ->    ORDER BY huk80.order_date ASC
    ->    LIMIT 5
    -> ) AS main_part
    -> 
    -> UNION ALL
    -> 
    -> SELECT eiv4.user_id AS col_1, eiv4.id AS col_2 FROM orders AS eiv4;
+---------+-----------+
| col_1   | col_2     |
+---------+-----------+
|  770.00 | 9052.0000 |
|  166.00 | 9722.0000 |
|  626.00 | 2170.0000 |
|  309.00 | 9722.0000 |
|   37.00 | 1172.0000 |
| 1172.00 | 3530.0000 |
| 2170.00 |   27.0000 |
| 3415.00 | 7965.0000 |
| 3485.00 | 3088.0000 |
| 3485.00 | 6421.0000 |
| 7588.00 | 3246.0000 |
| 9052.00 | 1178.0000 |
| 9052.00 | 9518.0000 |
| 9722.00 | 8856.0000 |
| 9722.00 | 9904.0000 |
+---------+-----------+
15 rows in set (0.001 sec)

mysql> 

Kind regards
Bogdan Kecman, MySQL Principal Technical Engineer
MySQL @ Oracle, Belgrade, Serbia
[23 Sep 1:51] MySQL Verification Team
Hi,

I discussed this with a colleague and we believe that this could actually be a bug.

I already shown you the workaround, but we will verify this and work on this more.

Kind regards
Bogdan