Bug #113060 The result of avg(distinct xx)is different between from primary and secondary
Submitted: 13 Nov 2023 10:23 Modified: 13 Nov 2023 12:07
Reporter: XIAOJING LI Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: avg distinct, sum

[13 Nov 2023 10:23] XIAOJING LI
Description:
The result of avg(distinct xx)is different between from primary and secondary indexes.
coloum xx is varchar type

How to repeat:
CREATE TABLE `t1` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `b` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `c` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`a`),
  KEY `idx_b_l` (`b`),
  KEY `idx_c_l` (`c`)
);

insert into t1(b,c) values ('6RXmcRyA48c0yvZ2I' ,'6RXmcRyA48c0yvZ2I');
insert into t1(b,c) values ('G6UHU26b8dkvwpYiM' ,'G6UHU26b8dkvwpYiM');
insert into t1(b,c) values ('0XS2E55maqch' ,'0XS2E55maqch');
insert into t1(b,c) values ('0qwCMwobKYOcmLyfRXlagA8ukENJv' ,'0qwCMwobKYOcmLyfRXlagA8ukENJv');
insert into t1(b,c) values ('3avxIDL4YPrZVHie4rOiPrK8ZIJwEPqZaioHK' ,'3avxIDL4YPrZVHie4rOiPrK8ZIJwEPqZaioHK');
insert into t1(b,c) values ('5DHC2QScT6P6lXgRtHajXOfztB7ddjYH1LLUC3' ,'5DHC2QScT6P6lXgRtHajXOfztB7ddjYH1LLUC3');
insert into t1(b,c) values ('5pdGQ2NTM3c2uRgCg9NspSE' ,'5pdGQ2NTM3c2uRgCg9NspSE');
insert into t1(b,c) values ('6Bk06GVtwZaKqg01' ,'6Bk06GVtwZaKqg01');
insert into t1(b,c) values ('6VFO4xJPwvxGzReYzVj6dwTSIechnSSCyz9iY' ,'6VFO4xJPwvxGzReYzVj6dwTSIechnSSCyz9iY');
insert into t1(b,c) values ('6hLBs3Rnd5elLLVv1ip3A2U6G1dkIApKDkiCy' ,'6hLBs3Rnd5elLLVv1ip3A2U6G1dkIApKDkiCy');

mysql> select avg(distinct b) from t1 force index(primary);
+--------------------+
| avg(distinct b)    |
+--------------------+
| 3.5|
+--------------------+
1 row in set (0.01 sec)

mysql> select avg(distinct b) from t1;
+--------------------+
| avg(distinct b)    |
+--------------------+
| 3.7 |
+--------------------+
1 row in set, 10 warnings (0.01 sec)

mysql> select sum(distinct b) from t1 force index(primary);
+-----------------+
| sum(distinct b) |
+-----------------+
|              14 |
+-----------------+
1 row in set (0.00 sec)

mysql> select sum(distinct b) from t1;
+-----------------+
| sum(distinct b) |
+-----------------+
|              37 |
+-----------------+
1 row in set, 10 warnings (0.00 sec)
[13 Nov 2023 12:07] MySQL Verification Team
Hi Mr. LI,

Thank you for your bug report.

We have managed to repeat it with 8.0.35:

avg(distinct b)
3.5
avg(distinct b)
3.7
sum(distinct b)
14
sum(distinct b)
37

Verified as reported.
[20 Nov 2023 11:40] huahua xu
Hi XIAOJING LI,
 
For your query statement `select avg(distinct b) from t1`, the optimizer use covering index skip scan for deduplication on t1.idx_b_l, and aggregate them, then average them. 

For your query statement `select avg(distinct b) from t1 force index(primary)`, the optimizer collects all the data ('0XS2E55maqch', 'G6UHU26b8dkvwpYiM' and '0XS2E55maqch' all have been converted to 0.0) into an unique for removing of duplicates, then aggregate and average them.