Bug #100504 | Count distinct is inconsistent when swapping two fields being counted. | ||
---|---|---|---|
Submitted: | 12 Aug 2020 11:58 | Modified: | 18 Aug 2020 2:50 |
Reporter: | Zhifeng Hu | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 8.0.21, 5.7.31 | OS: | Linux (Arch Linux) |
Assigned to: | CPU Architecture: | x86 (x86_64) | |
Tags: | aggragation, COUNT DISTINCT |
[12 Aug 2020 11:58]
Zhifeng Hu
[12 Aug 2020 13:09]
MySQL Verification Team
Hello Zhifeng Hu, Thank you for the report and test case. Observed this with 8.0.21 build. regards, Umesh
[12 Aug 2020 13:11]
MySQL Verification Team
- 8.0.21 build mysql> source 100504.sql; . mysql> SELECT count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) FROM table_190000_utf8_undef; +---------------------------------------------------------------------+ | count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) | +---------------------------------------------------------------------+ | 28748 | +---------------------------------------------------------------------+ 1 row in set (0.12 sec) mysql> SELECT count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) FROM table_190000_utf8_undef; +---------------------------------------------------------------------+ | count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) | +---------------------------------------------------------------------+ | 43860 | +---------------------------------------------------------------------+ 1 row in set (0.13 sec)
[12 Aug 2020 13:11]
MySQL Verification Team
I'll add 5.6/5.7 results later on
[14 Aug 2020 4:58]
MySQL Verification Team
5.7.31 - affected with provided test case bin/mysql -uroot -S /tmp/mysql_ushastry.sock --loose-local-infile=1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.7.31 MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database test; Query OK, 1 row affected (0.00 sec) mysql> use test Database changed mysql> source 100504.sql; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 9630 rows affected (0.29 sec) Records: 9630 Duplicates: 0 Warnings: 0 Query OK, 9529 rows affected (0.26 sec) Records: 9529 Duplicates: 0 Warnings: 0 Query OK, 9508 rows affected (0.25 sec) Records: 9508 Duplicates: 0 Warnings: 0 Query OK, 9540 rows affected (0.26 sec) Records: 9540 Duplicates: 0 Warnings: 0 Query OK, 9508 rows affected (0.26 sec) Records: 9508 Duplicates: 0 Warnings: 0 Query OK, 9539 rows affected (0.26 sec) Records: 9539 Duplicates: 0 Warnings: 0 Query OK, 9523 rows affected (0.26 sec) Records: 9523 Duplicates: 0 Warnings: 0 Query OK, 9533 rows affected (0.26 sec) Records: 9533 Duplicates: 0 Warnings: 0 Query OK, 9510 rows affected (0.26 sec) Records: 9510 Duplicates: 0 Warnings: 0 Query OK, 9536 rows affected (0.26 sec) Records: 9536 Duplicates: 0 Warnings: 0 Query OK, 9484 rows affected (0.26 sec) Records: 9484 Duplicates: 0 Warnings: 0 Query OK, 9440 rows affected (0.25 sec) Records: 9440 Duplicates: 0 Warnings: 0 Query OK, 9420 rows affected (0.26 sec) Records: 9420 Duplicates: 0 Warnings: 0 Query OK, 9440 rows affected (0.26 sec) Records: 9440 Duplicates: 0 Warnings: 0 Query OK, 9433 rows affected (0.26 sec) Records: 9433 Duplicates: 0 Warnings: 0 Query OK, 9425 rows affected (0.25 sec) Records: 9425 Duplicates: 0 Warnings: 0 Query OK, 9438 rows affected (0.25 sec) Records: 9438 Duplicates: 0 Warnings: 0 Query OK, 9439 rows affected (0.26 sec) Records: 9439 Duplicates: 0 Warnings: 0 Query OK, 9429 rows affected (0.25 sec) Records: 9429 Duplicates: 0 Warnings: 0 Query OK, 9442 rows affected (0.25 sec) Records: 9442 Duplicates: 0 Warnings: 0 Query OK, 254 rows affected (0.01 sec) Records: 254 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> SELECT count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) FROM table_190000_utf8_undef; +---------------------------------------------------------------------+ | count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) | +---------------------------------------------------------------------+ | 28748 | +---------------------------------------------------------------------+ 1 row in set (0.12 sec) mysql> SELECT count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) FROM table_190000_utf8_undef; +---------------------------------------------------------------------+ | count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) | +---------------------------------------------------------------------+ | 43860 | +---------------------------------------------------------------------+ 1 row in set (0.15 sec)
[15 Aug 2020 5:00]
Yushan ZHANG
Which is more interesting... mysql> select count(*) from (select distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed` FROM table_190000_utf8_undef ) t; +----------+ | count(*) | +----------+ | 52960 | +----------+ 1 row in set (0.36 sec) mysql> select count(*) from (select distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed` FROM table_190000_utf8_undef) t; +----------+ | count(*) | +----------+ | 52960 | +----------+ 1 row in set (0.41 sec)
[17 Aug 2020 3:09]
Zhifeng Hu
@Yushan, the more interesting case is not actually interesting, because distinct did not erase out entries with NULL field, while count(distinct) ignores them. The result of count(*) from select distinct is correct AFAIK.
[17 Aug 2020 4:32]
Yushan ZHANG
@Zhifeng, I modified the query to a nested one, the `count(*)` is in the outer query while the inner query is equivalent to the original one.
[18 Aug 2020 2:50]
Zhifeng Hu
@Yushan, they are not really equivalent. As I said, select count(distinct a, b) from t; the result is not equal to the number of rows in select distinct a, b from t; becuase the later one includes rows where a and/or b is NULL, while count(distinct a, b) ignores them. BTW, this bug is derived from an issue here, take a look if it might help. https://github.com/pingcap/tidb/issues/17241