Bug #100504 Count distinct is inconsistent when swapping two fields being counted.
Submitted: 12 Aug 2020 11:58 Modified: 18 Aug 2020 2:50
Reporter: Zhifeng Hu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0.21, 5.7.31 OS:Linux (Arch Linux)
Assigned to: CPU Architecture:x86 (x86_64)
Tags: aggragation, COUNT DISTINCT

[12 Aug 2020 11:58] Zhifeng Hu
Description:
The following two SQL queries give different results on a database,

mysql> SELECT count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               28748 |
+---------------------------------------------------------------------+
1 row in set (0.11 sec)

mysql> SELECT count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               43860 |
+---------------------------------------------------------------------+
1 row in set (0.10 sec)

which does not make sense.

How to repeat:
the test data can be obtained here: https://github.com/pingcap/tidb/files/4649993/multi_column.txt

use test;
source multi_column.txt

and use the two above SQL queries, you will have it.

Suggested fix:
N/A
[12 Aug 2020 13:09] MySQL Verification Team
Hello Zhifeng Hu,

Thank you for the report and test case.
Observed this with 8.0.21 build.

regards,
Umesh
[12 Aug 2020 13:11] MySQL Verification Team
- 8.0.21 build
mysql> source 100504.sql;
.

mysql> SELECT count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               28748 |
+---------------------------------------------------------------------+
1 row in set (0.12 sec)

mysql> SELECT count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               43860 |
+---------------------------------------------------------------------+
1 row in set (0.13 sec)
[12 Aug 2020 13:11] MySQL Verification Team
I'll add 5.6/5.7 results later on
[14 Aug 2020 4:58] MySQL Verification Team
5.7.31 - affected with provided test case

bin/mysql -uroot -S /tmp/mysql_ushastry.sock --loose-local-infile=1                                                                                                  Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.31 MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> source 100504.sql;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 9630 rows affected (0.29 sec)
Records: 9630  Duplicates: 0  Warnings: 0

Query OK, 9529 rows affected (0.26 sec)
Records: 9529  Duplicates: 0  Warnings: 0

Query OK, 9508 rows affected (0.25 sec)
Records: 9508  Duplicates: 0  Warnings: 0

Query OK, 9540 rows affected (0.26 sec)
Records: 9540  Duplicates: 0  Warnings: 0

Query OK, 9508 rows affected (0.26 sec)
Records: 9508  Duplicates: 0  Warnings: 0

Query OK, 9539 rows affected (0.26 sec)
Records: 9539  Duplicates: 0  Warnings: 0

Query OK, 9523 rows affected (0.26 sec)
Records: 9523  Duplicates: 0  Warnings: 0

Query OK, 9533 rows affected (0.26 sec)
Records: 9533  Duplicates: 0  Warnings: 0

Query OK, 9510 rows affected (0.26 sec)
Records: 9510  Duplicates: 0  Warnings: 0

Query OK, 9536 rows affected (0.26 sec)
Records: 9536  Duplicates: 0  Warnings: 0

Query OK, 9484 rows affected (0.26 sec)
Records: 9484  Duplicates: 0  Warnings: 0

Query OK, 9440 rows affected (0.25 sec)
Records: 9440  Duplicates: 0  Warnings: 0

Query OK, 9420 rows affected (0.26 sec)
Records: 9420  Duplicates: 0  Warnings: 0

Query OK, 9440 rows affected (0.26 sec)
Records: 9440  Duplicates: 0  Warnings: 0

Query OK, 9433 rows affected (0.26 sec)
Records: 9433  Duplicates: 0  Warnings: 0

Query OK, 9425 rows affected (0.25 sec)
Records: 9425  Duplicates: 0  Warnings: 0

Query OK, 9438 rows affected (0.25 sec)
Records: 9438  Duplicates: 0  Warnings: 0

Query OK, 9439 rows affected (0.26 sec)
Records: 9439  Duplicates: 0  Warnings: 0

Query OK, 9429 rows affected (0.25 sec)
Records: 9429  Duplicates: 0  Warnings: 0

Query OK, 9442 rows affected (0.25 sec)
Records: 9442  Duplicates: 0  Warnings: 0

Query OK, 254 rows affected (0.01 sec)
Records: 254  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               28748 |
+---------------------------------------------------------------------+
1 row in set (0.12 sec)

mysql> SELECT count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) FROM table_190000_utf8_undef;
+---------------------------------------------------------------------+
| count(distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed`) |
+---------------------------------------------------------------------+
|                                                               43860 |
+---------------------------------------------------------------------+
1 row in set (0.15 sec)
[15 Aug 2020 5:00] Yushan ZHANG
Which is more interesting...

mysql> select count(*) from (select distinct `col_bit(64)_undef_signed`, `col_enum_undef_signed` FROM table_190000_utf8_undef
) t;
+----------+
| count(*) |
+----------+
|    52960 |
+----------+
1 row in set (0.36 sec)

mysql> select count(*) from (select distinct `col_enum_undef_signed`, `col_bit(64)_undef_signed` FROM table_190000_utf8_undef) t;
+----------+
| count(*) |
+----------+
|    52960 |
+----------+
1 row in set (0.41 sec)
[17 Aug 2020 3:09] Zhifeng Hu
@Yushan, the more interesting case is not actually interesting, because distinct did not erase out entries with NULL field, while count(distinct) ignores them. The result of count(*) from select distinct is correct AFAIK.
[17 Aug 2020 4:32] Yushan ZHANG
@Zhifeng, I modified the query to a nested one, the `count(*)` is in the outer query while the inner query is equivalent to the original one.
[18 Aug 2020 2:50] Zhifeng Hu
@Yushan, they are not really equivalent. As I said, 

select count(distinct a, b) from t;

the result is not equal to the number of rows in

select distinct a, b from t;

becuase the later one includes rows where a and/or b is NULL, while count(distinct a, b) ignores them.

BTW, this bug is derived from an issue here, take a look if it might help.

https://github.com/pingcap/tidb/issues/17241