Description:
See this test case. The second SELECT is more restrictive than the first SELECT because of the WHERE clause, but why the second SELECT returns bigger number of estimated rows than that of the first SELECT?
CREATE TABLE t0(c0 VARCHAR(500), c1 TINYINT(83));
INSERT INTO t0 VALUES(1, 2), (3, NULL);
INSERT INTO t0(c0) VALUES(2);
INSERT INTO t0(c1) VALUES(NULL);
CREATE INDEX i0 USING HASH ON t0(c1, c0);
ANALYZE TABLE t0 UPDATE HISTOGRAM ON c0, c1;
EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 GROUP BY t0.c1; -- rows: 3
EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 WHERE t0.c0 > t0.c1 GROUP BY t0.c1; -- rows: 4
How to repeat:
$ sudo docker run -it -p 13306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql
$ mysql --host 127.0.0.1 -u root --password=123456 --port 13306
Then create a database and type the above test case.
Suggested fix:
I expect the second SELECT should return no more rows than the first SELECT.
As a reference, if we change the column types of c0 and c1 to INT, the results are expected as both SELECT return 4 rows.