Bug #108833 Suspicious Estimated Rows
Submitted: 20 Oct 2022 10:11 Modified: 20 Oct 2022 12:34
Reporter: JINSHENG BA Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.31 OS:Ubuntu (20.04)
Assigned to: CPU Architecture:x86
Tags: cardinality estimation

[20 Oct 2022 10:11] JINSHENG BA
Description:
See this test case. The second SELECT is more restrictive than the first SELECT because of the WHERE clause, but why the second SELECT returns bigger number of estimated rows than that of the first SELECT?

CREATE TABLE t0(c0 VARCHAR(500), c1 TINYINT(83));
INSERT INTO t0 VALUES(1, 2), (3, NULL);
INSERT INTO t0(c0) VALUES(2);
INSERT INTO t0(c1) VALUES(NULL);
CREATE INDEX i0 USING HASH ON t0(c1, c0);
ANALYZE TABLE t0 UPDATE HISTOGRAM ON c0, c1;

EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 GROUP BY t0.c1; -- rows: 3
EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 WHERE t0.c0 > t0.c1 GROUP BY t0.c1; -- rows: 4

How to repeat:
$ sudo docker run -it -p 13306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql
$ mysql --host 127.0.0.1 -u root --password=123456 --port 13306

Then create a database and type the above test case.

Suggested fix:
I expect the second SELECT should return no more rows than the first SELECT.

As a reference, if we change the column types of c0 and c1 to INT, the results are expected as both SELECT return 4 rows.
[20 Oct 2022 12:34] MySQL Verification Team
Hello Jinsheng Ba,

Thank you for the report and test case.

regards,
Umesh