MySQL Bugs: #108833: Suspicious Estimated Rows

Bug #108833	Suspicious Estimated Rows
Submitted:	20 Oct 2022 10:11	Modified:	20 Oct 2022 12:34
Reporter:	JINSHENG BA	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.0.31	OS:	Ubuntu (20.04)
Assigned to:		CPU Architecture:	x86
Tags:	cardinality estimation

Description:
See this test case. The second SELECT is more restrictive than the first SELECT because of the WHERE clause, but why the second SELECT returns bigger number of estimated rows than that of the first SELECT?

CREATE TABLE t0(c0 VARCHAR(500), c1 TINYINT(83));
INSERT INTO t0 VALUES(1, 2), (3, NULL);
INSERT INTO t0(c0) VALUES(2);
INSERT INTO t0(c1) VALUES(NULL);
CREATE INDEX i0 USING HASH ON t0(c1, c0);
ANALYZE TABLE t0 UPDATE HISTOGRAM ON c0, c1;

EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 GROUP BY t0.c1; -- rows: 3
EXPLAIN SELECT DISTINCT t0.c1 AS ref0 FROM t0 WHERE t0.c0 > t0.c1 GROUP BY t0.c1; -- rows: 4

How to repeat:
$ sudo docker run -it -p 13306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql
$ mysql --host 127.0.0.1 -u root --password=123456 --port 13306

Then create a database and type the above test case.

Suggested fix:
I expect the second SELECT should return no more rows than the first SELECT.

As a reference, if we change the column types of c0 and c1 to INT, the results are expected as both SELECT return 4 rows.

Hello Jinsheng Ba,

Thank you for the report and test case.

regards,
Umesh