Bug #79271 InnoDB uses non-covering index on bool column when estimate is 50% of the table
Submitted: 13 Nov 2015 13:44 Modified: 7 Dec 2015 16:03
Reporter: Jiří Kavalík Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[13 Nov 2015 13:44] Jiří Kavalík
Description:
When there is a tinyint column with 0/1 values and only handful of rows contain one of them, optimizer computes the rows extimate for using index on that column as 50% of the rows instead of 95%+ and moreover it actually uses the index instead of a table scan even when there are additional conditions on a non-indexed column (so PK ref access will be needed anyway).

Selecting the "rare" value gives much better estimate (and using index is the right choice in that case).

I was able to repeat it with 5.5 http://sqlfiddle.com/#!2/31d54/1 and 5.6 http://sqlfiddle.com/#!9/31d549/1

I tested it between 15k - 1M rows and did observe it in production on 3.5M table. It seemed to behave correctly (optimizer prefering tablescan) somewhere under 15k rows.

I do not have access to 5.7 currently so cannot verify it there.

How to repeat:
DROP TABLE IF EXISTS skewed_bool;
CREATE TABLE skewed_bool (id INT AUTO_INCREMENT PRIMARY KEY, val INT NOT NULL, flag bool NOT NULL, KEY(flag)) engine=innodb;
INSERT INTO skewed_bool(val, flag) VALUES (5,0),(5,0),(5,1),(5,0),(5,0),(5,0),(5,0),(5,0);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
INSERT INTO skewed_bool(val, flag) SELECT 5, 0 FROM skewed_bool;
INSERT INTO skewed_bool(val, flag) VALUES (5,1);
 
EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 1;
EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 0;
SHOW INDEX FROM skewed_bool;
analyze TABLE skewed_bool;
SHOW INDEX FROM skewed_bool;
EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 1;
EXPLAIN SELECT COUNT(1) FROM skewed_bool WHERE val > 1 AND flag = 0;
SELECT COUNT(1), flag FROM skewed_bool GROUP BY flag;

Suggested fix:
Observe the "heuristics" 20% rows -> full table scan as done in other cases.
[7 Dec 2015 16:03] MySQL Verification Team
Hi Jiří,

Thanks for bug submission. Unfortunately at the moment this is expected behavior and not a bug. You can work around the problem by using index hints
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

kind regards
Bogdan Kecman