| Bug #77720 | "IS bool" condition in where provides worse execution plan than "= integer" | ||
|---|---|---|---|
| Submitted: | 14 Jul 2015 15:41 | Modified: | 24 Jul 2015 15:14 |
| Reporter: | Pep Pla | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 5.6 | OS: | MacOS (Mavericks) |
| Assigned to: | CPU Architecture: | Any | |
[14 Jul 2015 15:48]
Pep Pla
I also reproduced this bug in 5.5 on FreeBSD
[14 Jul 2015 15:51]
Pep Pla
Test case for partitioned tables
Attachment: test_case_p.sql (application/sql, text), 743 bytes.
[14 Jul 2015 15:52]
Pep Pla
Test case for indexed tables
Attachment: test_case_i.sql (application/sql, text), 669 bytes.
[23 Jul 2015 15:05]
MySQL Verification Team
Hi! Thank you for your bug report. In order to proceed with testing your report, we need your MySQL settings. Can you send us the arguments that you have passed to mysqld daemon and can you also send us you global config file. Can you also try to change your settings for: * innodb_stats_persistent * innodb_stats_persistent_sample_pages * innodb_stats_sample_pages * innodb_stats_transient_sample_pages Simply, turn persistent stats on and increase other variables by certain factor. Also, check whether optimizer_prune_level is ON and turn on engine_condition_pushdown and index_condition_pushdown if those are off. Then, repeat your tests and let us know whether the results are the same ...
[24 Jul 2015 15:14]
MySQL Verification Team
This bug seems to be fixed in the latest version, at least for 5.7. Just as in your test case, first comes explain for IS FALSE and then for = 0; Here is my output for the case when non-partitioned table is used: +----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+ | 1 | SIMPLE | test_table | NULL | index | NULL | bool_field | 5 | NULL | 10001 | 100.00 | Using where; Using index | +----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+ 1 row in set, 2 warnings (0.00 sec) +----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | test_table | NULL | ref | bool_field | bool_field | 1 | const | 4984 | 100.00 | Using index | +----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ 1 row in set, 2 warnings (0.00 sec) There are some differences, but in both cases, the index on column `bool_field` is used. Do note that "using index" does not mean that index is used for searching. It only means that index is scanned and not used in the optimal manner. Here are the results for partitioned table, in the same order: id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE test_table field_true,field_false ALL NULL NULL NULL NULL 10001 100.00 Using where id select_type table partitions type possible_keys key key_len ref rows filtered Extra 1 SIMPLE test_table field_false ALL NULL NULL NULL NULL 4961 100.00 Using where Results are practically identical. Index is not utilized in both cases, although I increased number of rows. As I wrote above, here too, index is not used optimally. This is expected behavior. If you read our manual you will find out that searching by index when cca. 50 % of rows satisfy the condition is not optimal !!!!! It is simply faster to read sequentially, either rows or index then to use complex B+-trees searches. As I said, it is explained in the manual. I also tried all the extras and the results were identical.

Description: Indexes and partitions on bool fields deliver different execution plans when WHERE condition has "IS TRUE"/"IS FALSE" than "=1"/"=0" In the case of partitions, when the condition is "IS <bool>" is not able to run against only one of the partitions and scans all of them. In the case of non partitioned tables with an index on the boolean column, when the condition is "IS <bool>" is not able to use the index (the extra column in EXPLAIN only says "using where") For partitioned tables: is bool +----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | test_table | field_true,field_false | ALL | NULL | NULL | NULL | NULL | 5874 | Using where | +----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec) = integer +----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | test_table | field_false | ALL | NULL | NULL | NULL | NULL | 2966 | Using where | +----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.) For indexed tables is nool +----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | test_table | index | NULL | bool_field | 5 | NULL | 5117 | 100.00 | Using where; Using index | +----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ 1 row in set, 1 warning (0.00 sec) = integer +----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | test_table | ref | bool_field | bool_field | 1 | const | 2533 | 100.00 | Using index | +----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) How to repeat: Use this script for indexed tables; DROP TABLE test_table; CREATE TABLE `test_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `bool_field` bool NOT NULL DEFAULT false, PRIMARY KEY (`id`), KEY (bool_field,id) ) ENGINE INNODB; DROP PROCEDURE FILL_TABLE; DELIMITER $$ CREATE PROCEDURE FILL_TABLE() BEGIN DECLARE A INT; SET A=1; bucle: LOOP IF A > 5000 THEN LEAVE bucle; END IF; SET A = A + 1; INSERT INTO test_table VALUES (null,FLOOR(RAND()+.5)); END LOOP bucle; END$$ DELIMITER ; CALL FILL_TABLE; EXPLAIN EXTENDED SELECT id,bool_field FROM test_table WHERE bool_field IS FALSE; EXPLAIN EXTENDED SELECT id,bool_field FROM test_table WHERE bool_field = 0; And this for partitioned DROP TABLE test_table; CREATE TABLE test_table ( id int(11) NOT NULL AUTO_INCREMENT, bool_field bool NOT NULL DEFAULT false, KEY (`id`) ) PARTITION BY LIST(bool_field) ( PARTITION field_true VALUES IN (true), PARTITION field_false VALUES IN (false) ); DROP PROCEDURE FILL_TABLE; DELIMITER $$ CREATE PROCEDURE FILL_TABLE() BEGIN DECLARE A INT; SET A=1; bucle: LOOP IF A > 5000 THEN LEAVE bucle; END IF; SET A = A + 1; INSERT INTO test_table VALUES (null,FLOOR(RAND()+.5)); END LOOP bucle; END$$ DELIMITER ; CALL FILL_TABLE; EXPLAIN PARTITIONS SELECT id,bool_field FROM test_table WHERE bool_field IS FALSE; EXPLAIN PARTITIONS SELECT id,bool_field FROM test_table WHERE bool_field = 0;