Bug #77720 "IS bool" condition in where provides worse execution plan than "= integer"
Submitted: 14 Jul 2015 15:41 Modified: 24 Jul 2015 15:14
Reporter: Pep Pla Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6 OS:MacOS (Mavericks)
Assigned to: CPU Architecture:Any

[14 Jul 2015 15:41] Pep Pla
Description:
Indexes and partitions on bool fields deliver different execution plans when WHERE condition has "IS TRUE"/"IS FALSE" than "=1"/"=0"

In the case of partitions, when the condition is "IS <bool>" is not able to run against only one of the partitions and scans all of them.

In the case of non partitioned tables with an index on the boolean column, when the condition is "IS <bool>" is not able to use the index (the extra column in EXPLAIN only says "using where")

For partitioned tables:
is bool
+----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | partitions             | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | test_table | field_true,field_false | ALL  | NULL          | NULL | NULL    | NULL | 5874 | Using where |
+----+-------------+------------+------------------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

= integer
+----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | partitions  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | test_table | field_false | ALL  | NULL          | NULL | NULL    | NULL | 2966 | Using where |
+----+-------------+------------+-------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.)

For indexed tables
is nool
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table      | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test_table | index | NULL          | bool_field | 5       | NULL | 5117 |   100.00 | Using where; Using index |
+----+-------------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

= integer
+----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table      | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_table | ref  | bool_field    | bool_field | 1       | const | 2533 |   100.00 | Using index |
+----+-------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

How to repeat:
Use this script for indexed tables;
DROP TABLE test_table;
CREATE TABLE `test_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `bool_field` bool NOT NULL DEFAULT false,
  PRIMARY KEY (`id`),
  KEY (bool_field,id)
) ENGINE INNODB;
DROP PROCEDURE FILL_TABLE;
DELIMITER $$
CREATE PROCEDURE FILL_TABLE()
BEGIN
   DECLARE A INT;
   SET A=1;
   bucle: LOOP
      IF A > 5000 THEN
         LEAVE bucle;
      END IF;
      SET A = A + 1;
      INSERT INTO test_table VALUES (null,FLOOR(RAND()+.5));
   END LOOP bucle;
END$$
DELIMITER ;
CALL FILL_TABLE;
EXPLAIN EXTENDED
SELECT id,bool_field
FROM test_table
WHERE bool_field IS FALSE;
EXPLAIN EXTENDED
SELECT id,bool_field
FROM test_table
WHERE bool_field = 0;

And this for partitioned
DROP TABLE test_table;
CREATE TABLE test_table (
  id int(11) NOT NULL AUTO_INCREMENT,
  bool_field bool NOT NULL DEFAULT false,
  KEY (`id`)
)
PARTITION BY LIST(bool_field) (
    PARTITION field_true VALUES IN (true),
    PARTITION field_false VALUES IN (false)
);
DROP PROCEDURE FILL_TABLE;
DELIMITER $$
CREATE PROCEDURE FILL_TABLE()
BEGIN
   DECLARE A INT;
   SET A=1;
   bucle: LOOP
      IF A > 5000 THEN
         LEAVE bucle;
      END IF;
      SET A = A + 1;
      INSERT INTO test_table VALUES (null,FLOOR(RAND()+.5));
   END LOOP bucle;
END$$
DELIMITER ;
CALL FILL_TABLE;
EXPLAIN PARTITIONS
SELECT id,bool_field
FROM test_table
WHERE bool_field IS FALSE;
EXPLAIN PARTITIONS
SELECT id,bool_field
FROM test_table
WHERE bool_field = 0;
[14 Jul 2015 15:48] Pep Pla
I also reproduced this bug in 5.5 on FreeBSD
[14 Jul 2015 15:51] Pep Pla
Test case for partitioned tables

Attachment: test_case_p.sql (application/sql, text), 743 bytes.

[14 Jul 2015 15:52] Pep Pla
Test case for indexed tables

Attachment: test_case_i.sql (application/sql, text), 669 bytes.

[23 Jul 2015 15:05] MySQL Verification Team
Hi!

Thank you for your bug report. In order to proceed with testing your report, we need your MySQL settings. Can you send us the arguments that you have passed to mysqld daemon and can you also send us you global config file.

Can you also try to change your settings for:

* innodb_stats_persistent 
* innodb_stats_persistent_sample_pages 
* innodb_stats_sample_pages 
* innodb_stats_transient_sample_pages

Simply, turn persistent stats on and increase other variables by certain factor. Also, check whether optimizer_prune_level is ON and turn on engine_condition_pushdown and
index_condition_pushdown if those are off.

Then, repeat your tests and let us know whether the results are the same ...
[24 Jul 2015 15:14] MySQL Verification Team
This bug seems to be fixed in the latest version, at least for 5.7.

Just as in your test case, first comes explain for IS FALSE and then for = 0;

Here is my output for the case when non-partitioned table is used:

+----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
| id | select_type | table      | partitions | type  | possible_keys | key        | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | test_table | NULL       | index | NULL          | bool_field | 5       | NULL | 10001 |   100.00 | Using where; Using index |
+----+-------------+------------+------------+-------+---------------+------------+---------+------+-------+----------+--------------------------+
1 row in set, 2 warnings (0.00 sec)

+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_table | NULL       | ref  | bool_field    | bool_field | 1       | const | 4984 |   100.00 | Using index |
+----+-------------+------------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

There are some differences, but in both cases, the index on column `bool_field` is used. Do note that "using index" does not mean that index is used for searching. It only means that index is scanned and not used in the optimal manner.

Here are the results for partitioned table, in the same order:

id      select_type     table   partitions      type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  test_table      field_true,field_false  ALL     NULL    NULL    NULL    NULL    10001   100.00  Using where

id      select_type     table   partitions      type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  test_table      field_false     ALL     NULL    NULL    NULL    NULL    4961    100.00  Using where

Results are practically identical. Index is not utilized in both cases, although I increased number of rows.

As I wrote above, here too, index is not used optimally. This is expected behavior. If you read our manual you will find out that searching by index when cca. 50 % of rows satisfy the condition is not optimal !!!!! It is simply faster to read sequentially, either rows or index then to use complex B+-trees searches. As I said, it is explained in the manual.

I also tried all the extras and the results were identical.