Bug #80067 Index on BIT column is NOT used when column name only is used in WHERE clause
Submitted: 20 Jan 2016 10:50 Modified: 20 Jan 2016 11:39
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.28, 5.7.10 OS:Any
Assigned to: CPU Architecture:Any
Tags: bit, missing manual, Optimizer

[20 Jan 2016 10:50] Valeriy Kravchuk
Description:
MySQL Manual does NOT say anything specific on how to use BIT columns in WHERE clauses, iut just says how to use BIT literals (see https://dev.mysql.com/doc/refman/5.6/en/bit-type.html and https://dev.mysql.com/doc/refman/5.6/en/bit-field-literals.html).

So, one may have a habit to consider BIT(1) equal to BOOL in a sense that "A value of zero is considered false. Nonzero values are considered true". This works as expected and produces correct results, but it seems index on BIT(1) column is NOT used even when forced:

mysql> explain select id, c2 from tbit FORCE INDEX(c1) where c1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tbit  | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

See more details below.

How to repeat:
mysql> create table tbit(id int primary key, c1 bit(1), c2 char(100), key(c1));
Query OK, 0 rows affected (0.66 sec)

mysql> insert into tbit values (1,1,'a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into tbit values (2,0,'a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into tbit values (3,0,'a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into tbit values (4,0,'a');
Query OK, 1 row affected (0.04 sec)

mysql> insert into tbit values (6,0,'b');
Query OK, 1 row affected (0.03 sec)

mysql> analyze table tbit;
+-----------+---------+----------+----------+
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.tbit | analyze | status   | OK       |
+-----------+---------+----------+----------+
1 row in set (0.08 sec)

In the following cases index is used as expected:

mysql> explain select id, c2 from tbit where c1=1;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | tbit  | ref  | c1            | c1   | 2       | const |    1 | NULL  |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> explain select id, c2 from tbit where c1=b'1';
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | tbit  | ref  | c1            | c1   | 2       | const |    1 | NULL  |
+----+-------------+-------+------+---------------+------+---------+-------+------+-------+
1 row in set (0.00 sec)

But not in this case:

mysql> explain select id, c2 from tbit where c1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tbit  | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

While results are the same:

mysql> select id, c2 from tbit where c1;
+----+------+
| id | c2   |
+----+------+
|  1 | a    |
+----+------+
1 row in set (0.00 sec)

mysql> select id, c2 from tbit where c1=1;
+----+------+
| id | c2   |
+----+------+
|  1 | a    |
+----+------+
1 row in set (0.00 sec)

Suggested fix:
Rewrite queries having C1 mentioned without condition in WHERE clause if C1 is a BIT(1) column, to reference it as C1=b'1' before further optimizations (or othewrwise allow to use the index on column when it makes sense from performance point of view).

While this is not implemented, explain performance implications of this use of BIT(1) columns in the manual.
[20 Jan 2016 11:39] MySQL Verification Team
Hello Valeriy,

Thank you for the report and test case.
Observed this with 5.6.28 and 5.7.10 builds.

Thanks,
Umesh
[20 Jan 2016 12:59] Øystein Grøvlen
Hi Valeriy,

I do not understand why you focus on BIT columns.  INTEGER columns have same behavior:

mysql> show create table t5;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                        |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t5    | CREATE TABLE `t5` (
  `pk` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `idx` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

mysql> explain select * from t5 force index(idx) where a;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t5    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |    80.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

mysql> explain select * from t5 force index(idx) where a>0;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t5    | NULL       | range | idx           | idx  | 5       | NULL |    5 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0,00 sec)
[15 Feb 2016 12:44] Sveta Smirnova
Øystein,

I believe worst case here is what this is true for BOOLEAN also.
[17 Jun 2020 23:13] Mark Guinness
Also reproduced on MySQL 8.0, see fiddle at https://www.db-fiddle.com/f/cmDYMj5j2SL7uTjZikzBaY/1.

These use the index:

select id, c2 from tbit where c1 > 0;
select id, c2 from tbit where c1 = TRUE;

These ignore the index:

select id, c2 from tbit where c1;
select id, c2 from tbit where c1 IS TRUE;

Note: The MariaDB documentation at https://mariadb.com/kb/en/is/ makes the following distinction for the IS operator.

"There is an important difference between using IS TRUE or comparing a value with TRUE using =. When using =, only 1 equals to TRUE. But when using IS TRUE, all values which are logically true (like a number > 1) return TRUE."