MySQL Bugs: #47906: Having an extra field in a table changes optimizer index choice for other column

Bug #47906	Having an extra field in a table changes optimizer index choice for other column
Submitted:	8 Oct 2009 4:56	Modified:	8 Nov 2009 6:21
Reporter:	Roel Van de Paar	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S1 (Critical)
Version:	5.0.85, 5.1.37, 5.1.39, 6.0.9	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
CREATE TABLE `a` (`id` int,`p` varchar(20) KEY) ENGINE=MyISAM;
CREATE TABLE `b` (         `p` varchar(20) KEY) ENGINE=MyISAM;

mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

---------

mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 1
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

---------

mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

How to repeat:
A small testcase that highlights the issue

---------
DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b;

CREATE TABLE `a` (`id` int,`p` varchar(20) KEY) ENGINE=MyISAM;
CREATE TABLE `b` (         `p` varchar(20) KEY) ENGINE=MyISAM;

insert into a values (1,"123"),(2,"456"),(3,"789"); 
insert into b values   ("123"),  ("456"),  ("789");

EXPLAIN SELECT * FROM a WHERE p LIKE '%2%'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '%2%'\G

EXPLAIN SELECT * FROM a WHERE p LIKE '2%'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '2%'\G

EXPLAIN SELECT * FROM a WHERE p LIKE '%2'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G
---------

A larger testcase that shows all possibilities

---------
DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; DROP TABLE IF EXISTS c; DROP TABLE IF EXISTS d;

CREATE TABLE `a` (`id` int,`p` varchar(20) KEY) ENGINE=MyISAM;
CREATE TABLE `b` (         `p` varchar(20) KEY) ENGINE=MyISAM;
CREATE TABLE `c` (`id` int,`p` varchar(20) KEY) ENGINE=InnoDB;
CREATE TABLE `d` (         `p` varchar(20) KEY) ENGINE=InnoDB;

insert into a values (1,"123"),(2,"456"),(3,"789"),(4,"135"),(5,"246"),(6,"357"),(7,"468"),(8,"579"),(9,"222"); 
insert into b values   ("123"),  ("456"),  ("789"),(  "135"),(  "246"),(  "357"),(  "468"),(  "579"),(  "222"); 
insert into c values (1,"123"),(2,"456"),(3,"789"),(4,"135"),(5,"246"),(6,"357"),(7,"468"),(8,"579"),(9,"222"); 
insert into d values   ("123"),  ("456"),  ("789"),(  "135"),(  "246"),(  "357"),(  "468"),(  "579"),(  "222"); 

EXPLAIN SELECT * FROM a WHERE p LIKE '%2%'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '%2%'\G
EXPLAIN SELECT * FROM c WHERE p LIKE '%2%'\G
EXPLAIN SELECT * FROM d WHERE p LIKE '%2%'\G

EXPLAIN SELECT * FROM a WHERE p LIKE '2%'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '2%'\G
EXPLAIN SELECT * FROM c WHERE p LIKE '2%'\G
EXPLAIN SELECT * FROM d WHERE p LIKE '2%'\G

EXPLAIN SELECT * FROM a WHERE p LIKE '%2'\G
EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G
EXPLAIN SELECT * FROM c WHERE p LIKE '2%'\G
EXPLAIN SELECT * FROM d WHERE p LIKE '2%'\G
---------

Same result on 5.1.39 and 6.0.9, with a slight difference on 6.0.9 ('Using MRR') on third explain:

mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.02 sec)

mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 2
     filtered: 100.00
        Extra: Using index condition; Using MRR
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 9
     filtered: 22.22
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 9
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

No workaround. Force index doesn't work:

mysql> EXPLAIN SELECT * FROM a FORCE INDEX (PRIMARY) WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b FORCE INDEX (PRIMARY) WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM a FORCE INDEX (`p`) WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b FORCE INDEX (`p`) WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 22
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

I can confirm the results with 5.0.x also:

C:\Program Files\MySQL\MySQL Server 5.1\bin>mysql -uroot -proot -P3308 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 30
Server version: 5.0.85-community-nt MySQL Community Edition (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b;
Query OK, 0 rows affected, 1 warning (0.00 sec)

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE `a` (`id` int,`p` varchar(20) KEY) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec)

mysql> CREATE TABLE `b` (         `p` varchar(20) KEY) ENGINE=MyISAM;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into a values (1,"123"),(2,"456"),(3,"789");
Query OK, 3 rows affected (0.03 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into b values   ("123"),  ("456"),  ("789");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql>
mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.03 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql>
mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 1
        Extra: Using where
1 row in set (0.03 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '2%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql>
mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.00 sec)

But the only potential problem I see here is this case:

SELECT * FROM a WHERE p LIKE '2%';

vs.

SELECT * FROM b WHERE p LIKE '2%';

and the problem is: Why use "index" for table b instead of "range", as for table a? But I'd say that "index" in this case is just "too big" "range", that includes all index records. Quite possible this is a good idea for table that small.

For other cases index is covering for table b, so it makes perfect sense to scan entire index (even for MyISAM table, it is in memory) instead of scanning entire table.

But even for this case, plan actually will depend on number of rows and distribution, and constants used in the query. Let me add some more rows:

mysql> insert into a values (4,"223"),(5,"256"),(6,"289");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into b values ("223"),("256"),("289");
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '1%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '1%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 62
          ref: NULL
         rows: 1
        Extra: Using where
1 row in set (0.00 sec)

So, for more rows and other constant we can easily get "range" for both tables, and the only different will be a covering index usage for table b ("Using index").

We need much bigger table and more evidence (execution time, 'Handler_read%' statistics etc) to clam that these is a bug here. Formally different access pathes for different tables on similar data are NOT a problem by themselves, IMHO.

Done a lot of research into this bug.

The problem I saw was not with the queries for table b, but rather with the queries for table a.

The issues seen are:
#1 For table a, the optimizer chooses to use no index at all (for the %2% and %2 examples).
#2 It chooses to use a range (for the 2% example), instead of an index.
#3 FORCE INDEX does not work at all for these queries.

A few conclusions in regards #1 and #3:

1. Table a is disk bound for %2% and %2 on MyISAM (InnoDB output looks similar):

| Handler_read_rnd_next | 10000000 |

2. Table b is index bound:

| Handler_read_first | 1 |
| Handler_read_next | 9999999 |

Now, even though the index is not covering for the columns requested on table a, one may think that fully scanning through an entire index (which may possibly reside in memory) for %2% and %2, and simultaneously reading the relevant rows from the data file (to obtain other columns) [as matching rows are being found in the index], would be faster than reading through the entire data file.

However, there are a few caveats: any other non-covered columns requested (like in this example the 'extra' id column from table a) still have to be read from the data file, and the index may not reside in memory, eliminating most optimization possibilities due to the need to swap back and forth between the index file (on disk) and the data file (on disk). Another unexplored issue is that it may not be possible to use cardinality estimates when using '%2%'.

Hence, any possible optimization here is very small, if any.

Surprisingly, during actual testing, I generally found that table b, though using less CPU, was even slower than table a (more total elapsed times), which seems quite odd given the fact that there is one column less for table b, that the index is covering (and being used as per Handler output), and especially since I had preloaded the index into a large 5gb key_buffer. Possibly this is because of how the index is formed (non-sequential?). Any other ideas as to why would be welcome.

3. The fact that FORCE INDEX does not work can be explained by the manual:

'You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.'
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

In other words, it is still only a hint, not an actual 'force index'.

In summary, for issues #1 and #3, I now consider this to be not a bug, nor a feature request.

However, it would be interesting to learn why table b processes slower than table a, as per the above.

For issue #2, Valeriy's note shows that when more items are added, it chooses a range. It would be interesting to do some more in-depth testing on this one as mentioned.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".