Bug #47906 | Having an extra field in a table changes optimizer index choice for other column | ||
---|---|---|---|
Submitted: | 8 Oct 2009 4:56 | Modified: | 8 Nov 2009 6:21 |
Reporter: | Roel Van de Paar | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 5.0.85, 5.1.37, 5.1.39, 6.0.9 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[8 Oct 2009 4:56]
Roel Van de Paar
[8 Oct 2009 5:00]
Roel Van de Paar
Same result on 5.1.39 and 6.0.9, with a slight difference on 6.0.9 ('Using MRR') on third explain: mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.02 sec) mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 22 ref: NULL rows: 9 filtered: 100.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) mysql> mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: range possible_keys: PRIMARY key: PRIMARY key_len: 22 ref: NULL rows: 2 filtered: 100.00 Extra: Using index condition; Using MRR 1 row in set, 1 warning (0.00 sec) mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: PRIMARY key: PRIMARY key_len: 22 ref: NULL rows: 9 filtered: 22.22 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) mysql> mysql> EXPLAIN EXTENDED SELECT * FROM a WHERE p LIKE '%2'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) mysql> EXPLAIN EXTENDED SELECT * FROM b WHERE p LIKE '%2'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 22 ref: NULL rows: 9 filtered: 100.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec)
[8 Oct 2009 5:12]
Roel Van de Paar
No workaround. Force index doesn't work: mysql> EXPLAIN SELECT * FROM a FORCE INDEX (PRIMARY) WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using where 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM b FORCE INDEX (PRIMARY) WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 22 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM a FORCE INDEX (`p`) WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using where 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM b FORCE INDEX (`p`) WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 22 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec)
[8 Oct 2009 6:21]
Valeriy Kravchuk
I can confirm the results with 5.0.x also: C:\Program Files\MySQL\MySQL Server 5.1\bin>mysql -uroot -proot -P3308 test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 30 Server version: 5.0.85-community-nt MySQL Community Edition (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> DROP TABLE IF EXISTS a; DROP TABLE IF EXISTS b; Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> CREATE TABLE `a` (`id` int,`p` varchar(20) KEY) ENGINE=MyISAM; Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE `b` ( `p` varchar(20) KEY) ENGINE=MyISAM; Query OK, 0 rows affected (0.06 sec) mysql> insert into a values (1,"123"),(2,"456"),(3,"789"); Query OK, 3 rows affected (0.03 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> insert into b values ("123"), ("456"), ("789"); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using where 1 row in set (0.03 sec) mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 62 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: range possible_keys: PRIMARY key: PRIMARY key_len: 62 ref: NULL rows: 1 Extra: Using where 1 row in set (0.03 sec) mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '2%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: PRIMARY key: PRIMARY key_len: 62 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '%2'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3 Extra: Using where 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 62 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '%2'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: index possible_keys: NULL key: PRIMARY key_len: 62 ref: NULL rows: 3 Extra: Using where; Using index 1 row in set (0.00 sec) But the only potential problem I see here is this case: SELECT * FROM a WHERE p LIKE '2%'; vs. SELECT * FROM b WHERE p LIKE '2%'; and the problem is: Why use "index" for table b instead of "range", as for table a? But I'd say that "index" in this case is just "too big" "range", that includes all index records. Quite possible this is a good idea for table that small. For other cases index is covering for table b, so it makes perfect sense to scan entire index (even for MyISAM table, it is in memory) instead of scanning entire table. But even for this case, plan actually will depend on number of rows and distribution, and constants used in the query. Let me add some more rows: mysql> insert into a values (4,"223"),(5,"256"),(6,"289"); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> insert into b values ("223"),("256"),("289"); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT * FROM b WHERE p LIKE '1%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: b type: range possible_keys: PRIMARY key: PRIMARY key_len: 62 ref: NULL rows: 1 Extra: Using where; Using index 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM a WHERE p LIKE '1%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: range possible_keys: PRIMARY key: PRIMARY key_len: 62 ref: NULL rows: 1 Extra: Using where 1 row in set (0.00 sec) So, for more rows and other constant we can easily get "range" for both tables, and the only different will be a covering index usage for table b ("Using index"). We need much bigger table and more evidence (execution time, 'Handler_read%' statistics etc) to clam that these is a bug here. Formally different access pathes for different tables on similar data are NOT a problem by themselves, IMHO.
[15 Oct 2009 5:18]
Roel Van de Paar
Done a lot of research into this bug. The problem I saw was not with the queries for table b, but rather with the queries for table a. The issues seen are: #1 For table a, the optimizer chooses to use no index at all (for the %2% and %2 examples). #2 It chooses to use a range (for the 2% example), instead of an index. #3 FORCE INDEX does not work at all for these queries. A few conclusions in regards #1 and #3: 1. Table a is disk bound for %2% and %2 on MyISAM (InnoDB output looks similar): | Handler_read_rnd_next | 10000000 | 2. Table b is index bound: | Handler_read_first | 1 | | Handler_read_next | 9999999 | Now, even though the index is not covering for the columns requested on table a, one may think that fully scanning through an entire index (which may possibly reside in memory) for %2% and %2, and simultaneously reading the relevant rows from the data file (to obtain other columns) [as matching rows are being found in the index], would be faster than reading through the entire data file. However, there are a few caveats: any other non-covered columns requested (like in this example the 'extra' id column from table a) still have to be read from the data file, and the index may not reside in memory, eliminating most optimization possibilities due to the need to swap back and forth between the index file (on disk) and the data file (on disk). Another unexplored issue is that it may not be possible to use cardinality estimates when using '%2%'. Hence, any possible optimization here is very small, if any. Surprisingly, during actual testing, I generally found that table b, though using less CPU, was even slower than table a (more total elapsed times), which seems quite odd given the fact that there is one column less for table b, that the index is covering (and being used as per Handler output), and especially since I had preloaded the index into a large 5gb key_buffer. Possibly this is because of how the index is formed (non-sequential?). Any other ideas as to why would be welcome. 3. The fact that FORCE INDEX does not work can be explained by the manual: 'You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.' http://dev.mysql.com/doc/refman/5.1/en/index-hints.html In other words, it is still only a hint, not an actual 'force index'. In summary, for issues #1 and #3, I now consider this to be not a bug, nor a feature request. However, it would be interesting to learn why table b processes slower than table a, as per the above. For issue #2, Valeriy's note shows that when more items are added, it chooses a range. It would be interesting to do some more in-depth testing on this one as mentioned.
[9 Nov 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".