Description:
When joining two tables on a column that is nullable on one of the tables, index range scan cannot be used (and cannot even be forced) on that table unless you add a condition:
WHERE column IS NOT NULL
How to repeat:
CREATE TABLE t_leading (id INT NOT NULL PRIMARY KEY, selector INT, value TEXT NOT NULL, KEY ix_leading_selector (selector)) ENGINE=InnoDB;
CREATE TABLE t_driven (id INT NOT NULL PRIMARY KEY, value TEXT NOT NULL) Engine=InnoDB;
INSERT
INTO t_leading (id, selector, value)
VALUES
(1, 1, LPAD('Value 1', 3000, ' ')),
(2, NULL, LPAD('Value 2', 3000, ' ')),
(3, NULL, LPAD('Value 3', 3000, ' ')),
(4, NULL, LPAD('Value 4', 3000, ' ')),
(5, NULL, LPAD('Value 5', 3000, ' ')),
(6, NULL, LPAD('Value 6', 3000, ' ')),
(7, NULL, LPAD('Value 7', 3000, ' ')),
(8, NULL, LPAD('Value 8', 3000, ' ')),
(9, NULL, LPAD('Value 9', 3000, ' ')),
(10, NULL, LPAD('Value 10', 3000, ' '));
INSERT
INTO t_driven (id, value)
VALUES
(1, LPAD('Value 1', 3000, ' ')),
(2, LPAD('Value 2', 3000, ' ')),
(3, LPAD('Value 3', 3000, ' ')),
(4, LPAD('Value 4', 3000, ' ')),
(5, LPAD('Value 5', 3000, ' ')),
(6, LPAD('Value 6', 3000, ' ')),
(7, LPAD('Value 7', 3000, ' ')),
(8, LPAD('Value 8', 3000, ' ')),
(9, LPAD('Value 9', 3000, ' ')),
(10, LPAD('Value 10', 3000, ' '));
EXPLAIN
SELECT *
FROM t_leading l FORCE INDEX (ix_leading_selector)
JOIN
t_driven d FORCE INDEX (PRIMARY)
ON d.id = l.selector;
1, 'SIMPLE', 'l', 'ALL', 'ix_leading_selector', '', '', '', 10, ''
1, 'SIMPLE', 'd', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.l.selector', 1, ''
EXPLAIN
SELECT *
FROM t_leading l FORCE INDEX (ix_leading_selector)
JOIN
t_driven d FORCE INDEX (PRIMARY)
ON d.id = l.selector
WHERE l.selector IS NOT NULL;
1, 'SIMPLE', 'l', 'range', 'ix_leading_selector', 'ix_leading_selector', '5', '', 1, 'Using where'
1, 'SIMPLE', 'd', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.l.selector', 1, ''
Since these queries are in fact identical (a satisfied equality condition implies NOT NULL on both columns), I expected same plan to be used for both queries, which would use ix_leading_selector, as FORCE INDEX says.
However, the first query uses full table scan over t_leading, even despite FORCE INDEX being used.
This is not efficient, since there is only 1 non-NULL column and 9 NULL columns, and the table rows very large.
Besides, this behavior contradicts documentation, which says:
> You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
There certainly is a way to use this index, since this query, which differs only in JOIN order, does use it:
EXPLAIN
SELECT *
FROM t_driven d FORCE INDEX (PRIMARY)
STRAIGHT_JOIN
t_leading l FORCE INDEX (ix_leading_selector)
ON d.id = l.selector;
1, 'SIMPLE', 'd', 'ALL', 'PRIMARY', '', '', '', 10, ''
1, 'SIMPLE', 'l', 'ref', 'ix_leading_selector', 'ix_leading_selector', '5', 'test.d.id', 5, 'Using where'
Suggested fix:
Fix the optimizer algorithm so that the index can be FORCED for the first query