MySQL Bugs: #5606: Nonoptimal join type

Bug #5606	Nonoptimal join type
Submitted:	16 Sep 2004 10:19	Modified:	7 Oct 2004 12:27
Reporter:	Eugene Mamchits	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	4.0.21	OS:	Any (any)
Assigned to:		CPU Architecture:	Any

Description:
MySQL prefers ref join type instead of range check for each record
(that may be much more quick) and no way to force the right behaviour.

The problem is that so we cannot include 'use/ignore/force key'
into request because MySQL uses right key here but not as good
as possible.

How to repeat:
#!/bin/sh

mysql="mysql test"

echo '
  create table `tab` (
    `id` int(10),
    `prm` int(10),
    key `id-prm` (`id`, `prm`)
  );

  create table `ctl` (
    `id` int(10),
    `prm_from` int(10), prm_to int(10)
  );

  insert into `ctl` (`id`, `prm_from`, `prm_to`) values (1, 0, 256), (0, 0, 65536);
' | ${mysql}

{
for i in `seq 1 128`; do
  echo "${i}" >&2
  echo 'insert into `tab` (`prm`, `id`) values'
  c=""
  for j in `seq 1 8192`; do
    echo -n "${c} (floor(rand() * 65536), if(prm % 256, 1, 0))"
    c=","
  done
  echo ";"
done
} | ${mysql}

# End of script
explain select count(1) from `ctl` inner join `tab`
on
 `tab`.`id` = `ctl`.`id` and
 `tab`.`prm` >= `ctl`.`prm_from` and `tab`.`prm` < `ctl`.`prm_to`
;

+-------+------+---------------+--------+---------+--------+------+--------------------------+
| table | type | possible_keys | key    | key_len | ref    | rows | Extra                    |
+-------+------+---------------+--------+---------+--------+------+--------------------------+
| ctl   | ALL  | NULL          | NULL   |    NULL | NULL   |    2 |                          |
| tab   | ref  | id-prm        | id-prm |       5 | ctl.id | 4096 | Using where; Using index |
+-------+------+---------------+--------+---------+--------+------+--------------------------+

select count(1) from `ctl` inner join `tab`
on
 `tab`.`id` = `ctl`.`id` and
 `tab`.`prm` >= `ctl`.`prm_from` and `tab`.`prm` < `ctl`.`prm_to`
;

+----------+
| count(1) |
+----------+
|     8210 |
+----------+
1 row in set (3.30 sec)

For comparison, the same job with table `tab` using ranges needs
much less time:

explain select count(1) from `tab`
where
  (`id` = 1 and `prm` >= 0 and `prm` < 256) or
  (`id` = 0 and `prm` >= 0 and `prm` < 65536)
;

+-------+-------+---------------+--------+---------+------+------+--------------------------+
| table | type  | possible_keys | key    | key_len | ref  | rows | Extra                    |
+-------+-------+---------------+--------+---------+------+------+--------------------------+
| tab   | range | id-prm        | id-prm |      10 | NULL | 6115 | Using where; Using index |
+-------+-------+---------------+--------+---------+------+------+--------------------------+

select count(1) from `tab`
where
  (`id` = 1 and `prm` >= 0 and `prm` < 256) or
  (`id` = 0 and `prm` >= 0 and `prm` < 65536)
;

+----------+
| count(1) |
+----------+
|     8210 |
+----------+
1 row in set (0.04 sec)

Suggested fix:
Select optimizer modification needed assuming the longer part
of key using for ranges is always better than the shorter part for ref
if it is really so.

If not may be good to give to keyword 'force key' an additional property
for such cases.

The problem here is that the 'range_check' is a very expensive method to check rows as we build up a total range tree for every lookup. MySQL uses this now only as a last resort if there wasn't any keys that could be used.

We are constantly improving the optimizer and we plan to improve the 'range_check' cost calculation so that we can use that more often.

I will put on our todo to add a new keywords so that one can force the range_check method.
Unfortunately this is not likely to happen until in MySQL 5.x