Bug #22367 Optimizer uses ref join type instead of eq_ref for simple join on strings
Submitted: 14 Sep 2006 20:52 Modified: 26 Oct 2006 3:31
Reporter: jocelyn fournier (Silver Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.24a, 5.1 BK OS:Linux (Linux)
Assigned to: Georgi Kodinov CPU Architecture:Any

[14 Sep 2006 20:52] jocelyn fournier
Description:
Hi,

When doing a simple join with strings, I figured out MySQL was using ref join type instead of eq_ref.

Is this expected ?

How to repeat:
CREATE TABLE a (`id` int(10) unsigned NOT NULL auto_increment,`createdby` varchar(20) NOT NULL default '',PRIMARY KEY  (`id`));
CREATE TABLE b (`id` int(10) unsigned NOT NULL auto_increment,`login` varchar(20) NOT NULL default '',PRIMARY KEY  (`id`),UNIQUE KEY (login));
INSERT INTO a VALUES (1,'a'),(2,'b'),(3,'c');
INSERT INTO b VALUES (1,'a'),(2,'b'),(3,'c');
EXPLAIN SELECT a.id FROM a LEFT JOIN b ON b.login=a.createdby WHERE a.id=3;

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | a     | const | PRIMARY       | PRIMARY | 4       | const |    1 |             |
|  1 | SIMPLE      | b     | ref   | login         | login   | 22      | const |    1 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

I would have expect MySQL to use an eq_ref optimisation for table b since it has a unique key on 'login'.

BTW, using EXPLAIN EXTENDED, it reports something strange :

mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                       |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`a`.`id` AS `id` from `test`.`a` left join `test`.`b` on((`test`.`b`.`login` = `test`.`a`.`createdby`)) where 1 |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Why do we have WHERE 1 here, instead of where `test`.`a`.`id` = 3 ?

Regards,
  Jocelyn
[15 Sep 2006 12:59] Sveta Smirnova
Thank you for the report.

According to http://dev.mysql.com/doc/refman/5.0/en/explain.html and quotes from it:

----<START QUOTE>----
eq_ref
...
It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE index.
...
ref
...
ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index
----<END QUOTE>----

I mark this bug as "Verified".

MySQL 4.1 is not affected:

<skip filling tables>
mysql> EXPLAIN SELECT a.id FROM a LEFT JOIN b ON b.login=a.createdby WHERE
    -> a.id=3;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | a     | const | PRIMARY       | PRIMARY |       4 | const |    1 |             |
|  1 | SIMPLE      | b     | const | login         | login   |      20 | const |    1 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)
[27 Sep 2006 8:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12590

ChangeSet@1.2280, 2006-09-27 11:04:07+03:00, gkodinov@macbook.gmz +6 -0
  Bug #22367: Optimizer uses ref join type instead of eq_ref for simple join on 
               strings
  MySQL is setting the flag HA_END_SPACE_KEYS for all the keys that reference
  text or varchar columns with collation different than binary.
  This was done to handle correctly the situation where a lookup on such a key
  may return more than 1 row because of the presence of many rows that differ
  only by the amount of trailing space in the table's string column.
  Inserting such values however appears to violate the unique checks on 
  INSERT/UPDATE. Thus that flag must not be set as it will prevent the optimizer
  from choosing a faster access method.
  This fix removes the setting of the HA_END_SPACE_KEYS flag.
[27 Sep 2006 14:13] jocelyn fournier
Hi,

-Note	1003	select `test`.`t8`.`pseudo` AS `pseudo`,(select `test`.`t8`.`email` AS `email`
from `test`.`t8` where (`test`.`t8`.`pseudo` = (select `test`.`t8`.`pseudo` AS `pseudo`
from `test`.`t8` where (`test`.`t8`.`pseudo` = _latin1'joce')))) AS `(SELECT email FROM
t8 WHERE pseudo=(SELECT pseudo FROM t8 WHERE pseudo='joce'))` from `test`.`t8` where
(`test`.`t8`.`pseudo` = (select `test`.`t8`.`pseudo` AS `pseudo` from `test`.`t8` where
(`test`.`t8`.`pseudo` = _latin1'joce')))
+Note	1003	select `test`.`t8`.`pseudo` AS `pseudo`,(select `test`.`t8`.`email` AS `email`
from `test`.`t8` where 1) AS `(SELECT email FROM t8 WHERE pseudo=(SELECT pseudo FROM t8
WHERE pseudo='joce'))` from `test`.`t8` where 1

=> Any idea why the EXTENDED SELECT returns WHERE 1 whereas before it was returning properly the WHERE part of the query ?

Regards,
  Jocelyn
[5 Oct 2006 16:12] Sergei Golubchik
Yes (though it's just an idea, as you requested, I didn't check the code to verify it).

If you look at the output of the EXPLAIN - the table itself, not the warning - you'll see that optimizer was using 'ref' join, and after the fix it was changed to 'const'. The manual says:
"
   * `const'

     The table has at most one matching row, which is read at the
     start of the query. Because there is only one row, values
     from the column in this row can be regarded as constants by
     the rest of the optimizer.  `const' tables are very fast
     because they are read only once.
"
but actually there's more to it, for 'const' tables, MySQL read the matching row very early in the execution, in fact, it reads it during optimizing, many optimizations are done _after_ this one row is read. MySQL reads this row and basically substitutes column references with plain constants - values from this row. In the remaining of optimizations these values are treated exactly like constants (e.g. they're subject to constant propagation).

The above was a fact, now goes an idea.

I suspect that after reading const tables and replacing appropriate columns with constants WHERE clause is optimized away. the warning is generated from the transformed query after WHERE is optimized away, so you see no WHERE clause. On the other hand, when prining an expression tree every column prints its original name, even if it was transformed to a constant. So, the printed query looks illogical.
[5 Oct 2006 16:35] jocelyn fournier
Hi Sergei,

Thanks for the explanation !
Do you want me to open an other bug report for this issue ?

Thanks,
  Jocelyn
[6 Oct 2006 16:57] Sergei Golubchik
No need to, I suspect it's the same as Bug#22331
(and Bug#21939 is also related)
[16 Oct 2006 15:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13749

ChangeSet@1.2280, 2006-10-16 18:09:58+03:00, gkodinov@macbook.gmz +6 -0
  Bug #22367: Optimizer uses ref join type instead of eq_ref for simple join on 
               strings
  MySQL is setting the flag HA_END_SPACE_KEYS for all the keys that reference
  text or varchar columns with collation different than binary.
  This was done to handle correctly the situation where a lookup on such a key
  may return more than 1 row because of the presence of many rows that differ
  only by the amount of trailing space in the table's string column.
  Inserting such values however appears to violate the unique checks on 
  INSERT/UPDATE. Thus that flag must not be set as it will prevent the optimizer
  from choosing a faster access method.
  This fix removes the setting of the HA_END_SPACE_KEYS flag.
[21 Oct 2006 9:13] Georgi Kodinov
Pushed in 5.0.27/5.1.13-beta
[26 Oct 2006 3:31] Paul DuBois
Noted in 5.0.30, 5.1.13 changelogs.