Bug #104928 LEFT JOIN does not use function-part index
Submitted: 13 Sep 2021 13:25 Modified: 14 Sep 2021 13:53
Reporter: Александр Ммммммм Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[13 Sep 2021 13:25] Александр Ммммммм
Description:
If I create index with part based on function MySQL doenst use with path in LEFT JOIN

How to repeat:
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.23    |
+-----------+
1 row in set (0.00 sec)
mysql> DROP TABLE IF EXISTS tmp_documents, tmp_document_positions;
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TABLE tmp_documents (
    ->     id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT
    ->     , date DATE
    ->     , KEY (date)
    -> );
Query OK, 0 rows affected (0.03 sec)
mysql> CREATE TABLE tmp_document_positions (
    ->     id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT
    ->     , document_id INT UNSIGNED NOT NULL
    ->     , quantity INT
    ->     , KEY document_id_abs_quantity (document_id, (ABS(quantity)))
    -> );
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT tmp_documents
    -> WITH RECURSIVE num(n) AS (
    ->     SELECT 0 n
    ->     UNION ALL
    ->     SELECT n + 1
    ->     FROM num
    ->     WHERE n + 1 <= 100
    -> )
    -> SELECT NULL, DATE(NOW() - INTERVAL num.n DAY)
    -> FROM num;
Query OK, 101 rows affected (0.00 sec)
Records: 101  Duplicates: 0  Warnings: 0
mysql> INSERT tmp_document_positions (document_id, quantity)
    -> WITH RECURSIVE num(n) AS (
    ->     SELECT 1 n
    ->     UNION ALL
    ->     SELECT n + 1
    ->     FROM num
    ->     WHERE n + 1 <= 100
    -> )
    -> SELECT d.id, n
    -> FROM tmp_documents d, num
    -> ;
Query OK, 10100 rows affected (0.28 sec)
Records: 10100  Duplicates: 0  Warnings: 0

INNER JOIN works fine:

mysql> EXPLAIN ANALYZE
    -> SELECT *
    -> FROM tmp_documents td
    -> JOIN tmp_document_positions tdp ON (tdp.document_id, (ABS(quantity))) = (td.id, 5)
    -> WHERE td.date = DATE(NOW())
    -> \G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=0.70 rows=1) (actual time=0.052..0.055 rows=1 loops=1)
    -> Index lookup on td using date (date=cast(now() as date))  (cost=0.35 rows=1) (actual time=0.020..0.021 rows=1 loops=1)
    -> Index lookup on tdp using document_id_abs_quantity (document_id=td.id, abs(quantity)=5)  (cost=0.35 rows=1) (actual time=0.030..0.032 rows=1 loops=1)
1 row in set (0.00 sec)

LEFT JOIN does not use second part index based on function:

mysql> EXPLAIN ANALYZE
    -> SELECT *
    -> FROM tmp_documents td
    -> LEFT JOIN tmp_document_positions tdp ON (tdp.document_id, (ABS(quantity))) = (td.id, 5)
    -> WHERE td.date = DATE(NOW())
    -> \G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop left join  (cost=0.70 rows=1) (actual time=0.058..0.232 rows=1 loops=1)
    -> Index lookup on td using date (date=cast(now() as date))  (cost=0.35 rows=1) (actual time=0.020..0.021 rows=1 loops=1)
    -> Filter: (abs(tdp.quantity) = 5)  (cost=0.35 rows=1) (actual time=0.036..0.209 rows=1 loops=1)
        -> Index lookup on tdp using document_id_abs_quantity (document_id=td.id)  (cost=0.35 rows=1) (actual time=0.032..0.200 rows=100 loops=1)
1 row in set (0.00 sec)

But if I create VIRTIAL column and make index without function everything will be ok.

mysql> ALTER TABLE tmp_document_positions
    -> DROP KEY document_id_abs_quantity;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE tmp_document_positions
    -> ADD abs_quantity INT AS (ABS(quantity)) INVISIBLE
    -> , ADD KEY document_id_abs_quantity_gen (document_id, abs_quantity);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> EXPLAIN ANALYZE
    -> SELECT *
    -> FROM tmp_documents td
    -> JOIN tmp_document_positions tdp ON (tdp.document_id, abs_quantity) = (td.id, 5)
    -> WHERE td.date = DATE(NOW())
    -> \G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=0.71 rows=1) (actual time=0.060..0.063 rows=1 loops=1)
    -> Index lookup on td using date (date=cast(now() as date))  (cost=0.35 rows=1) (actual time=0.033..0.034 rows=1 loops=1)
    -> Index lookup on tdp using document_id_abs_quantity_gen (document_id=td.id, abs_quantity=5)  (cost=0.36 rows=1) (actual time=0.025..0.027 rows=1 loops=1)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE
    -> SELECT *
    -> FROM tmp_documents td
    -> LEFT JOIN tmp_document_positions tdp ON (tdp.document_id, abs_quantity) = (td.id, 5)
    -> WHERE td.date = DATE(NOW())
    -> \G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop left join  (cost=0.71 rows=1) (actual time=0.038..0.041 rows=1 loops=1)
    -> Index lookup on td using date (date=cast(now() as date))  (cost=0.35 rows=1) (actual time=0.018..0.019 rows=1 loops=1)
    -> Index lookup on tdp using document_id_abs_quantity_gen (document_id=td.id, abs_quantity=5)  (cost=0.36 rows=1) (actual time=0.018..0.020 rows=1 loops=1)
1 row in set (0.00 sec)
[14 Sep 2021 12:04] MySQL Verification Team
Hi Mr. Mmmmmmm,

Thank you very much for your bug report.

However, we do need some additional information from you.

First of all, EXPLAIN ANALYSE for both INNER and OUTER JOIN look exactly the same, except for the costs involved and the timing. Both of these exceptions are, actually, expected behaviour.

Next, have you tried not using row expressions, but using separate column expressions ???

Regarding virtual column itself, that is expected behaviour.

We are waiting on your feedback.
[14 Sep 2021 12:35] MySQL Verification Team
Hi Mr. Mmmmmmm,

We have made some further analysis and decided that your report is indeed a bug.

Simply, abs(tdp.quantity) = 5 should have been pushed down into the index, instead of being left as a filter.

Verified as reported.
[14 Sep 2021 13:53] Александр Ммммммм
yes, it is! Different behavior of the optimizer under similar conditions

I also discovered something similar in the work of local variables vs constants (variable binding)

I think I'll issue a bug report ...