Bug #114703 The different use of in and = before subquery leads to the change of SQL result
Submitted: 19 Apr 2024 14:08 Modified: 25 Apr 2024 13:48
Reporter: shu William Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.28 OS:CentOS (7.9)
Assigned to: CPU Architecture:x86 (x86_64)
Tags: subquery

[19 Apr 2024 14:08] shu William
Description:
I have an SQL statement. When in and = are combined with a subquery, the result sets returned by them are inconsistent.

1、SQL
SELECT
al.object_id3 AS product_id,
al.object_id1 AS domain_id,
DATE(al.created_dt) AS view_date,
COUNT(al.object_id3) AS count
FROM activitylog AS al
LEFT JOIN domaincompanymap AS dcm
ON dcm.company_id = al.login_company_id
AND dcm.domain_id = al.login_domain_id
WHERE al.activitylogkey_id = (select id from activitylogkey where name = 'PRODUCT_VIEW')
AND dcm.account_type_lookup_id = 201
AND al.created_dt >= '2024-03-17'
AND al.object_id3 IS NOT NULL
GROUP BY al.object_id3, view_date;

2、use = before the subquery
SELECT
al.object_id3 AS product_id,
al.object_id1 AS domain_id,
DATE(al.created_dt) AS view_date,
COUNT(al.object_id3) AS count
FROM activitylog AS al
LEFT JOIN domaincompanymap AS dcm
ON dcm.company_id = al.login_company_id
AND dcm.domain_id = al.login_domain_id
WHERE al.activitylogkey_id = (select id from activitylogkey where name = 'PRODUCT_VIEW')
AND dcm.account_type_lookup_id = 201
AND al.created_dt >= '2024-03-17'
AND al.object_id3 IS NOT NULL
GROUP BY al.object_id3, view_date;

The result set I got is:
|     708911 |       933 | 2024-03-21 |     1 |
|     683087 |       933 | 2024-03-21 |     2 |
|     770104 |       933 | 2024-03-21 |     1 |
+------------+-----------+------------+-------+
148 rows in set (1.35 sec)

explain analyze 
 -> Table scan on <temporary>  (actual time=0.002..0.020 rows=148 loops=1)
    -> Aggregate using temporary table  (actual time=178.123..178.151 rows=148 loops=1)
        -> Nested loop inner join  (cost=41739.14 rows=788) (actual time=0.370..177.736 rows=187 loops=1)
            -> Filter: ((al.activitylogkey_id = (select #2)) and (al.object_id3 is not null))  (cost=38982.75 rows=7875) (actual time=0.070..148.740 rows=11812 loops=1)
                -> Index range scan on al using activitylog_createddt_nix over ('2024-03-17 00:00:00' <= created_dt), with index condition: (al.created_dt >= TIMESTAMP'2024-03-17 00:00:00')  (cost=38982.75 rows=86608) (actual time=0.024..142.879 rows=42323 loops=1)
                -> Select #2 (subquery in condition; run only once)
                    -> Filter: (activitylogkey.`name` = 'PRODUCT_VIEW')  (cost=0.35 rows=1) (actual time=0.024..0.026 rows=1 loops=1)
                        -> Index lookup on activitylogkey using activitylogkey_name_nix (name='PRODUCT_VIEW')  (cost=0.35 rows=1) (actual time=0.022..0.024 rows=1 loops=1)
            -> Filter: (dcm.account_type_lookup_id = 201)  (cost=0.25 rows=0) (actual time=0.002..0.002 rows=0 loops=11812)
                -> Single-row index lookup on dcm using domaincompanymap_domainid_companyid_uix (domain_id=al.login_domain_id, company_id=al.login_company_id), with index condition: (dcm.domain_id = al.login_domain_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=0 loops=11812)
 |

3、use in before the subquery
SELECT
al.object_id3 AS product_id,
al.object_id1 AS domain_id,
DATE(al.created_dt) AS view_date,
COUNT(al.object_id3) AS count
FROM activitylog AS al
LEFT JOIN domaincompanymap AS dcm
ON dcm.company_id = al.login_company_id
AND dcm.domain_id = al.login_domain_id
WHERE al.activitylogkey_id in (select id from activitylogkey where name = 'PRODUCT_VIEW')
AND dcm.account_type_lookup_id = 201
AND al.created_dt >= '2024-03-17'
AND al.object_id3 IS NOT NULL
GROUP BY al.object_id3, view_date;

The result set I got is:
|     748985 |       934 | 2024-03-20 |     1 |
|     748984 |       934 | 2024-03-20 |     1 |
|     709166 |       933 | 2024-03-20 |     1 |
+------------+-----------+------------+-------+
148 rows in set (10.54 sec)

explain analyze 
| -> Table scan on <temporary>  (actual time=0.002..0.020 rows=148 loops=1)
    -> Aggregate using temporary table  (actual time=9574.275..9574.303 rows=148 loops=1)
        -> Nested loop inner join  (cost=688.90 rows=0) (actual time=5912.164..9573.831 rows=187 loops=1)
            -> Nested loop inner join  (cost=687.75 rows=3) (actual time=5744.669..9544.705 rows=11812 loops=1)
                -> Filter: (activitylogkey.`name` = 'PRODUCT_VIEW')  (cost=0.35 rows=1) (actual time=0.023..0.029 rows=1 loops=1)
                    -> Index lookup on activitylogkey using activitylogkey_name_nix (name='PRODUCT_VIEW')  (cost=0.35 rows=1) (actual time=0.021..0.027 rows=1 loops=1)
                -> Filter: ((al.created_dt >= TIMESTAMP'2024-03-17 00:00:00') and (al.object_id3 is not null))  (cost=491.38 rows=3) (actual time=5744.645..9543.856 rows=11812 loops=1)
                    -> Index lookup on al using activitylog_typelogincompanyloginuser_nix (activitylogkey_id=activitylogkey.id), with index condition: (al.activitylogkey_id = activitylogkey.id)  (cost=491.38 rows=1963) (actual time=0.340..8495.810 rows=2617714 loops=1)
            -> Filter: (dcm.account_type_lookup_id = 201)  (cost=0.25 rows=0) (actual time=0.002..0.002 rows=0 loops=11812)
                -> Single-row index lookup on dcm using domaincompanymap_domainid_companyid_uix (domain_id=al.login_domain_id, company_id=al.login_company_id), with index condition: (dcm.domain_id = al.login_domain_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=0 loops=11812)
 |

4、subquery only 1 row
mysql> select id from activitylogkey where name = 'PRODUCT_VIEW';
+----+
| id |
+----+
| 84 |
+----+
1 row in set (0.22 sec)

5、CTE rewrite
with q1 as 
(SELECT al.object_id3 AS product_id,
al.object_id1 AS domain_id,
DATE(al.created_dt) AS view_date,
COUNT(al.object_id3) AS coun,
al.activitylogkey_id as activitylogkey_id
FROM activitylog AS al
LEFT JOIN domaincompanymap AS dcm
ON dcm.company_id = al.login_company_id
AND dcm.domain_id = al.login_domain_id
WHERE 
dcm.account_type_lookup_id = 201
AND al.created_dt >= '2024-01-17'
AND al.object_id3 IS NOT NULL
GROUP BY al.object_id3, view_date),
q2 as (select id from activitylogkey where name = 'PRODUCT_VIEW')
select q1.product_id,q1.domain_id,q1.coun from q1 where q1.activitylogkey_id in (select id from q2);

use in
|     708911 |       933 |    1 |
|     683087 |       933 |    2 |
|     770104 |       933 |    1 |
+------------+-----------+------+
148 rows in set (0.41 sec)

use =
|     708911 |       933 |    1 |
|     683087 |       933 |    2 |
|     770104 |       933 |    1 |
+------------+-----------+------+
148 rows in set (0.40 sec)

How to repeat:
There is no way for the time being.

Suggested fix:
There is no way for the time being.
[19 Apr 2024 15:38] MySQL Verification Team
HI Mr. William,

Thank you for your bug report.

However, this is not a bug.

IN operator is identical with = ANY combination. Without ANY, ALL or similar, you can not know which value will be used for comparison. That is according to SQL standard.
[20 Apr 2024 12:21] shu William
There is only one result of the subquery. pls check explain and subquery
Why is the result set different? 
I don't think = and in should cause it.
[22 Apr 2024 3:16] shu William
There is only one result of the subquery. pls check explain and subquery again
Why is the result set different?
I don't think = and in should cause it.
[22 Apr 2024 10:10] MySQL Verification Team
Hi Mr. William,

Thank you for your comments.

Let us inform you that this is a forum for the bug reports with repeatable test cases. The test case should contain a set of SQL statements that always leads to the bug that you are reporting. We do not see DDL and DML statements that would enable us to run your queries.

Next, you are using an old release of 8.0. Please, try with the latest release available.

Not a bug.
[23 Apr 2024 23:47] shu William
This is a query statement, not DDL or DML.I provided the result set of the query, in the attachment. pls check
[24 Apr 2024 10:45] MySQL Verification Team
HI,

We have analysed your result set and determined that result sets are consistent.

Thanks for your interest in MySQL.
[25 Apr 2024 13:48] shu William
Did you really check the result set? The first line is inconsistent.
[25 Apr 2024 14:14] MySQL Verification Team
Hi Mr. William,

Thank you for your feedback.

However, we have two problems with your bug report.

First of all, this is a forum for the reports with fully repeatable test cases. Such a test case should consist of a set of SQL statements that would always show the bug that you report. We have never received such a test case from you. We have only the queries and not the entire tables.

Second, you have provided us with 5 (five) different results sets , so we do not know which ones of those five sets are you comparing.

Third, we repeat that IN and = ANY statements are identical, as per SQL standard.

Fourth, if you use only = before nested query and that nested query returns more then one result, then you should get an error.

Fifth, you should check your queries against 8.0.36 and not against 8.0.28.

Hence, we have lot's of feedback to get from you in this report.