Bug #113054 mysql explain filtered inaccurate when find_in_set function used
Submitted: 12 Nov 2023 10:36 Modified: 13 Nov 2023 10:40
Reporter: harry harry Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.32 OS:CentOS
Assigned to: CPU Architecture:Any

[12 Nov 2023 10:36] harry harry
Description:
mysql explain filtered inaccurate when find_in_set function used

How to repeat:
This sql return actual rows is 33:

mysql> select count(*) from ms_transferhourse m0     WHERE  m0.DataDate BETWEEN '2023-10-01' AND '2023-10-25'         AND  FIND_IN_SET( m0
    -> .EnterpriseID, '2793862' )         AND  FIND_IN_SET( m0.PigFarmID,'2204141338210000077,2203030933030000077' );
+----------+
| count(*) |
+----------+
|       33 |
+----------+
1 row in set (0.51 sec)

But this sql explain estimate return rows is: 251750 * 100% = 251750, I think the result is wrong.

mysql> explain select count(*) from  ms_transferhourse m0     WHERE  m0.DataDate BETWEEN '2023-10-01' AND '2023-10-25'         AND  FIND_IN_SET
( m0 .EnterpriseID, '2793862' )         AND  FIND_IN_SET( m0.PigFarmID,'2204141338210000077,2203030933030000077' );
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+--------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key          | key_len | ref  | rows   | filtered | Extra                              |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+--------+----------+------------------------------------+
|  1 | SIMPLE      | m0    | NULL       | range | idx_DataDate  | idx_DataDate | 3       | NULL | 251750 |   100.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+--------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)

Now I drop the index idx_DataDate :

mysql> alter table ms_transferhourse drop index  idx_DataDate ;
Query OK, 0 rows affected (7.29 sec)
Records: 0  Duplicates: 0  Warnings: 0

And then run this sql explain again:
mysql> desc select count(*) from  ms_transferhourse m0     WHERE  m0.DataDate BETWEEN '2023-10-01' AND '2023-10-25'         AND  FIND_IN_SET( m0.EnterpriseID, '2793862' )         AND  FIND_IN_SET( m0.PigFarmID,'2204141338210000077,2203030933030000077' ) \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m0
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2845859
     filtered: 5.58
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Now this sql explain return the estimate is :2845859 * 5.58% = 165, Near true value 33.

I think mysql exlain  filterd is not inaccurate in the above situation.
[13 Nov 2023 10:40] MySQL Verification Team
Hi Mr. harry,

Thank you for your bug report.

However, we can not repeat what you are reporting.

We need a fully repeatable test case, including all tables involved.

Also, there must be a difference in the number of rows when the index is used and when it is not used.

Your table statistics may not be correct. Hence, that is why we recommend running ANALYZE TABLE, so that table and index stats are redone. That operation is done by MySQL when idle, but meanwhile, there might have been a number of rows changed or added. Hence, that is why the above command has to be run.

This is expected behaviour and not a bug.