Bug #96219 | Optimizer not using indexes when they have only a few non-NULL values | ||
---|---|---|---|
Submitted: | 16 Jul 2019 10:05 | Modified: | 15 Nov 2019 14:54 |
Reporter: | S. M. | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
Version: | OS: | Any | |
Assigned to: | CPU Architecture: | Any |
[16 Jul 2019 10:05]
S. M.
[16 Jul 2019 12:53]
MySQL Verification Team
Hi Mr. Pisellino, Thank you for your bug report. Before we proceed to getting a test case from you, we have to get some clarifications from you. First of all, "using index" does not mean that index is used efficiently. That actually means that the entire index is scanned, which is absolutely necessary for the DISTINCT clause ... Next, there can be only one index used for each table instance in the query. This begs the question on whether there is usable index on the `id` column of the `carts` table. Last, but not least, try using inner join instead of the straight one ...... Let us know the results.
[16 Jul 2019 13:57]
Saverio Miroddi
> First of all, "using index" does not mean that index is used efficiently. That actually means that the entire index is scanned, which is absolutely necessary for the DISTINCT clause ... True. In this specific case, based on the statistics that MySQL has, it's doing the correct thing; assuming that: - it's going to perform essentially a cartesian product; - for each record, to traverse the index and then access the data page; it follows that is less expensive to perform a nested block loop, because it saves the index traversal (and that it allows a sequential read of the child table). I'll follow up on this on the last point. > Next, there can be only one index used for each table instance in the query. This begs the question on whether there is usable index on the `id` column of the `carts` table. Yes; it's the primary key. > Last, but not least, try using inner join instead of the straight one ...... When using an inner join rather than a straight join, mysql reorders the tables, yielding a suboptimal but still usable plan. I actually tested it before - the intent was to show what I think is the wrong prediction on the joinable child rows. It's hard for me (but maybe not impossible) to provide data for a more conventional case (ie. using inner or left join). I think reformulating the bug could solve this problem. If we think it as: "Optimizer significantly overestimates the number of JOIN rows, leading to (significantly) suboptimal plans"? would that be accepted? This makes the bug concept more generic and simpler in nature, being independent from the queries/joins themselves (inner/straight/left). The focus is on a simple fact - that the statistics on the carts table are wrong: the rows expected should be low (k*10⁰) rather than the number of table rows (k*10⁵). note: I've mistakenly used a disused account when opening the account, apologies.
[16 Jul 2019 15:04]
MySQL Verification Team
Hi, Actually, this is not a bug. Simply, you are using STRAIGHT_JOIN in a wrong manner. So, if INNER JOIN works fine, this is not a bug. This is all explained in our Reference Manual. Send me the EXPLAIN with the INNER JOIN.
[18 Jul 2019 14:39]
Saverio Miroddi
Thanks, I'll collect some further data, and post the details!
[17 Aug 2019 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[26 Aug 2019 12:41]
MySQL Verification Team
Hello Mr. S.M., Seems like you did not read our Reference Manual. When the optimiser concludes that there is an index that can be used (key != NULL) and when that index is used with "Using index" method, then the entire index is scanned. Table is not scanned, just the index in it's entirety. In that case, NULLs can not be excluded, since it is scanning, not jumping around B+ tree. Answers to all other of your questions can be found in the Reference Manual.
[26 Aug 2019 16:33]
Saverio Miroddi
I see; thanks for the prompt replies.
[27 Aug 2019 12:10]
MySQL Verification Team
You are truly welcome ....
[22 Oct 2019 15:09]
Saverio Miroddi
Hello! I've found a dataset that consistently reproduces the issue (on MySQL 8.0.15). The procedure, with some comments, follow; I can make the data available, if required. Preparation: ANALYZE TABLE children; SELECT COUNT(*), SUM(parent_id IS NOT NULL) `non_nulls` FROM children; -- +----------+-----------+ -- | COUNT(*) | non_nulls | -- +----------+-----------+ -- | 6619200 | 3 | -- +----------+-----------+ CREATE TEMPORARY TABLE parents (id INT PRIMARY KEY); INSERT INTO parents VALUES (319467910),(319467912),(319467914),(319467915); The base (unhinted) query form takes around 27": EXPLAIN DELETE c.* FROM children c JOIN parents p ON p.id = c.parent_id WHERE c.an_attribute IS NULL ; -- +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+-------------+ -- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+-------------+ -- | 1 | SIMPLE | p | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 4 | 100.00 | Using index | -- | 1 | DELETE | c | NULL | ALL | children_parent_id_idx | NULL | NULL | NULL | 6327845 | 3.33 | Using where | -- +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+-------------+ The hinted version takes around 0.04": EXPLAIN DELETE c.* FROM parents p JOIN children c FORCE INDEX (children_parent_id_index) ON p.id = c.parent_id WHERE c.an_attribute IS NULL ; -- +----+-------------+-------+------------+-------+------------------------+------------------------+---------+-------------------------------+---------+----------+-------------+ -- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- +----+-------------+-------+------------+-------+------------------------+------------------------+---------+-------------------------------+---------+----------+-------------+ -- | 1 | SIMPLE | p | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 4 | 100.00 | Using index | -- | 1 | DELETE | c | NULL | ref | children_parent_id_idx | children_parent_id_idx | 5 | mydb.p.id | 2109281 | 10.00 | Using where | -- +----+-------------+-------+------------+-------+------------------------+------------------------+---------+-------------------------------+---------+----------+-------------+ In unhinted query version, MySQL doesn't use the `children_parent_id_idx` index, which leads to a very slow plan. Based on other similar instances of the problem we've had in the last years, it seems that MySQL/InnoDB tends not to use indexes that have a proportionally small amount of non-NULL values. In such conditions, MySQL generally rearranges the tables order, which yields a suboptimal but still viable plan, but in some cases, like the above, the optimization is excessively suboptimal.
[23 Oct 2019 12:45]
MySQL Verification Team
Hello Mr. Severio, If you can improve performance with optimiser hints, then please use them. That is why they have been invented. Nobody ever made a perfect optimiser that always find the best manner of execution, regardless of what is thrown at it. What you have reported here is a feature request for the Optimiser. If you send me all your data and I am capable of repeating what you see, then I will verify this report as aa feature request. Do note that feature requests are usually implemented in some future version.
[28 Oct 2019 10:24]
Saverio Miroddi
Thanks for the follow up. One question I have is: which are the technical grounds that make this case special, so that the optimizer refuses to use the index on `payments`, even when a FTS is considered very expensive (by using FORCE INDEX), and therefore, making this issue a feature request? The most basic test case is ultimately a simple two-table SELECT/JOIN: `SELECT p.* FROM _carts_to_delete ctd JOIN payments p ON ctd.id = p.cart_id`.
[4 Nov 2019 13:48]
MySQL Verification Team
Hi, This case is special because it is a boundary case. These type of cases are the reason why we have optimiser hints. There is no perfect SQL optimiser , that will always use the best path. Also, I do not see what version are you exactly using. You should also, first check your test case on 8.0.18. I also do not see a difference in the EXPLAIN with or without optimiser hint.
[4 Nov 2019 13:49]
MySQL Verification Team
Last, but not least, we do need a test case which does not use STRAIGHT_JOIN. As I wrote to you before, you use that reserved word totally wrongly.
[4 Nov 2019 13:50]
MySQL Verification Team
I have also looked at the "File" tab and I do not see anything uploaded.
[15 Nov 2019 9:35]
Saverio Miroddi
> This case is special because it is a boundary case. These type of cases are the reason why we have optimiser hints. Can you be more detailed about this? I'm trying to understand the problem, in particular, because this doesn't seem to affect SELECTs, but only DELETEs. > Also, I do not see what version are you exactly using. You should also, first check your test case on 8.0.18. OK. I've tested also on 8.0.18; the slow queries are actually slower (~8.9"). > I also do not see a difference in the EXPLAIN with or without optimiser hint. > Last, but not least, we do need a test case which does not use STRAIGHT_JOIN. As I wrote to you before, you use that reserved word totally wrongly. If you look at the queries provided, you'll see that two of them are regular JOINs. Given their plan is the same, there is one test case that can be used. The reason for providing the version with the hint is for completeness; trying all the options ultimately lead to the query that run fast. STRAIGHT_JOIN alone wasn't enough, and FORCE INDEX alone wasn't enough as well; I had to use both. > I have also looked at the "File" tab and I do not see anything uploaded. I've uploaded the file in the Oracle FTP (`mysql-bug-data-96219.zip`). File is still there, since if I try to reupload, I get the error `File //support/incoming/mysql-bug-data-96219.zip already exists, Please specify a different filename`.
[15 Nov 2019 9:36]
Saverio Miroddi
> The reason for providing the version with the hint is for completeness Correction. I meant: "The reason for providing all the versions (vanilla, STRAIGHT_JOIN, hist) is for completeness."
[15 Nov 2019 12:48]
MySQL Verification Team
Hi, Can you try to ANALYZE both tables prior to running your queries. Second, what is a difference in speed, exactly, when no other queries are running. Third, scanning indices has nothing to do with Cartesian products.
[15 Nov 2019 14:54]
MySQL Verification Team
Hi, I have actually run your HUGE test case and managed to repeat the behaviour. However, this turns out to be expected behaviour, which is here to stay. As our Reference Manual clearly states: " STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order. " Hence, this is a behaviour that is here to stay for quite some time.
[20 Nov 2019 10:45]
Saverio Miroddi
Ok; for my own understanding: which are specific technical reasons for the optimizer not to use the `payments_cart_id_index` index, with the following query and the dataset provided?: ``` EXPLAIN SELECT p.* FROM _carts_to_delete c STRAIGHT_JOIN payments p ON c.id = p.cart_id; +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+----------------------------------------------------+ | 1 | SIMPLE | c | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 1000 | 100.00 | Using index | | 1 | SIMPLE | p | NULL | ALL | payments_cart_id_index | NULL | NULL | NULL | 7035624 | 50.00 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+-------+------------------------+---------+---------+------+---------+----------+----------------------------------------------------+ ``` the dataset provided appears to be an uncommon circumstance; under regular circumstances, the index is used: ``` EXPLAIN SELECT p.* FROM _carts_to_delete c STRAIGHT_JOIN payments p ON c.id = p.cart_id; +----+-------------+-------+------------+-------+------------------------+------------------------+---------+------------+---------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+------------------------+------------------------+---------+------------+---------+----------+-------------+ | 1 | SIMPLE | c | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 1311 | 100.00 | Using index | | 1 | SIMPLE | p | NULL | ref | payments_cart_id_index | payments_cart_id_index | 5 | my_db.c.id | 2075564 | 100.00 | NULL | +----+-------------+-------+------------+-------+------------------------+------------------------+---------+------------+---------+----------+-------------+ ``` Note that I've used `SELECT`s; they yield the same plan as `DELETE`s. On both tables, `ANALYZE TABLE` has been run.
[20 Nov 2019 12:32]
MySQL Verification Team
The answwer is quite obvious. A change in index statistics. Anyway, this is not a forum for answering anyone's questions. It is only a forum for bugs with repeatable test case, which shows a behaviour that is different from the one in our Reference Manual.