Bug #67638 | Odd optimizer choice for UPDATE .. ORDER BY ... LIMIT query | ||
---|---|---|---|
Submitted: | 19 Nov 2012 17:00 | Modified: | 26 Nov 2012 17:24 |
Reporter: | Sergey Petrunya | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.6.8 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[19 Nov 2012 17:00]
Sergey Petrunya
[19 Nov 2012 19:52]
Erlend Dahl
Moved to the internal bug db for evaluation by the Optimizer team.
[20 Nov 2012 11:38]
MySQL Verification Team
Thank you for your report. Looks like this issue has been fixed in 5.6.9-rc which is not yet released for community/download. #### 5.5.28 [root@node-1 mysql-5.5.28]# bin/mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.28 MySQL Community Server (GPL) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database defect67638; Query OK, 1 row affected (0.00 sec) mysql> use defect67638; Database changed mysql> CREATE TABLE `ten` ( -> `a` int(11) DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.05 sec) mysql> INSERT INTO `ten` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); Query OK, 10 rows affected (0.01 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> CREATE TABLE `t10` ( -> `a` int(11) DEFAULT NULL, -> `b` int(11) DEFAULT NULL, -> KEY `a` (`a`) -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.01 sec) mysql> mysql> mysql> insert into t10 -> select -> A.a+10*B.a+100*C.a + 1000*D.a, -> A.a+10*B.a+100*C.a + 1000*D.a -> from -> ten A, ten B, ten C, ten D; Query OK, 10000 rows affected (0.11 sec) Records: 10000 Duplicates: 0 Warnings: 0 mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> update t10 set b=10 order by a limit 2; Query OK, 0 rows affected (0.03 sec) Rows matched: 2 Changed: 0 Warnings: 0 mysql> show status like 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 3 | | Handler_read_last | 0 | | Handler_read_next | 1 | | Handler_read_prev | 0 | | Handler_read_rnd | 2 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 2 | | Handler_write | 0 | +----------------------------+-------+ 16 rows in set (0.00 sec) ### mysql-5.6.9-rc - NOT yet released but seems the reported issue is fixed [root@node-1 mysql-5.6.9]# bin/mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.9-rc-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial) Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> mysql> create database defect67638; Query OK, 1 row affected (0.00 sec) mysql> use defect67638; Database changed mysql> CREATE TABLE `ten` ( -> `a` int(11) DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.03 sec) mysql> INSERT INTO `ten` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> mysql> CREATE TABLE `t10` ( -> `a` int(11) DEFAULT NULL, -> `b` int(11) DEFAULT NULL, -> KEY `a` (`a`) -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.04 sec) mysql> mysql> mysql> insert into t10 -> select -> A.a+10*B.a+100*C.a + 1000*D.a, -> A.a+10*B.a+100*C.a + 1000*D.a -> from -> ten A, ten B, ten C, ten D; Query OK, 10000 rows affected (0.22 sec) Records: 10000 Duplicates: 0 Warnings: 0 mysql> mysql> explain update t10 set b=10 order by a limit 2; +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ | 1 | SIMPLE | t10 | ALL | a | a | 5 | NULL | 2 | Using temporary | +----+-------------+-------+------+---------------+------+---------+------+------+-----------------+ 1 row in set (0.00 sec) mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> update t10 set b=10 order by a limit 2; Query OK, 2 rows affected (0.00 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> mysql> show status like 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_external_lock | 2 | | Handler_mrr_init | 0 | | Handler_prepare | 0 | | Handler_read_first | 1 | | Handler_read_key | 3 | | Handler_read_last | 0 | | Handler_read_next | 1 | | Handler_read_prev | 0 | | Handler_read_rnd | 2 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 2 | | Handler_write | 0 | +----------------------------+-------+ 18 rows in set (0.00 sec) mysql>
[20 Nov 2012 12:35]
Erlend Dahl
Not repeatable on latest trunk version (5.6.9).
[21 Nov 2012 7:14]
Erlend Dahl
Re-opening since the optimizer team managed to reproduce on head of trunk.
[21 Nov 2012 8:59]
Olav Sandstå
I think this change in execution plan is caused by introducing persistent statistics for InnoDB in 5.6 and making this the default way to update the statistic. Before InnoDB had persistent statistics, the statistic data about the table was updated by the user thread doing the insert. With persistent statistics enabled, the statistics is now normally updated by a background thread. If I do any of the following changes: a. add ANALYZE TABLE t10 after inserting data b. add sleep 10 to the test case after inserting data c. disable persistent statistics by using --innodb-stats-persistent=0 I get the similar execution as in 5.5 for this query. Thus, I think the reason for the change in execution plan is that when the thread doing the insert of data and immediately after running the update statement, the statistics for the table has not yet been updated. This causes the server to select a different execution plan. I have not looked into details about what the statistics is when running the update statement or the choices done by the optimizer.
[21 Nov 2012 10:25]
Sergey Petrunya
Thanks for the explanation! This property of persistent index statistics is very interesting for anyone doing anything with the query optimizer.
[21 Nov 2012 10:31]
Sergey Petrunya
So, I wanted to see what's going on. I've modified the original test case to run an EXPLAIN that would show both rec_per_key and table->stats.records statistics, like this: ... insert into t10 select A.a+10*B.a+100*C.a + 1000*D.a, A.a+10*B.a+100*C.a + 1000*D.a from ten A, ten B, ten C, ten D; explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a; select sleep(10); explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a;
[21 Nov 2012 10:34]
Sergey Petrunya
... Then I ran it on 5.6 and I've got: explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE A ALL NULL NULL NULL NULL 10000 Using where 1 SIMPLE B ref a a 5 test.A.b 5000 NULL select sleep(10); sleep(10) 0 explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE A ALL NULL NULL NULL NULL 10157 Using where 1 SIMPLE B ref a a 5 test.A.b 5000 NULL and on 5.5, I've got: explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE A ALL NULL NULL NULL NULL 10266 1 SIMPLE B ref a a 5 test.A.b 1 Using where select sleep(10); sleep(10) 0 explain select * from t10 A, t10 B FORCE INDEX (a) where A.b=B.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE A ALL NULL NULL NULL NULL 10266 1 SIMPLE B ref a a 5 test.A.b 1 Using where
[21 Nov 2012 10:45]
Sergey Petrunya
Observations: * In 5.6, statistics update doesn't make a huge difference. #rows went from 10000 to 10157 (that is, from the precise value to an imprecise one:). Still, this was enough to change the query plan. * What about rows=5000 in row B? this is rec_per_key statistics. The real value is 1, and 5.5 sets it correctly. Is it considered a bug that 5.6 has an estimate that is so much off? Btw, ANALYZE TABLE changes it to be correct. It seems, ANALYZE is much more important now - without it, statistics is waaaay off.
[26 Nov 2012 15:18]
Olav Sandstå
Thanks for sharing your observations. Here is what I think happens when using the persistent statistics implemented for InnoDB i 5.6: The first time the query is run is immediately after creating the table and filling it with data. The filling of data into the table has triggered a task for a background thread to update statistics for the table. This has not been done at the time the query is first run. First run of query: The server opens the table and creates table objects for the query. When the table objects are created, the code retrieves statistics from the storage engine for the total number of records and rec_per_key. The estimate for the number of records seems to be based on an "in memory counter" for the number of records inserted into this table since there is no persistent statistics for this yet. The rec_per_key statistics is based on the persistent statistics. At this time since there is no persistent statistics, it sets rec_per_key = number of records / 2 (which explains why you see 5000 as the record estimate for the ref access). Some time later the background thread in InnoDB updates the persistent statistics for this table. Second run of the query: The table objects are cached so on the second execution of the query the same table objects as in the first execution are re-used. In this case we do only update information about the number of records which is now read from the persistent statistics (which explains the change from 10000 to 10057). The rec_per_key is not re-read from the storage engine and thus we re-use the initial (very wrong) rec_per_key estimate. I might be wrong but it seems like for rec_per_key we do not update its value for the lifetime of the cached table object. It seems to be necessary to force a close of open table objects in order to get the rec_per_key estimate to be updated to use the value in the persistent statistics in InnoDB. This explains why running eg. flush table or analyze table "fixes" the problem.
[26 Nov 2012 17:24]
Sergey Petrunya
Thanks for the explanations. This means, optimizer bugs must be handled with greater care in 5.6+ - one has to make sure the table he's using already has got decent index statistics. (and keep in mind that having correct table statistics doesn't mean correct index statistics is good).
[27 Nov 2012 10:41]
Vasil Dimov
Hello, the comment in "[21 Nov 8:59] Olav Sandstaa" summarizes the situation very well.