Bug #70617 | Default persistent stats can cause unexpected long query times | ||
---|---|---|---|
Submitted: | 14 Oct 2013 19:18 | Modified: | 27 May 2015 8:38 |
Reporter: | Justin Swanhart | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | 5.6.13 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | analyze, Optimizer, regression, statistics |
[14 Oct 2013 19:18]
Justin Swanhart
[14 Oct 2013 19:22]
Justin Swanhart
Change severity.
[15 Oct 2013 9:54]
Arnaud Adant
Hi Justin, Do you have a SQL test case that could be verified in 5.6.13 ? Yes, persistent statistics have an impact on performance. Especially if you create a lot of small tables. Arnaud Adant
[15 Oct 2013 22:35]
Justin Swanhart
No - I don't have a test case. I have hours of lost work. You admit the change can result in bad plans. ALTER TABLE .. ADD KEY won't add new stats properly. Tables have invalid row counts. Tables that have a lot of rows changed will have invalid statistics and might do index scans instead of FTS. Use your imagination and think about how statistics are used.
[17 Oct 2013 13:30]
MySQL Verification Team
sample testcase. Sometimes EXPLAIN will show table scan for a few seconds until stats are avail.. Is this what you're seeing ?
Attachment: bug70617_testcase1.sql (application/octet-stream, text), 1.05 KiB.
[17 Oct 2013 13:47]
MySQL Verification Team
In my case I am guessing the background stats thread wakes up every MIN_RECALC_INTERVAL seconds to run.
[17 Oct 2013 14:56]
MySQL Verification Team
this testcase should take ~10 seconds to execute. it waits for stats...
Attachment: bug70617_testcase_poll.sql (application/octet-stream, text), 1.04 KiB.
[17 Oct 2013 15:42]
MySQL Verification Team
The crux of the matter is statistics on new tables, and heavily modified tables will be out of date until you run ANALYZE TABLE, or wait for the background thread to calculate stats, which is every ~10 seconds.
[18 Oct 2013 20:40]
Justin Swanhart
Is this background thread that recalculates stats documented somewhere?
[27 Oct 2013 13:26]
Valeriy Kravchuk
Here is my documentation request about the background thread: http://bugs.mysql.com/bug.php?id=70741
[26 Nov 2013 14:13]
Vasil Dimov
Hello, --- cut --- [15 Oct 9:54] Arnaud Adant ... Yes, persistent statistics have an impact on performance. Especially if you create a lot of small tables. [15 Oct 22:35] Justin Swanhart ... You admit the change can result in bad plans. --- cut --- No! What Arnaud said is that if you execute CREATE TABLE e.g. 1000 times in a row, that may be slower - the CREATE TABLE itself. Not worse execution plans. And improvements have been made to the performance of CREATE TABLE recently. Now, while it is theoretically possible, so far we haven't found a case where the persistent stats sampling algorithm would result in a worse estimates, compared to the old, transient stats algorithm.
[26 Nov 2013 14:35]
Vasil Dimov
--- cut --- [14 Oct 19:18] Justin Swanhart ... Do you know of any negative performance implications to disabling the persistent stats? --- cut --- If you switch off persistent stats, then InnoDB will start using the old stats method, which we call transient stats - the way it works in 5.5 and older. The transient stats have the following problems, which drove us to implement the new persistent stats: . The transient stats sampling algorithm sometimes comes up with a very wrong, or even bogus estimates. When this happens, it usually results in bogus execution plans. . The transient stats sampling algorithm results could vary quite a lot from run to run, which in combination with the point below may result in an unexpected execution plans change for good or for worse. . The transient stats are automatically updated on various conditions, sometimes quite unexpected by the user, which may cause execution plans to change out of the blue. . The transient stats are updated in the user thread, so eventually, now and then, at some point a DML query will execute slower compared to its usual execution time. The persistent stats are updated in an asynchronous manner by a background thread and I think your problems are somewhere around here. . The transient stats are not saved on disk, lost upon server restart, which means a possibly different execution plan when the server boots up after the restart. . The user can't fiddle with the transient stats. If desired, the persistent stats could be viewed _and modified_ in the tables mysql.innodb_table_stats and mysql.innodb_index_stats (use "FLUSH TABLE ...;" to force InnoDB to reread the stats from those two tables).
[26 Nov 2013 14:54]
Vasil Dimov
--- cut --- [17 Oct 15:42] Shane Bester The crux of the matter is statistics on new tables, and heavily modified tables will be out of date until you run ANALYZE TABLE, or wait for the background thread to calculate stats, which is every ~10 seconds. --- cut --- Correct, except with a small addition: usually the background stats recalc will start immediately after the DML that triggered the 10% threshold. But if stats were recalculated not long ago, then there will be an artificial delay to avoid recalculating stats of this table more often then once per 10 seconds. For example: Let table t has 100 rows. 11:30:00 idle 11:30:02 7 rows in t are modified 11:30:03 3 rows in t are modified -> this will trigger an immediate background stats recalc (at 11:30:03 o'clock) 11:30:05 4 rows in t are modified 11:30:07 6 rows in t are modified -> this will trigger an event that t's stats need to be recalculated, but the background thread will notice that it last calculated them in 11:30:03 and now is 11:30:07, so it will wait until 11:30:13 before it actually recalculates the stats.
[26 Nov 2013 15:00]
Vasil Dimov
And now to the actual issue reported here - why would stats and execution plans be relevant at all for a query like: replace into `tmp` select * from `rc`; it should always cause a full scan of the table `rc`, yes? Or maybe I miss something here...
[26 Nov 2013 15:09]
MySQL Verification Team
the query is: select * from `prod` where `b`='45';
[27 May 2015 8:38]
Vasil Dimov
Closing as "not a bug". See my earlier comments for detailed explanations.