Bug #75428 InnoDB persistent statistics not persistent enough
Submitted: 7 Jan 2015 1:24 Modified: 14 Jan 2015 17:18
Reporter: Eric Bergen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[7 Jan 2015 1:24] Eric Bergen
Description:
When using innodb persistent statistics the stats are only updated when about 10% of the rows are modified. The counter to track the number of rows modified does not survive restarts. If you restart or upgrade mysql often enough statistics never get updated. The 10% threshold is also not configurable. 

How to repeat:
Restart mysqld at a regular cadence. Observe show table status getting wildly out of sync.

Suggested fix:
Provide tunables for the threshold to update statistics. Also provide a method for triggering a statistics update that doesn't include doing a full analyze table. One idea is to make innodb_stats_on_metadata a session variable so some sessions can control if they trigger statistics updates or not.
[7 Jan 2015 1:27] Eric Bergen
The persistent statistics back things like data_length and index_length in information_schema.tables. When the stats don't update these values aren't updated which makes them nearly useless.
[7 Jan 2015 15:32] Morgan Tocker
Is there value in having a maximum age attached to statistics, as well as the 10% threshold?  This age threshold could be checked when opening the table.

My thought process is that as well as rows being modified outdating the statistics, there might be changes to the statistics collection (via changes to stats_sample_pages or as bugs are fixed etc.)   The historical bug #43660 comes to mind.
[8 Jan 2015 0:09] Harrison Fisk
More details also at:

https://www.facebook.com/MySQLatFacebook/posts/10152729777891696
[14 Jan 2015 9:51] Vasil Dimov
"Also provide a method for triggering a statistics update that doesn't include doing a full analyze table" -- but the statistics update that is done by InnoDB internally, when 10% of the table is changed, is exactly the same as to what happens when you run ANALYZE TABLE.

Notice that by default the stats gathering (done internally by InnoDB or by ANALYZE TABLE) will sample 20 leaf pages from each index. For big tables this may be insufficient to get a good representation. It can be overridden on a per table basis with:

ALTER TABLE t STATS_SAMPLE_PAGES=...;
[14 Jan 2015 17:18] Eric Bergen
The counter that tracks the number of rows updated is not persistent (joke in the bug title) so when mysql is restarted the counter gets reset. If mysql is restarted before 10% of the rows are changed the statistics don't get updated. When you upgrade mysql as often as we do some tables never get updated. 

Analyze table doesn't work if you have a mix of long and short running queries. For example if you're taking a backup and run analyze table the analyze will successfully update the table statistics but every query that attempts to access the table until your backup finishes will get blocked on "Waiting for flush". My idea with the session level stats was to be able to get access to the newest values for data_length and index_length but without the pain of analyze table.
[26 Feb 2015 22:00] Ben Krug
Here is a test case.  I'm sure it can be simplified, I have a few unnecessary steps for demonstration, and it also requires 4 restarts of the database.  Not sure of a good way to script that into test cases.

(Note it is not 100% deterministic, since InnoDB stats are not 100% deterministic.  But it worked in my trials.)

DROP TABLE IF EXISTS stats;

CREATE TABLE stats (a int primary key auto_increment, b int, key idx_b (b));

INSERT INTO stats (b) values (1),(2),(3),(4);

show indexes from stats;

--- (cardinality of 4 for b)

UPDATE stats SET b=1;

--- (flush twice, sometimes estimates are bad)

flush tables;
show indexes from stats;
flush tables;
show indexes from stats;

--- (should see lower cardinality for b)

UPDATE stats SET b=a;
flush tables;
show indexes from stats;
flush tables;
show indexes from stats;

-- (cardinality should be back to 4 for b)

--shutdown
UPDATE STATS set b=5 where a=1;
--shutdown
UPDATE STATS set b=5 where a=2;
--shutdown
UPDATE STATS set b=5 where a=3;
--shutdown
UPDATE STATS set b=5 where a=4;

flush tables;
show indexes from stats;
flush tables;
show indexes from stats;

-- (stats still show cardinality 4 for b, although b=5 in all rows now)