Bug #32712 ANALYZE TABLE takes a lot of time for MyISAM tables
Submitted: 26 Nov 2007 11:16 Modified: 27 Nov 2007 21:44
Reporter: Kristian Koehntopp Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S4 (Feature request)
Version:5.0.46-enterprise-gpl-log OS:Any
Assigned to: CPU Architecture:Any

[26 Nov 2007 11:16] Kristian Koehntopp
Description:
For InnoDB the "Analyze Table" statement is fast, and independent of table size, because InnoDB implements "Analyze" as ten index dives taking representative samples of the table data instead of doing a full scan.

For MyISAM, "Analyze" becomes slower when the table grows in size. This is not acceptable for large tables.

How to repeat:
Create a 500G MyISAM table, run Analyze table on it.

Suggested fix:
Make MyISAM use the row count.

If row count < 1000, do a full scan.
If row count >= 1000, divide row count by 100 giving step, and take a sample of the data at n * step for step = 0 to 100. Use this as a representative sample instead of doing exact counts.
[26 Nov 2007 11:19] Kristian Koehntopp
Hmm, thinking about this, this might not work unmodified: The reason why index dives work for InnoDB is that they are INDEX and not DATA dives - if you drop down to the leaves of the PK in InnoDB at 10 different equidistant locations, you'll get a representative sample, because data in an index is ordered.

Still, we can optimize this for MyISAM as well, if the MyISAM table has a primary key - if such a thing is present, we can copy the InnoDB behaviour and make Analyze fast.
[16 Jul 2009 9:41] Andrii Nikitin
Alternatively, it would be great if ANALYZE has possibility to collect statistics without holding table lock for a long time.
E.g. it can optionally perform several iterations (let's say for each 10% of table), holding lock only for each iteration. In this case accuracy may be worse but I think many users will be happy with such approach if lock is not held for long time.
[26 Jul 2009 11:30] Valeriy Kravchuk
Bug #3865 was marked as a duplicate of this one.