Bug #58551 Optimize table after load or insert into ... select gives 25% speed increase
Submitted: 29 Nov 2010 9:33 Modified: 8 Dec 2010 3:19
Reporter: Kristof Van Cleemput Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S5 (Performance)
Version:5.0, 5.1, (probably All) OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: insert into, load, Optimize, speed

[29 Nov 2010 9:33] Kristof Van Cleemput
Description:
When you do a load or insert into ... select and after this you do an optimize table you can gain a speed increase of 25% when querying on those tables.
The manual states that this shouldn't be done normally but I guess it's wrong.

When I did an explain select I noticed that some primary keys returned not 1 row but like 28 rows and after a few tries I stumbled upon the optimize table.

There is a speed gain even on small tables but the biggest ones come from large to huge tables off course.

How to repeat:
create a table, put 1 million records into it with insert into ... select
Make up some other tables to join with this one.
Execute a select with a few joins on this table.
(Maybe do it once more to see if there's a change due to caching)
Now optimize the table and do the same select again. => 25% faster

Suggested fix:
auto optimize table when doing a load or insert into ... select.
It's so simple, it's hard to believe that nobody ever came up with this before ;-)

Maybe you can even do this in other cases to, like updates on large tables.
Users shouldn't have to do an optimize table it should be automated.

Giving the results it would be a large boost for every mysql user that has some large table or that frequently uses temptables.
[29 Nov 2010 9:37] Kristof Van Cleemput
Better synopsis
[29 Nov 2010 9:46] Kristof Van Cleemput
The speed increase can be a lot higher than 25%
I even have a case where the query without optimize took 87s and with optimize 18s.
So I think that the speed gain will keep on rising depending on the table sizes in the queries.
[29 Nov 2010 9:51] Valeriy Kravchuk
What exact manual page do you refer to? Please, provide URL. 

What storage engine is used for the table you load data into?
[29 Nov 2010 10:23] Kristof Van Cleemput
In this case we use mysql for read-only operations so ENGINE=MyISAM
But I expect it's propably the same on most other engines to (didn't test that).

The page in the manual is http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html

And it states: In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.
[29 Nov 2010 10:30] Valeriy Kravchuk
Please, send the results of EXPLAIN for some problematic query before and after OPTIMIZE.
[29 Nov 2010 10:51] Kristof Van Cleemput
Before

id;select_type;table;type;possible_keys;key;key_len;ref;rows;Extra
1;SIMPLE;v;ALL;(null);(null);(null);(null);406;Using temporary; Using filesort
1;SIMPLE;k;ref;lo_prest_kalender1,lo_prest_kalender2,lo_prest_kalender3;lo_prest_kalender3;24;dmp01i0049owervik.v.pid,dmp01i0049owervik.v.opid,const,const;1;Using where
1;SIMPLE;d1;ref;gl_kode_ext_def1,gl_kode_ext_def2;gl_kode_ext_def1;20;const,dmp01i0049owervik.k.tp,dmp01i0049owervik.k.kode,dmp01i0049owervik.k.ext;1;Using where
1;SIMPLE;f;ref;ft_dagen1;ft_dagen1;20;const,const,dmp01i0049owervik.v.rsz_dagenstelsel,dmp01i0049owervik.v.m_verwerking;11;
1;SIMPLE;t1;eq_ref;PRIMARY;PRIMARY;8;const,dmp01i0049owervik.d1.tellerkode;1;
1;SIMPLE;oo;ref;PRIMARY;PRIMARY;8;dmp01i0049owervik.v.pid,dmp01i0049owervik.k.opid;12;Using where
1;SIMPLE;oc1;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn1_id;1;
1;SIMPLE;oc2;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn2_id;1;
1;SIMPLE;oc3;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn3_id;1;
1;SIMPLE;oc4;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn4_id;1;
1;SIMPLE;oc5;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn5_id;1;
1;SIMPLE;w;ref;PRIMARY;PRIMARY;8;dmp01i0049owervik.k.pid,dmp01i0049owervik.k.opid;28;Using where
1;SIMPLE;sd;ref;gl_116_45;gl_116_45;9;dmp01i0049owervik.w.internnr;11;
1;SIMPLE;g;ref;lo_graadid1;lo_graadid1;4;dmp01i0049owervik.w.graadid;1;

after
id;select_type;table;type;possible_keys;key;key_len;ref;rows;Extra
1;SIMPLE;v;ALL;(null);(null);(null);(null);406;Using temporary; Using filesort
1;SIMPLE;f;ref;ft_dagen1;ft_dagen1;20;const,const,dmp01i0049owervik.v.rsz_dagenstelsel,dmp01i0049owervik.v.m_verwerking;1;
1;SIMPLE;oo;ref;PRIMARY;PRIMARY;8;dmp01i0049owervik.v.pid,dmp01i0049owervik.v.opid;1;Using where
1;SIMPLE;oc1;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn1_id;1;
1;SIMPLE;oc2;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn2_id;1;
1;SIMPLE;oc3;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn3_id;1;
1;SIMPLE;oc4;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn4_id;1;
1;SIMPLE;oc5;ref;lo_osn_code2;lo_osn_code2;4;dmp01i0049owervik.oo.osn5_id;1;
1;SIMPLE;k;ref;lo_prest_kalender1,lo_prest_kalender2,lo_prest_kalender3;lo_prest_kalender3;24;dmp01i0049owervik.oo.pid,dmp01i0049owervik.oo.opid,const,const;1;Using where
1;SIMPLE;d1;ref;gl_kode_ext_def1,gl_kode_ext_def2;gl_kode_ext_def1;20;const,dmp01i0049owervik.k.tp,dmp01i0049owervik.k.kode,dmp01i0049owervik.k.ext;1;Using where
1;SIMPLE;t1;eq_ref;PRIMARY;PRIMARY;8;const,dmp01i0049owervik.d1.tellerkode;1;
1;SIMPLE;w;ref;PRIMARY;PRIMARY;8;dmp01i0049owervik.k.pid,dmp01i0049owervik.k.opid;2;Using where
1;SIMPLE;sd;ref;gl_116_45;gl_116_45;9;dmp01i0049owervik.w.internnr;1;
1;SIMPLE;g;ref;lo_graadid1;lo_graadid1;4;dmp01i0049owervik.w.graadid;1;
[8 Dec 2010 3:19] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

I will replace the "In most cases" statement with:

After extensive changes to a table, this statement may also improve
performance of statements that use the table, sometimes significantly.