Bug #53631 Primary key index size optimisation disagrees with docs
Submitted: 13 May 2010 20:54 Modified: 26 May 2010 20:59
Reporter: Chris Wilson Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:5.0.41-community-log OS:Linux (CentOS 5.4)
Assigned to: CPU Architecture:Any

[13 May 2010 20:54] Chris Wilson
Description:
The MyISAM documentation states that:

"When rows are inserted in sorted order (as when you are using an AUTO_INCREMENT column), the index tree is split so that the high node only contains one key. This improves space utilization in the index tree."

However, I observe the exact opposite. Placing the increasing, most varied columns first (a timestamp) makes the index four times larger, not smaller.

For example, I load the database with the dump file that I will attach shortly:

chris@gcc(~)$ mysql -u root pmacct_temp < /tmp/pmacct.sql

And then check the index sizes:

chris@gcc(~)$ sudo ls -l /var/lib/mysql/pmacct_temp/
total 239888
-rw-rw---- 1 mysql mysql      9224 May 13 21:14 acct_v7.frm
-rw-rw---- 1 mysql mysql  62230271 May 13 21:18 acct_v7.MYD
-rw-rw---- 1 mysql mysql  14924800 May 13 21:18 acct_v7.MYI
-rw-rw---- 1 mysql mysql      9224 May 13 21:18 acct_v7_summary.frm
-rw-rw---- 1 mysql mysql 152754922 May 13 21:30 acct_v7_summary.MYD
-rw-rw---- 1 mysql mysql  15418368 May 13 21:30 acct_v7_summary.MYI
-rw-rw---- 1 mysql mysql        65 May 13 21:13 db.opt
-rw-rw---- 1 mysql mysql      8554 May 13 21:30 test.frm
-rw-rw---- 1 mysql mysql         7 May 13 21:30 test.MYD
-rw-rw---- 1 mysql mysql      1024 May 13 21:30 test.MYI

Just to be sure, I change the primary key to the same structure that it already has:

mysql> alter table acct_v7 drop primary key, add primary key (agent_id, class_id, mac_src, mac_dst, vlan, as_src, as_dst, ip_src,
    ->   ip_dst, src_port, dst_port, ip_proto, tos, stamp_inserted);
Query OK, 412121 rows affected (3 min 13.61 sec)
Records: 412121  Duplicates: 0  Warnings: 0

And then check the file sizes again:

chris@gcc(~)$ sudo ls -l /var/lib/mysql/pmacct_temp/
total 239888
-rw-rw---- 1 mysql mysql      9224 May 13 21:31 acct_v7.frm
-rw-rw---- 1 mysql mysql  62230271 May 13 21:34 acct_v7.MYD
-rw-rw---- 1 mysql mysql  14924800 May 13 21:34 acct_v7.MYI
-rw-rw---- 1 mysql mysql      9224 May 13 21:18 acct_v7_summary.frm
-rw-rw---- 1 mysql mysql 152754922 May 13 21:30 acct_v7_summary.MYD
-rw-rw---- 1 mysql mysql  15418368 May 13 21:30 acct_v7_summary.MYI
-rw-rw---- 1 mysql mysql        65 May 13 21:13 db.opt
-rw-rw---- 1 mysql mysql      8554 May 13 21:30 test.frm
-rw-rw---- 1 mysql mysql         7 May 13 21:30 test.MYD
-rw-rw---- 1 mysql mysql      1024 May 13 21:30 test.MYI

And then I change the primary key to put the stamp_inserted column first, as this should maximise the variance in the first column and also ensure that rows are inserted in order:

mysql> alter table acct_v7 drop primary key, add primary key (stamp_inserted, agent_id, class_id, mac_src, mac_dst, vlan, as_src, as_dst, ip_src,   ip_dst, src_port, dst_port, ip_proto, tos);
Query OK, 412121 rows affected (59.84 sec)
Records: 412121  Duplicates: 0  Warnings: 0

And check the size again:

chris@gcc(~)$ sudo ls -l /var/lib/mysql/pmacct_temp/
total 289576
-rw-rw---- 1 mysql mysql      9224 May 13 21:35 acct_v7.frm
-rw-rw---- 1 mysql mysql  62230271 May 13 21:36 acct_v7.MYD
-rw-rw---- 1 mysql mysql  65755136 May 13 21:36 acct_v7.MYI
-rw-rw---- 1 mysql mysql      9224 May 13 21:18 acct_v7_summary.frm
-rw-rw---- 1 mysql mysql 152754922 May 13 21:30 acct_v7_summary.MYD
-rw-rw---- 1 mysql mysql  15418368 May 13 21:30 acct_v7_summary.MYI
-rw-rw---- 1 mysql mysql        65 May 13 21:13 db.opt
-rw-rw---- 1 mysql mysql      8554 May 13 21:30 test.frm
-rw-rw---- 1 mysql mysql         7 May 13 21:30 test.MYD
-rw-rw---- 1 mysql mysql      1024 May 13 21:30 test.MYI

Note that now the index on acct_v7 is four times bigger than before! This is not what I would expect from the documentation.

How to repeat:
Load the dump file for this bug report, and execute the steps given above.

Suggested fix:
Please correct the documentation to explain how to use this feature properly to reduce index size rather than increasing it.
[14 May 2010 3:24] Valeriy Kravchuk
I do not see any dump file uploaded.
[14 May 2010 19:17] Chris Wilson
Bug data trimmed to make it easier to work with, but still demonstrate the problem, and attached.
[14 May 2010 19:41] Chris Wilson
Perhaps "improve space utilisation" actually means that performance is better, rather than that the file size is smaller. With a larger data set, I notice a 20% reduction in INSERT time:

chris@gcc(tmp)$ time mysql -u root pmacct_temp < pmacct-short.sql
real    0m26.496s
user    0m2.352s
sys     0m0.917s

chris@gcc(tmp)$ time mysql -u root pmacct_temp < pmacct-short.sql
real    0m26.463s
user    0m2.283s
sys     0m0.978s

chris@gcc(tmp)$ vi pmacct-short.sql
(move stamp_inserted from last to first in the primary key of acct_v7)

chris@gcc(tmp)$ time mysql -u root pmacct_temp < pmacct-short.sql
real    0m19.833s
user    0m2.312s
sys     0m0.861s

chris@gcc(tmp)$ time mysql -u root pmacct_temp < pmacct-short.sql
real    0m20.022s
user    0m2.374s
sys     0m0.942s
[26 May 2010 20:59] Sveta Smirnova
Thank you for the feedback.

You insert in not sorted order starting from mac_src and mac_dst fields at least. Simple query

 alter table acct_v7 order by `agent_id`,`class_id`,`mac_src`,`mac_dst`,`vlan`,`as_src`,`as_dst`,`ip_src`,`ip_dst`,`src_port`,`dst_port`,`ip_proto`,`tos`,`stamp_inserted`;

decreases size of index file 2 times. So I am closing this as not a bug.