Bug #65889 Appended primary key stored inefficiently in unique surrogate keys
Submitted: 13 Jul 2012 0:21 Modified: 13 Jul 2012 4:09
Reporter: Trey Raymond Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:<=5.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: appended primary key, INDEX, key, secondary, surrogate
Triage: Needs Triage: D5 (Feature request)

[13 Jul 2012 0:21] Trey Raymond
Easier to see this in percona, with the better stats in info schema, but the problem is with all innodb.

In unique surrogate keys, the PK-appending logic needs to be slightly different than in non-unique keys, to avoid needless overhead.  information_schema.INNODB_INDEX_STATS reports unique keys as not having it appended as an actual key field in rows_per_key (unlike what you see for a non-unique key), which would be the ideal behavior, but in actuality it is on page.  Rather than being appended as an extra key value, which requires storage in all internal nodes of the key, it should be appended only in leaf nodes - the same way the data is appended to the leaf nodes of the primary key.  With the inherent 1:1 relationship between leaf values for the unique key and pk values, you will always find the needed records quickly.

Removing this wasteful key storage has the potential to significantly speed up reads and writes to unique surrogate keys.  First off, more keys will be stored per internal (non-leaf) node, meaning a higher order tree, and less height - this corresponds to less iops for every lookup on the key.  Second, having more keys per internal node reduces the need to restructure (split/combine) pages during write operations - and as unique key writes can't be placed in the insert buffer, it will not just reduce disk load, but also improve query response time.  The improvements would be truly dramatic on tables with a large primary key.

Examples of how this can not be used and won't hurt things:
- a lookup using this key will always find one record (or the amount needed for a range) as it has a 1:1 relationship with the primary key
- a sort on the primary key via unique key equality (select * from table where unique_field=1 order by primary_key) is irrelevant as it would only return one record
- a sort on the primary key where only one field from a compound unique is filtered is also irrelevant as index optimization can't be used past leftmost prefix (select unique_field2 from table where unique_field1=1 order by primary_key)
- a range on the unique key will not be able to do any primary key filtering as it is not a leftmost prefix.  if the primary key has an equality, it would use that key instead.

How to repeat:
create table test (
id int unsigned not null auto_increment,
name binary(36) not null,
primary key (id),
unique key (name)
) engine=innodb;

insert into test (name) values (uuid());
insert into test (name) select uuid() from test;
-- repeat the last to populate the table with a few million rows, it will double each run

while it's still in the buffer pool, check leaf page count in innodb_index_stats table, pull the index id for the surrogate from the innodb_sys_indexes table as $index_id, then run:

select data_size/n_recs,count(*) from INNODB_BUFFER_POOL_PAGES_INDEX group by 1;

and you can see by the counts the leaf nodes vs internal nodes.  Note that the bytes per record is almost identical, but the internal nodes have an extra 4 bytes to store the child page id.  Imagine if we could remove the byte count of the primary key, how many more keys per internal node we could store!  Consider tables with compound or otherwise large pk's!

Suggested fix:
For a unique key, store the appended primary key only in the leaf nodes, similar to the data columns stored in the clustered pk.