Bug #81814 InnoDB adaptive hash index uses a bad partitioning algorithm for the real world
Submitted: 11 Jun 2016 16:25 Modified: 14 Jun 2016 13:55
Reporter: Jeremy Cole (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: adaptive hash index, ahi, innodb_adaptive_hash_index_parts

[11 Jun 2016 16:25] Jeremy Cole
Description:
The current algorithm used for partitioning data into the adaptive hash index (which has innodb_adaptive_hash_index_parts parts, by default 8) produces an almost worst case uneven distribution when the server contains many tables with an identical schema. The algorithm is as follows:

/** Get the hash-table based on index attributes.
A table is selected from an array of tables using pair of index-id, space-id.
@param[in]	index	index handler
@return hash table */
UNIV_INLINE
hash_table_t*
btr_get_search_table(const dict_index_t* index)
{
	ut_ad(index != NULL);

	ulint	ifold = ut_fold_ulint_pair(static_cast<ulint>(index->id),
					   static_cast<ulint>(index->space));

	return(btr_search_sys->hash_tables[ifold % btr_ahi_parts]);
}

The ut_fold_ulint_pair function is as follows:

#define UT_HASH_RANDOM_MASK	1463735687
#define UT_HASH_RANDOM_MASK2	1653893711

/*************************************************************//**
Folds a pair of ib_ulint_ts.
@return folded value */
static
ib_ulint_t
ut_fold_ib_ulint_t_pair(
/*====================*/
	ib_ulint_t	n1,	/*!< in: ib_ulint_t */
	ib_ulint_t	n2)	/*!< in: ib_ulint_t */
{
	return(((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1)
		^ UT_HASH_RANDOM_MASK) + n2);
}

So in summary the AHI is partitioned by ut_fold_ulint_pair(index_id, space) % 8. This is, in practice, essentially a proxy for the distance between the index_id and space. Since both index_id and space are essentially auto-incremented values, and with innodb_file_per_table, a new space is allocated for each table, the two maintain a fixed distance when many tables with the same schema are created on a server, causing the AHI to be partitioned and used unevenly (and in the worst case for a server with thousands of identical tables, such as in a partitioned system, these could all fall into the same single AHI partition), such as:

Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 20889 buffer(s)
Hash table size 5843083, node heap has 1 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 1 buffer(s)

This is due to a bad interaction with both the ut_fold_ulint_pair (which is a poor hash anyway) and the default number of partitions being 8. The result of ut_fold_ulint_pair(n1, n2) and ut_fold_ulint_pair(n1+1, n2+1) are often the exact same value, and in all cases have the same result modulus 8.

I believe this bug affects all versions of MySQL, but I have not exhaustively tested.

How to repeat:
This problem can be trivially reproduced:

1. Create several tables with only a primary key.
2. Insert data.
3. Create secondary keys (if desired).
4. Run a workload against the tables.

All primary keys (and thus all tables) will fall into the same single AHI partition, rather than being evenly distributed across partitions.

A small Ruby program which demonstrates the problem follows:

#!/usr/bin/env ruby

UT_HASH_RANDOM_MASK	 = 1463735687
UT_HASH_RANDOM_MASK2 = 1653893711
def ut_fold_ulint_pair(n1, n2)
  (((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1) ^ UT_HASH_RANDOM_MASK) + n2)
end

innodb_adaptive_hash_index_parts = 8
index_id = (rand * 1000000000).floor
space_id = (rand * 1000000000).floor

puts "innodb_adaptive_hash_index_parts=%i" % [ innodb_adaptive_hash_index_parts ]
(0...10).each do |i|
  index_id += 1
  space_id += 1
  fold = ut_fold_ulint_pair(index_id, space_id)
	puts "index_id=%i, space_id=%i, fold=%i, ahi_slot=%i" % [
    index_id, space_id, fold,
    fold % innodb_adaptive_hash_index_parts,
  ]
end

And its output is, e.g.:

innodb_adaptive_hash_index_parts=8
index_id=936096520, space_id=759598558, fold=516258342765, ahi_slot=5
index_id=936096521, space_id=759598559, fold=516258342765, ahi_slot=5
index_id=936096522, space_id=759598560, fold=516258339693, ahi_slot=5
index_id=936096523, space_id=759598561, fold=516258339693, ahi_slot=5
index_id=936096524, space_id=759598562, fold=516258336621, ahi_slot=5
index_id=936096525, space_id=759598563, fold=516258336621, ahi_slot=5
index_id=936096526, space_id=759598564, fold=516258339693, ahi_slot=5
index_id=936096527, space_id=759598565, fold=516258339693, ahi_slot=5
index_id=936096528, space_id=759598566, fold=516258350973, ahi_slot=5
index_id=936096529, space_id=759598567, fold=516258350973, ahi_slot=5

Suggested fix:
Use a better hashing algorithm to partition the adaptive hash index. As a workaround, setting innodb_adaptive_hash_index_parts to a different value (not a power of 2, ideally prime?) also partitions more usefully, although it still doesn't seem great. I have tried innodb_adaptive_hash_index_parts=11 for instance and it partitions a bit better, but not great, as can be demonstrated by the Ruby program:

innodb_adaptive_hash_index_parts=11
index_id=49870460, space_id=375996162, fold=508786651645, ahi_slot=8
index_id=49870461, space_id=375996163, fold=508786651645, ahi_slot=8
index_id=49870462, space_id=375996164, fold=508786654717, ahi_slot=0
index_id=49870463, space_id=375996165, fold=508786654717, ahi_slot=0
index_id=49870464, space_id=375996166, fold=508786616589, ahi_slot=9
index_id=49870465, space_id=375996167, fold=508786616589, ahi_slot=9
index_id=49870466, space_id=375996168, fold=508786617613, ahi_slot=10
index_id=49870467, space_id=375996169, fold=508786617613, ahi_slot=10
index_id=49870468, space_id=375996170, fold=508786614541, ahi_slot=7
index_id=49870469, space_id=375996171, fold=508786614541, ahi_slot=7
[11 Jun 2016 17:03] Jeremy Cole
Oops, I accidentally copied the wrong one of the ut_fold_ulint_pair functions (heh, there's more than one), but it's the same implementation, anyway:

/*************************************************************//**
Folds a pair of ulints.
@return	folded value */
UNIV_INLINE
ulint
ut_fold_ulint_pair(
/*===============*/
	ulint	n1,	/*!< in: ulint */
	ulint	n2)	/*!< in: ulint */
{
	return(((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1)
		^ UT_HASH_RANDOM_MASK) + n2);
}
[14 Jun 2016 13:55] Sinisa Milivojevic
Jeremy,

I agree with you on the entire matter.

I think that a better hash algorithm should be used and have division based on the prime number. Traditionally, in MySQL, it was 7 ...... :-)

Verified as a feature request.
[14 Jun 2016 14:57] Mark Callaghan
Traditional hash function was also a source of fun a few years back:
http://bugs.mysql.com/bug.php?id=66473
[15 Jun 2016 2:37] Sunny Bains
I think this should be treated like a normal bug and not a feature request. It is an in memory data structure, should be an easy change and easy to test.

I recommend that we push it in the next release.
[2 Mar 2017 15:56] OCA Admin
Contribution submitted via Github - Bug #81814: InnoDB adaptive hash index uses a bad partitioning algorithm for the 
(*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/122#issuecomment-283647748): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_108539733.txt (text/plain), 2.33 KiB.

[29 May 2018 16:07] OCA Admin
Contribution submitted via Github - Bug #81814: InnoDB adaptive hash index uses a bad partitioning algorithm 
(*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/210#issuecomment-392573374): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_190803597.txt (text/plain), 2.28 KiB.