MySQL Bugs: #81814: InnoDB adaptive hash index uses a bad partitioning algorithm for the real world

Bug #81814	InnoDB adaptive hash index uses a bad partitioning algorithm for the real world
Submitted:	11 Jun 2016 16:25	Modified:	6 Apr 2022 20:59
Reporter:	Jeremy Cole (Basic Quality Contributor) (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.11	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	adaptive hash index, ahi, innodb_adaptive_hash_index_parts

Description:
The current algorithm used for partitioning data into the adaptive hash index (which has innodb_adaptive_hash_index_parts parts, by default 8) produces an almost worst case uneven distribution when the server contains many tables with an identical schema. The algorithm is as follows:

/** Get the hash-table based on index attributes.
A table is selected from an array of tables using pair of index-id, space-id.
@param[in]	index	index handler
@return hash table */
UNIV_INLINE
hash_table_t*
btr_get_search_table(const dict_index_t* index)
{
	ut_ad(index != NULL);

	ulint	ifold = ut_fold_ulint_pair(static_cast<ulint>(index->id),
					   static_cast<ulint>(index->space));

	return(btr_search_sys->hash_tables[ifold % btr_ahi_parts]);
}

The ut_fold_ulint_pair function is as follows:

#define UT_HASH_RANDOM_MASK	1463735687
#define UT_HASH_RANDOM_MASK2	1653893711

/*************************************************************//**
Folds a pair of ib_ulint_ts.
@return folded value */
static
ib_ulint_t
ut_fold_ib_ulint_t_pair(
/*====================*/
	ib_ulint_t	n1,	/*!< in: ib_ulint_t */
	ib_ulint_t	n2)	/*!< in: ib_ulint_t */
{
	return(((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1)
		^ UT_HASH_RANDOM_MASK) + n2);
}

So in summary the AHI is partitioned by ut_fold_ulint_pair(index_id, space) % 8. This is, in practice, essentially a proxy for the distance between the index_id and space. Since both index_id and space are essentially auto-incremented values, and with innodb_file_per_table, a new space is allocated for each table, the two maintain a fixed distance when many tables with the same schema are created on a server, causing the AHI to be partitioned and used unevenly (and in the worst case for a server with thousands of identical tables, such as in a partitioned system, these could all fall into the same single AHI partition), such as:

Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 20889 buffer(s)
Hash table size 5843083, node heap has 1 buffer(s)
Hash table size 5843083, node heap has 0 buffer(s)
Hash table size 5843083, node heap has 1 buffer(s)

This is due to a bad interaction with both the ut_fold_ulint_pair (which is a poor hash anyway) and the default number of partitions being 8. The result of ut_fold_ulint_pair(n1, n2) and ut_fold_ulint_pair(n1+1, n2+1) are often the exact same value, and in all cases have the same result modulus 8.

I believe this bug affects all versions of MySQL, but I have not exhaustively tested.

How to repeat:
This problem can be trivially reproduced:

1. Create several tables with only a primary key.
2. Insert data.
3. Create secondary keys (if desired).
4. Run a workload against the tables.

All primary keys (and thus all tables) will fall into the same single AHI partition, rather than being evenly distributed across partitions.

A small Ruby program which demonstrates the problem follows:

#!/usr/bin/env ruby

UT_HASH_RANDOM_MASK	 = 1463735687
UT_HASH_RANDOM_MASK2 = 1653893711
def ut_fold_ulint_pair(n1, n2)
  (((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1) ^ UT_HASH_RANDOM_MASK) + n2)
end

innodb_adaptive_hash_index_parts = 8
index_id = (rand * 1000000000).floor
space_id = (rand * 1000000000).floor

puts "innodb_adaptive_hash_index_parts=%i" % [ innodb_adaptive_hash_index_parts ]
(0...10).each do |i|
  index_id += 1
  space_id += 1
  fold = ut_fold_ulint_pair(index_id, space_id)
	puts "index_id=%i, space_id=%i, fold=%i, ahi_slot=%i" % [
    index_id, space_id, fold,
    fold % innodb_adaptive_hash_index_parts,
  ]
end

And its output is, e.g.:

innodb_adaptive_hash_index_parts=8
index_id=936096520, space_id=759598558, fold=516258342765, ahi_slot=5
index_id=936096521, space_id=759598559, fold=516258342765, ahi_slot=5
index_id=936096522, space_id=759598560, fold=516258339693, ahi_slot=5
index_id=936096523, space_id=759598561, fold=516258339693, ahi_slot=5
index_id=936096524, space_id=759598562, fold=516258336621, ahi_slot=5
index_id=936096525, space_id=759598563, fold=516258336621, ahi_slot=5
index_id=936096526, space_id=759598564, fold=516258339693, ahi_slot=5
index_id=936096527, space_id=759598565, fold=516258339693, ahi_slot=5
index_id=936096528, space_id=759598566, fold=516258350973, ahi_slot=5
index_id=936096529, space_id=759598567, fold=516258350973, ahi_slot=5

Suggested fix:
Use a better hashing algorithm to partition the adaptive hash index. As a workaround, setting innodb_adaptive_hash_index_parts to a different value (not a power of 2, ideally prime?) also partitions more usefully, although it still doesn't seem great. I have tried innodb_adaptive_hash_index_parts=11 for instance and it partitions a bit better, but not great, as can be demonstrated by the Ruby program:

innodb_adaptive_hash_index_parts=11
index_id=49870460, space_id=375996162, fold=508786651645, ahi_slot=8
index_id=49870461, space_id=375996163, fold=508786651645, ahi_slot=8
index_id=49870462, space_id=375996164, fold=508786654717, ahi_slot=0
index_id=49870463, space_id=375996165, fold=508786654717, ahi_slot=0
index_id=49870464, space_id=375996166, fold=508786616589, ahi_slot=9
index_id=49870465, space_id=375996167, fold=508786616589, ahi_slot=9
index_id=49870466, space_id=375996168, fold=508786617613, ahi_slot=10
index_id=49870467, space_id=375996169, fold=508786617613, ahi_slot=10
index_id=49870468, space_id=375996170, fold=508786614541, ahi_slot=7
index_id=49870469, space_id=375996171, fold=508786614541, ahi_slot=7

Oops, I accidentally copied the wrong one of the ut_fold_ulint_pair functions (heh, there's more than one), but it's the same implementation, anyway:

/*************************************************************//**
Folds a pair of ulints.
@return	folded value */
UNIV_INLINE
ulint
ut_fold_ulint_pair(
/*===============*/
	ulint	n1,	/*!< in: ulint */
	ulint	n2)	/*!< in: ulint */
{
	return(((((n1 ^ n2 ^ UT_HASH_RANDOM_MASK2) << 8) + n1)
		^ UT_HASH_RANDOM_MASK) + n2);
}

Jeremy,

I agree with you on the entire matter.

I think that a better hash algorithm should be used and have division based on the prime number. Traditionally, in MySQL, it was 7 ...... :-)

Verified as a feature request.

Traditional hash function was also a source of fun a few years back:
http://bugs.mysql.com/bug.php?id=66473

I think this should be treated like a normal bug and not a feature request. It is an in memory data structure, should be an easy change and easy to test.

I recommend that we push it in the next release.

Contribution submitted via Github - Bug #81814: InnoDB adaptive hash index uses a bad partitioning algorithm for the (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/122#issuecomment-283647748): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_108539733.txt (text/plain), 2.33 KiB.

Contribution submitted via Github - Bug #81814: InnoDB adaptive hash index uses a bad partitioning algorithm (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/210#issuecomment-392573374): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_190803597.txt (text/plain), 2.28 KiB.

Was this fixed?  This bug is still in verified state.

Posted by developer:
 
This will be fixed in MySQL Server 8.0.30.
The fix bases on changing the InnoDB's hash function to much stronger one.
Unit tests are added to assure the reporter's test case and other similar sequences are handled correctly, that is are distributed quite uniformly.