MySQL Bugs: #96930: lf_hash get performance regression when the bucket size is too large

Bug #96930	lf_hash get performance regression when the bucket size is too large
Submitted:	18 Sep 2019 13:55	Modified:	25 Sep 2019 11:51
Reporter:	Zongzhi Chen (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
After insert lots of item to lf_hash, lf_hash will increase it's bucket size.  However, after delete the item in lf_hash, the bucket size won't decrease, there is still lots of dummy nodes in it.

we add an iterator operation on lf_hash. 
When iterate the lf_hash, it have to skip lots of dummy node, so the performance regression

so can we add a new operation to reduce the bucket size and delete dummy node.

How to repeat:
read the code and run test

Hi Mr. zongzhi,

Thank you for your bug report.

However, we need further info in order to be able to proceed with the processing of this bug.

First of all, which MySQL version displays the behaviour that you describe.

Next, can you point us to the exact functions and source code files in our source code where insertion and deletion occur.

Next, can you come up with a standalone example of using lf_hash, which will show clearly and unequivocally that size of the memory allocated is not reduced.

Last, but not least, why is it so important for you that size gets reduced ??? The current algorithm is optimised for the speed, not size !!!!!

The code version is 8.0.17

I have asked the same question in MariaDB community, 

https://jira.mariadb.org/browse/MDEV-20630

In MySQL, we only use lf_hash to store MDL lock. However, in mariadb, they also use lf_hash to store the active transactions in trx_sys to replace the rw_trx_list. 
In our own MySQL version, we port these features from MariaDB to MySQL.

The difference is that the lf_hash used in trx_sys need to iterate the lf_hash to get the whole active transaction, and store them to a new readview.

so when there is lots of dummy nodes in lf_hash, the iterate operation almost 2000 slower than the clean lf_hash.

Hi Mr. zongzhi,

Thank you for your reply.

I now understand you completely. When used only for MDL, it is quite OK to have gaps in lf_hash. Which is what we use this algorithm for. However, when you use it for transactions, this makes problems.

Since we do not plan to use this algorithm for transaction, for us, this is not a bug.