Bug #91796 trx_sys scalability fixes
Submitted: 26 Jul 2018 7:10 Modified: 3 Jan 2019 11:30
Reporter: Sandeep Sethia (OCA) Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:CentOS
Assigned to: CPU Architecture:ARM
Tags: Contribution, innodb

[26 Jul 2018 7:10] Sandeep Sethia
Description:
Misc trx_sys scalability fixes
 trx_erase_lists(): trx->read_view is owned by current thread and thus doesn't need trx_sys.mutex protection for reading it's value. Move trx->read_view check out of mutex trx_start_low(): moved assertion out of mutex. Call ReadView::creator_trx_id() directly: allows to inline this one-line method.

How to repeat:
Code review
[26 Jul 2018 7:11] Sandeep Sethia
Misc trx_sys scalability fixes

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: scalibilitypatch.rtf (application/msword, text), 3.48 KiB.

[26 Jul 2018 7:12] Sandeep Sethia
Adding version
[26 Jul 2018 7:17] MySQL Verification Team
Hello Sandeep Sethia,

Thank you for the report and contribution.

Thanks,
Umesh
[3 Jan 2019 11:30] Erlend Dahl
Posted by developer (Bin X Su):

Thanks for the contribution!
 
However, I checked the patch again, and also found some problems:
 
1. About the changes in trx_start_low(), it releases the trx_sys mutex
earlier to get better performance, but, it leaves the access to
trx_sys_rw_trx_add() and trx_sys->rw_trx_list without the mutex protection,
which should be wrong, since concurrent update or read to these two will
always hold the trx_sys mutex. Here the patch removes the mutex protection
may lead to race condition.
 
2. About the change in trx_erase_lists(), the patch changes the original
logic without proper comments or explanation, I feel this is dangerous. Also
this change is to move a memory read out of mutex protection, I'm not sure
about how it will gain from this change.
 
3. About the last change to inline the single line method, this should be
safe and correct. But still I'm not sure how it will help the performance.
 
My suggestion is that filer probably reconsiders the improvements for point 1
and 2 in this patch, and see if there would be better and safer fixes. If so,
then we can try improve the code, including point 3 above. If only for point
3, I'm not sure if it is worth pushing this part of changes.
 
Anyway, thanks the filer for checking code around this. I would temporarily
close this bug as "won't fix", and if there are follow-up patch(es) which is better and safer, we can re-visit this bug.
[3 Jan 2019 11:31] Erlend Dahl
Note - to ensure proper handling, please file new bugs for any follow-up patches.