Bug #116249 Addressing Unfriendliness Problems in NUMA Environments
Submitted: 27 Sep 6:48 Modified: 27 Sep 8:27
Reporter: Bin Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:all versions OS:Any
Assigned to: CPU Architecture:Any

[27 Sep 6:48] Bin Wang
Description:
The remove_item_from_jobs and append_item_to_jobs functions primarily utilize the pending_jobs_lock latch for managing job queues. The append_item_to_jobs function, executed by the scheduling thread, is called once per event to enqueue it into the worker queue. In contrast, remove_item_from_jobs, executed by worker threads, is called once per event removal.

In high-throughput scenarios, the scheduling thread frequently calls append_item_to_jobs to enqueue events, while numerous worker threads concurrently call remove_item_from_jobs to dequeue them. This leads to significant latch contention, as both functions involve acquiring and releasing the latch. With event processing rates reaching several hundred thousand per second, latch contention between the scheduling thread and worker threads may become severe.

Frequent acquisition and release of latches cause context switches. In NUMA environments, these context switches lead to cache migration between NUMA nodes, resulting in decreased replay efficiency.

How to repeat:
Utilizing all nodes in a NUMA architecture can lead to poor replay performance on the replica.
[27 Sep 6:51] Bin Wang
If you're interested in learning more, you can refer to the following link:
https://advancedmysql.github.io/The-Art-of-Problem-Solving-in-Software-Engineering_How-to-...
[27 Sep 8:27] MySQL Verification Team
Thanks for the writeup