Bug #120597 performance_schema atomic values maintaining CPU cache-friendliness
Submitted: 2 Jun 6:58
Reporter: Alex Zimnitski Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S5 (Performance)
Version:8.0.46, 8.4.9 OS:Linux
Assigned to: CPU Architecture:x86

[2 Jun 6:58] Alex Zimnitski
Description:
False sharing is a well-known problem in multiprocessor systems, causing performance degradation in multi-threaded programs running in such environments

CPU Cache has a significant impact on performance.
Maintaining cache-friendliness

/**
  An atomic @c uint32 variable, guaranteed to be alone in a CPU cache line.
  This is for performance, for variables accessed very frequently.
*/
struct PFS_cacheline_atomic_uint32 {
  std::atomic<uint32> m_u32;
  char m_full_cache_line[PFS_CACHE_LINE_SIZE - sizeof(std::atomic<uint32>)];

  PFS_cacheline_atomic_uint32() : m_u32(0) {}
};

struct PFS_cacheline_atomic_uint64 {
struct PFS_cacheline_atomic_size_t {
...

How to repeat:
Run on server with 2 sockets, each socket 64 logical CPU (32 physical), 4 NUMA-nodes.
Disabling performance_schema significantly improves performance.

Suggested fix:
Remove align to full cache line.
Separete atomic value to CPU count
Aggregate separated values in performance_schema