Description:
False sharing is a well-known problem in multiprocessor systems, causing performance degradation in multi-threaded programs running in such environments
CPU Cache has a significant impact on performance.
Maintaining cache-friendliness
/**
An atomic @c uint32 variable, guaranteed to be alone in a CPU cache line.
This is for performance, for variables accessed very frequently.
*/
struct PFS_cacheline_atomic_uint32 {
std::atomic<uint32> m_u32;
char m_full_cache_line[PFS_CACHE_LINE_SIZE - sizeof(std::atomic<uint32>)];
PFS_cacheline_atomic_uint32() : m_u32(0) {}
};
struct PFS_cacheline_atomic_uint64 {
struct PFS_cacheline_atomic_size_t {
...
How to repeat:
Run on server with 2 sockets, each socket 64 logical CPU (32 physical), 4 NUMA-nodes.
Disabling performance_schema significantly improves performance.
Suggested fix:
Remove align to full cache line.
Separete atomic value to CPU count
Aggregate separated values in performance_schema
Description: False sharing is a well-known problem in multiprocessor systems, causing performance degradation in multi-threaded programs running in such environments CPU Cache has a significant impact on performance. Maintaining cache-friendliness /** An atomic @c uint32 variable, guaranteed to be alone in a CPU cache line. This is for performance, for variables accessed very frequently. */ struct PFS_cacheline_atomic_uint32 { std::atomic<uint32> m_u32; char m_full_cache_line[PFS_CACHE_LINE_SIZE - sizeof(std::atomic<uint32>)]; PFS_cacheline_atomic_uint32() : m_u32(0) {} }; struct PFS_cacheline_atomic_uint64 { struct PFS_cacheline_atomic_size_t { ... How to repeat: Run on server with 2 sockets, each socket 64 logical CPU (32 physical), 4 NUMA-nodes. Disabling performance_schema significantly improves performance. Suggested fix: Remove align to full cache line. Separete atomic value to CPU count Aggregate separated values in performance_schema