Bug #72811 | Set NUMA mempolicy for optimum mysqld performance | ||
---|---|---|---|
Submitted: | 30 May 2014 7:49 | Modified: | 13 Jul 2015 12:08 |
Reporter: | Stewart Smith | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Compiling | Severity: | S5 (Performance) |
Version: | 5.6, 5.7 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | numa, PowerPC |
[30 May 2014 7:49]
Stewart Smith
[30 May 2014 7:51]
Stewart Smith
set memory policy interleaved for global structures, node local for runtime allocations (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: mempolicy.patch (text/x-patch), 789 bytes.
[30 May 2014 8:20]
MySQL Verification Team
Hello Stewart, Thank you for the report and contribution. Thanks, Umesh
[2 Jun 2014 1:44]
Stewart Smith
here's a little microbenchmark program that can work out the actual latency for doing memory operations from one NUMA node to another versus local: wget http://ozlabs.org/~anton/junkcode/latency2001.c gcc -O2 -o latency2001 latency2001.c sudo -s echo 2048 > /proc/sys/vm/nr_hugepages echo 1000000000000 > /proc/sys/kernel/shmmax # Node local allocation for i in `seq 0 4 127` do ./latency2001 -a $i -c $i -l 128M done # Allocate on memory on CPU 0 for i in `seq 0 4 127` do ./latency2001 -a 0 -c $i -l 128M done Output like following (these are made up numbers, just to ensure I don't divulge anything I'm not meant to): 134217728 500.00 cycles 500.00 ns 134217728 600.00 cycles 600.00 ns 134217728 800.00 cycles 800.00 ns 134217728 1000.00 cycles 1000.00 ns (i.e. it's easy to see that as you access memory further away it gets more expensive) You'll want to vary the parameters to seq there for the hardware you're running on - e.g. on Intel the 4 is probably going to be a 2 (rather than 4 for POWER7 and 8 for POWER8 - this is the amount of hardware threads per CPU core, no use in running benchmark on each SMT thread)
[4 Jun 2014 1:13]
Stewart Smith
Explicitly set NUMA memory allocate policy. Now for 5.7 with CMake foo (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: numa-mempolicy-v2.patch (text/x-patch), 2.54 KiB.
[4 Jun 2014 1:16]
Stewart Smith
My updated patch is for 5.7.4-m14 (although should work on 5.6 too). If it finds numactl.h and libnuma, it will build with setting numa memory policy and output information to this effect on mysqld startup If it doesn't find them, it will compile anyway and display an appropriate notice on mysqld startup. Possible enhancements that people may want: 1) find out how to do this on Windows - I have no idea and don't have a current mysqld on win32 build environment. 2) server variable?
[4 Jun 2014 1:49]
Stewart Smith
on further reading, using MAP_POPULATE flag to mmap when allocating buffer pool may be needed to help force the allocation of the pages at the right time.
[16 Sep 2014 11:32]
Daniƫl van Eeden
This could be a duplicate of Bug #57241.
[17 Sep 2014 0:56]
Stewart Smith
It possibly is. My patch is under OCA though, so it can be merged as-is or with any amendment needed.
[4 Jun 2015 21:40]
Stewart Smith
This is still something that should be resolved. there's two approaches I see as being possible: 1) buffer pool instances map to NUMA nodes this possibly gives better latency for a number of workloads, as a whole BP page will be node local, and if there is locality to a BP-instance for a query, that thread is likely to be scheduled on the node where the data is. 2) a patch like this, interleave everything except connection specific This could be interim while (1) is sorted. On larger NUMA machines, the links between nodes can be non-symmetrical too, with lower bandwidth over some links, so simply interleaving everything (especially connection specific memory) is unlikely to be the best thing to do.
[13 Jul 2015 12:08]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.6.27, 5.7.9, 5.8.0 releases, and here's the changelog entry: The new innodb_numa_interleave read-only configuration option allows you to enable the NUMA interleave memory policy for allocation of the InnoDB buffer pool. When innodb_numa_interleave is enabled, the NUMA memory policy is set to MPOL_INTERLEAVE for the mysqld process. After the InnoDB buffer pool is allocated, the NUMA memory policy is set back to MPOL_DEFAULT. This option is only available on NUMA-enabled systems. Thanks to Stewart Smith for the patch.
[3 Jan 2016 6:18]
Tim Smith
I install the latest version of MySQL 5.6.28 (x64) on Windows. mysql> show global variables like '%version%'; +-------------------------+------------------------------+ | Variable_name | Value | +-------------------------+------------------------------+ | innodb_version | 5.6.28 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.6.28-log | | version_comment | MySQL Community Server (GPL) | | version_compile_machine | x86_64 | | version_compile_os | Win64 | +-------------------------+------------------------------+ 7 rows in set (0.00 sec) mysql> show global variables like '%numa%'; Empty set (0.00 sec) The manual (http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_numa_interleav...) says: "For the innodb_numa_interleave option to be available, MySQL must be compiled on a NUMA-enabled system." Does it mean 1. MySQL 5.6.28 (x64) on Windows is not compiled on a NUMA-enabled system by Oracle? 2. MySQL 5.6.28 (x64) on Windows cannot get the optimal performance on a server with more than one CPU socket?
[5 Jan 2016 17:22]
Joe Grasse
I went to try this out and ran into Bug #78953.