MySQL Bugs: #79379: Inconsistent innodb_buffer_pool

Bug #79379	Inconsistent innodb_buffer_pool_size handling
Submitted:	23 Nov 2015 6:45	Modified:	30 Dec 2015 19:43
Reporter:	Alexey Kopytov	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.9	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
innodb_buffer_pool_size is handled quite differently when set on startup
or dynamically.

Setting it dynamically invokes innodb_buffer_pool_size_update(), which
eventually calls buf_pool_resize(), which does quite a lot of push-ups
to calculate the real buffer pool size from the user-provided value,
i.e. to take rounding, instances and chunks into account.

Setting it from a configuration file or the command line does not invoke
innodb_buffer_pool_size_update(). Instead the user-provided value is
simply copied to the internal variable verbatim in innodb_init().

The result is the following surprising behavior:

mysql> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                  25165824 |
+---------------------------+
1 row in set (0.00 sec)

mysql> set global innodb_buffer_pool_size=10485760;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                  20971520 |
+---------------------------+
1 row in set (0.00 sec)

mysql> set global innodb_buffer_pool_size=25165824;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                  50331648 |
+---------------------------+
1 row in set (0.00 sec)

How to repeat:
This is fairly easy to repeat on x86 with large pages.

1. Configure the large page support as described at
https://dev.mysql.com/doc/refman/5.7/en/large-page-support.html

2. Run the following command: ./mtr --mysqld=--large-pages sys_vars.innodb_buffer_pool_size_basic

MTR's internal server state check will fail as follows:
...
@@ -209,7 +209,7 @@
 INNODB_BUFFER_POOL_LOAD_ABORT  OFF
 INNODB_BUFFER_POOL_LOAD_AT_STARTUP     ON
 INNODB_BUFFER_POOL_LOAD_NOW    OFF
-INNODB_BUFFER_POOL_SIZE        25165824
+INNODB_BUFFER_POOL_SIZE        50331648
 INNODB_CHANGE_BUFFERING        all
 INNODB_CHANGE_BUFFER_MAX_SIZE  25
 INNODB_CHECKSUMS       ON
...

Suggested fix:
Unify static-vs-dynamic behavior.

Hello Alexey,

Thank you for the report.
Verified as described.

Thanks,
Umesh

##

root@ubuntux86:/home/ushastry/Downloads# grep -i huge /proc/meminfo
AnonHugePages:    270336 kB
HugePages_Total:      40
HugePages_Free:       40
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

root@ubuntux86:/home/ushastry/Downloads# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"
NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.2 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

root@ubuntux86:/home/ushastry/Downloads# free -m -t
             total       used       free     shared    buffers     cached
Mem:          4051       3717        334          7         52       2884
-/+ buffers/cache:        779       3271
Swap:         4092          0       4092
Total:        8144       3717       4426

root@ubuntux86:/home/ushastry/Downloads# md5sum mysql-5.7.9-linux-glibc2.5-i686.tar.gz
d3f7abc7cf5533ec7268f3d5ecb350b2  mysql-5.7.9-linux-glibc2.5-i686.tar.gz
root@ubuntux86:/home/ushastry/Downloads# md5sum mysql-5.7.9-linux-glibc2.5-i686.tar
59bcf280fd317031ee59733e9c512fea  mysql-5.7.9-linux-glibc2.5-i686.tar
root@ubuntux86:/home/ushastry/Downloads# md5sum mysql-test-5.7.9-linux-glibc2.5-i686.tar.gz
072adc6dfdf1176144980929b320b3a4  mysql-test-5.7.9-linux-glibc2.5-i686.tar.gz
root@ubuntux86:/home/ushastry/Downloads# 

-- 
ushastry@ubuntux86:~/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test$ ./mtr --mysqld=--large-pages sys_vars.innodb_buffer_pool_size_basic
Logging: ./mtr  --mysqld=--large-pages sys_vars.innodb_buffer_pool_size_basic
2015-11-23T12:24:49.633244Z 0 [Warning] Changed limits: max_open_files: 1024 (requested 5000)
2015-11-23T12:24:49.633795Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 2000)
MySQL Version 5.7.9
Checking supported features...
 - SSL connections supported
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var'...
Installing system database...

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
sys_vars.innodb_buffer_pool_size_basic   [ pass ]    136

MTR's internal check of the test case 'sys_vars.innodb_buffer_pool_size_basic' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up. It could also be caused by the previous test run
by this thread, if the server wasn't restarted.
This is the diff of the states of the servers before and after the
test case was executed:
mysqltest: Logging to '/home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var/tmp/check-mysqld_1.log'.
mysqltest: Results saved in '/home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var/tmp/check-mysqld_1.result'.
mysqltest: Connecting to server localhost:13000 (socket /home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var/tmp/mysqld.1.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from './include/check-testcase.test' ...
mysqltest: ... Done processing test commands.
--- /home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var/tmp/check-mysqld_1.result	2015-11-23 15:25:09.937883000 +0300
+++ /home/ushastry/Downloads/mysql-5.7.9-linux-glibc2.5-i686/mysql-test/var/tmp/check-mysqld_1.reject	2015-11-23 15:25:12.116793000 +0300
@@ -209,7 +209,7 @@
 INNODB_BUFFER_POOL_LOAD_ABORT	OFF
 INNODB_BUFFER_POOL_LOAD_AT_STARTUP	ON
 INNODB_BUFFER_POOL_LOAD_NOW	OFF
-INNODB_BUFFER_POOL_SIZE	25165824
+INNODB_BUFFER_POOL_SIZE	50331648
 INNODB_CHANGE_BUFFERING	all
 INNODB_CHANGE_BUFFER_MAX_SIZE	25
 INNODB_CHECKSUMS	ON

mysqltest: Result content mismatch

not ok
safe_process[9642]: Child process: 9643, exit: 1

--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.136 of 28 seconds executing testcases

Check of testcase failed for: sys_vars.innodb_buffer_pool_size_basic

Completed: All 1 tests were successful.

This is a regression introduced by the following commit:

---
commit a6a2933edcd419abd271e987d647546c7306649d
Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
Date:   Wed Jul 29 16:26:45 2015 +0530

    Bug #21348684 SIGABRT DURING RESIZING THE INNODB BUFFER POOL ONLINE
    WITH MEMORY FULL CONDITION
    
    Problem:
    
    InnoDB buffer pool resize is done by background thread.
    Consider the following scenario:
    
    1.  User issues "set global innodb_buffer_pool_resize = X;", where
        X is very high and allocation of such large memory is bound to fail.
    2.  The system variable innodb_buffer_pool_resize is immediately
        set to X, and then an event is generated to resize buffer pool.
    3.  The background thread wakes up with the event generated.  It
        proceeds to do buffer pool resize.
    4.  The background thread may fully or partially fail depending on the
        available memory in the machine.
    
    The memory allocation failure was considered as a fatal error.  Also,
    the system variable was not updated to the actual value of the innodb
    buffer pool.
    
    Solution:
    
    If memory allocation fails, it should not be considered as a fatal failure.
    And update the system variable to the actual value of the innodb buffer
    pool after doing resize.
    
    rb#9581 approved by Deb.
---

The important bit here is that after this commit the actual
innodb_buffer_pool_size value is calculated based on the allocated
memory and may differ from the user-provided one due to either various
alignment rules (OS page size, chunk size, etc.) or chunk allocation
failures.

A probably unexpected outcome from that change is that on some
configurations or architectures where OS page size is larger than the
InnoDB page size (non-x86 architectures, or x86 with large pages
enabled) a simple operation like "SET GLOBAL
innodb_buffer_pool_size=@@innodb_buffer_pool_size" results in a
different (and larger) variable value every time it is executed.

Which is generally not a big issue, but that complicates MTR test cases
that modify innodb_buffer_pool_size, because the usual logic "remember
the current variable value before modifying, then modify, then restore
to the old value to make MTR check happy" doesn't work anymore: the
user-provided and actually seen values for innodb_buffer_pool_size
always differ.

That's why the following MTR tests fail on platforms with 64k page size
even with bug #79378 fixed:

sys_vars.innodb_buffer_pool_size_basic 
innodb.innodb_buffer_pool_resize_debug 
innodb_zip.cmp_drop_table

I suggest reverting that part of the fix for bug #21348684, i.e. always
keep innodb_buffer_pool_size as a user-provided value (with some simple
alignment rules), and delegate the actually allocated value to a
status/INNODB_METRICS variable.

On further analysis, this code from buf_chunk_init() is causing
static-vs-dynamic and alignment inconsistency with large pages:

	/* Reserve space for the block descriptors. */
	mem_size += ut_2pow_round((mem_size / UNIV_PAGE_SIZE) * (sizeof *block)
				  + (UNIV_PAGE_SIZE - 1), UNIV_PAGE_SIZE);

We align the total buffer pool size on many levels (to UNIV_PAGE_SIZE in
buf_chunk_init(), to OS page size in os_mem_alloc_large() and to
'srv_buf_pool_instances * srv_buf_pool_chunk_unit' in
buf_pool_size_align()). Unfortunately, all that alignment occurs after
reserving space for block descriptors in the above code, which breaks
alignment even if the original chunk size is properly aligned on all
levels. That in turn results in all kinds of inconsistencies
demonstrated in this bug report.

I wonder if it makes any sense to reserve extra bytes for block
descriptors. It would be more consistent to stay within the limits
passed by the caller to buf_chunk_init(). That is, make buf_chunk_init()
allocate a memory block of the original mem_size argument and share that
memory for both block frames and descriptors by removing the above code
from buf_chunk_init(). The remaining code in buf_chunk_init() can handle
it just fine: it adjusts the chunk size to make enough room for both
frames and descriptors.

That alone would solve the inconsistency problem for large OS pages and
test cases, at least with properly aligned buffer pool sizes. On the
other hand, that would mean slightly less buffer pool pages for same
innodb_buffer_pool_size values. But then again, that would make
innodb_buffer_pool_size more "fair" and intuitive (no auto-adjustments
by the server for properly aligned sizes). Which I think is good enough
for all intents and purposes.

Contribution submitted via Github - Bug #79379: Inconsistent innodb_buffer_pool_size handling (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/45#issuecomment-168211582): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_54827727.txt (text/plain), 2.23 KiB.

Contribution submitted via Github - MySQL bug #79379: Inconsistent innodb_buffer_pool_size handling (*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/207#issuecomment-392049842): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_190533119.txt (text/plain), 2.28 KiB.