Bug #88071 An arresting Performance degradation when set sort_buffer_size=32M
Submitted: 12 Oct 8:17 Modified: 22 Nov 12:55
Reporter: ashe sun Email Updates:
Status: Need Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7.19 OS:Any
Assigned to:

[12 Oct 8:17] ashe sun
Description:
An arresting Performance degradation when I set sort_buffer_size=32M with sysbench oltp tests.

sort_buffer_size=31M
threads: 256, tps: 7332.99, reads: 102331.92, writes: 0.00, response time: 41.90ms (95%), errors: 0.00, reconnects:  0.00

sort_buffer_size=32M
threads: 256, tps: 4139.01, reads: 58050.08, writes: 0.00, response time: 86.00ms (95%), errors: 0.00, reconnects:  0.00

And then I fund a critical value,it is 33550440/33550439

sort_buffer_size=33550439
threads: 256, tps: 7410.05, reads: 103828.67, writes: 0.00, response time: 40.17ms (95%), errors: 0.00, reconnects:  0.00

sort_buffer_size=33550440;
threads: 256, tps: 4276.00, reads: 59815.01, writes: 0.00, response time: 82.00ms (95%), errors: 0.00, reconnects:  0.00

the perf-tools show me that when I set sort_buffer_size>=33550440,the mysqld hit a lot of page_fault.

How to repeat:
Just set sort_buffer_size >= 33550440,and then run sysbench oltp test,read only or not.
[12 Oct 8:20] ashe sun
perf-sgv of different sort_buffer_size,you can open it with google browser

Attachment: sort_buffer_size_33550439.svg (image/svg+xml, text), 215.58 KiB.

[12 Oct 8:21] ashe sun
sort_buffer_size_33550439

Attachment: sort_buffer_size_33550440.svg (image/svg+xml, text), 346.81 KiB.

[8 Nov 12:17] Sinisa Milivojevic
Hi!

Thank you for your bug report.

First of all, sort buffer is a session buffer, so if all 256 threads require sorting, you will spend exactly 8 Gb only on the sort buffer. So, if you have more then 10 or 12 Gb RAM, you do not have that problem.

If RAM is not a problem for you, then send us the exact command, which we shall try to repeat on our test. computer with 1 Tb of RAM.
[18 Nov 10:08] Alexey Kopytov
I'm fairly sure it is a modern version of the problem that has been
described and discussed before:

- https://mysqlha.wordpress.com/2007/09/06/read-buffer-performance-hit/
- https://mysqlha.wordpress.com/2007/09/10/more-on-malloc-speed/

It's just that the way malloc() works has somewhat changed since
2007. Quoting http://man7.org/linux/man-pages/man3/mallopt.3.html:

"
       M_MMAP_THRESHOLD
              For allocations greater than or equal to the limit specified
              (in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
              the free list, the memory-allocation functions employ mmap(2)
              instead of increasing the program break using sbrk(2).

              Allocating memory using mmap(2) has the significant advantage
              that the allocated memory blocks can always be independently
              released back to the system.  (By contrast, the heap can be
              trimmed only if memory is freed at the top end.)  On the other
              hand, there are some disadvantages to the use of mmap(2):
              deallocated space is not placed on the free list for reuse by
              later allocations; memory may be wasted because mmap(2)
              allocations must be page-aligned; and the kernel must perform
              the expensive task of zeroing out memory allocated via
              mmap(2).  Balancing these factors leads to a default setting
              of 128*1024 for the M_MMAP_THRESHOLD parameter.

              The lower limit for this parameter is 0.  The upper limit is
              DEFAULT_MMAP_THRESHOLD_MAX: 512*1024 on 32-bit systems or
              4*1024*1024*sizeof(long) on 64-bit systems.

              Note: Nowadays, glibc uses a dynamic mmap threshold by
              default.  The initial value of the threshold is 128*1024, but
              when blocks larger than the current threshold and less than or
              equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold
              is adjusted upward to the size of the freed block.  When
              dynamic mmap thresholding is in effect, the threshold for
              trimming the heap is also dynamically adjusted to be twice the
              dynamic mmap threshold.  Dynamic adjustment of the mmap
              threshold is disabled if any of the M_TRIM_THRESHOLD,
              M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.
"

So the default threshold between sbrk() and mmap() used to be 128K. On
modern systems a dynamic threshold is used with
DEFAULT_MMAP_THRESHOLD_MAX as the maximum value. With
DEFAULT_MMAP_THRESHOLD_MAX being 32M by default on 64-bit systems.

I would recommend experimenting with malloc() parameters mentioned in
the man page and the test program from the second blog post.
[18 Nov 13:20] Alexey Kopytov
What probably makes the mmap() code path in malloc() even slower with
high concurrency is a long-known mmap() scalability problem in
the Linux kernel. Basically, mmap() calls from multiple threads in the
same process are serialized: https://lkml.org/lkml/2013/1/2/299
[20 Nov 12:47] Sinisa Milivojevic
Kaamos,

Thank you. Still if you have X Gb of RAM on your machine and allocate X * 1.5 Gb, then you have to expect the slowdown .....
[20 Nov 21:42] Mark Callaghan
more results at http://smalldatum.blogspot.com/2017/11/concurrent-large-allocations-glibc.html
[21 Nov 1:38] ashe sun
Hi,Sinisa Milivojevic:
      My computer is 256GB RAM.There is  still 211 GB RAM  free  when I run this test sort_buffer_size=33550440.
[21 Nov 1:44] ashe sun
Hi,Sinisa Milivojevic:

My test command:
 ./sysbench --test=./db/oltp.lua  --oltp-table-size=160000000 --oltp-tables-count=1 --mysql-db=sysbench --mysql-host=100.110.0.9  --mysql-user=sysbench --mysql-password=****** --mysql-port=13308 --num-threads=256 --max-time=60 --mysql-table-engine=INNODB --db-driver=mysql --report-interval=1 --max-requests=10000000  run

It is just a general oltp test. 

mysql> show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=160113831 DEFAULT CHARSET=utf8 MAX_ROWS=1000000
1 row in set (0.00 sec)
[21 Nov 14:29] Sinisa Milivojevic
Hi!

If you have read very valuable comments by experienced developers in this very bug, this turns out to be a problem with GNU malloc, which is shipped with GLIBC.

So, please try using jemalloc or tcmalloc and see if things improve.  Unfortunately, I do not have access to the LInux box where I could install these Malloc libraries .....
[22 Nov 7:58] ashe sun
Hello,Sinisa Milivojevic
      I have installed  tcmalloc libraries, and use it with mysql. It is not useful.
[22 Nov 12:55] Sinisa Milivojevic
Hi!

Please check three issues:

* you might have tcmalloc installed, but you have to instruct dynamic library linker that it should link with it and not with GNU malloc within glibc

* have you tried jemalloc as well ??

* how many cores do you have ??