Bug #64150 performance drops with glibc 2.13 when threads are created
Submitted: 27 Jan 2012 17:46 Modified: 26 Mar 2012 18:09
Reporter: Mark Callaghan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: DML Severity:S5 (Performance)
Version:5.1.52 OS:Any
Assigned to: CPU Architecture:Any
Tags: glibc, performance

[27 Jan 2012 17:46] Mark Callaghan
Description:
See https://www.facebook.com/note.php?note_id=10150494400690933 for the full story. Performance degrades once many threads are created and destroyed. The workaround is to set thread_cache_size to a large value to prevent that. MySQL can guarantee this won't be a problem by using malloc to allocate thread stacks when calling pthread_create and passing the stack to pthread_create.

How to repeat:
described in the facebook note

Suggested fix:
allocate thread stacks using malloc before calling pthread create

consider linking with jemalloc
[27 Jan 2012 22:33] Davi Arnaut
Isn't malloc going to use mmap anyway?
[27 Jan 2012 22:44] Davi Arnaut
Also, I'm not sure about the relation between jemalloc and the thread stack. When you use jemalloc, somehow glibc does not use mmap anymore?
[13 Mar 2012 20:01] Mark Callaghan
Reproduction can be done with one sysbench table. Invoke this as: 
  while :; do bash script.sh 128 10 ; done

-------

# a test table with 1 row is sufficient

nt=$1
s=$2

for i in $( seq 1 $np ); do
/data/sysbench.new --batch --batch-delay=5 --test=oltp --mysql-db=test \
    --oltp-table-size=1 --max-time=$s --max-requests=0 \
     --mysql-table-engine=innodb --db-ps-mode=disable \
     --mysql-engine-trx=yes --oltp-table-name=sbtest1 --oltp-read-only \
     --oltp-skip-trx --oltp-test-mode=simple --oltp-point-select-all-cols \
     --oltp-dist-type=uniform --oltp-range-size=100 \
     --num-threads=$nt --seed-rng=$i run > o.1

grep transactions: o.1

----

The command line requires my branch of sysbench which has the benefit of doing per-interval performance metrics -- https://code.launchpad.net/~mdcallag/sysbench/0.4-dev

----

Example output when there isn't a problem:

# while :; do bash b.sh 128 1 10 ; done
[1331697100]     transactions:                        723153 (72296.59 per sec.)
[1331697112]     transactions:                        737465 (73724.95 per sec.)
[1331697123]     transactions:                        716812 (71662.84 per sec.)
[1331697135]     transactions:                        723451 (72324.46 per sec.)
[1331697146]     transactions:                        745423 (74515.82 per sec.)
[1331697158]     transactions:                        735637 (73545.78 per sec.)
[1331697170]     transactions:                        732437 (73224.99 per sec.)
[1331697181]     transactions:                        713139 (71296.18 per sec.)
[1331697193]     transactions:                        712114 (71192.17 per sec.)
[1331697204]     transactions:                        710616 (71042.40 per sec.)
[1331697216]     transactions:                        717644 (71742.54 per sec.)
[1331697228]     transactions:                        718576 (71835.66 per sec.)
[1331697239]     transactions:                        721236 (72101.95 per sec.)
[1331697251]     transactions:                        718568 (71836.64 per sec.)
[1331697262]     transactions:                        746296 (74609.48 per sec.)
[1331697274]     transactions:                        718190 (71799.78 per sec.)
[1331697286]     transactions:                        720597 (72040.34 per sec.)

And an example where the problem occurs. It takes ~5000 thread create/destroy operations to occur:

[1331697551]     transactions:                        1499999 (149966.41 per sec.)
[1331697562]     transactions:                        1528138 (152777.01 per sec.)
[1331697574]     transactions:                        1372559 (137229.92 per sec.)
[1331697585]     transactions:                        1298904 (129859.70 per sec.)
[1331697597]     transactions:                        408541 (40843.22 per sec.)
[1331697609]     transactions:                        457317 (45717.58 per sec.)
[1331697620]     transactions:                        368105 (36796.29 per sec.)
[1331697632]     transactions:                        355782 (35568.13 per sec.)
[13 Mar 2012 20:02] Mark Callaghan
my.cnf during the test

[mysqld]
innodb_buffer_pool_size=16G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=1
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80

innodb_file_format=barracuda
innodb_file_per_table

max_connections=2000
table_cache=2000

key_buffer_size=200M
innodb_io_capacity=1000

log_bin
sync_binlog=1

query_cache_size=0
query_cache_type=0
innodb_thread_concurrency=0
thread_cache_size=0
[26 Mar 2012 18:09] Sveta Smirnova
We could not reproduce this on our side after weeks of testing, so closing as "Can't repeat".