Bug #34399 MySQL hangs with connection pooling option enabled
Submitted: 7 Feb 2008 22:56 Modified: 24 Apr 2009 15:31
Reporter: luojia chen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S1 (Critical)
Version:6.0.5 OS:Any (Solaris Express Community Edition snv_81 SPARC, Linux)
Assigned to: Paul Dubois CPU Architecture:Any
Triage: Triaged: D1 (Critical) / R2 (Low) / E3 (Medium)

[7 Feb 2008 22:56] luojia chen
Description:
System:
HW: SunFire v880  
    8x750MHZ
    32GB RAM

Test workload: sysbench
MySQL is configured as bellow:
PARAM="--port=3306 \
--socket=/tmp/mysql.sock \
--user=root \
--datadir=$FSPATH \
--basedir=$BASEPATH \
--log-error=/usr/local/mysql/data/global_error.txt \
--max_connections=12000 \
--max_connect_errors=10 \
--table_cache=2048 \
--max_prepared_stmt_count=32764 \
--max_allowed_packet=1048576 \
--binlog_cache_size=1048576 \
--max_heap_table_size=67108864 \
--sort_buffer_size=65536 \
--join_buffer_size=1048576 \
--thread_cache=8 \
--thread_concurrency=16 \
--thread_stack=64K \
--query_cache_size=0 \
--query_cache_limit=2M \
--ft_min_word_len=4 \
--default_table_type=Innodb \
--transaction_isolation=REPEATABLE-READ \
--tmp_table_size=64M \
--skip-locking \
--server-id=1 \
--thread_handling=pool-of-threads \
--thread_pool_size=10 \
--innodb_data_home_dir=$FSPATH \
--innodb_data_file_path=ibdata1:100M:autoextend \
--innodb_log_group_home_dir=$FSPATH \
--innodb_buffer_pool_size=1G\
--innodb_additional_mem_pool_size=20M \
--innodb_log_file_size=500M \
--innodb_log_buffer_size=8M \
--innodb_flush_log_at_trx_commit=1 \
--innodb_lock_wait_timeout=300 \
--innodb_locks_unsafe_for_binlog=1 \
--innodb_max_dirty_pages_pct=90 \
--innodb_thread_concurrency=0" 

When run the read-write test, the mysql server can't be reached, and error message in the "/usr/local/mysql/data/global_error.txt" shows "port-dissociate: No such file or directory"

How to repeat:
1. Generate the load
Load 1M row database using the following command:
sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=sbtest --max-requests=0 prepare

2. Run the Test
Running sysbench OLTP read-write test case with 1000 user connections using the following command:
sysbench --test=oltp --oltp-read-only=off --oltp-dist-type=special --num-threads=1000 --oltp-table-size=1000000 --mysql-db=sbtest --max-requests=0 --max-time=600 run

While running the sysbench with the above command, I couldn't reach the MySQL server from mysql client - while running #mysql, mysql> wasn't shown up)
[8 Feb 2008 7:22] Sveta Smirnova
Thank you for the report.

Please provide output of SHOW VARIABLES LIKE 'max_connections';
[8 Feb 2008 7:28] Sveta Smirnova
Please also provide output of `ulimit -a` and error log file.
[8 Feb 2008 19:09] luojia chen
#ulimit -a
fime(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited

mysql>show variables like 'max_connections'
max_connections      12000
[8 Feb 2008 22:25] luojia chen
uploaded the error log file: bug-data-34399.tar.gz
[12 Feb 2008 16:35] Susanne Ebrecht
max_connections      12000
That's too much for Solaris.

http://dev.mysql.com/doc/refman/5.0/en/too-many-connections.html

Try to use not more then 1024.

It depends on the Solaris thread handling.
[13 Feb 2008 1:35] luojia chen
My system has 32G memory. While not setting thread_handling as "pool-of-threads", MySQL runs well without this problem (used the same settings(max-connections=12000)

If I reduced the "max-connections=1200", the same problem happened
[29 Feb 2008 19:29] luojia chen
The same problem was observed while reducing:
max_connections = 512

MySQL run well without such problem if not setting "thread_handling" as "pool-of-threads" in the same sysbench rw test:
sysbench --test=oltp --oltp-read-only=off --oltp-dist-type=special --num-thread=512 --oltp-table-size=1000000 --mysql-db=sbtest --max-requests=0 --max-time=600 run
[13 Mar 2008 21:31] Sveta Smirnova
Thank you for the feedback.

I can not repeat described behavior. Although nearest machine I found with SunOS sol10-sparc-c 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V245 Solaris
[2 May 2008 23:41] luojia chen
I can repeat the same problem on my 2-way Sun Blade 1000 server.

I have put my system(Sun Fire v880) with test app/scripts set up ready to access on Internet. If it is possible for you, would you check it my system. If so, I can send to you the login in seperate email
[5 May 2008 19:40] Sveta Smirnova
Jenny,

thank you for the feedback and offer.

If you could open connection to your server for time enough for me to verify the report (probably 1-3 working days, depend from other issues I am loaded with) and for developers to fix (unknown) it would be possible for us to use your server.
[5 May 2008 23:53] luojia chen
Hi, Sveta,

Please let me know your emails, so that I would send to you the login in seperate email.

Thanks
[6 May 2008 5:45] Sveta Smirnova
Jenny,

you can add comment to bug report privately. But I just sent email from my address, so you can use it if you wish.
[11 May 2008 8:34] Sveta Smirnova
Thank you for the provided information.

Verified as described on your machine.
[21 May 2008 7:19] Venu Anuganti
any update on this bug ? whether this will be fixed or what is the problem ?
[21 May 2008 7:20] Venu Anuganti
By the way I can see this behavior on RHEL too with simple mysqlslap
[2 Oct 2008 20:03] Sveta Smirnova
Bug #39822 reported on Linux was marked as duplicate of this one.
[20 Mar 2009 7:12] Sveta Smirnova
Bug #43753 reported on Linux was marked as duplicate of this one.
[20 Apr 2009 11:01] Magne Mæhre
This is not a bug per se, but a case for which the thread pool implementation will not work.

If you have a load that consists of transactions with multiple DDL operations, _and_ these set an exclusive lock (i.e INSERT/UPDATE), _and_ you have less threads available in the pool than the number of sessions that try to execute these transactions, you may run into this problem.

A simple example, using two sessions (S1 and S2), and one thread in the pool (T1) :

S1 :  BEGIN;
T1 : executes and returns
S1 :  UPDATE t SET a=1 WHERE id = 7;
T1 : executes and returns
S2 :  BEGIN
T1 : executes and returns
S2 :  UPDATE t SET a=1 WHERE id = 7;
T1 : executes, but runs into a lock an holds
S1 : COMMIT
T1 : still executing S2's UPDATE, so the COMMIT is queued
... (hang for 50 seconds) ...
T1 : Lock timeout for the S2's UPDATE, return with an error
S2 : Gets an error
T1 : Execute S1 COMMIT and return
S1 : get an OK.

The default lock timeout value for innodb is 50 seconds. 

Increasing the number of worker threads will reduce the probability of the problem, but you will need at least the same number of worker threads as the number of running sessions to avoid it completely.

I'm setting this report to Documenting, so the doc crew can assess if we need to clarify the documentation on this point.
[21 Apr 2009 15:57] Paul Dubois
The operation of the thread pool is described here:
http://dev.mysql.com/doc/refman/6.0/en/connection-threads.html

There is some discussion about conditions under which pooling will not work well, but I agree that it doesn't cover the situation described by Magne. I'll look into updating the docs.
[22 Apr 2009 18:45] Paul Dubois
Reassigning to myself as docs bug.
[24 Apr 2009 15:31] Paul Dubois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Addition to
http://dev.mysql.com/doc/refman/6.0/en/connection-threads.html:

 Thread pooling works best if the normal workload consists
 mostly of lightweight queries, with no more than a few
 simultaneous transactions or long-running statements.

 Thread pooling is less suitable when you have N threads in
 the pool but it is possible that all or most of the threads
 will be tied up servicing transactions or long-running
 statements simultaneously. If this happens, the workload from
 other clients will be stalled. Configuring the server to use
 a pool larger than the maximum number of simultaneous
 transactions or long-running statements will help in this
 case.

 The following examples illustrate when a too-small thread
 pool might cause problems. Assume a thread pool size of 1 for
 simplicity, but the same issues can occur for a larger pool
 and larger number of clients.

    * One client issues a statement that takes a long time to
     process. This ties up the service thread. If other
     clients issue statements, they are queued and must wait
     for the statement currently executing to finish.

    * One client acquires an exclusive lock on a table. If
     another client attempts to use the table, its statement
     blocks and the service thread remains occupied waiting
     for the block to end. Any further statements from the
     original client cannot execute, either, because there is
     no free thread available.

     In the following scenario, conflicting lock requests
     cause deadlock.

     Session 1:

LOCK TABLES t1 WRITE;

     The service thread executes the statement and becomes
     free.

     Session 2:

LOCK TABLES t1 WRITE;

     The service thread starts to execute the statement but
     blocks waiting for the exiting lock on t1 to be released.
     The thread does not become free.

     Session 1:

INSERT INTO t1 VALUES(1,2);

     The statement cannot execute and is queued because the
     service thread remains allocated to session 2. At this
     point, the clients are deadlocked and neither cannot
     continue.

     A similar situation can occur in the absence of explicit
     locks if simultaneous transactions attempt to access
     tables such that one transaction blocks another. Assume
     that t1 is an InnoDB table.

     Session 1:

START TRANSACTION;
SELECT * FROM t1 WHERE id = 7 FOR UPDATE;

     The service thread executes each of the statements in
     turn and becomes free. The SELECT
     statement acquires an exclusive lock for the selected row
     or rows.

     Session 2:

START TRANSACTION;
UPDATE t1 SET a = a+1 WHERE id = 7;

     The service thread executes the first statement and
     attempts to execute the second. However, the UPDATE
     blocks because it must lock the same rows already locked
     by session 1, so the service thread waits and does not
     become free.

     Session 1:

COMMIT;

     The statement cannot execute and is queued because the
     service thread remains allocated to the session 2
     statement. At this point, the transactions are
     deadlocked. Eventually, InnoDB times out waiting for the
     lock and the session 2 UPDATE fails with an error. The
     service thread becomes free and executes the COMMIT for
     session 1.