Bug #63908 mysql-proxy with LUA script get stuck when the clients' concurrency too much
Submitted: 3 Jan 2012 4:28 Modified: 26 Jan 2012 3:04
Reporter: zhongliang kang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Proxy Severity:S1 (Critical)
Version:0.8.2 OS:Linux (Suse10-2.6.16.60-0.21 )
Assigned to:
Tags: mysql-proxy concurrency stuck
Triage: Needs Triage: D2 (Serious)

[3 Jan 2012 4:28] zhongliang kang
Description:
basic info:
./mysql-proxy -V
mysql-proxy 0.8.2
  chassis: mysql-proxy 0.8.2
  glib2: 2.16.6
  libevent: 1.4.13-stable
  LUA: Lua 5.1.4
    package.path: /usr/local/mysql-proxy/lib/mysql-proxy/lua/?.lua;
    package.cpath: /usr/local/mysql-proxy/lib/mysql-proxy/lua/?.so;
-- modules
  proxy: 0.8.2
------------------------------------------------
testing server info:
CPU:  CPU frequency is 2.13 GHz * 4 (4 CPUs)
RAM: 32G
NIC: 1G

---------------basic info end-----------------------------
---------------lua script content------------
proxy.global.master_index = proxy.global.master_index or 1
function read_query(packet)
                proxy.connection.backend_ndx = proxy.global.master_index
return

----------------------lua script end------------------

-----------------the case-------------------------------
when I tried to test mysql-proxy without lua script,when the clients' concurrency  is 5000,each client got 1000 query, mysql-proxy can handled the task in 163 SEC, and the result is OK.

But when I tried to test mysql-proxy with LUA script,the content as above,when the clients' concurrency  is 5000(even 1000,in test), the mysql-proxy got stuck and didnot work any more. Then I repeated the test , the case repeated.

------------------describe the case-----------------------------
on the mysql-DB server,the phenomenon is like this: 
all threads are sleep.

.....
| 115128 | sky  | 10.164.9.218:49614 | test_took_sql | Sleep       |     69 |                | NULL             |
| 115129 | sky  | 10.164.9.218:49615 | test_took_sql | Sleep       |     69 |                | NULL             |
| 115130 | sky  | 10.164.9.218:49616 | test_took_sql | Sleep       |     69 |                | NULL             |
| 115131 | sky  | 10.164.9.218:49617 | test_took_sql | Sleep       |     69 |                | NULL             |
| 115132 | sky  | 10.164.9.218:49618 | test_took_sql | Sleep       |     69 |                | NULL             |
| 115133 | sky  | 10.164.9.218:49619 | test_took_sql | Sleep       |     69 |                | NULL             |
+--------+------+--------------------+---------------+-------------+--------+----------------+------------------+
5002 rows in set (0.01 sec)

On the mysql-proxy server,the phenomenon is like below,all proxy threads stuck at 'sendto(4,",",1,0,NULL,...':

mysql@NODE_1:/usr/local/mysql-proxy/bin> pstree -p 3437
mysql-proxy(3437)-+-{mysql-proxy}(3438)
                  |-{mysql-proxy}(3439)
                  |-{mysql-proxy}(3440)
                  `-{mysql-proxy}(3441)
mysql@NODE_1:/usr/local/mysql-proxy/bin> strace -p 3439
Process 3439 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 3439 detached

On the client server,the mysql-proxy server cannot connect again,when you input the password,no response received(4406 is the port mysql-proxy listen):

mysql -h 10.164.9.218 -u sky -p -P 4406 
Enter password: 

How to repeat:
1.start the mysql-proxy
2.get 5000 clients connect the mysql-proxy ,but donot run the query
3.all 5000 clients query at the same time
4.the case repeat

Suggested fix:
I and my friends found that the stuck's position mybe is here:

"chassis-event-thread.c": 
     in function  void chassis_event_add(chassis *chas, struct event *ev) {
     send(chas->threads->event_notify_fds[1], C("."), 0); /* ping the event handler */  //!!!here,all thread were stuck here! 

and we test by add the follow test print verified it:
1.add some test message

        int m_pid=pthread_self();
        g_message("%d,bf sync\n",m_pid);
        g_async_queue_push(chas->threads->event_queue, op);

        g_message("%d,send start...\n",m_pid);
        send(chas->threads->event_notify_fds[1], C("."), 0); /* ping the event handler */
        g_message("%d,send end...\n",m_pid);

2.repeat the case,in the mysql-proxy.log ,when the proxy stuck,we got these info:

2012-01-03 11:37:21: (message) 1082132800,send start...

2012-01-03 11:37:21: (message) -1135401696,send end...

2012-01-03 11:37:21: (message) 1107310912,bf sync

2012-01-03 11:37:21: (message) -1135401696,bf sync

2012-01-03 11:37:21: (message) 1098918208,send end...

2012-01-03 11:37:21: (message) 1090525504,send start...

2012-01-03 11:37:21: (message) 1098918208,bf sync

2012-01-03 11:37:21: (message) 1098918208,send start...

2012-01-03 11:37:21: (message) 1107310912,send start...

2012-01-03 11:37:21: (message) -1135401696,send start...

there just many " send start " with no "send end" in the log.
and the mysql-proxy start info is like below:

2012-01-03 11:57:53: (message) mysql-proxy 0.8.2 started
2012-01-03 11:57:53: (debug) max open file-descriptors = 102400
2012-01-03 11:57:53: (message) proxy listening on port 0.0.0.0:4406
2012-01-03 11:57:53: (message) added read/write backend: 10.164.5.111:3306
2012-01-03 11:57:53: (message) added read/write backend: 10.164.5.108:3306
2012-01-03 11:57:53: (message) chassis-event-thread.c:383: starting 4 threads

-------------------------

so maybe there is something wrong in the  notification of the mysql-proxy.
///
send(chas->threads->event_notify_fds[1], C("."), 0);
///
[3 Jan 2012 4:44] zhongliang kang
and more:
we try to add a test print at :  chassis-event-thread.c
    166         unsigned int m_pid=pthread_self();
    167         if(received>0){
    168                 g_message(" %u now queue event num: %d\n",m_pid,received);
    169         }
    170         /* the pipe has one . per event, remove as many as we received */

and in the output of mysql-proxy.log ,we got the follow output:
my question is : if the proxy get stuck has some relation with the number of event in the queue?
-----------mysql-proxy.log----------------

2012-01-03 11:37:21: (message)  1107310912 now queue event num: 4
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 1
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 4
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 46
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 15
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 17
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 15
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 14
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 16
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 16
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 17
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 17
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 15
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 75
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 40
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 95
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 32
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 41
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 29
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 79
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 73
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 11
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 73
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 68
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 19
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 113
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 46
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 7
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 6
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 54
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 49
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 37
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 153
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 22
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 47
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 20
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 23
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 39
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 126
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 28
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 31
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 118
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 73
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 96
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 3
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 68
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 83
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 108
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 20
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 153
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 120
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 24
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 68
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 45
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 68
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 55
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 78
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 49
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 114
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 74
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 69
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 68
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 218
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 54
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 43
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 124
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 132
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 112
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 217
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 71
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 64
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 120
2012-01-03 11:37:21: (message)  1082132800 now queue event num: 178
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 89
2012-01-03 11:37:21: (message)  1090525504 now queue event num: 213
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 64
2012-01-03 11:37:21: (message)  1107310912 now queue event num: 112
2012-01-03 11:37:21: (message)  3159565600 now queue event num: 225
2012-01-03 11:37:21: (message)  1098918208 now queue event num: 88

---------------stuck here------------------
[3 Jan 2012 10:45] zhongliang kang
And one more question:

#event-threads
event-threads = 5

when the mysql-proxy get stuck, by strace all the mysql-proxy's threads,like bellow,All the threads are send the event,and dead here,that means no thread to receive the event? 

---------
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> pstree -p 12807
mysql-proxy(12807)-+-{mysql-proxy}(12808)
                   |-{mysql-proxy}(12809)
                   |-{mysql-proxy}(12810)
                   `-{mysql-proxy}(12811)
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12807
Process 12807 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12807 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12808
Process 12808 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12808 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12809
Process 12809 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12809 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12810
Process 12810 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12810 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12811
Process 12811 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12811 detached
------------
[14 Jan 2012 10:22] Sveta Smirnova
Thank you for the report.

You wrote:

----<q>----
when I tried to test mysql-proxy without lua script,when the clients' concurrency  is
5000,each client got 1000 query, mysql-proxy can handled the task in 163 SEC, and the
result is OK.

But when I tried to test mysql-proxy with LUA script,the content as above,when the
clients' concurrency  is 5000(even 1000,in test), the mysql-proxy got stuck and didnot
work any more. Then I repeated the test , the case repeated.
----</q>----

What is the difference between 2 cases? I see you mentioned same 5000 number of concurrent client.

Also have you tried to connect to MySQL server without proxy? Can it survive such a workload?
[16 Jan 2012 6:16] zhongliang kang
----<q>----
when I tried to test mysql-proxy without lua script,when the clients'
concurrency  is
5000,each client got 1000 query, mysql-proxy can handled the task in
163 SEC, and the
result is OK.

But when I tried to test mysql-proxy with LUA script,the content as
above,when the
clients' concurrency  is 5000(even 1000,in test), the mysql-proxy got
stuck and didnot
work any more. Then I repeated the test , the case repeated.
----</q>----

What is the difference between 2 cases? I see you mentioned same 5000
number of concurrent client.

Also have you tried to connect to MySQL server without proxy? Can it
survive such a workload?
=============
The difference is:  case 1 run with no lua script,but case 2 did.
When I tested at the same concurrency of 5000, the proxy with lua script got stuck(case 2),but case 1 (with no lua script) worked good.

And I tried the test with no proxy,5000 test clients conneted to mysql server directly,It was OK.

This case can repeat at up to 5000 client's concurrency.

I guess if this would happened when there is too much event generate by the proxy with lua script,and the proxy cannot handle them in time,so get  stuck?
[16 Jan 2012 18:44] Sveta Smirnova
Thank you for the feedback.

Got it. Please also send command you use to start MySQL Proxy, so I can imitate your test as best as possible.
[16 Jan 2012 18:49] zhongliang kang
OK. Thank you for your help :)  
the start command is below:

./mysql-proxy  --defaults-file=/etc/proxy_my.cnf

#cat /etc/proxy_my.cnf |grep -v "#"
[mysql-proxy]
log-file = /usr/local/mysql-proxy/log/mysql-proxy.log
log-level = debug
admin-address=0.0.0.0:4401
admin-username=proxy 
admin-password=proxy
admin-lua-script=/usr/local/mysql-proxy/lib/mysql-proxy/lua/admin.lua

proxy-address=0.0.0.0:4406
proxy-backend-addresses=10.164.5.111:3306

basedir=/usr/local/mysql-proxy

daemon = true

event-threads = 5
[25 Jan 2012 19:31] Sveta Smirnova
Thank you for the feedback.

Verified, although in my case when threads stuck depends from quantity of parallel threads and not only lua script. Also without lua script I only need 2750 threads to notice "stuck" while with lua script I need more.

Your Lua code contained error, I used following:

proxy.global.master_index = proxy.global.master_index or 1
function read_query(packet)
                proxy.connection.backend_ndx = proxy.global.master_index
end
[25 Jan 2012 19:32] Sveta Smirnova
program which creates threads, then sends query

Attachment: bug63908_1.cpp (text/x-c++src), 1.29 KiB.

[26 Jan 2012 3:04] zhongliang kang
Verified, although in my case when threads stuck depends from quantity of parallel
threads and not only lua script. Also without lua script I only need 2750 threads tonotice "stuck" while with lua script I need more.
---
Yes,I used mysqlslap,and when 5000 threads without lua, the proxy got stuck.
At first I didnot notice its a bug,so I ignored it :(

I'm sorry  about the lua script error,I'm a beginner to lua.

Thank you for your verified,and thank you for your work!

Hi Sveta :

One of my friends had fix the stuck problem by another way,we are not sure if that is ok,I'll ask him to share his changes later.