| Bug #63908 | mysql-proxy with LUA script get stuck when the clients' concurrency too much | ||
|---|---|---|---|
| Submitted: | 3 Jan 2012 4:28 | Modified: | 26 Jan 2012 3:04 |
| Reporter: | zhongliang kang | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Proxy | Severity: | S1 (Critical) |
| Version: | 0.8.2 | OS: | Linux (Suse10-2.6.16.60-0.21 ) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | mysql-proxy concurrency stuck | ||
[3 Jan 2012 4:44]
zhongliang kang
and more:
we try to add a test print at : chassis-event-thread.c
166 unsigned int m_pid=pthread_self();
167 if(received>0){
168 g_message(" %u now queue event num: %d\n",m_pid,received);
169 }
170 /* the pipe has one . per event, remove as many as we received */
and in the output of mysql-proxy.log ,we got the follow output:
my question is : if the proxy get stuck has some relation with the number of event in the queue?
-----------mysql-proxy.log----------------
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 4
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 1
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 4
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 46
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 15
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 17
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 15
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 14
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 16
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 16
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 17
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 17
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 15
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 75
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 40
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 95
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 32
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 41
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 29
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 79
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 73
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 11
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 73
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 68
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 19
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 113
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 46
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 7
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 6
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 54
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 49
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 37
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 153
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 22
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 47
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 20
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 23
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 39
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 126
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 28
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 31
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 118
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 73
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 96
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 3
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 68
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 83
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 108
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 20
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 153
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 120
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 24
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 68
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 45
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 68
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 55
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 78
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 49
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 114
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 74
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 69
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 68
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 218
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 54
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 43
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 124
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 132
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 112
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 217
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 71
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 64
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 120
2012-01-03 11:37:21: (message) 1082132800 now queue event num: 178
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 89
2012-01-03 11:37:21: (message) 1090525504 now queue event num: 213
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 64
2012-01-03 11:37:21: (message) 1107310912 now queue event num: 112
2012-01-03 11:37:21: (message) 3159565600 now queue event num: 225
2012-01-03 11:37:21: (message) 1098918208 now queue event num: 88
---------------stuck here------------------
[3 Jan 2012 10:45]
zhongliang kang
And one more question:
#event-threads
event-threads = 5
when the mysql-proxy get stuck, by strace all the mysql-proxy's threads,like bellow,All the threads are send the event,and dead here,that means no thread to receive the event?
---------
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> pstree -p 12807
mysql-proxy(12807)-+-{mysql-proxy}(12808)
|-{mysql-proxy}(12809)
|-{mysql-proxy}(12810)
`-{mysql-proxy}(12811)
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12807
Process 12807 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12807 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12808
Process 12808 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12808 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12809
Process 12809 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12809 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12810
Process 12810 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12810 detached
mysql@NODE_2:/usr/local/mysql-proxy-release-sky/bin> strace -p 12811
Process 12811 attached - interrupt to quit
sendto(4, ".", 1, 0, NULL, 0 <unfinished ...>
Process 12811 detached
------------
[14 Jan 2012 10:22]
Sveta Smirnova
Thank you for the report. You wrote: ----<q>---- when I tried to test mysql-proxy without lua script,when the clients' concurrency is 5000,each client got 1000 query, mysql-proxy can handled the task in 163 SEC, and the result is OK. But when I tried to test mysql-proxy with LUA script,the content as above,when the clients' concurrency is 5000(even 1000,in test), the mysql-proxy got stuck and didnot work any more. Then I repeated the test , the case repeated. ----</q>---- What is the difference between 2 cases? I see you mentioned same 5000 number of concurrent client. Also have you tried to connect to MySQL server without proxy? Can it survive such a workload?
[16 Jan 2012 6:16]
zhongliang kang
----<q>---- when I tried to test mysql-proxy without lua script,when the clients' concurrency is 5000,each client got 1000 query, mysql-proxy can handled the task in 163 SEC, and the result is OK. But when I tried to test mysql-proxy with LUA script,the content as above,when the clients' concurrency is 5000(even 1000,in test), the mysql-proxy got stuck and didnot work any more. Then I repeated the test , the case repeated. ----</q>---- What is the difference between 2 cases? I see you mentioned same 5000 number of concurrent client. Also have you tried to connect to MySQL server without proxy? Can it survive such a workload? ============= The difference is: case 1 run with no lua script,but case 2 did. When I tested at the same concurrency of 5000, the proxy with lua script got stuck(case 2),but case 1 (with no lua script) worked good. And I tried the test with no proxy,5000 test clients conneted to mysql server directly,It was OK. This case can repeat at up to 5000 client's concurrency. I guess if this would happened when there is too much event generate by the proxy with lua script,and the proxy cannot handle them in time,so get stuck?
[16 Jan 2012 18:44]
Sveta Smirnova
Thank you for the feedback. Got it. Please also send command you use to start MySQL Proxy, so I can imitate your test as best as possible.
[16 Jan 2012 18:49]
zhongliang kang
OK. Thank you for your help :) the start command is below: ./mysql-proxy --defaults-file=/etc/proxy_my.cnf #cat /etc/proxy_my.cnf |grep -v "#" [mysql-proxy] log-file = /usr/local/mysql-proxy/log/mysql-proxy.log log-level = debug admin-address=0.0.0.0:4401 admin-username=proxy admin-password=proxy admin-lua-script=/usr/local/mysql-proxy/lib/mysql-proxy/lua/admin.lua proxy-address=0.0.0.0:4406 proxy-backend-addresses=10.164.5.111:3306 basedir=/usr/local/mysql-proxy daemon = true event-threads = 5
[25 Jan 2012 19:31]
Sveta Smirnova
Thank you for the feedback.
Verified, although in my case when threads stuck depends from quantity of parallel threads and not only lua script. Also without lua script I only need 2750 threads to notice "stuck" while with lua script I need more.
Your Lua code contained error, I used following:
proxy.global.master_index = proxy.global.master_index or 1
function read_query(packet)
proxy.connection.backend_ndx = proxy.global.master_index
end
[25 Jan 2012 19:32]
Sveta Smirnova
program which creates threads, then sends query
Attachment: bug63908_1.cpp (text/x-c++src), 1.29 KiB.
[26 Jan 2012 3:04]
zhongliang kang
Verified, although in my case when threads stuck depends from quantity of parallel threads and not only lua script. Also without lua script I only need 2750 threads tonotice "stuck" while with lua script I need more. --- Yes,I used mysqlslap,and when 5000 threads without lua, the proxy got stuck. At first I didnot notice its a bug,so I ignored it :( I'm sorry about the lua script error,I'm a beginner to lua. Thank you for your verified,and thank you for your work! Hi Sveta : One of my friends had fix the stuck problem by another way,we are not sure if that is ok,I'll ask him to share his changes later.

Description: basic info: ./mysql-proxy -V mysql-proxy 0.8.2 chassis: mysql-proxy 0.8.2 glib2: 2.16.6 libevent: 1.4.13-stable LUA: Lua 5.1.4 package.path: /usr/local/mysql-proxy/lib/mysql-proxy/lua/?.lua; package.cpath: /usr/local/mysql-proxy/lib/mysql-proxy/lua/?.so; -- modules proxy: 0.8.2 ------------------------------------------------ testing server info: CPU: CPU frequency is 2.13 GHz * 4 (4 CPUs) RAM: 32G NIC: 1G ---------------basic info end----------------------------- ---------------lua script content------------ proxy.global.master_index = proxy.global.master_index or 1 function read_query(packet) proxy.connection.backend_ndx = proxy.global.master_index return ----------------------lua script end------------------ -----------------the case------------------------------- when I tried to test mysql-proxy without lua script,when the clients' concurrency is 5000,each client got 1000 query, mysql-proxy can handled the task in 163 SEC, and the result is OK. But when I tried to test mysql-proxy with LUA script,the content as above,when the clients' concurrency is 5000(even 1000,in test), the mysql-proxy got stuck and didnot work any more. Then I repeated the test , the case repeated. ------------------describe the case----------------------------- on the mysql-DB server,the phenomenon is like this: all threads are sleep. ..... | 115128 | sky | 10.164.9.218:49614 | test_took_sql | Sleep | 69 | | NULL | | 115129 | sky | 10.164.9.218:49615 | test_took_sql | Sleep | 69 | | NULL | | 115130 | sky | 10.164.9.218:49616 | test_took_sql | Sleep | 69 | | NULL | | 115131 | sky | 10.164.9.218:49617 | test_took_sql | Sleep | 69 | | NULL | | 115132 | sky | 10.164.9.218:49618 | test_took_sql | Sleep | 69 | | NULL | | 115133 | sky | 10.164.9.218:49619 | test_took_sql | Sleep | 69 | | NULL | +--------+------+--------------------+---------------+-------------+--------+----------------+------------------+ 5002 rows in set (0.01 sec) On the mysql-proxy server,the phenomenon is like below,all proxy threads stuck at 'sendto(4,",",1,0,NULL,...': mysql@NODE_1:/usr/local/mysql-proxy/bin> pstree -p 3437 mysql-proxy(3437)-+-{mysql-proxy}(3438) |-{mysql-proxy}(3439) |-{mysql-proxy}(3440) `-{mysql-proxy}(3441) mysql@NODE_1:/usr/local/mysql-proxy/bin> strace -p 3439 Process 3439 attached - interrupt to quit sendto(4, ".", 1, 0, NULL, 0 <unfinished ...> Process 3439 detached On the client server,the mysql-proxy server cannot connect again,when you input the password,no response received(4406 is the port mysql-proxy listen): mysql -h 10.164.9.218 -u sky -p -P 4406 Enter password: How to repeat: 1.start the mysql-proxy 2.get 5000 clients connect the mysql-proxy ,but donot run the query 3.all 5000 clients query at the same time 4.the case repeat Suggested fix: I and my friends found that the stuck's position mybe is here: "chassis-event-thread.c": in function void chassis_event_add(chassis *chas, struct event *ev) { send(chas->threads->event_notify_fds[1], C("."), 0); /* ping the event handler */ //!!!here,all thread were stuck here! and we test by add the follow test print verified it: 1.add some test message int m_pid=pthread_self(); g_message("%d,bf sync\n",m_pid); g_async_queue_push(chas->threads->event_queue, op); g_message("%d,send start...\n",m_pid); send(chas->threads->event_notify_fds[1], C("."), 0); /* ping the event handler */ g_message("%d,send end...\n",m_pid); 2.repeat the case,in the mysql-proxy.log ,when the proxy stuck,we got these info: 2012-01-03 11:37:21: (message) 1082132800,send start... 2012-01-03 11:37:21: (message) -1135401696,send end... 2012-01-03 11:37:21: (message) 1107310912,bf sync 2012-01-03 11:37:21: (message) -1135401696,bf sync 2012-01-03 11:37:21: (message) 1098918208,send end... 2012-01-03 11:37:21: (message) 1090525504,send start... 2012-01-03 11:37:21: (message) 1098918208,bf sync 2012-01-03 11:37:21: (message) 1098918208,send start... 2012-01-03 11:37:21: (message) 1107310912,send start... 2012-01-03 11:37:21: (message) -1135401696,send start... there just many " send start " with no "send end" in the log. and the mysql-proxy start info is like below: 2012-01-03 11:57:53: (message) mysql-proxy 0.8.2 started 2012-01-03 11:57:53: (debug) max open file-descriptors = 102400 2012-01-03 11:57:53: (message) proxy listening on port 0.0.0.0:4406 2012-01-03 11:57:53: (message) added read/write backend: 10.164.5.111:3306 2012-01-03 11:57:53: (message) added read/write backend: 10.164.5.108:3306 2012-01-03 11:57:53: (message) chassis-event-thread.c:383: starting 4 threads ------------------------- so maybe there is something wrong in the notification of the mysql-proxy. /// send(chas->threads->event_notify_fds[1], C("."), 0); ///