Bug #45274 Proxy hangs connections during Service Manager restart
Submitted: 2 Jun 2009 19:27 Modified: 8 Jun 2009 19:32
Reporter: Diego Medina Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S2 (Serious)
Version:2.1.0.1048 OS:Any
Assigned to: Jan Kneschke CPU Architecture:Any

[2 Jun 2009 19:27] Diego Medina
Description:
While the agent tries to get the quanconfig information from the Service Manager, it blocks all new connections through the proxy port.

This situation can happen when you restart the Service Manager.

How to repeat:
1- Install and start the agent and service manager (you need to monitor a mysql server that is *not* the mysql server that comes with the service manager (otherwise you would need a few extra steps)
2- On a new terminal, run this:

for i in `seq 1 100000`; do echo "Connection number:  $i";          mysql -h127.0.0.1 -P4040 -uusername -pwrongpassword    ; done;

(if you use the right password, you need to add -e "exit;" to the mysql command )

3- On another terminal restart the service manager
4- Go back to the terminal where you are trying all the new connections
5- Notice that at some point, the counter will not increase, this is when the proxy locks all new connections.
[2 Jun 2009 20:02] Diego Medina
for mac users who do not have seq, you can use

$for i in {1 100000} ; 
do echo "Connection number:  $i";
   mysql -h127.0.0.1 -P4040 -uusername -pwrongpassword;
done;
[2 Jun 2009 20:31] Sloan Childers
per bug council, setting priority
[3 Jun 2009 14:54] Jan Kneschke
(gdb) thread apply all bt

Thread 5 (process 22594 thread 0x2703):
#0  0x93a336fa in select$DARWIN_EXTSN ()
#1  0x004cab08 in Curl_socket_ready ()
#2  0x004c1b1a in Curl_perform ()
#3  0x006bad07 in luacurl_easy_perform (L=0x307e80) at lua-curl-0.3.0/lua-curl.c:579
#4  0x00037263 in luaD_precall ()
#5  0x0004278e in luaV_execute ()
#6  0x000376a0 in luaD_call ()
#7  0x00033101 in f_call ()
#8  0x00036b7b in luaD_rawrunprotected ()
#9  0x000379c2 in luaD_pcall ()
#10 0x00033175 in lua_pcall ()
#11 0x00408927 in agent_dc_lua_update_values_iter (_key=0x33cb60, _value=0x33b940, _conf=0x307e80) at job_collect_lua.c:322
#12 0x001148c6 in g_hash_table_foreach () at gstring.h:153
#13 0x00408deb in agent_dc_lua_update_values (class=0x0, _userdata=0x0) at job_collect_lua.c:367
#14 0x00403d46 in job_collect_get_value (ns=0x30e800, target=0x37d4d0, conf=0x32b6b0, data=0x3d18b0) at job_collect.c:81
#15 0x0040a9e0 in job_collect_lua_thread (_thr=0x0) at job_collect_lua.c:1015
#16 0x0015198e in g_thread_create_proxy ()
#17 0x93a15155 in _pthread_start ()
#18 0x93a15012 in thread_start ()

Thread 2 (process 22594 thread 0x1103):
#0  0x93a336fa in select$DARWIN_EXTSN ()
#1  0x004cab08 in Curl_socket_ready ()
#2  0x004c1b1a in Curl_perform ()
#3  0x00414e96 in network_io_send_curl (request_content=0x3b09f0, io_config=0x324ec0) at network-io.c:192
#4  0x00416b7e in network_io_thread (_thr=0x0) at network-io.c:999
#5  0x0015198e in g_thread_create_proxy ()
#6  0x93a15155 in _pthread_start ()
#7  0x93a15012 in thread_start ()

Thread 1 (process 22594 local thread 0x2d03):
#0  0x939e42ce in semaphore_wait_signal_trap ()
#1  0x939ebda5 in pthread_mutex_lock ()
#2  0x0002f2f0 in lua_scope_get (sc=0x307d70, pos=0x6b765 "network-mysqld.c:688") at lua-scope.c:93
#3  0x0005d5c0 in plugin_call (srv=0x307d20, con=0x33bc90, state=10) at network-mysqld.c:688
#4  0x0005e313 in network_mysqld_con_handle (event_fd=8, events=2, user_data=0x33bc90) at network-mysqld.c:1174
#5  0x0007c013 in event_base_loop ()
#6  0x0007c2d9 in event_base_dispatch ()
#7  0x000314ee in chassis_mainloop (_chas=0x307d20) at chassis-mainloop.c:306
#8  0x0000323f in main_cmdline (argc=1, argv=0xbffff684) at chassis.c:1122
#9  0x00001a66 in start ()
[3 Jun 2009 15:19] Jan Kneschke
The above stack trace shows the locking problem if the MEM server runs its repo-queries through the QUAN proxy:

* Thread 5 is the lua collector which uses curl to get the quanconfig from MEM.
* Thread 2 is the heartbeat thread trying to send data to MEM.
* Thread 1 is the proxy thread waiting to the lua lock from Thread 5

All the lua execution is single-threaded. Thread 5 has the lock, thread 1 wants to have it before it can forward the query to the DB. 

Thread 5 only releases the lua lock MEM has sent back data, but MEM waits on Thread 1 has returned data from MySQL. 

2 ways to fix it: 
1) make the lua execution multi-threaded (see MySQL Proxy 0.9)
2) run the curl request in C-land and only grab the lua lock, when we want to update the data.
[8 Jun 2009 14:00] Jan Kneschke
------------------------------------------------------------
revno: 1353
committer: jan@mysql.com
branch nick: trunk
timestamp: Mon 2009-06-08 14:16:14 +0200
message:
  fixed link/compile of the job_collect_mysql tests for quanconfig
------------------------------------------------------------
revno: 1352
committer: jan@mysql.com
branch nick: trunk
timestamp: Mon 2009-06-08 14:06:14 +0200
message:
  split the get-quanconfig into a c-based curl part and a lua based apply-config part
  
    * only take the LUA_LOCK if we really have something to apply to the lua context
    * moved the curl part out into the mysql collector
[8 Jun 2009 19:32] Diego Medina
Verified fixed on 2.1.0.1059