Bug #68067 The vio_read_buff (libmysqlclient.so.18) will block in multi-thread program
Submitted: 10 Jan 2013 9:32 Modified: 8 Jan 2014 8:46
Reporter: Nan Xiao Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S4 (Feature request)
Version: OS:Solaris
Assigned to: CPU Architecture:Any
Tags: block, unixODBC, vio_read

[10 Jan 2013 9:32] Nan Xiao
Description:
Hi, all:

    My multi-thread Client application uses unixODBC Connection pool to access MySQL Server, and found in some cases: when MySQL Server doesn't send response(for example, The MySQL server's disk is full), the thread will block for ever.

    OS: Solaris 10
    The unixodbc version: 2.3.10
    The MySQL Connector/ODBC version: 5.1.10
    libmysqlclient version: libmysqlclient.so.18

    Some threads' call stack:
   -----------------  lwp# 2 / thread# 2  --------------------
 ff0cc400 read     (1f, 713640, 4000)
 fe9b5130 vio_read_buff (0, 734550, 4, 0, 734550, fe9b50d8) + 58
 fe9a7190 _Z12my_real_readP6st_netPj (724d80, fedfa954, 734555, 734555, 0, ffffffff) + 70
 fe9a7564 my_net_read (724d80, 0, 0, 0, 0, 0) + 168
 fe9a1644 cli_safe_read (724d80, e, 0, 0, 0, 0) + 60
 fe9a3b8c cli_advanced_command (724d80, e, 0, 0, 0, 0) + 1b8
 fe99a7f0 mysql_ping (724d80, 0, 0, 0, 0, 4b9) + 28
 fecbf21c MySQLGetConnectAttr (724d78, 4b9, fedfabc8, fedfacc4, 0, 0) + 1b4
 fec9d66c SQLGetConnectAttrImpl (724d78, 4b9, fedfacc4, 0, 0, ff1f3798) + 38
 fec9d614 SQLGetConnectAttr (724d78, 4b9, fedfacc4, 0, 0, 2) + 2c
 ff1911fc search_for_pool (723578, eb7e8, fffffffd, 1000, 72a5d8, 717b40) + 4d8
 ff192054 SQLConnect (723578, eb7e8, fffffffd, eb828, fffffffd, fffffffd) + 744
 000a3bc0 db_connect (c5c80, eb7e8, fedfbf80, fffffff8, fedfbf80, eb7e8) + 260
 000a2ccc db_monitor_thread (633388, fedfc000, 0, 0, a2bc0, 0) + 10c
 ff0c8a20 _lwp_start (0, 0, 0, 0, 0, 0)
......
-----------------  lwp# 9 / thread# 9  --------------------
 ff0cc400 read     (6, 6ec1a0, 4000)
 fe9b5130 vio_read_buff (0, 6bcae0, 4, fffffffb, 6bcae0, fe9b50d8) + 58
 fe9a7190 _Z12my_real_readP6st_netPj (6cffb0, fe17a8b4, 93, 0, 0, ffffffff) + 70
 fe9a7564 my_net_read (6cffb0, 3, 0, 0, 650e70, 93) + 168
 fe9a1644 cli_safe_read (6cffb0, 3, 0, 0, 650e70, 93) + 60
 fe9a20c8 cli_read_query_result (6cffb0, 650e70, 93, 1c00, ff0da310, fe9a20b8) + 10
 fe9a234c mysql_real_query (6cffb0, 650e70, 93, 0, 6d06d8, fecd3578) + 2c
 fecb54e4 do_query (6e7e10, 650e70, 0, 0, 0, 0) + 1f0
 fecb8504 my_SQLExecute (6e7e10, fe17aef8, fffffffd, 0, 4800, 4b5b) + 950
 fec9d194 SQLExecDirect (6e7e10, fe17aef8, fffffffd, 6ec114, 0, 1) + 60
 ff197f94 SQLExecDirect (6c0418, fe17aef8, fffffffd, 1e, fee23a00, fe17ae28) + 380
 000a468c db_execute_for_query (bea68, fe17ae28, fe17aee0, 6b8078, 1e, 4) + 408
 0007e470 select_record_from_db (bea68, 1, fe17aef8, 6b8078, 1e, fe17aee0) + 21c
 00081320 ss7_db_thread_process_get_retry_msg_from_db_event (663688, 0, 46, bea68, bea78, 0) + 414
 00080e2c thread_ss7_process_db (633540, fe17c000, 0, 0, 80cf8, 0) + 134
 ff0c8a20 _lwp_start (0, 0, 0, 0, 0, 0)
......
-----------------  lwp# 23 / thread# 23  --------------------
 ff0cc400 read     (1e, 71b480, 4000)
 fe9b5130 vio_read_buff (0, 7188a8, 4, 29, 7188a8, fe9b50d8) + 58
 fe9a7190 _Z12my_real_readP6st_netPj (71ad20, fd37a5b4, 186, 0, 0, ffffffff) + 70
 fe9a7564 my_net_read (71ad20, 3, 0, 0, 6ae958, 186) + 168
 fe9a1644 cli_safe_read (71ad20, 3, 0, 0, 6ae958, 186) + 60
 fe9a20c8 cli_read_query_result (71ad20, 6ae958, 186, 1c00, ff0da310, fe9a20b8) + 10
 fe9a234c mysql_real_query (71ad20, 6ae958, 186, 0, 71b448, fecd3578) + 2c
 fecb54e4 do_query (725b70, 6ae958, 0, 0, 0, 725b70) + 1f0
 fecb8504 my_SQLExecute (725b70, fd37abb0, fffffffd, 0, 4800, 4b5b) + 950
 fec9d194 SQLExecDirect (725b70, fd37abb0, fffffffd, 0, 0, 1) + 60
 ff197f94 SQLExecDirect (725588, fd37abb0, fffffffd, 0, fee2aa00, fd37aa98) + 380
 000a41e4 db_execute_for_modify (720e88, fd37abb0, c20b0, 62d028, be138, ffffee58) + 2e8
 0008ab9c store_ss7_msg_in_database (720e88, fd37be50, 28, 720e88, b7d58, 4) + a98
 0007d4c8 store_queue_message_in_database (720e88, fd37be50, 721ab4, 721a8c, 64bcf8, fd37be50) + cc
 00079ff4 normal_submit_deliver_logic_for_ip_to_ss7_storeforward (720dc0, 6cfe97, 0, 0, 6cfe97, 0) + 18c
 00044920 action_ii_on_receive_tcp_submit_ind (720dc0, 6aa190, 17, 0, 6aa190, 720dc0) + 6fc
 0003d86c thread_system_worker (633770, fd37c000, 0, 0, 3d6a4, 0) + 1c8
 ff0c8a20 _lwp_start (0, 0, 0, 0, 0, 0)
......
-----------------  lwp# 33 / thread# 33  --------------------
 ff0cc400 read     (1b, 6aa630, 4000)
 fe9b5130 vio_read_buff (0, 6fa078, 4, 29, 6fa078, fe9b50d8) + 58
 fe9a7190 _Z12my_real_readP6st_netPj (64fdb0, fc97a5b4, 186, 0, 0, ffffffff) + 70
 fe9a7564 my_net_read (64fdb0, 3, 0, 0, 651200, 186) + 168
 fe9a1644 cli_safe_read (64fdb0, 3, 0, 0, 651200, 186) + 60
 fe9a20c8 cli_read_query_result (64fdb0, 651200, 186, 1c00, ff0da310, fe9a20b8) + 10
 fe9a234c mysql_real_query (64fdb0, 651200, 186, 0, 6504d8, fecd3578) + 2c
 fecb54e4 do_query (6c2b00, 651200, 0, 0, 0, 6c2b00) + 1f0
 fecb8504 my_SQLExecute (6c2b00, fc97abb0, fffffffd, 0, 4800, 4b5b) + 950
 fec9d194 SQLExecDirect (6c2b00, fc97abb0, fffffffd, 0, 0, 1) + 60
 ff197f94 SQLExecDirect (6fc088, fc97abb0, fffffffd, 0, fee2fa00, fc97aa98) + 380
 000a41e4 db_execute_for_modify (6e5f98, fc97abb0, c20b0, 62d028, be138, ffffee58) + 2e8
 0008ab9c store_ss7_msg_in_database (6e5f98, fc97be50, 28, 6e5f98, b7d58, 4) + a98
 0007d4c8 store_queue_message_in_database (6e5f98, fc97be50, 6e6bc4, 6e6b9c, 632a28, fc97be50) + cc
 00079ff4 normal_submit_deliver_logic_for_ip_to_ss7_storeforward (6e5ed0, 6bb0d7, 0, 0, 6bb0d7, 0) + 18c
 00044920 action_ii_on_receive_tcp_submit_ind (6e5ed0, 649d78, 17, 0, 649d78, 6e5ed0) + 6fc
 0003d86c thread_system_worker (633900, fc97c000, 0, 0, 3d6a4, 0) + 1c8
 ff0c8a20 _lwp_start (0, 0, 0, 0, 0, 0)

How to repeat:
When MySQL server's disk is full, the client thread will block for ever.
[11 Jan 2013 3:09] Bogdan Degtyariov
Are you using the release version of MySQL ODBC driver or you built it yourself?
Please note that under no circumstances the driver should use libmysqlclient.so.18 because the ODBC driver is statically linked against the client library. This means that the client library code is embedded in the driver binary file and ldd should not show this dependance at all.
[11 Jan 2013 3:26] Bogdan Degtyariov
You can also try setting READTIMEOUT=N where N is the number of seconds to wait before returning the timeout error.
This option can be set either in the connection string or in odbc.ini file in the appropriate data source section.
[11 Jan 2013 3:37] Nan Xiao
Hi, Bogdan:
   Thanks very much for your reply!
   I will test READTIMEOUT=N method.
Best Regards
Nan Xiao
[11 Jan 2013 6:06] Nan Xiao
Hi, Bogdan:
    I add the following set for my DSN in odbc.ini:
   [DSN_Name]
    ......
    readtimeout = 10
    writetimeout = 10
    But this seems doen't work, the thread is still blocking. So Is there any other reason? Thanks very much in advance!

Best Regards
Nan Xiao
[15 Jan 2013 7:31] Bogdan Degtyariov
Can you please answer my question about libmysqlclient.so.18 dependance?
[15 Jan 2013 7:50] Nan Xiao
Hi, Bogdan:
    (1) Q: Are you using the release version of MySQL ODBC driver or you built it yourself?
         A: I build it myself.

    I configure driver in odbcinst.ini, and odbcinst.ini likes this:
   root@192.168.23.236 # cat odbcinst.ini 
[ODBC]
Trace=No
Trace File=/tmp/sql.log
Pooling=Yes

[MySQL]
Description=MySQL test database
Driver=/data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so
UsageCount=2
FileUsage=1
DontDLClose=1
CPTimeout=30
option=3145731

Then ldd /data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so, it outputs:
root@192.168.23.236 # ldd /data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so
        libodbc.so.2 =>  /usr/lib/libodbc.so.2
        libodbcinst.so.2 =>      /usr/lib/libodbcinst.so.2
        libmysqlclient.so.18 =>  /usr/local/sql/lib/libmysqlclient.so.18
        libthread.so.1 =>        /usr/lib/libthread.so.1
        libm.so.2 =>     /usr/lib/libm.so.2
        libgcc_s.so.1 =>         /usr/lib/libgcc_s.so.1
        libltdl.so.7 =>  /usr/local/lib/libltdl.so.7
        libiconv.so.2 =>         /usr/local/lib/libiconv.so.2
        libc.so.1 =>     /usr/lib/libc.so.1
        libz.so =>       /usr/lib/libz.so
        libnsl.so.1 =>   /usr/lib/libnsl.so.1
        librt.so.1 =>    /usr/lib/librt.so.1
        libsocket.so.1 =>        /usr/lib/libsocket.so.1
        libmp.so.2 =>    /usr/lib/libmp.so.2
        libmd.so.1 =>    /usr/lib/libmd.so.1
        libscf.so.1 =>   /usr/lib/libscf.so.1
        libaio.so.1 =>   /usr/lib/libaio.so.1
        libdoor.so.1 =>  /usr/lib/libdoor.so.1
        libuutil.so.1 =>         /usr/lib/libuutil.so.1
        libgen.so.1 =>   /usr/lib/libgen.so.1
        /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-V240/lib/libmd_psr.so.1

It displays /usr/local/sql/lib/libmysqlclient.so.18 is linked.

Best Regards
Nan Xiao
[15 Jan 2013 8:50] Bogdan Degtyariov
Thanks for your reply Nan.
As I already mentioned before linking against libmysqlclient.so.18 is not the right way for MySQL driver. This might be the reason why you are getting the application thread blocked.

On Connector/ODBC download page you can get different packages for Solaris on x86 or Sparc platforms. Please try one of them and tell me if the problem is resolved.
[15 Jan 2013 9:57] Nan Xiao
Hi, Bogdan:
   Thanks very much for your reply!
   From http://www.mysql.com/downloads/connector/odbc/#downloads page, I find there is no package for sparc 32bit platform. My application is 32bit program. So I think I also need to compile it myself. Is it right?

Best Regards
Nan Xiao
[16 Jan 2013 8:50] Bogdan Degtyariov
Hello Nan,

Unfortunately, we do not ship 32-bit SPARC driver because we MySQL Server is not built for 32-bit SPARC. So, I guess you have to build both the mysql client library and the ODBC driver.

In order to link  the ODBC driver properly against the static library libmysqlclient_r.a you need to remove libmysqlclient.so libraries.
This is the default behavior of ld, which first looks for the shared library and if it fails then links the static library.

Note that we do not recommend linking 5.1 driver against libmysqlclient older than 5.5.

Please try to re-build the drive rand let me know how it worked for you.
[16 Jan 2013 9:00] Nan Xiao
Hi, Bogdan:
    Thanks very much for your patient reply, and you are a nice man!
    I will try your proposal ASAP, and reply to you.

Best Regards
Nan Xiao
[17 Jan 2013 5:29] Nan Xiao
Hi, Bogdan:
    I re-build a MySQL client library, which only produced static library. But when I try to build ODBC driver, there are a lot of link errors.
    The steps of building ODBC driver:
    (1) cmake -G "Unix Makefiles" -DWITH_UNIXODBC=1 -DCMAKE_BUILD_TYPE=Debug;
    (2) make;
     The result is:
     ......
     __umoddi3                           0xb90       /usr/local/asg_mysql/lib/libmysqlclient_r.a(ctype-ucs2.c.o)
__umoddi3                           0x1130      /usr/local/asg_mysql/lib/libmysqlclient_r.a(ctype-ucs2.c.o)
ld: fatal: relocations remain against allocatable but non-writable sections
collect2: ld returned 1 exit status
make[3]: *** [lib/libmyodbc5.so] Error 1
make[2]: *** [driver/CMakeFiles/myodbc5.dir/all] Error 2
make[1]: *** [driver/CMakeFiles/myodbc5.dir/rule] Error 2
make: *** [myodbc5] Error 2

     It seems ODBC driver can't link static library correctly.
     Could you please check it? Thanks very much in advance!

Best Regards
Nan Xiao
[18 Jan 2013 6:39] Bogdan Degtyariov
Hi Nan, 

It seems you have encountered the Solaris-specific problem with ld relocations.
Please run <mysql_server_dir>/bin/mysql_config --cflags and check whether it prints -fPIC among other compiler flags.

If it is not present you need to rebuild MySQL server with -fPIC.
[21 Jan 2013 3:09] Nan Xiao
Hi, Bogdan:

    With your great help, I built a odbc driver with static client library.
    But with my test, the thread is still blocking. So could you provide some other debug methods (for example, analyze some logs) to help analyze this issue? Thanks in advace!

Best Regards
Nan Xiao
[21 Jan 2013 8:18] Bogdan Degtyariov
Nan, can you please give me the stack trace where the block occurs like you did in the initial description? Does it look the same?
[21 Jan 2013 10:56] Nan Xiao
Hi, Bogdan:

   Yes, it seems like the original stack, the threads are blocking in read() function call. Thanks!

Best Regards
Nan Xiao
[22 Jan 2013 3:15] Nan Xiao
Hi, Bogdan:

    After checking MySQL 5.5.21 source code, I found the following clues for this issue:

    In my_net_set_read_timeout function:
    {
      ......
      #ifdef NO_ALARM
      if (net->vio)
          vio_timeout(net->vio, 0, timeout);
      #endif
      ......
    }
    Because I only build MySQL client, so I think the NO_ALARM is defined.

    In vio_timeout function:
    {
     #if defined(SO_SNDTIMEO) && defined(SO_RCVTIMEO)
     ......
     #else
     /*
     Platforms not suporting setting of socket timeout should either use
     thr_alarm or just run without read/write timeout(s)
     */
     #endif
    }

    Because SO_RCVTIMEO and SO_SNDTIMEO socket options don't work in Solaris, I think this is the root cause.

    Is this right? BTW, what's the differences between linking ODBC driver with static client library and dynamic client libaray? Thanks in advance!

Best Regards
Nan Xiao
[23 Jan 2013 6:42] Bogdan Degtyariov
Hi Nan,

You are right, mysql client library code sets the socket timeout using SO_RCVTIMEO option, which is not supported in Solaris. I checked for possible workarounds in the Internet, but all of them are not reliable enough. Looks like it has to be fixed in Solaris itself.

About your question with static linking. It matters a very great deal which way of linking you choose for ODBC driver because the driver cannot use just any MySQL client library. For instance MySQL ODBC Driver 5.1 requires MySQL Client library 5.5. The static linking makes sure that the required client library code is embedded into the ODBC driver. In case of the dynamic linking using the wrong version of libmysqlclient.so could cause bad side effects.

I am afraid there is nothing we can do about this bug, so setting the status "not a bug". Please let me know if you have any other ideas or questions.
[23 Jan 2013 8:41] Nan Xiao
Hi, Bogdan:

    Thanks very much for your kind reply!
    Personally, I think writetimeout is optional, but readtimeout is must. Because lack of readtimeout will cause read() block for ever.
    I think we can use special processing for Soalris OS in MySQL code:
    (1) Modify struct st_vio (Violite.h):
        struct st_vio
        {
           ......
	   /* Add the following members */
           #ifdef __sun 
           struct timeval read_timeout;
           struct timeval write_timeout;
           #endif
        }

    (2) Modify vio_read function (using select mechanism to check read timeout):
        size_t vio_read(Vio * vio, uchar* buf, size_t size)
	{
	  size_t r;
	#ifdef __sun
	  fd_set readfds;
	#endif
	  DBUG_ENTER("vio_read");
	  DBUG_PRINT("enter", ("sd: %d  buf: 0x%lx  size: %u", vio->sd, (long) buf,
	                       (uint) size));
	
	  /* Ensure nobody uses vio_read_buff and vio_read simultaneously */
	  DBUG_ASSERT(vio->read_end == vio->read_pos);
	#ifdef __WIN__
	  r = recv(vio->sd, buf, size,0);
	#else
	  errno=0;					/* For linux */
	#ifdef __sun
	  FD_ZERO(&readfds);
	  FD_SET(vio->sd,&readfds);
	  r = select(vio->sd+1,&readfds,NULL,NULL,&vio->read_timeout);
	  if (r > 0)
	  {
	    r = read(vio->sd, buf, size);
	  }
	#else
	  r = read(vio->sd, buf, size);
	#endif
	#endif /* __WIN__ */
	#ifndef DBUG_OFF
	  if (r == (size_t) -1)
	  {
	    DBUG_PRINT("vio_error", ("Got error %d during read",errno));
	  }
	#endif /* DBUG_OFF */
	  DBUG_PRINT("exit", ("%ld", (long) r));
	  DBUG_RETURN(r);
	}
	
      (3) Modify vio_timeout function (store timeout value in read_timeout or write_timeout):
        void vio_timeout(Vio *vio, uint which, uint timeout)
	{
	#if defined(SO_SNDTIMEO) && defined(SO_RCVTIMEO)
	  int r;
	  DBUG_ENTER("vio_timeout");
	
	  {
	#ifdef __WIN__
	  /* Windows expects time in milliseconds as int */
	  int wait_timeout= (int) timeout * 1000;
	#else
	  /* POSIX specifies time as struct timeval. */
	  struct timeval wait_timeout;
	  wait_timeout.tv_sec= timeout;
	  wait_timeout.tv_usec= 0;
	#endif
	#ifdef __sun
	  if (which)
	  {
	    vio->write_timeout = wait_timeout;
	  }
	  else
	  {
	    vio->read_timeout = wait_timeout;
	  }
	  r = 0;
	#else
	  r= setsockopt(vio->sd, SOL_SOCKET, which ? SO_SNDTIMEO : SO_RCVTIMEO,
	                IF_WIN((const char*), (const void*))&wait_timeout,
	                sizeof(wait_timeout));
	#endif
	  }
	
	  if (r != 0)
	    DBUG_PRINT("error", ("setsockopt failed: %d, errno: %d", r, socket_errno));
	  DBUG_VOID_RETURN;
	#else
	/*
	  Platforms not suporting setting of socket timeout should either use
	  thr_alarm or just run without read/write timeout(s)
	*/
	#endif
	}
	
Best Regards
Nan Xiao
[23 Jan 2013 9:59] Bogdan Degtyariov
Nan, 

thank you for your deep analysis of the problem and for proposing the patch.

Unfortunately, I see a problem there: putting select() into a heavily-used function such as vio_read() might seriously affect the performance. 
select() handles file descriptors one by one, so the more descriptors it checks the slower it gets. 

Creating such bottle neck inside a performance-critical piece of code is not something our developers can agree with.
[23 Jan 2013 10:19] Nan Xiao
Hi, Bogdan:

    Thanks for your reply!
    About the bottle neck issue, I have the following concerns:
    (1) The select mechaism only works in Solaris, no impact on other OSs;
    (2) There is only one socket need to be checked, so I think the performance is OK.
    (3) If not use select(), other mechanism (/dev/poll, kqueue) is OK?
    
Best Regards
Nan Xiao
[23 Jan 2013 11:03] Bogdan Degtyariov
Nan,
Here is a good article that compares select and poll approaches:
"... poll and select are basically the same speed-wise: slow... "
http://daniel.haxx.se/docs/poll-vs-select.html

Anyway, this report category has to be changed from "MySQL Connector/ODBC" to "MySQL Server C API". It will be taken by an appropriate person from the server development team.
[24 Jan 2013 9:27] Nan Xiao
Hi, Bogdan:

    Thanks very much for your reply!
    Is this issue will be automatically assigned to appropriate engineer? Or need I submit a new issue? Thanks very much!

Best Regards
Nan Xiao
[29 Jan 2013 4:57] Bogdan Degtyariov
Nan, 

this bug is verified, so I have copied it into our internal bug system where it gets assigned to a team leader who decides about the proper engineer to take care of this bug. The public report will be updated when the patch is ready.
Thanks for your patience.
[29 Jan 2013 5:06] Nan Xiao
Hi, Bogdan:

    Thanks very much for your reply and help!
    You can close the issue now.

Best Regards
Nan Xiao
[7 Mar 2013 13:42] Rafal Somla
Our current implementation of read timeouts uses platform specific mechanisms and can not work on some platforms such as Solaris. In that case OS read call on socket blocks and this is what was observed in this bug report.

We treat it as a feature request to extend implementation of read timeouts so that it covers other platforms too.
[26 Mar 2013 21:42] Paul DuBois
Added this to the discussion of MYSQL_OPT_READ_TIMEOUT in
http://dev.mysql.com/doc/refman/5.5/en/mysql-options.html:

"
Implementation of this timeout uses mechanisms that may not be 
available on all platforms. On such a platform, a client that issues
a read call might under certain circumstances might wait without 
timing out. For example, a client might no time out if the server is
not responding because it is waiting for a "disk full" condition to
clear. The timeout is known not to work on Solaris. 
"
[8 Jan 2014 8:46] Erlend Dahl
Setting to "verified" since we will discontinue the use of "to be fixed
later".