Bug #68067 | The vio_read_buff (libmysqlclient.so.18) will block in multi-thread program | ||
---|---|---|---|
Submitted: | 10 Jan 2013 9:32 | Modified: | 8 Jan 2014 8:46 |
Reporter: | Nan Xiao | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: C API (client library) | Severity: | S4 (Feature request) |
Version: | OS: | Solaris | |
Assigned to: | CPU Architecture: | Any | |
Tags: | block, unixODBC, vio_read |
[10 Jan 2013 9:32]
Nan Xiao
[11 Jan 2013 3:09]
Bogdan Degtyariov
Are you using the release version of MySQL ODBC driver or you built it yourself? Please note that under no circumstances the driver should use libmysqlclient.so.18 because the ODBC driver is statically linked against the client library. This means that the client library code is embedded in the driver binary file and ldd should not show this dependance at all.
[11 Jan 2013 3:26]
Bogdan Degtyariov
You can also try setting READTIMEOUT=N where N is the number of seconds to wait before returning the timeout error. This option can be set either in the connection string or in odbc.ini file in the appropriate data source section.
[11 Jan 2013 3:37]
Nan Xiao
Hi, Bogdan: Thanks very much for your reply! I will test READTIMEOUT=N method. Best Regards Nan Xiao
[11 Jan 2013 6:06]
Nan Xiao
Hi, Bogdan: I add the following set for my DSN in odbc.ini: [DSN_Name] ...... readtimeout = 10 writetimeout = 10 But this seems doen't work, the thread is still blocking. So Is there any other reason? Thanks very much in advance! Best Regards Nan Xiao
[15 Jan 2013 7:31]
Bogdan Degtyariov
Can you please answer my question about libmysqlclient.so.18 dependance?
[15 Jan 2013 7:50]
Nan Xiao
Hi, Bogdan: (1) Q: Are you using the release version of MySQL ODBC driver or you built it yourself? A: I build it myself. I configure driver in odbcinst.ini, and odbcinst.ini likes this: root@192.168.23.236 # cat odbcinst.ini [ODBC] Trace=No Trace File=/tmp/sql.log Pooling=Yes [MySQL] Description=MySQL test database Driver=/data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so UsageCount=2 FileUsage=1 DontDLClose=1 CPTimeout=30 option=3145731 Then ldd /data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so, it outputs: root@192.168.23.236 # ldd /data1/susie/tools/mysql-connector-odbc-5.1.10-src/lib/libmyodbc5.so libodbc.so.2 => /usr/lib/libodbc.so.2 libodbcinst.so.2 => /usr/lib/libodbcinst.so.2 libmysqlclient.so.18 => /usr/local/sql/lib/libmysqlclient.so.18 libthread.so.1 => /usr/lib/libthread.so.1 libm.so.2 => /usr/lib/libm.so.2 libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 libltdl.so.7 => /usr/local/lib/libltdl.so.7 libiconv.so.2 => /usr/local/lib/libiconv.so.2 libc.so.1 => /usr/lib/libc.so.1 libz.so => /usr/lib/libz.so libnsl.so.1 => /usr/lib/libnsl.so.1 librt.so.1 => /usr/lib/librt.so.1 libsocket.so.1 => /usr/lib/libsocket.so.1 libmp.so.2 => /usr/lib/libmp.so.2 libmd.so.1 => /usr/lib/libmd.so.1 libscf.so.1 => /usr/lib/libscf.so.1 libaio.so.1 => /usr/lib/libaio.so.1 libdoor.so.1 => /usr/lib/libdoor.so.1 libuutil.so.1 => /usr/lib/libuutil.so.1 libgen.so.1 => /usr/lib/libgen.so.1 /platform/SUNW,Sun-Fire-V240/lib/libc_psr.so.1 /platform/SUNW,Sun-Fire-V240/lib/libmd_psr.so.1 It displays /usr/local/sql/lib/libmysqlclient.so.18 is linked. Best Regards Nan Xiao
[15 Jan 2013 8:50]
Bogdan Degtyariov
Thanks for your reply Nan. As I already mentioned before linking against libmysqlclient.so.18 is not the right way for MySQL driver. This might be the reason why you are getting the application thread blocked. On Connector/ODBC download page you can get different packages for Solaris on x86 or Sparc platforms. Please try one of them and tell me if the problem is resolved.
[15 Jan 2013 9:57]
Nan Xiao
Hi, Bogdan: Thanks very much for your reply! From http://www.mysql.com/downloads/connector/odbc/#downloads page, I find there is no package for sparc 32bit platform. My application is 32bit program. So I think I also need to compile it myself. Is it right? Best Regards Nan Xiao
[16 Jan 2013 8:50]
Bogdan Degtyariov
Hello Nan, Unfortunately, we do not ship 32-bit SPARC driver because we MySQL Server is not built for 32-bit SPARC. So, I guess you have to build both the mysql client library and the ODBC driver. In order to link the ODBC driver properly against the static library libmysqlclient_r.a you need to remove libmysqlclient.so libraries. This is the default behavior of ld, which first looks for the shared library and if it fails then links the static library. Note that we do not recommend linking 5.1 driver against libmysqlclient older than 5.5. Please try to re-build the drive rand let me know how it worked for you.
[16 Jan 2013 9:00]
Nan Xiao
Hi, Bogdan: Thanks very much for your patient reply, and you are a nice man! I will try your proposal ASAP, and reply to you. Best Regards Nan Xiao
[17 Jan 2013 5:29]
Nan Xiao
Hi, Bogdan: I re-build a MySQL client library, which only produced static library. But when I try to build ODBC driver, there are a lot of link errors. The steps of building ODBC driver: (1) cmake -G "Unix Makefiles" -DWITH_UNIXODBC=1 -DCMAKE_BUILD_TYPE=Debug; (2) make; The result is: ...... __umoddi3 0xb90 /usr/local/asg_mysql/lib/libmysqlclient_r.a(ctype-ucs2.c.o) __umoddi3 0x1130 /usr/local/asg_mysql/lib/libmysqlclient_r.a(ctype-ucs2.c.o) ld: fatal: relocations remain against allocatable but non-writable sections collect2: ld returned 1 exit status make[3]: *** [lib/libmyodbc5.so] Error 1 make[2]: *** [driver/CMakeFiles/myodbc5.dir/all] Error 2 make[1]: *** [driver/CMakeFiles/myodbc5.dir/rule] Error 2 make: *** [myodbc5] Error 2 It seems ODBC driver can't link static library correctly. Could you please check it? Thanks very much in advance! Best Regards Nan Xiao
[18 Jan 2013 6:39]
Bogdan Degtyariov
Hi Nan, It seems you have encountered the Solaris-specific problem with ld relocations. Please run <mysql_server_dir>/bin/mysql_config --cflags and check whether it prints -fPIC among other compiler flags. If it is not present you need to rebuild MySQL server with -fPIC.
[21 Jan 2013 3:09]
Nan Xiao
Hi, Bogdan: With your great help, I built a odbc driver with static client library. But with my test, the thread is still blocking. So could you provide some other debug methods (for example, analyze some logs) to help analyze this issue? Thanks in advace! Best Regards Nan Xiao
[21 Jan 2013 8:18]
Bogdan Degtyariov
Nan, can you please give me the stack trace where the block occurs like you did in the initial description? Does it look the same?
[21 Jan 2013 10:56]
Nan Xiao
Hi, Bogdan: Yes, it seems like the original stack, the threads are blocking in read() function call. Thanks! Best Regards Nan Xiao
[22 Jan 2013 3:15]
Nan Xiao
Hi, Bogdan: After checking MySQL 5.5.21 source code, I found the following clues for this issue: In my_net_set_read_timeout function: { ...... #ifdef NO_ALARM if (net->vio) vio_timeout(net->vio, 0, timeout); #endif ...... } Because I only build MySQL client, so I think the NO_ALARM is defined. In vio_timeout function: { #if defined(SO_SNDTIMEO) && defined(SO_RCVTIMEO) ...... #else /* Platforms not suporting setting of socket timeout should either use thr_alarm or just run without read/write timeout(s) */ #endif } Because SO_RCVTIMEO and SO_SNDTIMEO socket options don't work in Solaris, I think this is the root cause. Is this right? BTW, what's the differences between linking ODBC driver with static client library and dynamic client libaray? Thanks in advance! Best Regards Nan Xiao
[23 Jan 2013 6:42]
Bogdan Degtyariov
Hi Nan, You are right, mysql client library code sets the socket timeout using SO_RCVTIMEO option, which is not supported in Solaris. I checked for possible workarounds in the Internet, but all of them are not reliable enough. Looks like it has to be fixed in Solaris itself. About your question with static linking. It matters a very great deal which way of linking you choose for ODBC driver because the driver cannot use just any MySQL client library. For instance MySQL ODBC Driver 5.1 requires MySQL Client library 5.5. The static linking makes sure that the required client library code is embedded into the ODBC driver. In case of the dynamic linking using the wrong version of libmysqlclient.so could cause bad side effects. I am afraid there is nothing we can do about this bug, so setting the status "not a bug". Please let me know if you have any other ideas or questions.
[23 Jan 2013 8:41]
Nan Xiao
Hi, Bogdan: Thanks very much for your kind reply! Personally, I think writetimeout is optional, but readtimeout is must. Because lack of readtimeout will cause read() block for ever. I think we can use special processing for Soalris OS in MySQL code: (1) Modify struct st_vio (Violite.h): struct st_vio { ...... /* Add the following members */ #ifdef __sun struct timeval read_timeout; struct timeval write_timeout; #endif } (2) Modify vio_read function (using select mechanism to check read timeout): size_t vio_read(Vio * vio, uchar* buf, size_t size) { size_t r; #ifdef __sun fd_set readfds; #endif DBUG_ENTER("vio_read"); DBUG_PRINT("enter", ("sd: %d buf: 0x%lx size: %u", vio->sd, (long) buf, (uint) size)); /* Ensure nobody uses vio_read_buff and vio_read simultaneously */ DBUG_ASSERT(vio->read_end == vio->read_pos); #ifdef __WIN__ r = recv(vio->sd, buf, size,0); #else errno=0; /* For linux */ #ifdef __sun FD_ZERO(&readfds); FD_SET(vio->sd,&readfds); r = select(vio->sd+1,&readfds,NULL,NULL,&vio->read_timeout); if (r > 0) { r = read(vio->sd, buf, size); } #else r = read(vio->sd, buf, size); #endif #endif /* __WIN__ */ #ifndef DBUG_OFF if (r == (size_t) -1) { DBUG_PRINT("vio_error", ("Got error %d during read",errno)); } #endif /* DBUG_OFF */ DBUG_PRINT("exit", ("%ld", (long) r)); DBUG_RETURN(r); } (3) Modify vio_timeout function (store timeout value in read_timeout or write_timeout): void vio_timeout(Vio *vio, uint which, uint timeout) { #if defined(SO_SNDTIMEO) && defined(SO_RCVTIMEO) int r; DBUG_ENTER("vio_timeout"); { #ifdef __WIN__ /* Windows expects time in milliseconds as int */ int wait_timeout= (int) timeout * 1000; #else /* POSIX specifies time as struct timeval. */ struct timeval wait_timeout; wait_timeout.tv_sec= timeout; wait_timeout.tv_usec= 0; #endif #ifdef __sun if (which) { vio->write_timeout = wait_timeout; } else { vio->read_timeout = wait_timeout; } r = 0; #else r= setsockopt(vio->sd, SOL_SOCKET, which ? SO_SNDTIMEO : SO_RCVTIMEO, IF_WIN((const char*), (const void*))&wait_timeout, sizeof(wait_timeout)); #endif } if (r != 0) DBUG_PRINT("error", ("setsockopt failed: %d, errno: %d", r, socket_errno)); DBUG_VOID_RETURN; #else /* Platforms not suporting setting of socket timeout should either use thr_alarm or just run without read/write timeout(s) */ #endif } Best Regards Nan Xiao
[23 Jan 2013 9:59]
Bogdan Degtyariov
Nan, thank you for your deep analysis of the problem and for proposing the patch. Unfortunately, I see a problem there: putting select() into a heavily-used function such as vio_read() might seriously affect the performance. select() handles file descriptors one by one, so the more descriptors it checks the slower it gets. Creating such bottle neck inside a performance-critical piece of code is not something our developers can agree with.
[23 Jan 2013 10:19]
Nan Xiao
Hi, Bogdan: Thanks for your reply! About the bottle neck issue, I have the following concerns: (1) The select mechaism only works in Solaris, no impact on other OSs; (2) There is only one socket need to be checked, so I think the performance is OK. (3) If not use select(), other mechanism (/dev/poll, kqueue) is OK? Best Regards Nan Xiao
[23 Jan 2013 11:03]
Bogdan Degtyariov
Nan, Here is a good article that compares select and poll approaches: "... poll and select are basically the same speed-wise: slow... " http://daniel.haxx.se/docs/poll-vs-select.html Anyway, this report category has to be changed from "MySQL Connector/ODBC" to "MySQL Server C API". It will be taken by an appropriate person from the server development team.
[24 Jan 2013 9:27]
Nan Xiao
Hi, Bogdan: Thanks very much for your reply! Is this issue will be automatically assigned to appropriate engineer? Or need I submit a new issue? Thanks very much! Best Regards Nan Xiao
[29 Jan 2013 4:57]
Bogdan Degtyariov
Nan, this bug is verified, so I have copied it into our internal bug system where it gets assigned to a team leader who decides about the proper engineer to take care of this bug. The public report will be updated when the patch is ready. Thanks for your patience.
[29 Jan 2013 5:06]
Nan Xiao
Hi, Bogdan: Thanks very much for your reply and help! You can close the issue now. Best Regards Nan Xiao
[7 Mar 2013 13:42]
Rafal Somla
Our current implementation of read timeouts uses platform specific mechanisms and can not work on some platforms such as Solaris. In that case OS read call on socket blocks and this is what was observed in this bug report. We treat it as a feature request to extend implementation of read timeouts so that it covers other platforms too.
[26 Mar 2013 21:42]
Paul DuBois
Added this to the discussion of MYSQL_OPT_READ_TIMEOUT in http://dev.mysql.com/doc/refman/5.5/en/mysql-options.html: " Implementation of this timeout uses mechanisms that may not be available on all platforms. On such a platform, a client that issues a read call might under certain circumstances might wait without timing out. For example, a client might no time out if the server is not responding because it is waiting for a "disk full" condition to clear. The timeout is known not to work on Solaris. "
[8 Jan 2014 8:46]
Erlend Dahl
Setting to "verified" since we will discontinue the use of "to be fixed later".