Bug #34176 | Connector/J hang retrieving large data set w/5.0.45 server | ||
---|---|---|---|
Submitted: | 30 Jan 2008 20:53 | Modified: | 3 Mar 2008 19:21 |
Reporter: | Jeremy Gilbert | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | 5.0.8 | OS: | Linux (Red Hat Enterprise WS 3 (Taroon Update 9)) |
Assigned to: | CPU Architecture: | Any | |
Tags: | Connector/J, hang, jdk1.6.0_02, ReadAheadInputStream.fill(4), Red Hat, Server 5.0.45, socketRead0() |
[30 Jan 2008 20:53]
Jeremy Gilbert
[30 Jan 2008 21:07]
Jeremy Gilbert
Client and server traces, client stack trace
Attachment: traces.tar.gz (application/x-gzip, text), 464.61 KiB.
[30 Jan 2008 21:10]
Jeremy Gilbert
I had to strip out some lines from the mysqld.trace to meet the limit of 500k on files included with the bug report. In doing so, some of the line counts from my original description are now incorrect. Here is the new analysis of the server trace with the correct line numbers ... In "mysqld.trace" starting at line 88632 the server begins processing the query. From there until line 95442 it seems to be analyzing the query and preparing to fetch data. From there until line 95539 it seems to be fetching rows. Then it seems to do some sorting things until line 96009. Then it looks like it starts buffering up the response and sending it in blocks of 16384 bytes (see lines 102265, 108525, 114702, 120996, and 127097. At this point it seems to have reached the maximum buffered capacity of the socket, because the write returns EAGAIN, indicating that it would block. It looks like the server enables blocking mode and then tries to write again. Then, at line 128255 this blocked write returns EINTR, although I don't know what the mechanism is here.
[30 Jan 2008 23:10]
Jeremy Gilbert
I have done more research, and I think this may be a Linux kernel bug, somehow. I switched back to MySQL server 4.1.9 standard, and tried the same query execting it to work, but it hung. So I rebooted with an older version of the kernel (2.4.21-52.ELsmp) and tried again with MySQL server 4.1.9 and it worked without error. Now it gets stranger and more disturbing. I then rebooted again with the latest kernel (2.4.21-53.ELsmp) and tried it with MySQL server 4.1.9, expecting it to hang, but it worked without any error. Then I uninstalled MySQL server 4.1.9 and installed MySQL server 5.0.45 and tried the query and it works. So, I am willing to bet that if I had just rebooted the linux box earlier the problem would have gone away, which does not make me very comfortable, but probably means that it isn't a MySQL bug, after all. Do you know if other strange MySQL problems have been reported against the lastest Red Hat 3 kernel?
[31 Jan 2008 8:26]
Tonci Grgin
Hi Jeremy and thanks for your report. I consulted with my colleagues and no one is aware of any such problem... Maybe netstat will provide enough info so you can file a bug report with RH?
[1 Mar 2008 0:02]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[3 Mar 2008 17:22]
Jeremy Gilbert
Upgrading to JDK 6 update 4 solved the problem. Beware of possible incompatibility between JDK 6 update 2 and glibc-2.3.2-95.50 when using Connector/J. Using 'netstat' and 'pstack' I was able to see that the MySQL server was properly sending the reply (the Send-Q of the socket reached its maximum size), so I focused on the Java client code using Connector/J and saw that it was blocked in the pthreads recv() call - which seemed odd to me because clearly there was data to receive. I remembered that I had recently used up2date on this server, and one of the packages that was upgraded was glibc which contains the pthreads libraries. I considered the possibility that there might be some incompatibility between the JDK that I was using and the new version of pthreads, so I upgraded to JDK 6 udpate 4 and the problem went away.
[3 Mar 2008 19:21]
Tonci Grgin
Jeremy, if I read you correctly, this is !Bg. Thanks for following up on this and providing valuable info to all. Thanks for your interest in MySQL.
[6 Jul 2008 4:17]
Rico H.
I think this is really a bug of mysql server, but I think I found the problem. It's related to the query cache. There must be some kind of bug because sometimes the server stops sending rows to the jdbc driver, so after a long time a java.io.EOFException is thrown. Whenever this blocking issue occurs, re-executing the same program will end-up with the same result. However, you will see that issuing a "reset query cache" (or restarting mysql) will make your program to work again... but just for some time until it happens again. My workarounds are (depending on your needs): a) Disable query cache by setting "query_cache_size=0" in your my.cnf b) Changing your streaming queries so that they include the SQL_NO_CACHE flag. This is what I've done, so now my queries look like "select SQL_NO_CACHE * from t1". No more hangs so far. c) Changing query_cache_type to a different option (but will involve adding flags to some queries). Honestly, I think there's some kind of bug around the query cache not being able to handle so many results (due to the streaming resultset). Some bugs point to OS networking issues, but I've updated my redhat 5.2 box with the latest pachages & kernel, also updated to java 1.6.06, mysql 5.0.51a and jdbc 5.1.6 and the same happened. I hope this helps you. I've lost a lot of time debugging this, as it's very hard to reproduce.
[26 Oct 2009 11:46]
Tomaž Šolc
Hi We are seeing this exact bug on Debian Lenny (mysql 5.0.51a, kernel 2.6.31) with programs in Java, C++ and Python, so I am confident that this problem is not limited to Java. It appears to be triggered when a large streaming query (i.e. UseQueryResult in C++, SSCursor in Python) is interrupted when only a handful of rows were retrieved. My guess is that when the amount of data actually read from the server is smaller than the query_cache_limit somehow a corrupted query cache entry gets created. After that, the same query will expose the problem until the server is restarted. But if you disable the query cache as Rico described the query works normally. You can also "repair" the "broken" query by just adding a space to it (because query cache works on the raw, unparsed query string). I'm attaching a simple test case in C++ that exposes this bug. If you run it with the "_exit()" line enabled, it will create the broken query cache entry I mentioned. The table I used for this test is the image SQL dump from Wikipedia, but any sufficiently large table will do.
[26 Oct 2009 11:47]
Tomaž Šolc
Test case for the cache query bug
Attachment: mysql_cache_bug.cpp (text/x-c++src), 597 bytes.
[26 Oct 2009 12:33]
Tonci Grgin
Tomaž, Rico, thanks for your thoughts and tests. There are numerous bug reports against server cache reported so please search in BugsDB.
[26 Oct 2009 12:59]
Tomaž Šolc
Search returns over 680 bug reports for "query cache". I haven't checked them all, but from other searches that I did, this is the only bug report that describes exactly what I'm seeing on my servers. Can you perhaps reassign the category from Connector/J to Server: Query Cache, or should I open a new bug report?
[26 Oct 2009 13:09]
Tonci Grgin
Tomaž sure but I've already passed this report to my server colleagues and will let them make the ruling.
[26 Oct 2009 13:11]
Tomaž Šolc
Ok. Thanks for your help Tonci.
[26 Oct 2009 13:47]
Valeriy Kravchuk
If you suspect a server bug, please, try to repeat with recent actively supported server versions, 5.0.86 and/or 5.1.40, and inform about the results.