Bug #39251 EOFException on big queries, mysqld reports connection OK
Submitted: 4 Sep 2008 16:50 Modified: 26 Apr 2013 8:59
Reporter: Renaud Waldura Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.6 OS:Solaris (SunOS 5.10 Generic_137111-01 sun4v sparc SUNW,Sun-Fire-T200)
Assigned to: Alexander Soklakov CPU Architecture:Any

[4 Sep 2008 16:50] Renaud Waldura
Description:
We've built a Lucene indexer that iterates through all 10M rows of a table. The primary key is an 8-char string, evenly distributed. The indexer selects portions of the table to work on according to the first 2 letters of the key.

SELECT * FROM metadata_table WHERE key LIKE 'aa%'
SELECT * FROM metadata_table WHERE key LIKE 'ab%'
SELECT * FROM metadata_table WHERE key LIKE 'ac%'

etc. The indexer is multithreaded; many of these queries are executed concurrently, up to a factor we set, typically between 30 and 100. Each thread opens its own connection to the database, and issues a query as above. Each query selects about 15K rows; there are 26*26 =~ 700 such queries total.

On a typical execution, between 2 and 10 of these queries fail. Sometimes when trying to open a database connection[1], sometimes when executing the query[2]. 

[1] com.mysql.jdbc.CommunicationsException: Communications link failure

Last packet sent to the server was 150 ms ago.
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
        at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2103)
        at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:718)
        at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:298)
        at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:282)
        at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:422)
        at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:137)
        at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:107)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrors(IndexingFunction.java:76)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrorsAndRetry(IndexingFunction.java:170)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.run(IndexingFunction.java:130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
        at java.lang.Thread.run(Thread.java:595)
Caused by: com.mysql.jdbc.CommunicationsException: Communications link failure

Last packet sent to the server was 149 ms ago.
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910)
        at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3923)
        at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1273)
        at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2031)
        ... 12 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2431)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
        ... 18 more

[2] com.mysql.jdbc.CommunicationsException: Communications link failure

Last packet sent to the server was 93486 ms ago.
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
        at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1642)
        at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1398)
        at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816)
        at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467)
        at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510)
        at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2536)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2465)
        at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.selectMandatoryAndOptionalFields(IndexingFunction.java:207)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrors(IndexingFunction.java:94)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrorsAndRetry(IndexingFunction.java:170)
        at edu.ucsf.library.ltdl2.index.IndexingFunction.run(IndexingFunction.java:130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
        at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 14 bytes, read 6 bytes before connection was unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2431)
        at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1623)
        ... 16 more

If we trap SQLException 08S01, close the connection, open a new one, and re-do the query, it succeeds. 

mysqld --log-warnings=2 doesn't report the connection as being dropped. We've reproduced this error when there is no physical network involved; where the mysql server runs on the same machine as the indexer.

We've tried tweaking useReadAheadInput and useUnbufferedInput to no avail. We've also tried streaming results sets but the errors were similar. We've bumped up every timeout we've come across, increased cache sizes, etc. We haven't been able to reproduce the error using the MySQL command-line client. It may very well be a mysqld bug, but since so far only the JDBC driver has claimed an error, this is where I'm reporting it first. :)

I realize this is a difficult issue, and I'd like to get some feedback on how to track it further. While our application manages to recover from these errors, it has a negative performance impact. See also attached debugging output.

MySQL server:
| protocol_version                | 10                                                   |
| version                         | 5.0.27-standard-log                                  |
| version_comment                 | MySQL Community Edition - Standard (GPL)             |
| version_compile_machine         | sparc                                                |
| version_compile_os              | sun-solaris2.10                                      |

How to repeat:
I'm sorry, I don't have a small self-contained test case yet. Instead I'm attaching a complete debugging output, generated with useReadAheadInput=false and traceProtocol=true. I apologize, for it is nearly 300MB decompressed. I can whittle it down if needed, and answer any questions.

Look for CommunicationsException to finds the errors quickly.
[4 Sep 2008 16:53] Renaud Waldura
Compressed log file, DEBUG level, traceProtocol=true

Attachment: ZAPPLE_RC3j.log.bz2 (application/octet-stream, text), 291.27 KiB.

[4 Sep 2008 19:20] Renaud Waldura
java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_12-b04, mixed mode)
[5 Sep 2008 8:23] Tonci Grgin
Hi Renaud and thanks for your report.

Even though you do not have small and self-contained test case at this time, I urge you to make one. It should not be that hard to boil down your code to few simple routines doing exactly what you described. Please attach it afterwards so we can check.

As for my immediate thoughts, there is about zillion things that might have gone wrong here. Java framework by itself is multithreaded but that does not mean things are shareable across threads. So I should really have test case to prove/disprove the object sharing across threads first. Afterwards, I'd like you to attach my.cnf file used to start MySQL server. Are there any network HW problems on your side that can cause this?

Until such time you attach repeatable test case, I'm setting this report to "Need Feedback".
[5 Oct 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[26 Apr 2013 8:59] Alexander Soklakov
Hi Renaud,

There is no feedback since Sep 2008, so I close this report as "Can't repeat".
Please, feel free to reopen it if the problem still exists with current
driver version.