Description:
We've built a Lucene indexer that iterates through all 10M rows of a table. The primary key is an 8-char string, evenly distributed. The indexer selects portions of the table to work on according to the first 2 letters of the key.
SELECT * FROM metadata_table WHERE key LIKE 'aa%'
SELECT * FROM metadata_table WHERE key LIKE 'ab%'
SELECT * FROM metadata_table WHERE key LIKE 'ac%'
etc. The indexer is multithreaded; many of these queries are executed concurrently, up to a factor we set, typically between 30 and 100. Each thread opens its own connection to the database, and issues a query as above. Each query selects about 15K rows; there are 26*26 =~ 700 such queries total.
On a typical execution, between 2 and 10 of these queries fail. Sometimes when trying to open a database connection[1], sometimes when executing the query[2].
[1] com.mysql.jdbc.CommunicationsException: Communications link failure
Last packet sent to the server was 150 ms ago.
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2103)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:718)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:298)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:282)
at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:422)
at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:137)
at com.mysql.jdbc.jdbc2.optional.MysqlDataSource.getConnection(MysqlDataSource.java:107)
at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrors(IndexingFunction.java:76)
at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrorsAndRetry(IndexingFunction.java:170)
at edu.ucsf.library.ltdl2.index.IndexingFunction.run(IndexingFunction.java:130)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.mysql.jdbc.CommunicationsException: Communications link failure
Last packet sent to the server was 149 ms ago.
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910)
at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3923)
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1273)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2031)
... 12 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2431)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
... 18 more
[2] com.mysql.jdbc.CommunicationsException: Communications link failure
Last packet sent to the server was 93486 ms ago.
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1070)
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1642)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1398)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2816)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2536)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2465)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383)
at edu.ucsf.library.ltdl2.index.IndexingFunction.selectMandatoryAndOptionalFields(IndexingFunction.java:207)
at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrors(IndexingFunction.java:94)
at edu.ucsf.library.ltdl2.index.IndexingFunction.runWithErrorsAndRetry(IndexingFunction.java:170)
at edu.ucsf.library.ltdl2.index.IndexingFunction.run(IndexingFunction.java:130)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 14 bytes, read 6 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2431)
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:1623)
... 16 more
If we trap SQLException 08S01, close the connection, open a new one, and re-do the query, it succeeds.
mysqld --log-warnings=2 doesn't report the connection as being dropped. We've reproduced this error when there is no physical network involved; where the mysql server runs on the same machine as the indexer.
We've tried tweaking useReadAheadInput and useUnbufferedInput to no avail. We've also tried streaming results sets but the errors were similar. We've bumped up every timeout we've come across, increased cache sizes, etc. We haven't been able to reproduce the error using the MySQL command-line client. It may very well be a mysqld bug, but since so far only the JDBC driver has claimed an error, this is where I'm reporting it first. :)
I realize this is a difficult issue, and I'd like to get some feedback on how to track it further. While our application manages to recover from these errors, it has a negative performance impact. See also attached debugging output.
MySQL server:
| protocol_version | 10 |
| version | 5.0.27-standard-log |
| version_comment | MySQL Community Edition - Standard (GPL) |
| version_compile_machine | sparc |
| version_compile_os | sun-solaris2.10 |
How to repeat:
I'm sorry, I don't have a small self-contained test case yet. Instead I'm attaching a complete debugging output, generated with useReadAheadInput=false and traceProtocol=true. I apologize, for it is nearly 300MB decompressed. I can whittle it down if needed, and answer any questions.
Look for CommunicationsException to finds the errors quickly.