Bug #7435 Cannot "stream" large result sets.
Submitted: 20 Dec 2004 17:26 Modified: 31 Mar 2014 7:56
Reporter: Dan Armbrust Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:3.0.16 OS:Microsoft Windows (WinXP)
Assigned to: Alexander Soklakov CPU Architecture:Any

[20 Dec 2004 17:26] Dan Armbrust
Description:
I'm using MySQL 4.0.20-standard, along with 3.0.16-ga of the Connector/J.

I've also had this problem with all earlier versions of Connector/J.

I'm writing a program that reads all of the data from a database, and converts it into another format before writing to a new database.  So, I need to read every single row of several tables, and some of the tables have 20 million+ rows.

I am using the new "streaming" feature to bring the results back one at a time.

Example Query:
PreparedStatement getCodingSchemeInfo = umlsConnection2_
                .prepareStatement("SELECT  LAT, CODE, CUI, TS, STT, STR FROM MRCONSO WHERE SAB = ? ORDER BY CODE",
                                  ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
        getCodingSchemeInfo.setFetchSize(Integer.MIN_VALUE);
        getCodingSchemeInfo.setString(1, UMLSCodingSchemeName);

        ResultSet results = getCodingSchemeInfo.executeQuery();

This works fine, some of the time.  Other times, (and at completely random locations in the result set) it bails out with this:

java.sql.SQLException: Communication link failure: java.io.EOFException, underlying cause: null

** BEGIN NESTED EXCEPTION ** 

java.io.EOFException

STACKTRACE:

java.io.EOFException
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1394)
	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1570)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1929)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1906)
	at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:997)
	at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:313)
	at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:285)
	at com.mysql.jdbc.ResultSet.next(ResultSet.java:2490)
	at [snip my code]

** END NESTED EXCEPTION **

	at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1713)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1929)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1906)
	at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:997)
	at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:313)
	at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:285)
	at com.mysql.jdbc.ResultSet.next(ResultSet.java:2490)
	at [snip my code]

This is making it nearly impossible to get all of the results out of this database.  

How to repeat:
Easy for me to do, hard for me to write up :)

Make a large table - one that I have problems with has about 5 million rows - but I don't know if the size of the database is related to the problem - other than it increases the likleyhood that I won't make it through the full result set.

Do a "streamed" select from the table.

Iterate over the results.

Possibly related - I maintain two different connections to the sql server - one returning the streamed results, and another for going to get more data from other tables for each row in the streamed result.
[20 Dec 2004 18:00] Mark Matthews
What's your network timeout situation? (i.e. what is the following variable set to on the MySQL server):

net_write_timeout (the default is 30 seconds or so)

(see http://dev.mysql.com/doc/mysql/en/Server_system_variables.html for more details)

If your process isn't 'consuming' rows fast enough, MySQL will think the client has crashed, and will kill the connection.

The alternatives are to adjust the variables above, process your result set in pieces (using 'LIMIT'), or wait for MySQL-5.0 and Connector/J-3.2 which has a rudimentary 'cursor' capability to ask for rows 'n-at-a-time'.
[20 Dec 2004 18:37] Dan Armbrust
Origionally, net_read and net_write timeout were both set to the default - 30.

I tried changing them to see if it would help - I changed write to 60 (no change in behavior) but read wouldn't change.  I then tried changing the read timeout, but it wouldn't change... I just tried again now - and it did change (can you only do this if there aren't any other queries running?)

Currently testing to see if it has any affect.

Is the new feature you mentioned available in alpha or beta yet?

Thanks for you very fast response.
[20 Dec 2004 19:17] Dan Armbrust
Changing the read timeout to 60 also not have an obvious affect - I still get the exception at roughly the same point.
[5 Jan 2005 11:15] Thomas Breitkreuz
Hi,

we have also a problem with big result sets in steaming mode.

Scenario:
MySQL DB and the application (our ETL processor) on the same Machine (4 Itanium). The cross product of some tables results in 80 billion rows.

Bug:
After about 2 billion rows the JDBC driver cant't retrieve the row data. It seems, that there are some 32bit integer values (signed 32bit overflow) used in streaming mode.

Bye Thomas
[5 Jan 2005 15:01] Mark Matthews
Thomas,

Do you get any exception, or does the JDBC driver just hang?
[5 Jan 2005 15:02] Mark Matthews
Dan,

To enable fetching rows in batches, you need to use Connector/J 3.2.0 (alpha, just released), a recent version of MySQL-5.0, as well as use prepared statements to issue the query, and call .setFetchSize(somePositiveInteger) on the statement before you issue the query.
[6 Jan 2005 12:07] Thomas Breitkreuz
Hi Mark,

I got an SQLException (Msg: Before start of result set) while getString(). The previous next() works, but increments the internal row counter ....

I have looked at the driver sources. The row position is internaly stored as an 'int'. So signed 32bit is the limit :-((.
The JDBC documentation specifies some methods (getRow, absolute, relative) to get the row number. They use also 'int'. Other manufacturer (like Oracle) have the same problem with this methods. But they also supports the navigation (next/previous) in greater ResultSet (cursor usage).

Suggestion
If the MySQL network protcol (I found also int in the C/C++ APIs :-(() supports row positioning with 64bit values, the driver should use internaly long for row countings. The methods absolute and relative works as is. The method getRow returns Integer.MIN_VALUE, if the internal row count is greater than Integer.MAX_VALUE.

Bye Thomas
[6 Jan 2005 21:17] Mark Matthews
Thomas,

I think the fix for this is simple. Contact me at mark@mysql.com if you'd like to test it, otherwise I need to come up with a 2 billion row testcase.
[14 Feb 2005 22:54] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[31 Mar 2014 7:56] Alexander Soklakov
I close this report as "Can't repeat" because there is no feedback for a long time and codebase is too old. Please, feel free to reopen it if the problem still exists in current driver.