MySQL Bugs: #91858: Closing not fully retrieved resultset takes long time

Bug #91858	Closing not fully retrieved resultset takes long time
Submitted:	1 Aug 2018 18:37	Modified:	3 Aug 2018 15:59
Reporter:	Dmitriy Shirokov	Email Updates:
Status:	Verified	Impact on me:	None
Category:	Connector / J	Severity:	S4 (Feature request)
Version:	master	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I execute query returning very large number of records (100 millions). I iterate these records, select data matching very complex criteria (i.e. this criteria cannot be done as WHERE clause). When I get sufficient data (statistics) I want to exit. At this point com.mysql.jdbc.ResultSetImpl#close is called, which calls com.mysql.jdbc.RowDataDynamic#close, which has infinit loop with comment "// drain data". In my case I might read 10 million records to find 10000 matching my criteria, but attempt to close will keep reading the rest even if I do not need them.

I cannot limit SQL to return 10 million records, because I dod not know is it 10 mil or 20 mil to find 10000 records. 

How to repeat:
I use mybatis to execute select with forward only cursor to read one record at a time (I cannot fit all data into memory):
<select id="readTrainingData" resultMap="TrainingRecord" fetchSize="-2147483648" resultSetType="FORWARD_ONLY">...

I use ResultHandler<> as callback to process each row. ResultHandler has method stop() with I use to tell mybatis to stop reading, at which point it's calling JDBC to close ResultSet.

Suggested fix:
I do not know mySQL and JDBC well enough to tell that not draining resultSet will be fine, but this is needed: just drop it.

Hello Dmitriy Shirokov,

Thank you for the bug report.
Could you please provide repeatable test case (exact steps, full stack trace, sample code etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.

Hello, Chiranjeevi,

This is top of the callstack:
close:151, RowDataDynamic (com.mysql.jdbc)
realClose:6676, ResultSetImpl (com.mysql.jdbc)
close:851, ResultSetImpl (com.mysql.jdbc)
close:-1, HikariProxyResultSet (com.zaxxer.hikari.pool)
closeResultSet:265, DefaultResultSetHandler (org.apache.ibatis.executor.resultset)
...

If you look at https://github.com/mysql/mysql-connector-j/blob/release/5.1/src/com/mysql/jdbc/RowDataDyna..., while loop at line 155 with comment // drain the rest of the records, it will be obvious that it will take time to to drain millions of records if this is what SQL returns.

If I will have time I will build pure mysql.jdbc sample. 

I already implemented workaround: page through the records in 100K bunches. Not ideal, need to write more code, but works. Draining ~100K records is reasonably fast.

Thank you
Dmitriy

Hi Dmitriy,

After ResultSet is closed the Connection still may be used. So if we do not drain all resultset data it may be wrongly consumed by other statements result retrieving.

I see only one way to quickly ignore this unneeded data - call ((ConnectionImpl)conn).abortInternal(). It forces the socket close, so it's fast, but you'll need to create a new connection to proceed.

Hi, Alexander,
Thank you for the explanation. Maybe then my paging approach is the best :)
I guess you could add a config parameter for controlling record drain, maybe maxRecordsToDrain, after which it will close the socket. Or more sophisticated approach when you measure time of reopening connection (say 100ms), measure speed of record drain (say 10K in 100 ms) and then calculate the maxRecordsToDrain dynamically ( ~10K in this case). Of course there will be unfortunate cases when number of records to drain is just above the maxRecordsToDrain threshold :(
Thank you
Dmitriy

Ok, I'll keep it as a feature request inhope we could find an appropriate solution in the future.