Bug #91858 Closing not fully retrieved resultset takes long time
Submitted: 1 Aug 2018 18:37 Modified: 3 Aug 2018 15:59
Reporter: Dmitriy Shirokov Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / J Severity:S4 (Feature request)
Version:master OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[1 Aug 2018 18:37] Dmitriy Shirokov
Description:
I execute query returning very large number of records (100 millions). I iterate these records, select data matching very complex criteria (i.e. this criteria cannot be done as WHERE clause). When I get sufficient data (statistics) I want to exit. At this point com.mysql.jdbc.ResultSetImpl#close is called, which calls com.mysql.jdbc.RowDataDynamic#close, which has infinit loop with comment "// drain data". In my case I might read 10 million records to find 10000 matching my criteria, but attempt to close will keep reading the rest even if I do not need them.

I cannot limit SQL to return 10 million records, because I dod not know is it 10 mil or 20 mil to find 10000 records. 

How to repeat:
I use mybatis to execute select with forward only cursor to read one record at a time (I cannot fit all data into memory):
<select id="readTrainingData" resultMap="TrainingRecord" fetchSize="-2147483648" resultSetType="FORWARD_ONLY">...

I use ResultHandler<> as callback to process each row. ResultHandler has method stop() with I use to tell mybatis to stop reading, at which point it's calling JDBC to close ResultSet.

Suggested fix:
I do not know mySQL and JDBC well enough to tell that not draining resultSet will be fine, but this is needed: just drop it.
[2 Aug 2018 12:43] Chiranjeevi Battula
Hello Dmitriy Shirokov,

Thank you for the bug report.
Could you please provide repeatable test case (exact steps, full stack trace, sample code etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.
[2 Aug 2018 15:11] Dmitriy Shirokov
Hello, Chiranjeevi,

This is top of the callstack:
close:151, RowDataDynamic (com.mysql.jdbc)
realClose:6676, ResultSetImpl (com.mysql.jdbc)
close:851, ResultSetImpl (com.mysql.jdbc)
close:-1, HikariProxyResultSet (com.zaxxer.hikari.pool)
closeResultSet:265, DefaultResultSetHandler (org.apache.ibatis.executor.resultset)
...

If you look at https://github.com/mysql/mysql-connector-j/blob/release/5.1/src/com/mysql/jdbc/RowDataDyna..., while loop at line 155 with comment // drain the rest of the records, it will be obvious that it will take time to to drain millions of records if this is what SQL returns.

If I will have time I will build pure mysql.jdbc sample. 

I already implemented workaround: page through the records in 100K bunches. Not ideal, need to write more code, but works. Draining ~100K records is reasonably fast.

Thank you
Dmitriy
[3 Aug 2018 12:19] Alexander Soklakov
Hi Dmitriy,

After ResultSet is closed the Connection still may be used. So if we do not drain all resultset data it may be wrongly consumed by other statements result retrieving.

I see only one way to quickly ignore this unneeded data - call ((ConnectionImpl)conn).abortInternal(). It forces the socket close, so it's fast, but you'll need to create a new connection to proceed.
[3 Aug 2018 13:38] Dmitriy Shirokov
Hi, Alexander,
Thank you for the explanation. Maybe then my paging approach is the best :)
I guess you could add a config parameter for controlling record drain, maybe maxRecordsToDrain, after which it will close the socket. Or more sophisticated approach when you measure time of reopening connection (say 100ms), measure speed of record drain (say 10K in 100 ms) and then calculate the maxRecordsToDrain dynamically ( ~10K in this case). Of course there will be unfortunate cases when number of records to drain is just above the maxRecordsToDrain threshold :(
Thank you
Dmitriy
[3 Aug 2018 15:59] Alexander Soklakov
Ok, I'll keep it as a feature request inhope we could find an appropriate solution in the future.