Bug #42970 containsOnDuplicateKeyInString() implementation is killing PreparedStatement
Submitted: 18 Feb 2009 13:36 Modified: 18 Feb 2009 15:13
Reporter: Stephane Bailliez Email Updates:
Status: Duplicate Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.7 OS:Any
Assigned to: CPU Architecture:Any

[18 Feb 2009 13:36] Stephane Bailliez
Description:

containsOnDuplicateKeyInString() is a major performance killer.
All time is spent in the corresponding StringUtils.indexOfIgnoreCase() and the multiple character by character toUpperCase() calls.

With queries of ~3.5KB long and 1 row in the resultset, and a test doing 100 calls, upgrading from 5.1.6 to 5.1.7 causes a > 10x slowdown, from 4.5s to over 50s. In the application and with various queries this causes a general slowdown of about 5x.

How to repeat:
1) take a long query string of several KB
2) check performance of containsOnDuplicateKeyString() on that string
[18 Feb 2009 13:37] Stephane Bailliez
bumping to severity S2 (serious) since I feel this makes the driver more or less unusable in real life scenario.
[18 Feb 2009 13:40] Tonci Grgin
Hi Stephane and thanks for your report.

I think we're aware of this, just can't find right quote now.
[18 Feb 2009 13:49] Tonci Grgin
This is a duplicate of Bug#41532 please see my comments there.
[18 Feb 2009 14:16] Stephane Bailliez
Just to add, that in this case ALL my queries are purely read-only (ie: SELECT)

Without digging into the whole reason of why it is trying to do it that way, it looks like to me, you're likely to get more benefits trying to first look for a naive case insensitive match of ' ON DUPLICATE KEY UPDATE ' and choosing to dig deeper with more complex parsing if there is a match, otherwise absolutely all queries are affected by this extremely slow generic parsing.

I'm also not sure if the necessity to use Character.toUpperCase and worrying about the locale when you're ultimately looking for ascii characters, but maybe I'm missing something obvious (NB: some comments in the code might be helpful for the reader)
[18 Feb 2009 15:13] Stephane Bailliez
An example of the profiling:

4 PreparedStatement, means 4 calls to containsOnDuplicateKeyInString()  and generates more than 12M calls to String.charAt() , 9M calls to Characters.toUpperCase, 4.5M calls to Characters.toLowercase().

  99,9% - 386 s - 4 inv. com.mysql.jdbc.JDBC4PreparedStatement.<init>
    99,9% - 386 s - 4 inv. com.mysql.jdbc.PreparedStatement.<init> (line: 47)
      99,9% - 386 s - 4 inv. com.mysql.jdbc.PreparedStatement$ParseInfo.<init> (line: 635)
        99,9% - 386 s - 4 inv. com.mysql.jdbc.PreparedStatement.containsOnDuplicateKeyInString (line: 202)
          99,9% - 386 s - 4 inv. com.mysql.jdbc.StringUtils.indexOfIgnoreCaseRespectMarker (line: 5193)
            99,7% - 386 s - 3 002 inv. com.mysql.jdbc.StringUtils.indexOfIgnoreCase (line: 1023)
              18,0% - 69 578 ms - 9 530 899 inv. java.lang.String.charAt (line: 946)
              10,3% - 40 015 ms - 5 506 413 inv. java.lang.Character.toUpperCase (line: 946)
              7,4% - 28 716 ms - 4 024 486 inv. java.lang.Character.toLowerCase (line: 946)
              6,4% - 24 725 ms - 3 420 156 inv. java.lang.Character.toUpperCase (line: 963)
              6,2% - 23 975 ms - 3 420 156 inv. java.lang.String.charAt (line: 963)
              0,8% - 3 228 ms - 450 342 inv. java.lang.Character.toLowerCase (line: 974)
              0,8% - 2 932 ms - 450 342 inv. java.lang.String.charAt (line: 974)
[18 Feb 2009 15:39] Philippe Martin
Hi,

This is the same bug of Bug#41532 , if you want test my fix, you have just to download the last zip file and add it in the classpath before the connector jar.
I use it since almost 2 months without problem.