Bug #61105 Avoid Java bottleneck by using explicit Charset for byte[]<->String conversions
Submitted: 9 May 2011 19:55 Modified: 25 Jun 2013 21:38
Reporter: David Engberg Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:5.1.16 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[9 May 2011 19:55] David Engberg
Description:
We're using Connector/J in a high-volume, high-concurrency service.  At times, we see a performance slowdown within the service, which we've traced to a concurrency flaw within the JVM code that translates named encodings (e.g. "utf-8") into Charsets.  This translates into a number of stuck threads trying to convert a byte array to a String or vice versa, ala:

  java.lang.Thread.State: BLOCKED (on object monitor)
       at sun.nio.cs.FastCharsetProvider.charsetForName(Unknown Source)
       - waiting to lock <0x00007f89a49cdfd0> (a sun.nio.cs.StandardCharsets)
       at java.nio.charset.Charset.lookup2(Unknown Source)
       at java.nio.charset.Charset.lookup(Unknown Source)
       at java.nio.charset.Charset.isSupported(Unknown Source)
       at java.lang.StringCoding.lookupCharset(Unknown Source)
       at java.lang.StringCoding.encode(Unknown Source)
       at java.lang.String.getBytes(Unknown Source)
       at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:499)

This isn't a true deadlock, since each thread will eventually finish, but it can significantly affect concurrency if there are a number of threads making heavy use of:
   new String(byte[] b, String encoding)
   String.getBytes()
   String.getBytes(String encoding)

This is, unfortunately, a known bottleneck within the JVM:
http://blog.inuus.com/vox/2008/05/the-mysteries-of-java-character-set-performance.html
http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6790402

How to repeat:
Use Connector/J to make a lot of requests to MySQL across a lot of threads, and occasionally dump the Java thread stacks to see which threads are stuck in Charset.lookup2 waiting to lock sun.nio.cs.StandardCharsets

Suggested fix:
To avoid this bottleneck in the JVM, I'd suggest adding a helper function to StringUtils:

    private static final ConcurrentHashMap<String, Charset> encodingToCharsetCache =
	new ConcurrentHashMap<String, Charset>();    

    public static Charset getCharset(String enc) {
	Charset charset = encodingToCharsetCache.get(enc);
	if (charset == null) {
	    charset = Charset.forName(enc);
	    encodingToCharsetCache.put(enc, charset);
	}
	return charset;
    }

Then ...

Replace:
  new String(bytes, "encoding")
with:
  new String(bytes, StringUtils.getCharset("encoding")

Replace:
  s.getBytes()
with:
  s.getBytes(StringUtils.getCharset("ISO-8859-1"))   // that's the default used in Java6

Replace:
  s.getBytes("encoding")
with:
  s.getBytes(StringUtils.getCharset("encoding"))

I think that we'd get the most value from replacing these occurrences in:

StringUtils.getBytes*()
ResultSetImpl.getStringInternal()

Sorry, I'd submit a patch, but I'm on OSX and can't easily get the current trunk to compile.  If it would help a lot to do so, I can work on it.
[11 May 2011 9:20] Tonci Grgin
Hello David and thank for an excellent report! Probably the most interesting one I had in a long time.

We discussed this yesterday and the ruling is to start working on implementation as soon as possible.
[22 Jun 2011 14:58] Christopher Schultz
Tonci,

Over in Apache Tomcat, we implemented a strategy similar to that proposed here:
http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html

The advantage to this solution is that it does not depend on methods only available in Java 1.6... only NIO support.

Have a look at https://issues.apache.org/bugzilla/show_bug.cgi?id=51400#c5 for the patch we have for Tomcat where, hidden in there, is the use of Charset.decode(ByteBuffer.wrap(...)).
[22 Jun 2011 15:29] Mark Matthews
Christopher, see http://bazaar.launchpad.net/~mark-mysql/connectorj/5.1/revision/1063

(we came to the same approach).

Thanks for the pointer to your fix though!