Description:
We're using Connector/J in a high-volume, high-concurrency service. At times, we see a performance slowdown within the service, which we've traced to a concurrency flaw within the JVM code that translates named encodings (e.g. "utf-8") into Charsets. This translates into a number of stuck threads trying to convert a byte array to a String or vice versa, ala:
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.cs.FastCharsetProvider.charsetForName(Unknown Source)
- waiting to lock <0x00007f89a49cdfd0> (a sun.nio.cs.StandardCharsets)
at java.nio.charset.Charset.lookup2(Unknown Source)
at java.nio.charset.Charset.lookup(Unknown Source)
at java.nio.charset.Charset.isSupported(Unknown Source)
at java.lang.StringCoding.lookupCharset(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:499)
This isn't a true deadlock, since each thread will eventually finish, but it can significantly affect concurrency if there are a number of threads making heavy use of:
new String(byte[] b, String encoding)
String.getBytes()
String.getBytes(String encoding)
This is, unfortunately, a known bottleneck within the JVM:
http://blog.inuus.com/vox/2008/05/the-mysteries-of-java-character-set-performance.html
http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6790402
How to repeat:
Use Connector/J to make a lot of requests to MySQL across a lot of threads, and occasionally dump the Java thread stacks to see which threads are stuck in Charset.lookup2 waiting to lock sun.nio.cs.StandardCharsets
Suggested fix:
To avoid this bottleneck in the JVM, I'd suggest adding a helper function to StringUtils:
private static final ConcurrentHashMap<String, Charset> encodingToCharsetCache =
new ConcurrentHashMap<String, Charset>();
public static Charset getCharset(String enc) {
Charset charset = encodingToCharsetCache.get(enc);
if (charset == null) {
charset = Charset.forName(enc);
encodingToCharsetCache.put(enc, charset);
}
return charset;
}
Then ...
Replace:
new String(bytes, "encoding")
with:
new String(bytes, StringUtils.getCharset("encoding")
Replace:
s.getBytes()
with:
s.getBytes(StringUtils.getCharset("ISO-8859-1")) // that's the default used in Java6
Replace:
s.getBytes("encoding")
with:
s.getBytes(StringUtils.getCharset("encoding"))
I think that we'd get the most value from replacing these occurrences in:
StringUtils.getBytes*()
ResultSetImpl.getStringInternal()
Sorry, I'd submit a patch, but I'm on OSX and can't easily get the current trunk to compile. If it would help a lot to do so, I can work on it.
Description: We're using Connector/J in a high-volume, high-concurrency service. At times, we see a performance slowdown within the service, which we've traced to a concurrency flaw within the JVM code that translates named encodings (e.g. "utf-8") into Charsets. This translates into a number of stuck threads trying to convert a byte array to a String or vice versa, ala: java.lang.Thread.State: BLOCKED (on object monitor) at sun.nio.cs.FastCharsetProvider.charsetForName(Unknown Source) - waiting to lock <0x00007f89a49cdfd0> (a sun.nio.cs.StandardCharsets) at java.nio.charset.Charset.lookup2(Unknown Source) at java.nio.charset.Charset.lookup(Unknown Source) at java.nio.charset.Charset.isSupported(Unknown Source) at java.lang.StringCoding.lookupCharset(Unknown Source) at java.lang.StringCoding.encode(Unknown Source) at java.lang.String.getBytes(Unknown Source) at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:499) This isn't a true deadlock, since each thread will eventually finish, but it can significantly affect concurrency if there are a number of threads making heavy use of: new String(byte[] b, String encoding) String.getBytes() String.getBytes(String encoding) This is, unfortunately, a known bottleneck within the JVM: http://blog.inuus.com/vox/2008/05/the-mysteries-of-java-character-set-performance.html http://halfbottle.blogspot.com/2009/07/charset-continued-i-wrote-about.html http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6790402 How to repeat: Use Connector/J to make a lot of requests to MySQL across a lot of threads, and occasionally dump the Java thread stacks to see which threads are stuck in Charset.lookup2 waiting to lock sun.nio.cs.StandardCharsets Suggested fix: To avoid this bottleneck in the JVM, I'd suggest adding a helper function to StringUtils: private static final ConcurrentHashMap<String, Charset> encodingToCharsetCache = new ConcurrentHashMap<String, Charset>(); public static Charset getCharset(String enc) { Charset charset = encodingToCharsetCache.get(enc); if (charset == null) { charset = Charset.forName(enc); encodingToCharsetCache.put(enc, charset); } return charset; } Then ... Replace: new String(bytes, "encoding") with: new String(bytes, StringUtils.getCharset("encoding") Replace: s.getBytes() with: s.getBytes(StringUtils.getCharset("ISO-8859-1")) // that's the default used in Java6 Replace: s.getBytes("encoding") with: s.getBytes(StringUtils.getCharset("encoding")) I think that we'd get the most value from replacing these occurrences in: StringUtils.getBytes*() ResultSetImpl.getStringInternal() Sorry, I'd submit a patch, but I'm on OSX and can't easily get the current trunk to compile. If it would help a lot to do so, I can work on it.