Bug #109131 Wrong default clobCharacterEncoding
Submitted: 17 Nov 2022 22:04 Modified: 18 Nov 2022 9:48
Reporter: Michał Sobkiewicz Email Updates:
Status: Analyzing Impact on me:
Category:Connector / J Severity:S3 (Non-critical)
Version:8.0.31 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[17 Nov 2022 22:04] Michał Sobkiewicz
According to https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-connp-props-blob-clob-processing...., default character encoding to use for sending and retrieving TEXT, MEDIUMTEXT and LONGTEXT values (clobCharacterEncoding) should be the same as characterEncoding.

It seems to me that default clobCharacterEncoding can be in some cases unrelated to characterEncoding - and that in fact it is the same as file.encoding up to Java 17 and UTF-8 since Java 18.

It doesn't even matter if characterEncoding is being set explicitly or not (according to https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-charsets.html#idm458067..., for Connector/J 8.0.26 and later, character encoding is by default set to UTF8).

When java.sql.PreparedStatement.setClob(int, Reader) is used, then
com.mysql.cj.NativeQueryBindValue.writeAsText(Message) calls
com.mysql.cj.protocol.a.ReaderValueEncoder.readBytes(Reader, BindValue).
Then forcedEncoding, computed as this.propertySet.getStringProperty(PropertyKey.clobCharacterEncoding).getStringValue(), becomes null, which in turn very soon leads to problematic String.getBytes().

In case of java.sql.PreparedStatement.setText(int, String), everything is fine.

How to repeat:
git clone https://github.com/perceptron8/clob-encoding
cd clob-encoding
mvn test -DargLine="-Dfile.encoding=US-ASCII"


Please, please use JDK 17. As I mentioned earlier, default character encoding had changed in JDK 18 (you know that better than me), so the test would unexpectedly pass instead of failing miserably.


You can replace
insert.setClob(1, new StringReader("Hello world, Καλημέρα κόσμε, コンニチハ"));
insert.setString(1, "Hello world, Καλημέρα κόσμε, コンニチハ");
to see that encoding sometimes works as expected.

Suggested fix:
Maybe this.propertySet.getStringProperty(PropertyKey.characterEncoding).getStringValue()
should be used as a fallback when computing forcedEncoding inside
com.mysql.cj.protocol.a.ReaderValueEncoder.readBytes(Reader, BindValue)?

In the meantime, obvious workaround is to set clobCharacterEncoding explicitly in jdbc url.