Bug #73747 Add a config option to ignore non-BMP characters
Submitted: 28 Aug 2014 3:26 Modified: 29 Aug 2014 15:00
Reporter: Eric Hartford Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[28 Aug 2014 3:26] Eric Hartford
Description:
MySQL doesn't support non-BMP (Basic Multilingual Plane) characters.
When ConnectorJ receives an insert that contains BMP characters, it throws an exception.
I would like a config option for ConnectorJ that would strip non-BMP characters rather than throwing an exception.

How to repeat:
Create a table with utf8 encoding.  (not utf8mb4)
Use ConnectorJ to insert a row with a non-BMP character.

Expected:  There is a config option to suppress exceptions in this case, by stripping the offending characters from the insert.

Observed:  exception thrown, and there is no option to silently strip offending characters.

Suggested fix:
Please add a config option called something like "utf8OutsideBmpIgnore" that will strip out any non-Bmp characters instead of throwing an exception.
[28 Aug 2014 14:58] Todd Farmer
Hi Eric,

Thanks for the bug report.  Do you have a simple repeatable test case you can provide?  It would also be useful to provide the full stack trace for the Exception you are getting.  Have you tried setting jdbcCompliantTruncation=false?  By default, Connector/J sets SQL_MODE=STRICT_TRANS_TABLES, which causes data truncation to generate errors on the server.  If that is the trigger of the Exception, jdbcCompliantTruncation=false should suppress it (although the server-side truncation behavior may not do what you are wanting).
[28 Aug 2014 18:39] Eric Hartford
ok here is a test case.  Please change the connection string to have the appropriate credentials.  Please ensure Connector/J is in the classpath.

test.sql
  create database test;
  use test;
  create table t (c char(20) character set utf8);

test.java
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.SQLException;
  import java.sql.Statement;
  public class test { 
    public static void main(String[] args) { 
      Connection conn = null;
      Statement stmt = null;
      try {
        conn =
          DriverManager.getConnection("jdbc:mysql://localhost/test?" +
                                      "user=x&password=x&useUnicode=true&characterEncoding=UTF-8");
        stmt = conn.createStatement();
        stmt.executeUpdate("insert into t (c) values ('Hello
[28 Aug 2014 18:40] Eric Hartford
looks like this very comment field cannot support 4-byte characters, it truncated my message.
[28 Aug 2014 18:41] Eric Hartford
ok here is a test case.  Please change the connection string to have the appropriate credentials.  Please ensure Connector/J is in the classpath.

test.sql
  create database test;
  use test;
  create table t (c char(20) character set utf8);

test.java
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.SQLException;
  import java.sql.Statement;
  public class test { 
    public static void main(String[] args) { 
      Connection conn = null;
      Statement stmt = null;
      try {
        conn =
          DriverManager.getConnection("jdbc:mysql://localhost/test?" +
                                      "user=x&password=x&useUnicode=true&characterEncoding=UTF-8");
        stmt = conn.createStatement();
        stmt.executeUpdate("insert into t (c) values ('Hello {insert emoji here} dear')");
      } catch (SQLException ex) {
        System.out.println("SQLException: " + ex.getMessage());
        System.out.println("SQLState: " + ex.getSQLState());
        System.out.println("VendorError: " + ex.getErrorCode());
      }
    }
  }

This will throw an exception something like:
SQLException: Incorrect string value: '\xF0\x9F\x98\x8A\x0A[...' for column 'c' at row 1

I would like to have a setting that will strip the 4-byte characters.  So the insert would succeed, but the row would contain "Hello  dear" instead of "Hello {insert emoji here} dear"
[28 Aug 2014 18:45] Eric Hartford
If you use mac, type an emoji with ctrl-command-space.
if you use windows 8.1, use the soft keyboard to type a smily character.
[28 Aug 2014 18:47] Todd Farmer
Hi Eric,

That would be the exact truncating behavior of running without STRICT_TRANS_TABLES enabled I was mentioning.  :)

What you are observing is a server-related behavior (choose between truncation or error, depending on SQL_MODE - but the option to ignore specific bytes is not supported).  It's possible to do some pre-processing on the Connector/J side (you could do something using statement interceptors, for example).  That would probably be the better approach than to have Connector/J inspect and replace, character-by-character, out-of-range characters for every command.
[28 Aug 2014 19:00] Eric Hartford
ok i will look into the statement interceptors.
thank you for the information.