Bug #88529 Incorrect string value
Submitted: 17 Nov 2017 6:11 Modified: 28 Nov 2017 11:07
Reporter: chunyang xu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.7.18 OS:Linux
Assigned to: CPU Architecture:Any
Tags: Incorrect string value

[17 Nov 2017 6:11] chunyang xu
Description:
when l using  "load data local infile ... into table_name" command  for  load  data exported from db2  into  mysql ,  a few line  encounter   the error  like "ERROR 1366 (HY000): Incorrect string value: '\xF0\xA4\x8B\xAE' .

How to repeat:
first, l create a table  xcy_test, like this:

 CREATE TABLE "xcytest" (
  "a" int(11) NOT NULL,
  "b" varchar(200) DEFAULT NULL,
  "c" varchar(200) CHARACTER SET utf8mb4 DEFAULT NULL,
  PRIMARY KEY ("a")
) ENGINE=InnoDB DEFAULT CHARSET=utf8

then  using insert  command   for  inserting data into table .

mysql> insert into  xcytest(a,c) values(6,0xF0A48BAE); 
Query OK, 1 row affected (0.01 sec)

mysql> select * from xcytest;
+---+------+------+
| a | b    | c    |
+---+------+------+
| 6 | NULL | ?    |

mysql> insert into  xcytest(a,b) values(7,0xF0A48BAE); 
ERROR 1366 (HY000): Incorrect string value: '\xF0\xA4\x8B\xAE' for column 'b' at row 1

Suggested fix:
when l review the mysql  source code , and then find the function "my_valid_mbcharlen_utf8"

code  as follow :

static
int my_valid_mbcharlen_utf8(const CHARSET_INFO *cs MY_ATTRIBUTE((unused)),
                            const uchar *s, const uchar *e)
{
  uchar c;

  if (s >= e)
    return MY_CS_TOOSMALL;

  c= s[0];
  if (c < 0xf0)
    return my_valid_mbcharlen_utf8mb3(s, e);

#ifdef UNICODE_32BIT
  if (c < 0xf8 && sizeof(my_wc_t)*8 >= 32)
  {
    if (s+4 > e) /* We need 4 characters */
      return MY_CS_TOOSMALL4;

    if (!(IS_CONTINUATION_BYTE(s[1]) &&
          IS_CONTINUATION_BYTE(s[2]) &&
          IS_CONTINUATION_BYTE(s[3]) &&
          (c >= 0xf1 || s[1] >= 0x90)))
      return MY_CS_ILSEQ;

    return 4;
  }
  if (c < 0xfc && sizeof(my_wc_t)*8 >= 32)
  {
    if (s+5 >e) /* We need 5 characters */
      return MY_CS_TOOSMALL5;

    if (!(IS_CONTINUATION_BYTE(s[1]) &&
          IS_CONTINUATION_BYTE(s[2]) &&
          IS_CONTINUATION_BYTE(s[3]) &&
          IS_CONTINUATION_BYTE(s[4]) &&
          (c >= 0xf9 || s[1] >= 0x88)))
      return MY_CS_ILSEQ;

    return 5;
  }
  if (c < 0xfe && sizeof(my_wc_t)*8 >= 32)
  {
    if ( s+6 >e ) /* We need 6 characters */
      return MY_CS_TOOSMALL6;

    if (!(IS_CONTINUATION_BYTE(s[1]) &&
          IS_CONTINUATION_BYTE(s[2]) &&
          IS_CONTINUATION_BYTE(s[3]) &&
          IS_CONTINUATION_BYTE(s[4]) &&
          IS_CONTINUATION_BYTE(s[5]) &&
          (c >= 0xfd || s[1] >= 0x84)))
      return MY_CS_ILSEQ;

    return 6;
  }
#endif
  return MY_CS_ILSEQ;
}

 after this , l modify the CmakeList.txt file, and then recompile the mysqld  using  -D UNICODE_32BIT , after than.   exec the  same sql . and successful.

mysql> insert into  xcytest(a,b) values(7,0xF0A48BAE); 
Query OK, 1 row affected (0.01 sec)
[17 Nov 2017 7:08] chunyang xu
correctly insert after recomplie  mysqld

Attachment: correct_insert_into_mysql.png (image/png, text), 10.01 KiB.

[17 Nov 2017 7:10] chunyang xu
this issue cause data can't load data into mysql .
[17 Nov 2017 11:31] Peter Laursen
This does not raise an error for me on the offical 5.7.18 binary:

 CREATE TABLE xcytest (
  a INT(11) NOT NULL,
  b VARCHAR(200) CHARACTER SET utf8mb4 DEFAULT NULL,
  c VARCHAR(200) CHARACTER SET utf8mb4 DEFAULT NULL,
  PRIMARY KEY (a)
) ENGINE=INNODB DEFAULT CHARSET=utf8;

INSERT INTO  xcytest(a,b) VALUES(7,'0xF0A48BAE'); 

(used utf8mb4 for column `b` and used quotes according to my configuration (no ANSI quotes).
[17 Nov 2017 13:58] chunyang xu
yes , using character set  utf8mb4 for column b  , can  insert the values '0xF0A48BAE', but , if  check the result using 'select * from xcytest' , the value of column b is  '?' .  Pls re-check it . thanks .
[17 Nov 2017 14:11] Peter Laursen
Actually My INSERT statement was wrong. I should not have quoted the value as a literal string  string. 

INSERT INTO  xcytest(a,b) VALUES(7,0xF0A48BAE);
SELECT * FROM `xcytest`;

-- returns

     a  b           c       
------  ----------  --------
     7  0xF0A48BAE  (NULL)  
     9  ?           (NULL)  

But I am not sure if the display of "?" is simply a font issue (maybe no font support for this specific character?)

And besides I forgot my usual signature here

-- Peter
-- not a MySQL/Oracle person.
[17 Nov 2017 14:51] chunyang xu
Peter , Thank you all the same.   the correct result after insert,should display as  the Attachment: correct_insert_into_mysql.png  l uploaded .  ? , may be not correct.
[23 Nov 2017 15:10] MySQL Verification Team
Hi!

You need to use a client interface that supports fully utfmb4, meaning with all of its valid characters.

So, try client interface like the latest WorkBench and let us know if it works.

Also, the correct character set for such string is utf8mb4.
[23 Nov 2017 15:28] Peter Laursen
Obviously!  How could I miss that? :-(

In SQLyog (that supports utf8mb4 but uses (3 byte) utf8 as default as we still support old servers that don't have utf8mb4:

SET NAMES utf8mb4;
SELECT * FROM `xcytest`;
/*
     a  b           c       
------  ----------  --------
     7  0xF0A48BAE  (NULL)  
     9  𤋮          (NULL)  
*/                          

-- Peter
[23 Nov 2017 15:43] MySQL Verification Team
Thank you, very much, Peter Laursen !!!

I had a queue of work before I attempted the same with one shell, so thank you for saving my time !!!!
[28 Nov 2017 11:07] chunyang xu
thanks to  both of you  for helping me .   let me understand it absoulty .