Bug #70372 UTF-8 Character causes mysql to lose data
Submitted: 17 Sep 2013 19:31 Modified: 18 Sep 2013 15:14
Reporter: Chad Thomas Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S1 (Critical)
Version:5.5.31-0+wheezy1 OS:Linux (Debian)
Assigned to: CPU Architecture:Any

[17 Sep 2013 19:31] Chad Thomas
Description:
UTF 8 character causes mysql to drop all characters after utf8 character is introduced.

INSERT into `test_table` VALUES ('Everything up to this point is kept.

How to repeat:
DROP DATABASE IF EXISTS `utf8-test-database`;

CREATE DATABASE `utf8-test-database`;

DROP TABLE IF EXISTS `test_table`;

CREATE TABLE `test_table` (
  `meta_value` longtext
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT into `test_table` VALUES ('Everything up to this point is kept.
[17 Sep 2013 19:43] Chad Thomas
Test SQL file

Attachment: utf8-test-database_2013-09-17.sql (application/octet-stream, text), 1.24 KiB.

[17 Sep 2013 19:45] Chad Thomas
It looks like the bug tracker is effected by this bug as well, I pasted in steps to reproduce earlier in the ticket and my report got cut off after It hit a unicode character.
[17 Sep 2013 19:45] Chad Thomas
I tested on: 5.5.31-0+wheezy1
[17 Sep 2013 21:22] Peter Laursen
@Chad ..

a tip: attach your SQL test case as a plain text file!  HTML formatting sometimes corrupts (if there is a conflictg with HTML/XML characters that need to be encoded of if using > or characters that define a HTML tag (> or < )!

-- Peter
(not a MySQL/Oracle person)
[17 Sep 2013 21:45] MySQL Verification Team
Please try utf8mb4 instead of utf8: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html . Thanks.
[17 Sep 2013 22:07] Chad Thomas
Updated test case

Attachment: utf8-test-database_2013-09-17.sql (application/octet-stream, text), 1.24 KiB.

[17 Sep 2013 22:09] Chad Thomas
utf8mb4 does in fact work. Should the default functionality of utf8 be to drop all characters after a utf8 character is found?
[17 Sep 2013 22:09] Chad Thomas
OR rather a utf8 character with 3 or more bytes.
[18 Sep 2013 15:14] MySQL Verification Team
Thank you for the feedback. According mentioned in the Manual UTF8 uses a maximum of three bytes per character and contains only BMP characters. If you need beyond that then use utf8mb4 character set. Thanks.