Bug #53005 Invalid UTF-8 characters cause inserted text to be truncated with no warning
Submitted: 20 Apr 2010 20:05 Modified: 26 Apr 2010 14:55
Reporter: Matthew Sielski Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: General Severity:S3 (Non-critical)
Version:5.0.77 OS:Linux
Assigned to: CPU Architecture:Any

[20 Apr 2010 20:05] Matthew Sielski
Description:
I apologize for submitting this against an old version (5.0.77) however it's the latest version which RHEL 5.5 supports.  I could not find this bug in the MySQL bug tracker, but if it is present and fixed then hopefully I can pass that information along to RHEL's bug tracker.  I am unable to install a version beyond 5.0.77 to test if this has been fixed subsequently.

When inserting text with invalid UTF-8 characters into a longtext field, the text is truncated at the first occurrence of an invalid character.

I don't know what the generally accepted way of handling invalid characters is, but deleting everything after the invalid characters with no warning seems wrong.  

I would expect one of the following:

1. Statement fails with error message.
2. Invalid characters are skipped and a warning is issued.
3. Current behavior continues but with a warning issued.
4. Invalid characters are stored unchanged (if that's even possible).

I ran my test at the command line with mysql client (5.0.77-4.el5_4.2) against mysql server (5.0.77-4.el5_4.2).

How to repeat:
SET NAMES "utf8";
CREATE DATABASE test_x DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
USE test_x;
CREATE TABLE test (
  body longtext NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO test (body) VALUES ('test before
[20 Apr 2010 20:08] Matthew Sielski
Even submitting this to the bug tracker caused the entry to be truncated - so, maybe this isn't a bug but the character I'm experiencing problems with is some kind of EOF character.  I can't paste it here, but it appears in Firefox as a box with 100 on the top row and 083 on the bottom row.
[21 Apr 2010 5:22] Sveta Smirnova
Thank you for the report.

Please save test case in *txt file, then atttach it to the bug report.
[21 Apr 2010 13:13] Matthew Sielski
Test case SQL

Attachment: mysql-53005.txt (text/plain), 266 bytes.

[23 Apr 2010 9:10] Sveta Smirnova
Thank you for the feedback.

Special character you use is outside of 3byte UTF8 character set. Please use version 5.5.3-m3 and character set utf8mb4 if you want to use such characters. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for details.
[23 Apr 2010 14:02] Matthew Sielski
Thank you for your response.  

I realize that the character in question is outside the valid range of UTF-8 and I don't actually want to store this character (although I did list it as one of four possible acceptable outcomes).

My main concern is that in this situation MySQL server:

1. is asked to perform an operation, 
2. fails to do so, and 
3. reports that it was successful.  

I can't imagine how this maintains data integrity.  

If the requested operation couldn't be completed then how is responding without an error or warning not a bug?
[23 Apr 2010 18:12] Sveta Smirnova
Thank you for the feedback.

It shows warning in my environment. Have you issued SHOW WARNINGS after statement?
[26 Apr 2010 14:55] Matthew Sielski
I'm sorry, yes it does show a warning.  My application was set to pass warnings on to me but it's not doing that - so it an error in my application.  I did not issue SHOW WARNINGS in my direct test case against MySQL.

Thanks for your help!