| Bug #53005 | Invalid UTF-8 characters cause inserted text to be truncated with no warning | ||
|---|---|---|---|
| Submitted: | 20 Apr 2010 20:05 | Modified: | 26 Apr 2010 14:55 |
| Reporter: | Matthew Sielski | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: General | Severity: | S3 (Non-critical) |
| Version: | 5.0.77 | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
[20 Apr 2010 20:08]
Matthew Sielski
Even submitting this to the bug tracker caused the entry to be truncated - so, maybe this isn't a bug but the character I'm experiencing problems with is some kind of EOF character. I can't paste it here, but it appears in Firefox as a box with 100 on the top row and 083 on the bottom row.
[21 Apr 2010 5:22]
Sveta Smirnova
Thank you for the report. Please save test case in *txt file, then atttach it to the bug report.
[21 Apr 2010 13:13]
Matthew Sielski
Test case SQL
Attachment: mysql-53005.txt (text/plain), 266 bytes.
[23 Apr 2010 9:10]
Sveta Smirnova
Thank you for the feedback. Special character you use is outside of 3byte UTF8 character set. Please use version 5.5.3-m3 and character set utf8mb4 if you want to use such characters. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for details.
[23 Apr 2010 14:02]
Matthew Sielski
Thank you for your response. I realize that the character in question is outside the valid range of UTF-8 and I don't actually want to store this character (although I did list it as one of four possible acceptable outcomes). My main concern is that in this situation MySQL server: 1. is asked to perform an operation, 2. fails to do so, and 3. reports that it was successful. I can't imagine how this maintains data integrity. If the requested operation couldn't be completed then how is responding without an error or warning not a bug?
[23 Apr 2010 18:12]
Sveta Smirnova
Thank you for the feedback. It shows warning in my environment. Have you issued SHOW WARNINGS after statement?
[26 Apr 2010 14:55]
Matthew Sielski
I'm sorry, yes it does show a warning. My application was set to pass warnings on to me but it's not doing that - so it an error in my application. I did not issue SHOW WARNINGS in my direct test case against MySQL. Thanks for your help!

Description: I apologize for submitting this against an old version (5.0.77) however it's the latest version which RHEL 5.5 supports. I could not find this bug in the MySQL bug tracker, but if it is present and fixed then hopefully I can pass that information along to RHEL's bug tracker. I am unable to install a version beyond 5.0.77 to test if this has been fixed subsequently. When inserting text with invalid UTF-8 characters into a longtext field, the text is truncated at the first occurrence of an invalid character. I don't know what the generally accepted way of handling invalid characters is, but deleting everything after the invalid characters with no warning seems wrong. I would expect one of the following: 1. Statement fails with error message. 2. Invalid characters are skipped and a warning is issued. 3. Current behavior continues but with a warning issued. 4. Invalid characters are stored unchanged (if that's even possible). I ran my test at the command line with mysql client (5.0.77-4.el5_4.2) against mysql server (5.0.77-4.el5_4.2). How to repeat: SET NAMES "utf8"; CREATE DATABASE test_x DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci; USE test_x; CREATE TABLE test ( body longtext NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT INTO test (body) VALUES ('test before