Bug #56127 STRICT Mode UTF-8 validation gives 1366 Error for valid string
Submitted: 19 Aug 2010 20:54 Modified: 21 Sep 2010 10:10
Reporter: Cole Nielsen Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.1.49-community OS:Windows (XP Pro SP3)
Assigned to: CPU Architecture:Any
Tags: strict mode, UTF-8 - seems there's a 2 byte limit

[19 Aug 2010 20:54] Cole Nielsen
Description:
When the server is set in Strict mode, attempting to write a valid UTF-8 String containing 3 byte characters results in a MySQL 1366 Error citing the first two bytes of the character.

The default charsets are: 
server: UTF-8
DB: UTF-8
Table: UTF-8

Prior to attempting to write the data, running a "SET NAMES 'UTF8'" doesn't help.

I tried both Collations utf8_general_ci and utf8_unicode_ci. I also arbitrarily tried a few others but I'm not able to recall which. 

Reviewing the documentation turned up nothing about a UTF-8 2-Byte limit.

There is a possibility that this limitation is the result of the PHP MySQL extensions but it's odd that disabling strict mode on the server resolves the issue.

How to repeat:
I used PHP both MySQL and MySQLi libraries with STRICT mode enabled on the MySQL Server

1. Connect
2. Set Names 'utf8'
3. Create table with UTF-8 charset with a varchar field
4. Generate a UTF-8 String with 1, 2, 3 and 4 byte characters in it
5. Pass it through a regex validation, validating byte ranges and grouping
6. if valid, attempt to store in the database with an INSERT Statement

Suggested fix:
Strict mode validation of UTF-8 should account for at least 1, 2, 3, and 4 byte characters.
[21 Aug 2010 10:10] Sveta Smirnova
Thank you for the report.

Please provide PHP code demonstrating how do you do INSERT.
[21 Sep 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".