Bug #12404 LOWER() lowercases characters in UTF-8 incorrectly
Submitted: 5 Aug 2005 19:59 Modified: 26 Aug 2005 16:10
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.13 OS:Any (All)
Assigned to: Alexander Barkov CPU Architecture:Any

[5 Aug 2005 19:59] Jeremy Cole
Description:
MySQL 4.1.13 lowercases characters incorrectly.  It appears that when lowercasing a character shortens its byte-length(in our example, LATIN CAPITAL LETTER I WITH DOT ABOVE) the new character is not properly truncated.

In the example provided, the character 0xC4B0 should become 0x69 when lowercased, but it appears to carry over the B0 from its capitalized representation, becoming 0x69B0, which is incorrect.

I've tested this against 5.0.7, and it appears to work correctly.  I get back 0x69 for lower(t).  Maybe this was fixed in 5.0 but never backported to 4.1?

How to repeat:
DROP TABLE IF EXISTS u;
CREATE TABLE u (
  t varchar(255) character set utf8 collate utf8_bin NOT NULL 
) ;

insert into u (t) values (0xc4b0);

select t, hex(t), lower(t), hex(lower(t)) from u;
[5 Aug 2005 20:15] Kolbe Kegel
Confirmed that 5.0.10 exhibits the correct behavior, while 4.1.13 handles this case incorrectly.
[8 Aug 2005 14:19] Alexander Barkov
This bug required significant changes.  It was intentionally decided
to fix this problem only 5.0.x. to avoid big changes in 4.1.
[9 Aug 2005 13:12] MySQL Verification Team
Setting correct status.
[26 Aug 2005 16:10] Kolbe Kegel
This bug is fixed in MySQL 5.0. The decision was made not to backport the fix to the 4.1 tree because significant changes were required to accomodate this fix.

The limitation is now documented in MySQL Manual Section 10.11.1. Unicode Character Sets [http://dev.mysql.com/doc/mysql/en/charset-unicode-sets.html].