MySQL Bugs: #12404: LOWER() lowercases characters in UTF-8 incorrectly

Bug #12404	LOWER() lowercases characters in UTF-8 incorrectly
Submitted:	5 Aug 2005 19:59	Modified:	26 Aug 2005 16:10
Reporter:	Jeremy Cole (Basic Quality Contributor) (OCA)	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	4.1.13	OS:	Any (All)
Assigned to:	Alexander Barkov	CPU Architecture:	Any

Description:
MySQL 4.1.13 lowercases characters incorrectly.  It appears that when lowercasing a character shortens its byte-length(in our example, LATIN CAPITAL LETTER I WITH DOT ABOVE) the new character is not properly truncated.

In the example provided, the character 0xC4B0 should become 0x69 when lowercased, but it appears to carry over the B0 from its capitalized representation, becoming 0x69B0, which is incorrect.

I've tested this against 5.0.7, and it appears to work correctly.  I get back 0x69 for lower(t).  Maybe this was fixed in 5.0 but never backported to 4.1?

How to repeat:
DROP TABLE IF EXISTS u;
CREATE TABLE u (
  t varchar(255) character set utf8 collate utf8_bin NOT NULL 
) ;

insert into u (t) values (0xc4b0);

select t, hex(t), lower(t), hex(lower(t)) from u;

Confirmed that 5.0.10 exhibits the correct behavior, while 4.1.13 handles this case incorrectly.

This bug required significant changes.  It was intentionally decided
to fix this problem only 5.0.x. to avoid big changes in 4.1.

Setting correct status.

This bug is fixed in MySQL 5.0. The decision was made not to backport the fix to the 4.1 tree because significant changes were required to accomodate this fix.

The limitation is now documented in MySQL Manual Section 10.11.1. Unicode Character Sets [http://dev.mysql.com/doc/mysql/en/charset-unicode-sets.html].