Bug #8610 The ucs2_turkish_ci collation fails with upper('i')
Submitted: 18 Feb 2005 21:10 Modified: 22 Jun 2005 0:09
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.3-alpha-debug OS:Linux (SUSE 9.2)
Assigned to: Alexander Barkov CPU Architecture:Any

[18 Feb 2005 21:10] Peter Gulutzan
If s1 has a Turkish collation, then UPPER('i') should return 'dotted I'.
If character set is latin5 collate latin5_turkish_ci, it does.
If character set is ucs2 collate ucs2_turkish_ci, it does not.
If character set is utf8 collate utf8_turkish_ci, it does not.

How to repeat:
mysql> create table tu (s1 char(2) character set ucs2 collate ucs2_turkish_ci);
Query OK, 0 rows affected (0.15 sec)

mysql> insert into tu values (0x0130) /* Capital I with dot above */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into tu values (0x0131) /* Small i with no dot */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into tu values (0x0049) /* I */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into tu values (0x0069) /* i */;
Query OK, 1 row affected (0.00 sec)

mysql> select hex(s1), hex(upper(s1)) from tu;
| hex(s1) | hex(upper(s1)) |
| 0130    | 0130           |
| 0131    | 0049           |
| 0049    | 0049           |
| 0069    | 0049           |
4 rows in set (0.00 sec)
[18 Feb 2005 21:18] Jorge del Conde
Verified w/5.0.3 from our bk
[28 Mar 2005 9:50] Alexander Barkov
It is relatively simple to fix it for UCS2.
It's harder to fix it for UTF8, as lower and
upper letters have different lengths. 

I suggest to move this problem from bug system to Worklog,
together with German UPPER('SHARP S') problem.

Moreover, nobody complains. So this is something of a low priority.
[9 Apr 2005 17:55] Jim Winstead
The characters in question are:


(Sorry, just using this as an excuse to verify UTF-8 support of bugs.mysql.com.)
[6 Jun 2005 11:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[6 Jun 2005 12:00] Alexander Barkov
Fixed in 5.0.7.
Won't fix in 4.1.x.
[22 Jun 2005 0:09] Mike Hillyer
Documented in 5.0.7 changelog:

<listitem><para>The ucs2_turkish_ci collation fails with upper('i').
    UPPER/LOWER now can return a string with different length. (Bug #8610)</para></listitem>