| Bug #28916 | LDML doesn't work for utf8 and is not described in the manual | ||
|---|---|---|---|
| Submitted: | 6 Jun 2007 8:10 | Modified: | 30 May 2008 17:26 |
| Reporter: | Alexander Barkov | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
| Version: | 5.0, 5.1 | OS: | Any |
| Assigned to: | Paul DuBois | CPU Architecture: | Any |
[6 Jun 2007 8:11]
Alexander Barkov
Diff file to add user defined Unicode collations using LDML
Attachment: Index.xml.diff (text/x-patch), 596 bytes.
[6 Jun 2007 12:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28193 ChangeSet@1.2515, 2007-06-06 17:09:59+05:00, bar@mysql.com +9 -0 Bug#28916 LDML doesn't work for utf8 and is not described in the manual - Adding missing initialization for utf8 collations - Minor code clean-ups: renaming variables, moving code into a new separate function. - Adding test, to check that both ucs2 and utf8 user defined collations work (ucs2_test_ci and utf8_test_ci) - Adding Vietnamese collation as a complex user defined collation example.
[7 Jun 2007 12:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28295 ChangeSet@1.2515, 2007-06-07 17:55:55+05:00, bar@mysql.com +9 -0 Bug#28916 LDML doesn't work for utf8 and is not described in the manual - Adding missing initialization for utf8 collations - Minor code clean-ups: renaming variables, moving code into a new separate function. - Adding test, to check that both ucs2 and utf8 user defined collations work (ucs2_test_ci and utf8_test_ci) - Adding Vietnamese collation as a complex user defined collation example.
[8 Jun 2007 8:35]
Alexander Barkov
Pushed into 5.0.44-rpl Pushed into 5.1.20-rpl To documentation team: This bug can be closed only after we have a new section in the manual, explaining how to add user defined Unicode collations using LDML. I'm going to write this section soon. Please wait for me :)
[21 Jun 2007 20:12]
Bugs System
Pushed into 5.0.46
[21 Jun 2007 20:15]
Bugs System
Pushed into 5.1.20-beta
[23 Jun 2007 8:16]
Jon Stephens
Waiting for info from Bar. :)
[1 Oct 2007 10:50]
Alexander Barkov
At the Heidelberg DevConf, Bar gave a session "How to add a collation". Paul now has all information to write a manual section on LDML using Bar's presentation.
[15 Nov 2007 15:21]
Paul DuBois
Changing category to Documentation, assigning to myself.
[30 May 2008 17:26]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products. The manuals now contain a new section on adding new collations: http://dev.mysql.com/doc/refman/4.1/en/adding-collation.html http://dev.mysql.com/doc/refman/5.0/en/adding-collation.html http://dev.mysql.com/doc/refman/5.1/en/adding-collation.html http://dev.mysql.com/doc/refman/6.0/en/adding-collation.html This covers simple collations for 8-bit character sets an LDML-based collations for Unicode character sets. The 4.1 manual does not have instructions for adding LDML collations because that is not supported in 4.1.

Description: Some times ago a possibility to add user defined Unicode collations was implemented. This feature does not require mysqld to be recompiled to add a new Unicode collation - it uses so called "Locale Data Markup Language (LDML)" which can be embedded directly into the character set and collation index file Index.xml. There are two problems with LDML implementation: 1. It works only for UCS2, but does not work for UTF8 2. It is not documented in the manual How to repeat: 1. Apply the patch (attached in the "files" section of this bug report) to the file Index.xml of your MySQL installation (typically /usr/share/mysql/charsets/Index.xml) It adds a similar user defined collation to UCS2 and UTF8: with a rule making letter 'b' compare the same to letter 'a'. 2. Run this script: # # Check if it works with UCS2 # drop table if exists t1; create table t1 (c1 char(1) character set ucs2 collate ucs2_test_ci); insert into t1 values ('a'); select * from t1 where c1='b'; # # Check that it works with UTF8 # drop table if exists t1; create table t1 (c1 char(1) character set utf8 collate utf8_test_ci); insert into t1 values ('a'); select * from t1 where c1='b'; 3. Check its output: mysql> drop table if exists t1; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> create table t1 (c1 char(1) character set ucs2 collate ucs2_test_ci); Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 values ('a'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1 where c1='b'; +------+ | c1 | +------+ | a | +------+ 1 row in set (0.00 sec) mysql> mysql> drop table if exists t1; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (c1 char(1) character set utf8 collate utf8_test_ci); ERROR 1273 (HY000): Unknown collation: 'utf8_test_ci' So it perfectly added the user defined collation "ucs2_test_ci" and correctly compared 'a' equal to 'b', but it failed to add "utf8_test_ci" and returned "Unknown collation" error. Suggested fix: 1. Fix the collation routines to be able to load Unicode collations for both UCS2 and UTF8 2. Add LDML description into the manual, so the users can easily add their own collations