Bug #37129 LDML lacks <i> rule
Submitted: 1 Jun 2008 14:37 Modified: 7 Mar 2010 18:24
Reporter: Alexander Barkov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[1 Jun 2008 14:37] Alexander Barkov
Description:
A new article was recently added to the manual:
http://dev.mysql.com/doc/refman/5.1/en/adding-collation-unicode-uca.html

LDML specifications at http://www.unicode.org/reports/tr35/
declare usage of <i> for "identical" shift rule. MySQL LDML
parser supports only <p>, <s> and <t> shift rules, and <i>
is not supported.

Using <s> shift rules in the above manual section
in the example for the phonebook collation definition
is wrong here:

<rules>
  <reset>\u0000</reset>
  <s>\u0020</s> <!-- space -->
  <s>\u0028</s> <!-- l p -->
  <s>\u0029</s> <!-- r p -->
  <s>\u002B</s> <!-- plus -->
  <s>\u002D</s> <!-- hyphen -->
</rules>

The correct rules are:

<rules>
  <reset>\u0000</reset>
  <i>\u0020</i> <!-- space -->
  <i>\u0028</i> <!-- l p -->
  <i>\u0029</i> <!-- r p -->
  <i>\u002B</i> <!-- plus -->
  <i>\u002D</i> <!-- hyphen -->
</rules>

<s> is merely used because of missing <i> rule.
It's now unimportant, however it will be important
as soon as we implement
"WL#896 Primary, Secondary and Tertiary Sorts".

It's better to use the correct rules in the examples right
from the beginning.

How to repeat:
Try to change rules to:

<rules>
  <reset>\u0000</reset>
  <i>\u0020</i> <!-- space -->
  <i>\u0028</i> <!-- l p -->
  <i>\u0029</i> <!-- r p -->
  <i>\u002B</i> <!-- plus -->
  <i>\u002D</i> <!-- hyphen -->
</rules>

and follow the "how to add a collation" instructions.
MySQL will fails to add this collation.

Suggested fix:
Implement <i> shift rule and fix the manual section.
[12 Sep 2008 11:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53957

2834 Alexander Barkov	2008-09-12
      Bug#37129 LDML lacks <i> rule
      Problem: LDML didn't understand '<i>' tag in
      character set definition file Index.xml.
      Manual incorrectly used '<s>' instead of '<i>' in:
      http://dev.mysql.com/doc/refman/5.1/en/adding-collation-unicode-uca.html
      Fix: Adding support for '<i>' tag.
      Manual should be changed accordingly.
[12 Sep 2008 11:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53960

2834 Alexander Barkov	2008-09-12
      Bug#37129 LDML lacks <i> rule
      Problem: LDML didn't understand '<i>' tag in
      character set definition file Index.xml.
      Manual incorrectly used '<s>' instead of '<i>' in:
       http://dev.mysql.com/doc/refman/5.1/en/adding-collation-unicode-uca.html
      Fix:
      - Adding support for '<i>' tag. Manual should be changed to use '<i>'.
      - Adding tests for the fixed version of the collation "utf8_phone_ci"
      (from the above manual article).
[31 Oct 2008 12:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57553

2898 Alexander Barkov	2008-10-31
      Bug#37129 LDML lacks <i> rule
      Problem: LDML didn't understand '<i>' tag in
      character set definition file Index.xml.
      Manual incorrectly used '<s>' instead of '<i>' in:
       http://dev.mysql.com/doc/refman/5.1/en/adding-collation-unicode-uca.html
      Fix:
      - Adding support for '<i>' tag. Manual should be changed to use '<i>'.
      - Adding tests for the fixed version of the collation "utf8_phone_ci"
      (from the above manual article).
      
      ------------- This line and the following will be ignored --------------
      
      modified:
        mysql-test/r/ctype_ldml.result
        mysql-test/std_data/Index.xml
        mysql-test/t/ctype_ldml.test
        strings/ctype-uca.c
        strings/ctype.c
      unknown:
        LOG
        nohup.out
        libmysql/probes.h@
        libmysql_r/probes.h@
        mysql-test/std_data/AAA
[31 Oct 2008 12:35] Alexander Barkov
Pushed into 6.0.8-bugteam.
[10 Nov 2008 10:54] Bugs System
Pushed into 6.0.8-alpha  (revid:bar@mysql.com-20081031122542-4x7lpd3zms9xsdo4) (version source revid:bar@mysql.com-20081031122542-4x7lpd3zms9xsdo4) (pib:5)
[11 Nov 2008 16:08] Paul DuBois
The version is actually 6.0.9.
[11 Nov 2008 17:43] Paul DuBois
Noted in 6.0.9 changelog.

MySQL support for adding collations using LDML specifications did not
support the <i> identity rule that indicates one character sorts
identically to another. The <i> rule now is supported. 

Also updated http://dev.mysql.com/doc/refman/6.0/en/adding-collation-unicode-uca.html accordingly.
[29 Oct 2009 13:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88584

2910 Alexander Barkov	2009-10-29
      Backporting Bug#37129 LDML lacks <i> rule
[9 Nov 2009 9:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89750

2941 Alexander Barkov	2009-11-09
      Backporting Bug#37129 LDML lacks <i> rule
[20 Nov 2009 12:58] Bugs System
Pushed into 6.0.14-alpha (revid:kostja@sun.com-20091120124947-yi6h2jbgw0kbciwm) (version source revid:epotemkin@mysql.com-20091109132131-ad1gk2d2tn9o5i3l) (merge vers: 6.0.14-alpha) (pib:13)
[11 Dec 2009 6:05] Bugs System
Pushed into 5.6.0-beta (revid:alik@sun.com-20091211055628-ltr7fero363uev7r) (version source revid:alik@sun.com-20091211055453-717czhtezc74u8db) (merge vers: 5.6.0-beta) (pib:13)
[11 Dec 2009 19:35] Paul DuBois
Noted in 5.6.0 changelog.
[6 Mar 2010 10:59] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20091211201717-03qf8ckwiw0np80p) (merge vers: 5.6.0-beta) (pib:16)
[7 Mar 2010 18:24] Paul DuBois
Moved 5.6.0 changelog entry to 5.5.3.