Bug #51976 LDML collations issue (cyrillic example)
Submitted: 12 Mar 2010 7:06 Modified: 18 Jun 2010 1:13
Reporter: Alexandr Evstigneev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S1 (Critical)
Version:5.5.2-m2, 5.1.29-rc-log, 5.1, 5.5.99 bzr OS:FreeBSD (7.0. 64bit)
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: collation, cyrillic, LDML
Triage: Triaged: D2 (Serious) / R1 (None/Negligible) / E2 (Low)

[12 Mar 2010 7:06] Alexandr Evstigneev
Description:
My task was to teach mysql to make difference between russian ie and io.
Used LDML addition, as described here: http://dev.mysql.com/doc/refman/5.1/en/adding-collation-unicode-uca.html

Works perfectly, means mysql feels difference now. But.
If i use lcase/ucase functions on column with custom collation, server 5.1. crushes, server 5.2. give me empty result.

How to repeat:
Add to mysql/charsets/Index.xml, in utf8 section my own collation:

<collation name="utf8_russian_ci" id="253">
    <rules>
	<reset>\u0415</reset><p>\u0401</p><t>\u0451</t>
</rules>
</collation>

Than create table:
create table _example(
a char(4) not null 
)character set utf8 collate utf8_russian_ci;

And fill it with data:
insert into _example (a) values
('а'),
('б'),
('в'),
('г'),
('д'),
('е'),
('ё'),
('ж'),
('з'),
('и'),
('й'),
('к'),
('л'),
('м'),
('н'),
('о'),
('п'),
('р'),
('с'),
('т'),
('у'),
('ф'),
('х'),
('ц'),
('ч'),
('ш'),
('щ'),
('ь'),
('ы'),
('ъ'),
('э'),
('ю'),
('я'),
('А'),
('Б'),
('В'),
('Г'),
('Д'),
('Е'),
('Ё'),
('Ж'),
('З'),
('И'),
('Й'),
('К'),
('Л'),
('М'),
('Н'),
('О'),
('П'),
('Р'),
('С'),
('Т'),
('У'),
('Ф'),
('Х'),
('Ц'),
('Ч'),
('Ш'),
('Щ'),
('Ь'),
('Ы'),
('Ъ'),
('Э'),
('Ю'),
('Я');

Than make select lcase(a) from _example;
And voala.
[12 Mar 2010 7:21] Susanne Ebrecht
Many thanks for writing a bug report.

I am not able to repeat this by using MySQL 5.1.45.

Are you sure the data are stored correct in your database?

Please provide output from

SELECT a, length(a), hex(a) FROM _example;
[12 Mar 2010 7:28] Alexandr Evstigneev
5.5.2-m2

mysql> SELECT a, lcase(a), length(a), hex(a) FROM _example;
+----+----------+-----------+--------+
| a  | lcase(a) | length(a) | hex(a) |
+----+----------+-----------+--------+
| а  |          |         2 | D0B0   |
| б  |          |         2 | D0B1   |
| в  |          |         2 | D0B2   |
| г  |          |         2 | D0B3   |
| д  |          |         2 | D0B4   |
| е  |          |         2 | D0B5   |
| ё  |          |         2 | D191   |
| ж  |          |         2 | D0B6   |
| з  |          |         2 | D0B7   |
| и  |          |         2 | D0B8   |
| й  |          |         2 | D0B9   |
| к  |          |         2 | D0BA   |
| л  |          |         2 | D0BB   |
| м  |          |         2 | D0BC   |
| н  |          |         2 | D0BD   |
| о  |          |         2 | D0BE   |
| п  |          |         2 | D0BF   |
| р  |          |         2 | D180   |
| с  |          |         2 | D181   |
| т  |          |         2 | D182   |
| у  |          |         2 | D183   |
| ф  |          |         2 | D184   |
| х  |          |         2 | D185   |
| ц  |          |         2 | D186   |
| ч  |          |         2 | D187   |
| ш  |          |         2 | D188   |
| щ  |          |         2 | D189   |
| ь  |          |         2 | D18C   |
| ы  |          |         2 | D18B   |
| ъ  |          |         2 | D18A   |
| э  |          |         2 | D18D   |
| ю  |          |         2 | D18E   |
| я  |          |         2 | D18F   |
| А  |          |         2 | D090   |
| Б  |          |         2 | D091   |
| В  |          |         2 | D092   |
| Г  |          |         2 | D093   |
| Д  |          |         2 | D094   |
| Е  |          |         2 | D095   |
| Ё  |          |         2 | D081   |
| Ж  |          |         2 | D096   |
| З  |          |         2 | D097   |
| И  |          |         2 | D098   |
| Й  |          |         2 | D099   |
| К  |          |         2 | D09A   |
| Л  |          |         2 | D09B   |
| М  |          |         2 | D09C   |
| Н  |          |         2 | D09D   |
| О  |          |         2 | D09E   |
| П  |          |         2 | D09F   |
| Р  |          |         2 | D0A0   |
| С  |          |         2 | D0A1   |
| Т  |          |         2 | D0A2   |
| У  |          |         2 | D0A3   |
| Ф  |          |         2 | D0A4   |
| Х  |          |         2 | D0A5   |
| Ц  |          |         2 | D0A6   |
| Ч  |          |         2 | D0A7   |
| Ш  |          |         2 | D0A8   |
| Щ  |          |         2 | D0A9   |
| Ь  |          |         2 | D0AC   |
| Ы  |          |         2 | D0AB   |
| Ъ  |          |         2 | D0AA   |
| Э  |          |         2 | D0AD   |
| Ю  |          |         2 | D0AE   |
| Я  |          |         2 | D0AF   |
+----+----------+-----------+--------+
66 rows in set (0.00 sec)
[12 Mar 2010 7:59] Alexandr Evstigneev
5.1.29 crushes with query SELECT a, lcase(a), length(a), hex(a) FROM _example;
All tables marked as damaged.

Here is SELECT a, length(a), hex(a) FROM _example;

mysql> SELECT a, length(a), hex(a) FROM _example;
+------+-----------+--------+
| a    | length(a) | hex(a) |
+------+-----------+--------+
| а   |         2 | D0B0   |
| б   |         2 | D0B1   |
| в   |         2 | D0B2   |
| г   |         2 | D0B3   |
| д   |         2 | D0B4   |
| е   |         2 | D0B5   |
| ё   |         2 | D191   |
| ж   |         2 | D0B6   |
| з   |         2 | D0B7   |
| и   |         2 | D0B8   |
| й   |         2 | D0B9   |
| к   |         2 | D0BA   |
| л   |         2 | D0BB   |
| м   |         2 | D0BC   |
| н   |         2 | D0BD   |
| о   |         2 | D0BE   |
| п   |         2 | D0BF   |
| р   |         2 | D180   |
| с   |         2 | D181   |
| т   |         2 | D182   |
| у   |         2 | D183   |
| ф   |         2 | D184   |
| х   |         2 | D185   |
| ц   |         2 | D186   |
| ч   |         2 | D187   |
| ш   |         2 | D188   |
| щ   |         2 | D189   |
| ь   |         2 | D18C   |
| ы   |         2 | D18B   |
| ъ   |         2 | D18A   |
| э   |         2 | D18D   |
| ю   |         2 | D18E   |
| я   |         2 | D18F   |
| А   |         2 | D090   |
| Б   |         2 | D091   |
| В   |         2 | D092   |
| Г   |         2 | D093   |
| Д   |         2 | D094   |
| Е   |         2 | D095   |
| Ё   |         2 | D081   |
| Ж   |         2 | D096   |
| З   |         2 | D097   |
| И   |         2 | D098   |
| Й   |         2 | D099   |
| К   |         2 | D09A   |
| Л   |         2 | D09B   |
| М   |         2 | D09C   |
| Н   |         2 | D09D   |
| О   |         2 | D09E   |
| П   |         2 | D09F   |
| Р   |         2 | D0A0   |
| С   |         2 | D0A1   |
| Т   |         2 | D0A2   |
| У   |         2 | D0A3   |
| Ф   |         2 | D0A4   |
| Х   |         2 | D0A5   |
| Ц   |         2 | D0A6   |
| Ч   |         2 | D0A7   |
| Ш   |         2 | D0A8   |
| Щ   |         2 | D0A9   |
| Ь   |         2 | D0AC   |
| Ы   |         2 | D0AB   |
| Ъ   |         2 | D0AA   |
| Э   |         2 | D0AD   |
| Ю   |         2 | D0AE   |
| Я   |         2 | D0AF   |
+------+-----------+--------+
66 rows in set (0.08 sec)
[12 Mar 2010 8:44] Susanne Ebrecht
Did you also change mysys/charset-def.c and config/ac-macros/character_sets.m4?

Did you re-compile the code after adding the collation?

The full instruction how to add a new charset and/or new collation you will find here:

http://dev.mysql.com/doc/refman/5.1/en/adding-character-set.html
[12 Mar 2010 8:51] Alexandr Evstigneev
No i didn't. Because i'm not adding a charset, only collation. And manual about collations plainly says: "UCA collations for Unicode character sets can be added to MySQL without recompiling by using a subset of the Locale Data Markup Language (LDML),"
[12 Mar 2010 9:39] Susanne Ebrecht
Please provide the xml file.
[12 Mar 2010 9:46] Alexandr Evstigneev
full charsets/Index.xml with utf8_russian_ci collation added

Attachment: Index.xml (text/xml), 18.55 KiB.

[12 Mar 2010 9:46] Alexandr Evstigneev
Uploaded to files.
[14 Mar 2010 9:19] Sveta Smirnova
Thank you for the feedback.

Crash is only repeatable with old 5.1 versions, current 5.1 returns set of empty strings as and 5.5 series.

Wrong results verified as described.
[14 Mar 2010 9:56] Alexandr Evstigneev
WinXP 5.1.44 got the same problem - result is empty.
[15 Mar 2010 7:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103172

3400 Alexander Barkov	2010-03-15
      Bug #51976 LDML collations issue
      Problem: caseup_multiply and casedb_multiply members
      where not initialized for a dynamic collation, so
      UPPER() and LOWER() functions returned empty strings.
      Fix: initializing the members properly.
[22 Mar 2010 12:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103975

3410 Alexander Barkov	2010-03-22
      Bug #51976 LDML collations issue      
      
      Problem: caseup_multiply and casedn_multiply members      
      were not initialized for a dynamic collation, so          
      UPPER() and LOWER() functions returned empty strings.      
      Fix: initializing the members properly.
      
      Adding tests:
        mysql-test/r/ctype_ldml.result
        mysql-test/t/ctype_ldml.test
      
      Applying the fix:
        mysys/charset.c
[22 Mar 2010 13:17] Alexander Barkov
Pushed into mysql-5.1-bugteam (5.1.46)
Pushed into mysql-pe (6.0.14-alpha)
[26 Mar 2010 8:21] Bugs System
Pushed into 5.5.4-m3 (revid:alik@sun.com-20100326080914-2pz8ns984e0spu03) (version source revid:alexey.kopytov@sun.com-20100322132851-8j3m42x4ldi1kca5) (merge vers: 5.5.3-m2) (pib:16)
[26 Mar 2010 8:25] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100326081116-m3v4l34yhr43mtsv) (version source revid:alik@sun.com-20100325072612-4sds00ix8ajo1e84) (pib:16)
[26 Mar 2010 8:30] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100326081944-qja07qklw1p2w7jb) (version source revid:alik@sun.com-20100325073410-4t4i9gu2u1pge7xb) (merge vers: 6.0.14-alpha) (pib:16)
[6 Apr 2010 7:58] Bugs System
Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:bar@mysql.com-20100322122759-97i1u39pndttjde2) (merge vers: 5.1.46) (pib:16)
[12 Apr 2010 22:12] Paul Dubois
Noted in 5.1.46, 5.5.5, 6.0.14 changelogs.

For LDML-defined collations, some data structures were not
initialized properly to enable UPPER() and LOWER() to work correctly.
[17 Jun 2010 11:56] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:35] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609211156-tsac5qhw951miwtt) (merge vers: 5.1.46-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:22] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)