Bug #71995 | LDML collation size is limited | ||
---|---|---|---|
Submitted: | 10 Mar 2014 15:56 | Modified: | 31 Mar 2014 13:36 |
Reporter: | Александр Евстигнеев | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.5.36, 5.5.37 | OS: | Windows (Win7, 64 bit, FreeBSD 9.2) |
Assigned to: | Georgi Kodinov | CPU Architecture: | Any |
Tags: | charset, collation, LDML, utf8 |
[10 Mar 2014 15:56]
Александр Евстигнеев
[11 Mar 2014 8:52]
Hartmut Holzgraefe
It worked fine for me when I created "Unicode distinct case-insensitive" which is a pretty big one: http://www.skysql.com/blogs/hartmut/adding-case-insensitive-distinct-unicode-collation ftp://ftp.skysql.com/downloads/hartmut/utf8_distinct_ci.xml You may be running into one of these though: * Bug #65593 "parse errors in loadable UCA / LDML collations are silently ignored" http://bugs.mysql.com/bug.php?id=65593 * bug #68142 "UCA / LDML parser does not complain about invalid/unsupported backslash sequence" http://bugs.mysql.com/bug.php?id=68142 * bug #68143 "Validity of LDML collations is checked too late" http://bugs.mysql.com/bug.php?id=68143 * bug #68144 "Collation name missing from log messages about LDML definition problems" http://bugs.mysql.com/bug.php?id=68144
[11 Mar 2014 11:18]
Александр Евстигнеев
Looks like you are right, but need to do some additional testing. In case you are right, there is a mistake in my LDML data, which prevents it from parsing from some position. Is there any way to find and fix it? I've checked manually and found nothing...
[15 Mar 2014 11:49]
Александр Евстигнеев
UTF text file to define characters ordering
Attachment: utfsource.txt (text/plain), 475 bytes.
[15 Mar 2014 11:50]
Александр Евстигнеев
Ruleset, generated from UTF source file
Attachment: rules.txt (text/plain), 3.33 KiB.
[15 Mar 2014 12:00]
Александр Евстигнеев
Collation testing query
Attachment: test.sql (application/octet-stream, text), 19.99 KiB.
[15 Mar 2014 12:01]
Александр Евстигнеев
Ok, i've done some additional testings and research. 1) Your huge collation file from ftp://ftp.skysql.com/downloads/hartmut/utf8_distinct_ci.xml not working in version i've specified. Tested on Windows 7 64-bits. Symptoms are the same. 2) I wrote a little script in Perl that converts file like: http://bugs.mysql.com/file.php?id=21172&bug_id=71995 To the rules for collation files. Here is what i've got: http://bugs.mysql.com/file.php?id=21173&bug_id=71995 (in my case collation named utf_test_ci and have id = 253) So it's pretty correct and there are no human mistakes. Also i've wrote a script, that generates a test query from the same file: http://bugs.mysql.com/file.php?id=21174&bug_id=71995 What we have: It seems that only first 96 rules are working fine. Others are ignored. If you comment few 20 rules without breaking format, you'll see that next 20 rules will start working. So now i'm really sure about kinda buffering problem or rules number limit.
[15 Mar 2014 12:32]
Александр Евстигнеев
Tested this collation section with mySQL 5.6.16 - Works great, so it's defenitely a problem of 5.5.36
[15 Mar 2014 12:34]
Александр Евстигнеев
Ocassionaly changed version and OSes of original ticket. Sorry.
[17 Mar 2014 20:14]
Sveta Smirnova
Thank you for the report. Verified as described. Version 5.6 is not affected.