Bug #65593 | parse errors in loadable UCA / LDML collations are silently ignored | ||
---|---|---|---|
Submitted: | 12 Jun 2012 21:22 | Modified: | 22 Jan 2013 15:20 |
Reporter: | Hartmut Holzgraefe | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.5.21, 5.5.23 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[12 Jun 2012 21:22]
Hartmut Holzgraefe
[13 Jun 2012 6:07]
Valeriy Kravchuk
Thank you for the bug report. Verified with 5.5.23 on Windows also: ... mysql> SELECT * FROM phonebook ORDER BY phone; +-------+--------------------+ | name | phone | +-------+--------------------+ | Sanja | +380 (912) 8008005 | | Bar | +7-912-800-80-01 | | Svoj | +7 912 800 80 02 | | Ramil | (7912) 800 80 03 | | Hf | +7 (912) 800 80 04 | +-------+--------------------+ 5 rows in set (0.03 sec) mysql> exit Bye C:\Program Files\MySQL\MySQL Server 5.5\bin>net stop mysql55 The MySQL55 service is stopping. The MySQL55 service was stopped successfully. C:\Program Files\MySQL\MySQL Server 5.5\bin>net start mysql55 The MySQL55 service is starting. The MySQL55 service was started successfully. C:\Program Files\MySQL\MySQL Server 5.5\bin>mysql -uroot -proot -P3312 test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.23 MySQL Community Server (GPL) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> SELECT * FROM phonebook ORDER BY phone; ERROR 1273 (HY000): Unknown collation 'utf8_phone_ci' in table 'phonebook' defin ition But no errors in the error log: 120613 9:04:55 [Note] Event Scheduler: Loaded 0 events 120613 9:04:55 [Note] C:\Program Files\MySQL\MySQL Server 5.5\bin\mysqld: ready for connections. Version: '5.5.23' socket: '' port: 3312 MySQL Community Server (GPL)
[29 Jul 2012 23:17]
Paul DuBois
Noted in 5.6.6 changelog. Parse errors that occurred while loading UCA or LDML collation descriptions were not written to the error log.
[15 Jan 2013 12:58]
Hartmut Holzgraefe
While this is fixed for actual XML parsing problems (e.g. wrong tag spelling) it still isn't for the given "how to reproduce" example * \0000 (with missing 'u') is accepted even though http://dev.mysql.com/doc/refman/5.6/en/ldml-rules.html says that character names can be written literally or in \u#### format only ... i'm not sure how \0 is going to be interpreted here, maybe it is somehow valid but not mentioned? * Anyway, the collation using just that single <reset>\0000</reset> rule is accepted, it shows in SHOW COLLATION just fine, but when actually trying to use it: > SHOW COLLATION LIKE 'utf8_test'; +-----------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-----------+---------+-----+---------+----------+---------+ | utf8_test | utf8 | 253 | | | 8 | +-----------+---------+-----+---------+----------+---------+ > create table t1(id int primary key,d char collate utf8_revdig_ci); ERROR 1273 (HY000): Unknown collation: 'utf8_revdig_ci' The only extra error message now is a very obscure Shift expected at '' both in the output of SHOW WARNINGS and in the error log ...
[15 Jan 2013 13:01]
Hartmut Holzgraefe
the utf8_test / utf8_revdig name mismatch was a copy/paste error on my side ... in the actual test cases the name was either utf8_test or utf8_revdig consistently ...
[15 Jan 2013 13:13]
Hartmut Holzgraefe
Ok, the actual error is that no shift rules (<p>,<s>, <t>) are given after the reset rule, regardless of its content, the same effect can be seen when using <collation name="utf8_test" id="253"> <rules> <reset>A</reset> </rules> </collation> So things come down to these distinct problems: * it is not clear whether a backslash in front of anything else but a 'u' is valid at all, and how it is interpreted if it is indeed valid syntax ... * a <reset> not followed by a shift rule is not supported, but is reported in a very obscure way at best (neither mentioning the name of the collation nor the name of the <reset> rule, so effectively just saying "somethings wrong somewhere ... or so ..." * validity of collations is not checked at load time but only later at use time
[16 Jan 2013 13:18]
Erlend Dahl
Hartmut, if you still have concerns, please file a new bug. Continuing the discussion here will just make us lose track of the issue.
[22 Jan 2013 15:20]
Hartmut Holzgraefe
Ok, refiled as * bug #68142 "UCA / LDML parser does not complain about invalid/unsupported backslash sequence" * bug #68143 "Validity of LDML collations is checked too late" * bug #68144 "Collation name missing from log messages about LDML definition problems"