Bug #68143 Validity of LDML collations is checked too late
Submitted: 22 Jan 2013 15:09 Modified: 22 Jan 2013 18:23
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.1 OS:Any
Assigned to: CPU Architecture:Any

[22 Jan 2013 15:09] Hartmut Holzgraefe
Description:
If a LDML collation definition is valid XML but contains logic errors, like a <reset> rule not being followed by a shift rule (<p>, <s>, <t>), no error is raised at load time. 

The user defined collation will also show up in SHOW COLLATION just fine, but when trying to use it a misleading "Unknown collation: 'collation_name'" error will be raised

How to repeat:
Add a simple test collation like

  <collation name="utf8_test" id="253">
    <rules>
      <reset>A</reset>
    </rules>
  </collation>

to the utf8 section of share/charsets/Index.xml and restart the mysql server.
Check server error log to see that no errors or warnings about the utf8_test collation were logged.

Then try to use the collation:

First verify that the collation was actually loaded:

mysql> show collation like 'utf8_test';
+-----------+---------+-----+---------+----------+---------+
| Collation | Charset | Id  | Default | Compiled | Sortlen |
+-----------+---------+-----+---------+----------+---------+
| utf8_test | utf8    | 254 |         |          |       8 |
+-----------+---------+-----+---------+----------+---------+
1 row in set (0.01 sec)

Now try to actually use the collation

mysql> create table t1(id int primary key, c char(1) collate utf8_test);
ERROR 1273 (HY000): Unknown collation: 'utf8_test2'

"Unknown collation" isn't really helpful here as it clearly contradicts the SHOW COLLATION output.

SHOW WARNINGS gives more details:

mysql> show warnings;
+---------+------+---------------------------------+
| Level   | Code | Message                         |
+---------+------+---------------------------------+
| Error   | 1273 | Unknown collation: 'utf8_test2' |
| Warning | 1273 | Shift expected at ''            |
+---------+------+---------------------------------+
2 rows in set (0.00 sec)

So from the warning we can conclude that there may actually be something wrong with the collation definition and that "Unknown" should actually mean "Invalid" ...

Suggested fix:
* Verify that a user defined collation is actually usable at load time already

* Change error message from "Unknown" to "Invalid", "Malformed", or something similar ...
[22 Jan 2013 15:15] Hartmut Holzgraefe
There's a mix of "utf8_test" and "utf8_test2" in my original report text, this is an editing/copying mistake on my side. I ran tests with two different collations and didn't pay attention when copying messages over ...
[22 Jan 2013 18:23] Sveta Smirnova
Thank you for the report.

Verified as described.