Bug #29977 | MySQL Persian collation (utf8_persian_ci) incorrectly sorts Harakat | ||
---|---|---|---|
Submitted: | 23 Jul 2007 13:28 | Modified: | 29 Sep 2007 18:14 |
Reporter: | Roozbeh Pournader | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.0.22 | OS: | Linux (CentOS 5) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[23 Jul 2007 13:28]
Roozbeh Pournader
[7 Aug 2007 20:59]
MySQL Verification Team
Bug: http://bugs.mysql.com/bug.php?id=30277 was marked as duplicate of this one.
[24 Aug 2007 13:40]
Peter Gulutzan
Jody McIntyre, who submitted the Persian patch, did check ICU. See http://lists.mysql.com/internals/15841 These are the Unicode characters between 064B and 052: 064B;ARABIC FATHATAN 064C;ARABIC DAMMATAN 064D;ARABIC KASRATAN 064E;ARABIC FATHA 064F;ARABIC DAMMA 0650;ARABIC KASRA 0651;ARABIC SHADDA 0652;ARABIC SUKUN http://www.unicode.org/Public/UNIDATA/UnicodeData.txt Apparently those characters are Harakat, which are vowel marks, and are therefore like Hebrew niqqud. (There is a reference-manual comment about niqqud here: http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html ). Harakat are combining characters, so Bug#29977 is actually a feature request rather than a bug. Accordingly, I am changing the severity to S4. MySQL has two worklog tasks outstanding: - WL#898 Primary, Secondary and Tertiary Sorts - WL#3770 Unicode-compliant comparison and sorting of combining characters Those are not marked as 'private', so they are probably visible on forge.mysql.com, or they soon will be. Without first working on those tasks, Harakat may be difficult, but we wish the best of luck to patch submitters. Incidentally, I think Bug#30277 "Collation for Persian letters" in fact is not a duplicate, it is a non-bug. It appears that the writer of Bug#30277 was merely unaware that utf8_persian_ci exists.
[29 Sep 2007 18:14]
Valeriy Kravchuk
Thank you for a reasonable feature request.