Bug #44523 | Feature request/proposition: Croatian utf8 collation (utf8_croatian_ci) | ||
---|---|---|---|
Submitted: | 28 Apr 2009 15:32 | Modified: | 11 Nov 2010 17:14 |
Reporter: | Neven Jacmenovic | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.x, 6.x | OS: | Any |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
Tags: | collation, croatian, utf8, utf8_croatian_ci |
[28 Apr 2009 15:32]
Neven Jacmenovic
[29 Apr 2009 4:42]
Tomo Krajina
Yep, I agree, we croatian mysql users desperately need UTF8 collation.
[29 Apr 2009 8:06]
Tonci Grgin
Bok Neven and thanks for your report. This is a known problem and every so often we do talk about it. There is also a worklog for this but without much progress so far. The problem is in our letters Nj, Lj... as I'm informed. This needs different approach than is in effect today so I really don't know how long will it take to implement. https://intranet.mysql.com/worklog/Server-RawIdeaBin/?tid=3286 (internal): --<cut>-- The problem is that these collations do not support contractions: DŽ, LJ and NJ, which must be treated as single letters. Sorting order should be: A,B,C,Č,Ć,D,DŽ,Đ,E,F,G,H,I,J,K,L,LJ,M,N,NJ,O,P,Q,R,S,Š,T,U,V,W,X,Y,Z,Ž MySQL is also missing collations utf8_croatian_ci and ucs2_croatian_ci. --<cut>--
[29 Apr 2009 8:23]
Neven Jacmenovic
Bok Tonci majstore! Croatian is not the only language with contractions. I've been following experiments with Hungarian and Vietnamese contractions but I was unable to use same technique for Croatian utf8: http://bugs.mysql.com/file.php?id=6814 Best regards Neven
[29 Apr 2009 8:29]
Alexander Barkov
Hi Neven, Thank you very much for the reasonable request. We could not add Croatian collation so far because MySQL didn't support contractions between non-ASCII characters, so it was not possible to support dž correctly. Right now we're finishing this task: http://forge.mysql.com/worklog/task.php?id=2673 This patch (among other feature) makes possible to handle diagrams like dž correctly. The patch is already available and it's under code review. After code review is done, the patch will appear in a so called "feature preview" tree. After that, adding Croatian collation will be very simple - just a matter of half of an our. It is very likely that Croatian collation will appear in the same feature preview tree in May or June 2009, so you'll be able to download it and give it a try. I don't have estimation when Croatian will appear in a official release though at the moment.
[29 Apr 2009 8:30]
Tonci Grgin
Thanks for info provided Bar.
[29 Apr 2009 8:31]
Goran Ucpe
Hey Neven! Yes, i agree with you on the importance of this issue. Lately, more and more government institutions are requiring digital projects, and those projects are usually consisting of large amount of NAMES in databases (example: population counts, voting day ballots, employees working in state, etc.) These databases cannot be sorted correctly because of issues with NJ, LJ, and DŽ, and that is starting to causing problems on the state level.
[29 Apr 2009 8:32]
Neven Jacmenovic
Btw, I just realized that this ticket has been assigned to mr. Alexander Barkov who is original author of mentioned Hungarian experiment. Hi Alexander! :) The list of supporters for this feature is growing big in my original forum post: http://forums.mysql.com/read.php?20,260051,260051#msg-260051
[29 Apr 2009 8:44]
Neven Jacmenovic
Alexander - thank you for such a great news. You certainly made my day! I will be following progress on this like a hawk. Goran - yes my point exactly. So far, if sorting was an issue in one table then we could use old school hack with creating new column for order by clause which stored alternative names eg. Belančić -> belancxicy, Čutura -> cxutura etc. But those hacks slow down development and cause damage in long run. Hacking logic is done in application level and MySQL couldn't be use for more complex queries. Please keep me posted guys! Best regards Neven
[29 Apr 2009 8:56]
Damir Ribaric
We really need that feature! Thank you!
[7 Aug 2009 8:49]
Neven Jacmenovic
Hi guys, any update on progress of this? Thank in advance! Best regards Neven
[27 Nov 2009 12:45]
Tonci Grgin
Neven, good news! See http://www.collation-charts.org/articles/croatian.htm and http://forge.mysql.com/worklog/task.php?id=2673. All the problems/thoughts you might have using these patches feel free to report straight to me (if you can't express it in English) and I'll pass them to Bar. These links are also posted to Forum.hr under "mysql".
[30 Nov 2009 15:35]
Neven Jacmenovic
Great news, great news indeed my friend. Looking good so far! We even managed to apply the patch to 5.0.51 and we are working with Alexander Barkov on further tests. Here is test db dump: http://www.nivas.hr/pub/mysql_utf8_croatian_ci/test_croatian.sql And this is expected order by output: http://www.nivas.hr/pub/mysql_utf8_croatian_ci/output.txt
[11 Nov 2010 17:14]
Alexander Barkov
Croatian collation has been added into mysql-5.6. It's currently in documenting. See here for status updates: http://forge.mysql.com/worklog/task.php?id=5476