Bug #42126 | utf8_hungarian_ci issue | ||
---|---|---|---|
Submitted: | 15 Jan 2009 8:28 | Modified: | 23 Feb 2009 14:26 |
Reporter: | Istvan Mazal | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.1.30 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[15 Jan 2009 8:28]
Istvan Mazal
[15 Jan 2009 9:23]
Istvan Mazal
all inconsistencies in hungarian language! set names utf8; select 'a' = 'á' collate utf8_hungarian_ci;---1 select 'A' = 'Á' collate utf8_hungarian_ci;---1 select 'í' = 'i' collate utf8_hungarian_ci;---1 select 'Í' = 'I' collate utf8_hungarian_ci;---1 select 'ű' = 'u' collate utf8_hungarian_ci;---1 select 'Ű' = 'U' collate utf8_hungarian_ci;---1 select 'ő' = 'o' collate utf8_hungarian_ci;---1 select 'Ő' = 'O' collate utf8_hungarian_ci;---1 select 'ü' = 'u' collate utf8_hungarian_ci;---1 select 'Ü' = 'U' collate utf8_hungarian_ci;---1 select 'ö' = 'o' collate utf8_hungarian_ci;---1 select 'Ö' = 'O' collate utf8_hungarian_ci;---1 select 'ú' = 'u' collate utf8_hungarian_ci;---1 select 'Ú' = 'U' collate utf8_hungarian_ci;---1 select 'ó' = 'o' collate utf8_hungarian_ci;---1 select 'Ó' = 'O' collate utf8_hungarian_ci;---1 select 'é' = 'e' collate utf8_hungarian_ci;---1 select 'É' = 'E' collate utf8_hungarian_ci;---1
[15 Jan 2009 11:35]
Susanne Ebrecht
Many thanks for writing a bug report. Please also look here: http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html Unfortunately, we can't find a link (in English) for official Hungarian sorting rules. If you will be able to provide a link with official sorting rules for Hungaria then please provide them. We would prefer this in English but when it is just in Hungarian, then I am pretty sure we will find somebody for translation. As far as we don't have other official rules from Hungaria we won't change this implementation. So this is not a bug at the moment.
[15 Jan 2009 17:48]
Istvan Mazal
Thanks for quick reply. Here is the official hungarian sorting rules (in hungarian languages): http://hu.wikisource.org/wiki/A_magyar_helyes%C3%ADr%C3%A1s_szab%C3%A1lyai/A_bet%C5%B1k#A_... Summary above, is our sorting is this: A, Á, B, C, CS, D, DZ, DZS, E, É, F, G, GY, H, I, Í, J, K, L, LY, M, N, NY, O, Ó, Ö, Ő, P, Q, R, S, SZ, T, TY, U, Ú, Ü, Ű, V, W, X, Y, Z, ZS. Yes, we have vocals with two and three letters, but that is non-critical if in one ordering after "csalit" become "cukor" and not reverse, but that is yet critical if in one index field letter "A" equal with letter "Á". I suggest, use this (http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html) collation chart with this modify: A, Á, B, C, D, E, É, F, G, H, I, Í, J, K, L, M, N, O, Ó, Ö, Ő, P, Q, R, S, T, U, Ú, Ü, Ű, V, W, X, Y, Z. a, á, b, c, d, e, é, f, g, h, i, í, j, k, l, m, n, o, ó, ö, ő, p, q, r, s, t, u, ú, ü, ű, v, w, x, y, z. "À" and "à" letters are not hungarians. Best regards: István Mazál
[23 Feb 2009 13:26]
A. Sz. Szelp
Sorry, István, your quick description is wrong. According (also to the official document you have linked), the document http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html is "more correct" than what you describe. áÁ sorts as, but after aA éÉ sorts as, but after eE íÍ sorts as, but after iI óÓ sorts as, but after oO úÚ sorts as, but after úÚ őŐ sorts as, but after öÖ űŰ sorts as, but after üÜ. And yes, indeed, öÖ and üÜ sort _separately_ from oO and uU. The vowels are (but for àÀ which are indeed not part of the Hungarian alphabet) correct in the Mysql collation. What is missing is the digraph and trigraph consonants. The correct collation (also as outlined in the "A magyar helyesírás szabályai" by the Hungarian Academy of Sciences) is (only using small letters in this list): a á b c cs d dz dzs e é f g gy h i í j k l ly m n ny o ó ö ő p q r s sz t ty u ú ü ű v w x y z zs
[23 Feb 2009 14:26]
Istvan Mazal
Thx for reply. Yes, i see, the problem is not in the collation. The problem is this: In hungarian sorting a=á e=é o=ó ö=ő u=ú ü=ű it's right, because in indexes a!=á e!=é o!=ó ö!=ő u!=ú ü!=ű and in MySQL this is the malfunction: "szar" (shit) is equal to "szár" (stalk) in MySQL. Sorting is not equal with indexing! Best regards: István