Bug #42126 utf8_hungarian_ci issue
Submitted: 15 Jan 2009 8:28 Modified: 23 Feb 2009 14:26
Reporter: Istvan Mazal Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.30 OS:Any
Assigned to: CPU Architecture:Any

[15 Jan 2009 8:28] Istvan Mazal
Description:
In hungarian language meg!=még. In MySQL 5.1.30 only with utf8_bin character set work the following code below:

How to repeat:
-- phpMyAdmin SQL Dump
-- version 2.11.6
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Creating time: 2009. Jan 15. 09:14
-- Server Version: 5.1.30
-- PHP Version: 5.2.6

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

CREATE DATABASE `proba` DEFAULT CHARACTER SET utf8 COLLATE utf8_hungarian_ci;
USE `proba`;

CREATE TABLE IF NOT EXISTS `proba` (
  `proba` varchar(255) COLLATE utf8_hungarian_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_hungarian_ci;

INSERT INTO `proba` (`proba`) VALUES
('meg'),
('még');

SELECT * FROM `proba` where `proba`='még'

The bad result (the bug):

proba
------
meg
még
[15 Jan 2009 9:23] Istvan Mazal
all inconsistencies in hungarian language!

set names utf8;
select 'a' = 'á' collate utf8_hungarian_ci;---1
select 'A' = 'Á' collate utf8_hungarian_ci;---1
select 'í' = 'i' collate utf8_hungarian_ci;---1
select 'Í' = 'I' collate utf8_hungarian_ci;---1
select 'ű' = 'u' collate utf8_hungarian_ci;---1
select 'Ű' = 'U' collate utf8_hungarian_ci;---1
select 'ő' = 'o' collate utf8_hungarian_ci;---1
select 'Ő' = 'O' collate utf8_hungarian_ci;---1
select 'ü' = 'u' collate utf8_hungarian_ci;---1
select 'Ü' = 'U' collate utf8_hungarian_ci;---1
select 'ö' = 'o' collate utf8_hungarian_ci;---1
select 'Ö' = 'O' collate utf8_hungarian_ci;---1
select 'ú' = 'u' collate utf8_hungarian_ci;---1
select 'Ú' = 'U' collate utf8_hungarian_ci;---1
select 'ó' = 'o' collate utf8_hungarian_ci;---1
select 'Ó' = 'O' collate utf8_hungarian_ci;---1
select 'é' = 'e' collate utf8_hungarian_ci;---1
select 'É' = 'E' collate utf8_hungarian_ci;---1
[15 Jan 2009 11:35] Susanne Ebrecht
Many thanks for writing a bug report.

Please also look here:
http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html

Unfortunately, we can't find a link (in English) for official Hungarian sorting rules.

If you will be able to provide a link with official sorting rules for Hungaria then please provide them. We would prefer this in English but when it is just in Hungarian, then I am pretty sure we will find somebody for translation.

As far as we don't have other official rules from Hungaria we won't change this implementation. So this is not a bug at the moment.
[15 Jan 2009 17:48] Istvan Mazal
Thanks for quick reply.

Here is the official hungarian sorting rules (in hungarian languages):
http://hu.wikisource.org/wiki/A_magyar_helyes%C3%ADr%C3%A1s_szab%C3%A1lyai/A_bet%C5%B1k#A_...
Summary above, is our sorting is this:
A, Á, B, C, CS, D, DZ, DZS, E, É, F, G, GY, H, I, Í, J, K, L, LY, M, N, NY, O, Ó, Ö, Ő, P, Q, R, S, SZ, T, TY, U, Ú, Ü, Ű, V, W, X, Y, Z, ZS.

Yes, we have vocals with two and three letters, but that is non-critical if in one ordering after "csalit" become "cukor" and not reverse, but that is yet critical if in one index field letter "A" equal with letter "Á".
I suggest, use this (http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html) collation chart with this modify:
A, Á, B, C, D, E, É, F, G, H, I, Í, J, K, L, M, N, O, Ó, Ö, Ő, P, Q, R, S, T, U, Ú, Ü, Ű, V, W, X, Y, Z.
a, á, b, c, d, e, é, f, g, h, i, í, j, k, l, m, n, o, ó, ö, ő, p, q, r, s, t, u, ú, ü, ű, v, w, x, y, z.
"À" and "à" letters are not hungarians.

Best regards: István Mazál
[23 Feb 2009 13:26] A. Sz. Szelp
Sorry, István, your quick description is wrong.
According (also to the official document you have linked), the document http://www.collation-charts.org/mysql60/mysql604.utf8_hungarian_ci.html is "more correct" than what you describe.
áÁ sorts as, but after aA
éÉ sorts as, but after eE
íÍ sorts as, but after iI
óÓ sorts as, but after oO
úÚ sorts as, but after úÚ
őŐ sorts as, but after öÖ
űŰ sorts as, but after üÜ.
And yes, indeed, öÖ and üÜ sort _separately_ from oO and uU.

The vowels are (but for àÀ which are indeed not part of the Hungarian alphabet) correct in the Mysql collation.
What is missing is the digraph and trigraph consonants.

The correct collation (also as outlined in the "A magyar helyesírás szabályai" by the Hungarian Academy of Sciences) is (only using small letters in this list):
a á
b
c
cs
d
dz
dzs
e é
f
g
gy
h
i í
j
k
l
ly
m
n
ny
o ó
ö ő
p
q
r
s
sz
t
ty
u ú
ü ű
v
w
x
y
z
zs
[23 Feb 2009 14:26] Istvan Mazal
Thx for reply.
Yes, i see, the problem is not in the collation. The problem is this:
In hungarian sorting 
a=á
e=é
o=ó
ö=ő
u=ú
ü=ű
it's right, because in indexes
a!=á
e!=é
o!=ó
ö!=ő
u!=ú
ü!=ű
and in MySQL this is the malfunction: "szar" (shit) is equal to "szár" (stalk) in MySQL.
Sorting is not equal with indexing!

Best regards: István