Bug #32540 UTF8 collation for the Maltese alphabet is incomplete
Submitted: 20 Nov 2007 18:46 Modified: 4 Dec 2007 20:04
Reporter: Matthew C Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0.45 OS:Windows (Server 2003)
Assigned to: Assigned Account CPU Architecture:Any
Tags: alphabet, character set, characters, charset, collation, Maltese, Unicode, utf8

[20 Nov 2007 18:46] Matthew C
Description:
A full-text search for 'zejt' or 'żejt' brings up the result 'Ħobż biż-żejt', but a full-text search for 'hobz' does not - while one for 'ħobz' does.

I have taken a look at the collation document here: http://myoffice.izhnet.ru/bar/~bar/charts/utf8_general_ci.html

Bizarrely, z=ż, but Ħ!=h. In Maltese h and ħ (lowercase Ħ) are two different characters. But then again so are z and ż.

How to repeat:
SELECT 'ż' = 'z' COLLATE utf8_general_ci;

Returns: 1

SELECT 'ċ' = 'ċ' COLLATE utf8_general_ci;

Returns: 1

SELECT 'ħ' = 'h' COLLATE utf8_general_ci;

Returns: 0

Suggested fix:
In the Maltese alphabet, there are two versions of the Latin letter 'z' - these are 'z' and 'ż', the same goes for 'g' and 'ġ', and 'c' and 'ċ'. The dot above the character distinguishes one from the other. Likewise, 'ħ' and 'h' are two different characters in the Maltese alphabet.

So either arrange the collation so that:

z!=ż, c!=ċ, g!=ġ and h!=ħ

or, do the reverse.

I have to say that the latter solution is more attractive to me, as inputting Maltese characters is a hassle on any OS, and being able to get the result 'Ħobż' after searching for 'hobz' is a great facility.
[29 Nov 2007 0:48] Peter Gulutzan
Looks like a feature request.

There was a discussion about Maltese collation on MySQL forums,
http://forums.mysql.com/read.php?103,183888,183888#msg-183888
[3 Dec 2007 23:42] Matthew C
Yes, now I understand that this is actually not a bug. MySQL uses an older version of the Unicode Collation Algorithm. Are there any plans to integrate the newer version, which correctly collates Maltese characters?
[4 Dec 2007 0:59] Peter Gulutzan
Perhaps you will consider doing it yourself?

We tried to encourage such a project for
bug#4745 "Add Vietnamese collation for the ucs2 and utf8 Unicode character sets".
See the comments dated 6 July and 3 August on that bug report.
[4 Dec 2007 7:13] Matthew C
That's just what I'm going to do. Thanks for pointing me to those comments.
[4 Dec 2007 10:04] Susanne Ebrecht
Looking to the discussion, I'll set this to not a bug.
[4 Dec 2007 20:04] Sveta Smirnova
Set to "Verified" as feature request in any case is not a bug.