Bug #58149 no proper collation for Swedish
Submitted: 11 Nov 2010 22:27 Modified: 17 Nov 2010 9:34
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0+ OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: qc

[11 Nov 2010 22:27] Peter Laursen
Description:
We had a very long discussion here: http://bugs.mysql.com/bug.php?id=57765
I forgot one point that I think deserves a seperate report.

Nordic/Scandinavian people will have to use utf8_danish_ci to handle the nordic character 'å' (if they want to use utf8).

Problem is that in utf8_danish_ci 'å'='aa'.  This is acceptable (but not perfect) for Danish - but it is wrong for Swedish.  In Swedish 'å' never was = 'aa'.

Let me explain the background: the phonetic now represented in all Nordic languages as 'å' occured in those languages in 16th century when local dialects of the old 'Norse' language (almost = modern Icelandic) was replaced with (heavily German influenced) modern Danish and Swedish. From that very time Swedish had the 'å' letter to represent that phonetic.  From the beginning Danish did not have 'å'.  It was originally written as (as far as I remember) 'ou'.  Around 250 years ago the writing 'aa' became common for the 'å-phonetic'. 150-100 years ago 'å' started becoming common in Danish for same.  It was a result of the democratic peasants' movement of the time. 'å' was adopted from Swedish inspiration.  An official language reform in Denmark around 60 years ago replaced 'aa' with 'å' (but the form 'aa' still lives in (sur)names and geographical names).

However in Swedish 'aa' never was = 'å'.  They never needed such construction as they had 'å' from the beginning.  Using utf8_danish_ci for Swedish creates errors in matching and sorting where double-a/'aa' occurs.

(FYI: The Nordic phonetic 'å' is very close to 'o' in English 'monster')

How to repeat:
see above

Suggested fix:
Create a *unicode*_swedish_ci collations.  Make it as close to latin1_swedish_ci as possible. And BTW: that would make life easy for people replacing latin1 with unicode.
[11 Nov 2010 22:35] Peter Laursen
sorry .. the link was wrong.
I intended to refer http://bugs.mysql.com/bug.php?id=57877
[12 Nov 2010 22:24] Peter Laursen
.. and BTW I think same applies to Finnish and Estonian as well.
[17 Nov 2010 8:54] Susanne Ebrecht
We already have a worklog for it:
http://forge.mysql.com/worklog/task.php?id=5170
[17 Nov 2010 9:17] Alexander Barkov
Hi Peter,

Thanks for the historic background!
Now we know better where similar things in some languages come from ;)

Can you please have a look into utf8_swedich_ci chart:

http://www.collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html

I believe it works very close what we have in latin1_swedish_ci:

http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html

except for Thorn, O With Stroke, and Sharp S (SZ league).

But these three letters are not really Swedish or Finnish letters, are they?
[17 Nov 2010 9:34] Peter Laursen
I don't know if I will be able to understand.  I have bookmarked it for the weekend or next week.

but 
1) 'Sharp s' (if I understand what you refer to) is a purely German phenomenon - not used in Nordic languages at all to my best knowledge.
2) eth (Ð | ð) and thorn ( Þ | þ) are are modern Icelandic letters and not used in any other modern Nordic language.  But they are very old Nordic/Germanic letters. I think you may find them rarely in very old Danish or Swedish writing as well (very old = 300-400 years old probably).