Bug #27879 collation for utf8 in west europe
Submitted: 17 Apr 2007 9:42 Modified: 13 Sep 2007 19:27
Reporter: Susanne Ebrecht Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Paul DuBois CPU Architecture:Any

[17 Apr 2007 9:42] Susanne Ebrecht
Description:
Hi all,

it's not a bug, just a hint and a suggestion.

Look at me as user, with a little bit knowledge, what is to do:

I want to create a new database with utf8 encoding
I look into \h create database for the right syntax
then I look into show collation what is possible for utf8
But I didn't found a german style.

I look to the documentation and have to click serveral times to find, that I have to use utf8_unicode_ci

That is annoying. All examples are with iso encoding, and you need time to find the west European collation for utf8.

Also often users don't think about encoding. When you install MySQL often is uses utf8 and the utf8_general_ci as default. But utf8_general_ci wouldn't sort in the right way for germans, french, etc.

I think, there should be a hints for west European User, that the right collation for utf8 ist utf8_unicode_ci at several positions in the documentation. For example in the installing section, in the create database section, ...
Of course, it is more important, that you'll find this hints at the german, french, etc. documentation than at the english documentation.

Regards,

Susanne

How to repeat:
Just try to be a user, with not so much knowledge
[17 Apr 2007 10:16] Valeriy Kravchuk
Thank you for a reasonable documentation request.
[13 Sep 2007 19:27] Paul DuBois
Thanks for your suggestion. However, I've considered this report several times now, and I conclude that "if you want German collation, use utf_general_ci" is such a specialized piece of advice that it's not really appropriate for inclusion at "several positions within the manual." The proper place for it is in the discussion of collations, which is where it is.

http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html:

"utf8_general_ci also is satisfactory for both German and French, except that ‘ß’ is equal to ‘s’, and not to ‘ss’. If this is acceptable for your application, then you should use utf8_general_ci because it is faster. Otherwise, use utf8_unicode_ci because it is more accurate."