Bug #18576 Latin1 character set is obsolete, should use euro-compatible latin9 as default
Submitted: 28 Mar 2006 16:00 Modified: 28 Mar 2006 16:06
Reporter: Bruce Attah Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version: OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[28 Mar 2006 16:00] Bruce Attah
Description:
It is very odd that Latin 1 is the default Western character set in the configuration wizard, and UFT-8 is the default "international" one. I think it would be good if you changed the defaults to UTF-8 for Western data, and UTF-16 for the "international" setup, for the next release of this tool.

Here's why:

For a modern database application, if it is expected that only a Western character set will be used, then Latin 9 (ISO-8859-15) should be the default, as Latin 1 (ISO-8859-1) is officially obsolete. The difference between Latin 1 and Latin 9 is that Latin 9 includes the Euro character, and Latin 1 does not. Someone based in the US or Australia who never enters the Euro character in their data, will not notice any difference whether they use Latin 1 or Latin 9, but if they do ever enter the Euro character, and Latin 1 is the character set, they could have problems. 

Meanwhile, for a web-based application that accepts input such as names and adresses, names of uploaded files, or forum messages, or any application that has an international user base, even if it is just a European one, it is inadvisable to use an 8-bit character set, as it is pretty much guaranteed that someone will enter something (such as a Slavic name, place name or file name) that contains accented letters that cannot be represented in the character set being used, and problems will arise. UTF-8 solves that problem, because it normally stores characters in eight bits, but can store any Unicode character encoded in up to six bytes. A Western user won't notice the difference if they're using UTF-8 or their usual 8-bit character set, because the first 128 characters are identical. UTF-8, then, should be the default, regardless of whether someone are based in Europe, Australia or the Americas, if it is anticipated that text data will all be entered in Western languages.

Meanwhile, if one expects to store mainly non-Western text in the database, then it is wisest to store it in UTF-16, because it will be more compact. Characters that are stored in four to six bytes in UTF-8 (such as the whole of the Chinese character set) are just two bytes in UTF-16.

Incidentally, since the Windows operating system prefers to store characters (including filename characters) as two-byte entities, and Java and C# both use two-byte characters internally, it could be more efficient to store text as UTF-16, rather than converting between encodings when reading and writing. 

How to repeat:
Run the Configuration Wizard until you see the Character Set screen.
[28 Mar 2006 16:06] Valeriy Kravchuk
Thank you for a reasonable feature request.
[8 Oct 2008 12:46] Alexander Barkov
See also:
Bug#37738 - Latin9 for MySQL
[6 Mar 2009 13:53] Hajo Skwirblies
Is there any chance that this will be implemented in the future?

Target Version is (now) 6.x but I can't find it in the Reference Manual (http://dev.mysql.com/doc/refman/6.0/en/charset-charsets.html).
[9 Mar 2009 8:45] Alexander Barkov
Hello Hajo,
latin9 will most likely added into 6.1.
[9 Mar 2009 11:16] Hajo Skwirblies
Thank you very much!
[18 Mar 2014 7:22] Daniël van Eeden
As explained on http://dev.mysql.com/doc/refman/5.6/en/charset-we-sets.html
MySQL's latin1 is CP1252, not ISO-8859-1 (which is known as latin1).

MySQL's latin1 (CP1252) is euro-compatible (€ is at 0x80)