MySQL Bugs: #18576: Latin1 character set is obsolete, should use euro-compatible latin9 as default

Bug #18576	Latin1 character set is obsolete, should use euro-compatible latin9 as default
Submitted:	28 Mar 2006 16:00	Modified:	28 Mar 2006 16:06
Reporter:	Bruce Attah	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
It is very odd that Latin 1 is the default Western character set in the configuration wizard, and UFT-8 is the default "international" one. I think it would be good if you changed the defaults to UTF-8 for Western data, and UTF-16 for the "international" setup, for the next release of this tool.

Here's why:

For a modern database application, if it is expected that only a Western character set will be used, then Latin 9 (ISO-8859-15) should be the default, as Latin 1 (ISO-8859-1) is officially obsolete. The difference between Latin 1 and Latin 9 is that Latin 9 includes the Euro character, and Latin 1 does not. Someone based in the US or Australia who never enters the Euro character in their data, will not notice any difference whether they use Latin 1 or Latin 9, but if they do ever enter the Euro character, and Latin 1 is the character set, they could have problems. 

Meanwhile, for a web-based application that accepts input such as names and adresses, names of uploaded files, or forum messages, or any application that has an international user base, even if it is just a European one, it is inadvisable to use an 8-bit character set, as it is pretty much guaranteed that someone will enter something (such as a Slavic name, place name or file name) that contains accented letters that cannot be represented in the character set being used, and problems will arise. UTF-8 solves that problem, because it normally stores characters in eight bits, but can store any Unicode character encoded in up to six bytes. A Western user won't notice the difference if they're using UTF-8 or their usual 8-bit character set, because the first 128 characters are identical. UTF-8, then, should be the default, regardless of whether someone are based in Europe, Australia or the Americas, if it is anticipated that text data will all be entered in Western languages.

Meanwhile, if one expects to store mainly non-Western text in the database, then it is wisest to store it in UTF-16, because it will be more compact. Characters that are stored in four to six bytes in UTF-8 (such as the whole of the Chinese character set) are just two bytes in UTF-16.

Incidentally, since the Windows operating system prefers to store characters (including filename characters) as two-byte entities, and Java and C# both use two-byte characters internally, it could be more efficient to store text as UTF-16, rather than converting between encodings when reading and writing. 

How to repeat:
Run the Configuration Wizard until you see the Character Set screen.

Thank you for a reasonable feature request.

See also:
Bug#37738 - Latin9 for MySQL

Is there any chance that this will be implemented in the future?

Target Version is (now) 6.x but I can't find it in the Reference Manual (http://dev.mysql.com/doc/refman/6.0/en/charset-charsets.html).

Hello Hajo,
latin9 will most likely added into 6.1.

Thank you very much!

As explained on http://dev.mysql.com/doc/refman/5.6/en/charset-we-sets.html
MySQL's latin1 is CP1252, not ISO-8859-1 (which is known as latin1).

MySQL's latin1 (CP1252) is euro-compatible (€ is at 0x80)