Bug #6653 Choice of Default Character Set
Submitted: 16 Nov 2004 3:44 Modified: 25 Nov 2004 5:37
Reporter: B Jones Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.7 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[16 Nov 2004 3:44] B Jones
Description:
From MySQL 4.1.2 onwards the defaul character set of latin1 was changed to 
latin1_swedish_ci.  Understand MySQL is a Swedish company and this will be a bonus to Swedish users, but surely keeping a default suitable for International users would have been more sensible?  This change is costing us time and money and we really can't see the point.

How to repeat:
Install MySQL 4.1.2 or later 

Suggested fix:
Choose an International character set as the default.
[16 Nov 2004 11:32] Alexander Keremidarski
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

latin1_swedish_ci is *not* a Character set. It is Collation.

In 4.1 every Character set can be used with various Collations. It is important to understand the  difference between both.

In 4.0 there was single "object" called Character set which in fact consists of both Character set and Collation. In other words every Character set in 4.0 was bound to a sinle Collation. 

The default in 4.0 was latin1 with Swedish collation so there is no change in this respect.

There is no such thing as "International collation" simply because most alphabets are used with several languages each of hem with it's own rules. Even with ut8 character set which can reperesent various alphabeths different collations must be used to satisfy different languages needs.

Both default_character_set and default_collation are runtime paramaters for mysqld. There is no problem to change them in my.cnf
[17 Nov 2004 0:07] Stephen Thorne
When doing queries with tables created after the upgrade, we received error 1267.

#1267 - Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

The solution we were forced to implement was to write a script which would go and update every char, varchar and text field in every table in every database to have a standard character set (or collation, if you prefer).
[20 Nov 2004 7:17] Andrej Pirman
You say that before 4.1 it was all "deafult", charset and collation.

Now, I upgraded from 4.0 to 4.1.17 and I get:
  #1267 - Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' 

And I did not change none of parameters, I run it default now and I did run it on default charset before.
I have databases with millinos of records, and they are all created by few hundreds different users. So I cannot go to each user and tell them to change their scripts to preserve special local characters. I simply need to preserve database records as-they-are, without any charset translations.

So, upgrading to 4.1 with default parameters needs changing default collation, and it *does* lead to MySQL ERROR 1267, which means, that this *is* a bug.
As I understand it.
[20 Nov 2004 9:46] Sergei Golubchik
what is the query that results in this error ?
[21 Nov 2004 23:19] B Jones
We still consider it a bug too, despite the reclassification downwards.  MySQL labs should not be changing default settings if that means breaking a lot of existing installations, and really we can't see the point of this change.  We gather this has affected a lot of people.  We still think this is a bad idea.

Our system guys fixed the problem for us by recompiling MySQL removing the Swedish setttings.  There's a related Bug #5980 it fixed too.  Basically they took the Swedish defaults out of 4.1.  I'll get them to post a description of exactly how they did this.
[24 Nov 2004 5:56] B Jones
Sergei wrote:
  > what is the query that results in this error ?

Anything mixing strings with different collations.  Reportedly this is another manifestation:

SELECT IFNULL(NULL, _utf8'string')

Since we recompiled MySQL we have got rid of both instances of this bug.
[24 Nov 2004 8:41] Sergei Golubchik
"Anything mixing strings with different collations." is not the query that worked before the upgrade, as in 4.0 you cannot "mix strings with different collations".
Also, "mixing strings with different collations" won't be fixed by recompiling the server e.g.

  SELECT CONCAT(_ascii "a", _utf8 "b");

So, what is the query that worked in 4.0 but doesn't work in 4.1 ?
[24 Nov 2004 22:54] B Jones
> "Anything mixing strings with different collations." is not the query that
> worked before the upgrade, as in 4.0 you cannot "mix strings with different
> collations".
It was version 4.1.2 which changed the default collation to Swedish, 
at which point different collations became an issue.

> Also, "mixing strings with different collations" won't be fixed 
> by recompiling the server
But when our system guys recompiled as described, 
it did fix this and the Swedish NULL bug (#5980).

In case your other customers would like this fix, 
here's the description of the how-to from systems:

"They have to recompile MySQL from the source code.
Here are the parameters that I passed to the configure script
--prefix=/usr/local/mysql-4.1.7 --with-mysqld-user=mysql --with-charset=utf8 --with-collation=utf8_general_ci --with-extra-charsets=none --with-csv-storage-engine --with-isam --enable-thread-safe-client --enable-local-infile --enable-assembler --disable-shared --with-client-ldflags=-all-static --with-mysqld-ldflags=-all-static

Otherwise the defaults were charset latin1 collation latin1_swedish
Install no other charsets to avoid the NULL bug.
These will work with the 4.1 series and should work with series 5.0.
The remaining parameters are installation defaults.

We have a ready-to-run binary in Redhat RPM format containing this fix.
If you'd like this in RPM e-mail josh@innurve.com"

Also see http://bugs.mysql.com/bug.php?id=5980&error=no
for a description of the related Swedish NULL bug.
[24 Nov 2004 23:55] Paul DuBois
The default collation has always been swedish. That's
not what changed.  What changed is that now metadata
is represented in utf8.  This affects things like database
and table names, and a few functions.  That's what caused
the problems you're seeing. See:

http://dev.mysql.com/doc/mysql/en/Charset-metadata.html

(I'm not pointing this out as a solution to any problems,
only as a clarification of the source of the difficulties.)
[25 Nov 2004 5:37] Alexander Barkov
Dear B Jones,

We made several fixes in 4.1.8 to make it not
so strict in mixing collations and character sets.
SELECT IFNULL(NULL, _utf8'string') now works fine, among
many other things.

What has to be finished yet is UNION problem described in
"#5980 [Com]: NULL requires a characterset in a union".

http://bugs.mysql.com/bug.php?id=6653

We hope to fix it soon.
[26 Nov 2004 2:20] B Jones
Thanks.  That is good news.  Collations made the port to 4.1 very painful.   Making these less stern will make the transition easier for those that follow.  Meanwhile the above fix can hold those affected over until then.

cheers
bj
[16 Dec 2004 14:47] Nobu Toge
I am wondering how 4.1.8 is coming along. I would be happy to beta-test it if the developers feel worthwhile, and if I am allowed. I am heavily using ujis coded character string data in combination with english-latin everywhere in my installation.