Bug #19292 libmysql default-charset cannot be changed
Submitted: 24 Apr 2006 8:15 Modified: 26 Jul 2006 14:47
Reporter: [ name withheld ] Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:<=5.1 OS:Linux (Linux)
Assigned to: Assigned Account CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[24 Apr 2006 8:15] [ name withheld ]
Description:
The latest MySQL manual states:

"When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection."

But this does not work with all the PHP-Extensions based on libmysql (ext/mysql, ext/mysqli and ext/pdo_mysql). They use a simple mysql_connect() which forces

character_set_client
character_set_results
character_set_connection

to be set to the value mysql has been compiled with (./configure --with-charset=X)

It is not possible to change that by configuration, you have to use "SET NAMES..." if you need to change this. That's OK for new applications, and yes, new applications can use ext/mysqli with its charset functions, but it's a big problem for legacy applications. A MySQL admin often cannot / may not change applications. There have not been many problems, because everybody still uses MySQL compiled with latin1, but now Gentoo Linux Distribution has switched its stable MySQL package to "./configure --with-charset=utf8" (dev-db/mysql-4.1.14-r1). This causes almost every international PHP/MySQL application to break, because the PHP extensions now use a libmysql which forces

character_set_client = utf8
character_set_results = utf8
character_set_connection  = utf8

To solve this, Gentoo Devs patched/forked all the PHP-extensions, to read my.cnf defaults for every HTTP-request/connection:

mysql_options(H->server, MYSQL_READ_DEFAULT_GROUP, option_section);

http://svn.gnqs.org/projects/gentoo-php-overlay/browser/patches/php-patches/5.1.3/5.1.3/ph...

I don't think this is a good idea. But without the patch it only works if you patch every application with a "SET NAMES latin1" statement.

Now I'd like to know, where this issue should be solved. Using utf8 is a good thing IMO, but it's very painful if you have a lot of applications to migrate/patch.

I see the following options:

1. Applications based on libmysql should read my.cnf defaults (like default-charset in [client] section), so the charset can be configured there -> Upstream PHP extensions need to be patched as described above

2. libmysql should behave as stated in the manual: "When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection."
This way you could configure it using the [mysqld] section

3. Add a "unicode" USE flag to the mysql ebuild, so the admin can decide at installation stage if the mysql packages (e.g. libmysql) should default to utf8 or latin1. But you allways have to recompile MySQL and PHP, if you want to change the default-charset for PHP-Extensions.

4. switch back to latin1, as MySQL AB binaries and other distribitions do it. But this way you only delay the problem, you don't solve it!

5. something else? 

What would you recommend to do?

You can find some more information here: 
http://svn.gnqs.org/projects/gentoo-php-overlay/ticket/125

Bug reports / discussion  from Gentoo Linux users:
https://bugs.gentoo.org/show_bug.cgi?id=129761
https://forums.gentoo.org/viewtopic-t-436439.html

How to repeat:
1. compile mysql with "./configure --with-charset=utf8"
2. compile PHP with the created libmysql
3. try to read/write latin1 data to mysql without using SET NAMES and mysqli charset functions

remember, the problem is not about new applications, but legacy applications which get huge problems when mysql is compiled with ./configure --with-charset=utf8", and the admin cannot do much about it

Suggested fix:
change libmysql to use the charset the server suggests (if it does not set the charset on its own)
[21 May 2006 10:36] Valeriy Kravchuk
Thank you for a problem report. All the ideas you proposed sounds like a feature requests for me. Do you agree with me?

If you do not want to "patch" applications to use mysqli and SET NAMES properly, you can create separate set of mysql client libraries, with any default character set you want, put it in a separate directory and make PHP/your applications use it.
[2 Jun 2006 9:29] [ name withheld ]
It's not that I don't "want" to change the PHP applications, but if you are working on servers with 1000s of shared hosts, that's an impossible task, and not every admin is allowed to do that. Don't you think it should be possible to change the character-set of PHP/MySQL extensions by configuration?

Another problem is, that security problems like that:
http://bugs.mysql.com/bug.php?id=8378

Are still not solved for PHP-extensions, because they don't read my.cnf, so mysql_real_escape_string() allways uses the compiled in default character-set, not what you configure using [client] section.