Description:
The latest MySQL manual states:
"When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection."
But this does not work with all the PHP-Extensions based on libmysql (ext/mysql, ext/mysqli and ext/pdo_mysql). They use a simple mysql_connect() which forces
character_set_client
character_set_results
character_set_connection
to be set to the value mysql has been compiled with (./configure --with-charset=X)
It is not possible to change that by configuration, you have to use "SET NAMES..." if you need to change this. That's OK for new applications, and yes, new applications can use ext/mysqli with its charset functions, but it's a big problem for legacy applications. A MySQL admin often cannot / may not change applications. There have not been many problems, because everybody still uses MySQL compiled with latin1, but now Gentoo Linux Distribution has switched its stable MySQL package to "./configure --with-charset=utf8" (dev-db/mysql-4.1.14-r1). This causes almost every international PHP/MySQL application to break, because the PHP extensions now use a libmysql which forces
character_set_client = utf8
character_set_results = utf8
character_set_connection = utf8
To solve this, Gentoo Devs patched/forked all the PHP-extensions, to read my.cnf defaults for every HTTP-request/connection:
mysql_options(H->server, MYSQL_READ_DEFAULT_GROUP, option_section);
http://svn.gnqs.org/projects/gentoo-php-overlay/browser/patches/php-patches/5.1.3/5.1.3/ph...
I don't think this is a good idea. But without the patch it only works if you patch every application with a "SET NAMES latin1" statement.
Now I'd like to know, where this issue should be solved. Using utf8 is a good thing IMO, but it's very painful if you have a lot of applications to migrate/patch.
I see the following options:
1. Applications based on libmysql should read my.cnf defaults (like default-charset in [client] section), so the charset can be configured there -> Upstream PHP extensions need to be patched as described above
2. libmysql should behave as stated in the manual: "When a client connects to a MySQL server, the server indicates to the client what the server's default character set is. The client switches to this character set for this connection."
This way you could configure it using the [mysqld] section
3. Add a "unicode" USE flag to the mysql ebuild, so the admin can decide at installation stage if the mysql packages (e.g. libmysql) should default to utf8 or latin1. But you allways have to recompile MySQL and PHP, if you want to change the default-charset for PHP-Extensions.
4. switch back to latin1, as MySQL AB binaries and other distribitions do it. But this way you only delay the problem, you don't solve it!
5. something else?
What would you recommend to do?
You can find some more information here:
http://svn.gnqs.org/projects/gentoo-php-overlay/ticket/125
Bug reports / discussion from Gentoo Linux users:
https://bugs.gentoo.org/show_bug.cgi?id=129761
https://forums.gentoo.org/viewtopic-t-436439.html
How to repeat:
1. compile mysql with "./configure --with-charset=utf8"
2. compile PHP with the created libmysql
3. try to read/write latin1 data to mysql without using SET NAMES and mysqli charset functions
remember, the problem is not about new applications, but legacy applications which get huge problems when mysql is compiled with ./configure --with-charset=utf8", and the admin cannot do much about it
Suggested fix:
change libmysql to use the charset the server suggests (if it does not set the charset on its own)