Bug #48182 mysql_error returns some error strings in wrong charset
Submitted: 20 Oct 2009 13:00 Modified: 10 Nov 2009 9:40
Reporter: Carsten Wiedmann Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: C API (client library) Severity:S3 (Non-critical)
Version:5.1.39 OS:Windows
Assigned to: CPU Architecture:Any

[20 Oct 2009 13:00] Carsten Wiedmann
Description:
Hello,

assuming a MYSQL_SET_CHARSET_NAME  = "latin1" in mysql_options, mysql_error() still returns some errors in "utf8".

You can simply test this in a latin1 shell (CP1252) with the MySQL command line client. (see below)

Regards,
Carsten

How to repeat:
1)
| # mysql -u Müller --default-character-set=latin1
| ERROR 1045 (28000): Access denied for user 'Müller'@'localhost' ...

Reading the error message I must assume this command:
| # mysql -u 'Müller--default-character-set=latin1

BTW: If I have a user "Müller", I can connect without problems.

2)
| # mysql -u root --default-character-set=latin1 test
| mysql> SELECT `Müller`;
| ERROR 1054 (42S22): Unknown column 'Müller' in 'field list'

Reading the error message I must assume this SQL Statement:
| SELECT `Müller`;

| mysql> SELEC `Müller`;
| ERROR 1064 (42000): You have an error in your SQL syntax;
| check the manual that corresponds to your MySQL server version
|  for the right syntax to use near 'SELEC `Müller`' at line 1

In this case the error message is returned in the correct charset "latin1".

BTW:
| mysql> SHOW VARIABLES LIKE "character_set_%";
| | character_set_client     | latin1                             |
| | character_set_connection | latin1                             |
| | character_set_database   | latin1                             |
| | character_set_filesystem | binary                             |
| | character_set_results    | latin1                             |
| | character_set_server     | latin1                             |
| | character_set_system     | utf8                               |
[21 Oct 2009 8:21] Susanne Ebrecht
Hello Carsten,

many thanks for writing a bug report. But this is not a bug.

Please also check output of:
$ locale

It should be ISO-8859-15 if you want to use charset_client latin1. Otherwise charset_client should be utf8 (if locale is utf8).

More deeper informations in German about that topic you will find in my blog here:
http://www.miracee.org/2009/03/30/encoding-handling-mysql-cli-5858398/

On German Windows it is codepage 850.

This should work on unix/linux when you are using utf8 as locale:
$ mysql -u Müller --default-character-set=utf8

and on Windows:
mysql -u Müller --default-character-set=cp850

If this don't work then your user Müller is stored wrong in the database.

Please let you give output from:
SELECT user, length(user) from mysql.user where user like '%ller';

If length for Müller is not 7 then Müller is stored wrong. The column mysql.user is using utf8 and so you should get 2 bytes for the ü.
[21 Oct 2009 9:16] Carsten Wiedmann
You are wrong. Or maybe you don't understand the problem:

> Please also check output of:
> $ locale
On Unix:
LANG=de_DE.ISO8859-15

> On German Windows it is codepage 850.
On Windows my console codepage is 1252.

--> On both OS exactly what MySQL ist using: "latin1".

> Please let you give output from:
> SELECT user, length(user) from mysql.user where user like '%ller';
> 
> If length for Müller is not 7 then Müller is stored wrong.

mysql> SELECT user, length(user) from mysql.user where user like '%ller';
+--------+--------------+
| user   | length(user) |
+--------+--------------+
| Müller |            7 |
+--------+--------------+

> The column mysql.user is using utf8

Correct.. And now how MySQL is working:
With a --default-character-set=latin1, MySQL is translating my client charset latin1 to the (in this case) column collation utf8_binary, and returned values are translated back to latin1.
And that's the reason I have have no problems connecting to the server with the user "Müller", IF there is an user "Müller" in MySQL.

Of course connecting to the server or using German umlauts is not my problem and only an example, so you can easily verify the problem:
If there is no user "Müller" (normal for a test system), the error message is:
| ERROR 1045 (28000): Access denied for user 'Müller'@'localhost'

You can see, the server is sending back the error message in utf8, even it should translate results to "latin1". (character_set_results = latin1). Thus, the error messsage must be:
| ERROR 1045 (28000): Access denied for user 'Müller'@'localhost'.

And as you can see in my other examples, this behavior is not consistent. Sometimes the error message is in utf8, sometimes in latin1.
--> Giving support to a customer is not easy, if the error message does not show what the customer is using.

Regards,
Carsten
[21 Oct 2009 10:02] Peter Laursen
It is also reproducable using SET NAMES:

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select `Müller`;
ERROR 1054 (42S22): Unknown column 'Mller' in 'field list'

mysql> select Müller;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use near 'üller
' at line 1

(using client in cmd.exe on Danish Windows) 

I tend to agree with Carsten that in error messages the client character set is not always respected by server. More specifically it seems to happen when a non-existing database object name or non-existing referred database content is included in the error message.
[21 Oct 2009 11:27] Peter Laursen
also: 

peter@linux-o957:/etc> mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.0.51a SUSE MySQL RPM

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set names latin1;
Query OK, 0 rows affected (0.01 sec)

mysql> select `Müller`;
ERROR 1054 (42S22): Unknown column 'Müller' in 'field list'

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select `Müller`;
ERROR 1054 (42S22): Unknown column 'Müller' in 'field list'
mysql>
[26 Oct 2009 7:29] Susanne Ebrecht
This all works fine on Linux.

Seems it is a Windows problem.

I will analyse it as soon as I have a Windows system.

On a German Windows system it should be default-character-set=cp850 and nothing else.

Or even SET NAMES CP850;

Windows don't know utf8.

Please test this too.
[26 Oct 2009 7:32] Susanne Ebrecht
Tests on Linux:

Gnome Terminal = utf8

$ mysql -u Müller --default-character-set=utf8
mysql> SHOW GRANTS;
Result: GRANT ALL on *.* TO Müller@'localhost'

Gnome Terminal = ISO-8859-15
$ mysql -u Müller --default-character-set=utf8
mysql> SHOW GRANTS;
Result: GRANT ALL on *.* TO Müller@'localhost'

On plane linux console with locale = utf8 all works fine with default-character-set=utf8 and with locale = ISO-8859-15 all works fine with default-character-set=latin1

So when here is a problem, then it is Windows only
[26 Oct 2009 7:34] Susanne Ebrecht
Oops, I made a typo ...

Of course I meant here if Gnome Terminal is ISO-8859-15 then I used:
$ mysql -u Müller --default-character-set=latin1
[26 Oct 2009 8:44] Peter Laursen
@Susanne .. did you notice my example on SuSE11 ?
[27 Oct 2009 15:25] Carsten Wiedmann
@ Peter
She still thinks we have a problem with connecting to the server.

@Susanne
Once more, we know how to connect to the server. Our problem is the error message returned from the server (if any). This error message is sometimes in a wrong encoding.

OK, I've posted this in the category "C API". Maybe you are more familiar in C (as with Windows ;-) ). Please store the following code with your editor (ISO8859 encoding) as "test.c":
| #include <stdio.h>
| #if defined _WIN32
| 	#include <winsock2.h>
| #else
| 	#include <sys/socket.h>
| #endif
| #include <mysql.h>
| 
| int main(void) {
| 	MYSQL *hMySQL;
| 	int iCount;
| 
| 	hMySQL = mysql_init(NULL);
| 	mysql_options(hMySQL, MYSQL_SET_CHARSET_NAME, "latin1");
| 
| 	if (!mysql_real_connect(hMySQL, NULL, "Müller", "foo", NULL, 0, NULL, 0)) {
| 		printf("\nerror string from MySQL (connection set to 'latin1'):\n");
| 		iCount = printf(mysql_error(hMySQL));
| 		printf("\ncharacter count: %d bytes\n\n", iCount);
| 
| 		printf("error string from source code (source stored in editor with 'ISO-8859'):\n");
| 		iCount = printf("Access denied for user 'Müller'@'localhost' (using password: YES)");
| 		printf("\ncharacter count: %d bytes\n", iCount);
| 	}
| 
| 	return 0;
| }

Now compile it with:
# gcc -I/usr/local/include/mysql -L/usr/local/lib/mysql -lmysqlclient test.c -o test
or
# cl.exe -IC:\mysql\include test.cpp /link /LIBPATH:C:\mysql\lib\opt libmysql.lib

execute the program with:
# ./test > test.txt
or
# test.exe > test.txt

Open the file "test.txt" in your editor and post the result here. And of course, please explain the output. Especially the part with "error string from MySQL" and the "character count".

Regards,
Carsten
[10 Nov 2009 9:40] Susanne Ebrecht
Carsten,

now I think I know what you mean.

Our error messages aren't stored in a database. They are stored in a file and they are utf8 hardcoded.

So this is not a bug, this is an expected behaviour.
[10 Nov 2009 9:43] Susanne Ebrecht
Peter,

your SUSE example:

you didn't switch terminal encoding. Your terminal was utf8.

When you want to use "SET NAMES latin1" then you need to set the terminal encoding to iso-8859-15.