Bug #75466 Error message has incorrect encoding in russian lang
Submitted: 9 Jan 2015 11:32 Modified: 27 Jan 2015 19:05
Reporter: Виталий Климин Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.6.21 OS:Linux (Debian 7)
Assigned to: CPU Architecture:Any

[9 Jan 2015 11:32] Виталий Климин
Description:
var_dump:

object(DB)#5 (19) { ["affected_rows"]=> NULL ["client_info"]=> NULL ["client_version"]=> int(50538) ["connect_errno"]=> int(1045) ["connect_error"]=> string(279) "\0414\043E\0441\0442\0443\043F \0437\0430\043A\0440\044B\0442 \0434\043B\044F \043F\043E\043B\044C\0437\043E\0432\0430\0442\0435\043B\044F '1php'@'109.234.37.157' (\0431\044B\043B \0438\0441\043F\043E\043B\044C\0437\043E\0432\0430\043D \043F\0430\0440\043E\043B\044C: \0414\0410)" ["errno"]=> NULL ["error"]=> NULL ["error_list"]=> NULL ["field_count"]=> NULL ["host_info"]=> NULL ["info"]=> NULL ["insert_id"]=> NULL ["server_info"]=> NULL ["server_version"]=> NULL ["stat"]=> NULL ["sqlstate"]=> NULL ["protocol_version"]=> NULL ["thread_id"]=> NULL ["warning_count"]=> NULL }

Here:
["connect_error"]=> string(279) "\0414\043E\0441\0442\0443\043F ...

must contain
["connect_error"]=> string(279) "\u0414\u043E\u0441\u0442\u0443\u043F ...

russian encoding.

How to repeat:
Connect from php without password and var_dump(mysqli::connect_error)
[13 Jan 2015 17:46] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Default character set for MySQL is latin1. In order to be able to see Russian letters you have to set valid character set:

$ php -r '$m=mysqli_connect("127.0.0.1","root",""); $m->set_charset("utf8");$m->query("select * from mysql.users;"); var_dump($m);'
object(mysqli)#1 (19) {
  ["affected_rows"]=>
  int(-1)
  ["client_info"]=>
  string(79) "mysqlnd 5.0.11-dev - 20120503 - $Id: bf9ad53b11c9a57efdb1057292d73b928b8c5c77 $"
  ["client_version"]=>
  int(50011)
  ["connect_errno"]=>
  int(0)
  ["connect_error"]=>
  NULL
  ["errno"]=>
  int(1146)
  ["error"]=>
  string(54) "Таблица 'mysql.users' не существует"
  ["error_list"]=>
  array(1) {
    [0]=>
    array(3) {
      ["errno"]=>
      int(1146)
      ["sqlstate"]=>
      string(5) "42S02"
      ["error"]=>
      string(54) "Таблица 'mysql.users' не существует"
    }
  }
  ["field_count"]=>
  int(0)
  ["host_info"]=>
  string(20) "127.0.0.1 via TCP/IP"
  ["info"]=>
  NULL
  ["insert_id"]=>
  int(0)
  ["server_info"]=>
  string(6) "5.6.21"
  ["server_version"]=>
  int(50621)
  ["stat"]=>
  string(134) "Uptime: 1270  Threads: 1  Questions: 1229  Slow queries: 0  Opens: 67  Flush tables: 1  Open tables: 60  Queries per second avg: 0.967"
  ["sqlstate"]=>
  string(5) "00000"
  ["protocol_version"]=>
  int(10)
  ["thread_id"]=>
  int(609)
  ["warning_count"]=>
  int(0)
}
[13 Jan 2015 18:24] Виталий Климин
php -r '$m = new mysqli("89.111.181.xxx","qroot","", "", 23306);'

PHP Warning:  mysqli::mysqli(): (28000/1045): \0414\043E\0441\0442\0443\043F \0437\0430\043A\0440\044B\0442 \0434\043B\044F \043F\043E\043B\044C\0437\043E\0432\0430\0442\0435\043B\044F 'qroot'@'89.111.181.xxx' (\0431\044B\043B \0438\0441\043F\043E\043B\044C\0437\043E\0432\0430\043D \043F\0430\0440\043E\043B\044C: \041D\0415\0422) in Command line code on line 1

Must be
PHP Warning:  mysqli::mysqli(): (28000/1045): \u0414\u043E\u0441\u0442\u0443\u043F \u0437\u0430\u043A\u0440\u044B\u0442 \u0434\u043B\u044F \u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u0442\u0435\u043B\u044F 'qroot'@'89.111.181.xxx' (\u0431\u044B\u043B \u0438\u0441\u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u043D \u043F\u0430\u0440\u043E\u043B\u044C: \u041D\u0415\u0422) in Command line code on line 1
[15 Jan 2015 19:11] Sveta Smirnova
Thank you for the feedback.

So you actually complaining about missed 'u' symbol. Could you please provide any standard, requiring it? I cannot find any at unicode.org, but rather found that this is implementation-dependent.
[16 Jan 2015 4:38] Виталий Климин
Ok. If I undestand you. In my.cnf I have such important settings:

[mysqld]
lc-messages=ru_RU
character-set-server=utf8

To encode error message:

PHP Warning:  mysqli::mysqli(): (28000/1045): \u0414\u043E\u0441\u0442\u0443\u043F \u0437\u0430\u043A\u0440\u044B\u0442 \u0434\u043B\u044F \u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u0442\u0435\u043B\u044F 'qroot'@'89.111.181.xxx' (\u0431\u044B\u043B \u0438\u0441\u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u043D \u043F\u0430\u0440\u043E\u043B\u044C: \u041D\u0415\u0422) in Command line code on line 1

I use "json_decode()" function with quote-escaped ("...") string and receive correct russian language message. Try:

$ php -r 'echo json_decode("\"PHP Warning:  mysqli::mysqli(): (28000/1045): \u0414\u043E\u0441\u0442\u0443\u043F \u0437\u0430\u043A\u0440\u044B\u0442 \u0434\u043B\u044F \u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u0442\u0435\u043B\u044F 'qroot'@'89.111.181.xxx' (\u0431\u044B\u043B \u0438\u0441\u043F\u043E\u043B\u044C\u0437\u043E\u0432\u0430\u043D \u043F\u0430\u0440\u043E\u043B\u044C: \u041D\u0415\u0422) in Command line code on line 1\"");'

Readable result:

PHP Warning:  mysqli::mysqli(): (28000/1045): Доступ закрыт для пользователя qroot@89.111.181.xxx (был использован пароль: НЕТ) in Command line code on line 1

Do you think I use incorrect method to encode error-message?
[16 Jan 2015 12:37] Sveta Smirnova
Thank you for the feedback.

You use correct method to decode the message, simply MySQL never promise it will print error message using encoding which json_decode will understand. You can see, for example, at http://www.utf8icons.com/character/1079/cyrillic-small-letter-ze what \xxxx style is used in CSS, and see even more different styles for other languages at http://www.fileformat.info/info/unicode/char/0437/index.htm I simply don't find any standard, saying that each and every language should use \uxxxx style.

Anyway, to get correct Russian language in PHP you don't only need to specify lc-messages=ru_RU, character-set-server=utf8 under [mysqld] section in your my.cnf, but also specify client encoding by using function mysqli_set_charset($link, 'utf8') See also http://php.net/manual/en/mysqlinfo.concepts.charset.php
[16 Jan 2015 13:12] Виталий Климин
Do not fully understand your answer.

You said: "I simply don't find any standard, saying that each and every language should use \uxxxx style."

Well, MySQL should not use \uxxxx encoding style.
Then what encoding MySQL uses? I don't undestand What is the encoding \xxxx? How it to decode in PHP?

It may be better to use \uxxxx style and json_decode()?
[16 Jan 2015 16:32] Sveta Smirnova
Thank you for the feedback.

In your environment MySQL server uses UTF8 encoding and client uses default, latin1, encoding. To be able to see Russain characters you don't need to use json_decode or any other similar function, but specify proper client encoding in your PHP program using function  mysqli_set_charset($link, 'utf8'). This is not MySQL bug.
[16 Jan 2015 17:44] Виталий Климин
Sorry, but you mislead.
You can not use function mysqli::set_charset() to initialize the object mysqli::init().
Initialize the object causing the error before setting the correct encoding!

$ php -r '$m = new mysqli("89.111.181.xxx","qroot","", "", 23306); $m->set_charset("UTF8");'

PHP Warning:  mysqli::mysqli(): (28000/1045): \0414\043E\0441\0442\0443\043F \0437\0430\043A\0440\044B\0442 \0434\043B\044F \043F\043E\043B\044C\0437\043E\0432\0430\0442\0435\043B\044F 'qroot'@'cp.worldwide-ad-network.bz' (\0431\044B\043B \0438\0441\043F\043E\043B\044C\0437\043E\0432\0430\043D \043F\0430\0440\043E\043B\044C: \041D\0415\0422) in Command line code on line 1
PHP Warning:  mysqli::set_charset(): Couldn't fetch mysqli in Command line code on line 1

That is impossible to set the correct encoding if there is an error mysqli-object initialization.
Is it a problem of PHP?
[16 Jan 2015 18:39] Sveta Smirnova
Thank you for the feedback.

Yes, this is PHP issue:

$ /usr/local/mysql/bin/mysql -h127.0.0.1 -uroots mysql
ERROR 1044 (42000): Для пользователя ''@'localhost' доступ к базе данных 'mysql' закрыт

Actually PHP should read [client] section of my.cnf (default-character-set), but it does not in my case.

Please report it at bugs.php.net
[27 Jan 2015 18:46] Sveta Smirnova
Actually this is PHP documentation bug: you can use mysqli_options to set character set before connecting.

$ php -r '$m=mysqli_init(); $m->options(MYSQLI_SET_CHARSET_NAME, "utf8"); $m->connect("127.0.0.1","root","f");'

Warning: mysqli::connect(): (HY000/1045): Доступ закрыт для пользователя 'root'@'localhost' (был использован пароль: ДА) in Command line code on line 1
[27 Jan 2015 18:57] Виталий Климин
Yes! You are right! It work!
[27 Jan 2015 19:04] Sveta Smirnova
https://bugs.php.net/bug.php?id=68923
[27 Jan 2015 19:05] Виталий Климин
It workS!