Bug #17151 Importing tar dumps fails to properly render cp1251 symbols in utf8
Submitted: 6 Feb 2006 12:08 Modified: 9 Feb 2006 17:32
Reporter: Plamen Gradinarov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.14 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[6 Feb 2006 12:08] Plamen Gradinarov
Description:
Importing archived dumps - both via cpbackups (cPanel) or manually, after exporting via phpMyAdmin - fails to correctly render cp1251 symbols in utf8. There are several instances of Capital letter И turning into small letter п, ш turning into ѿ and another two or three examples I do not recall now.

Restoring accounts through cPanel makes the utf8-based cyrillic sites worthless, the same happens when tar.gz files are imported directly from phpMyAdmin.

When the database.sql is extracted from the tarball and imported with no compression, the reprted sympols are adequately rendered.

How to repeat:
Repeats on all 4.1.xx versions of MySQL and 4.xx and 5.xx PHP, as well with all sites using utf8 charset for their Cyrillic pages.
[6 Feb 2006 12:14] Plamen Gradinarov
reported symbols
[6 Feb 2006 19:04] Plamen Gradinarov
"When the database.sql is extracted from the tarball and imported with no
compression, the reported symbols are adequately rendered."

This turned out to be a false positive. There are still malformations in the Cyrillic code.
[7 Feb 2006 10:56] Valeriy Kravchuk
Thank you for a problem report. I've changed category to a more appropriate one. 

Please, send the

SHOW VARIABLES LIKE 'char%';
SHOW VARIABLES LIKE 'collat%';

results from your phpMyAdmin or any other environment you use to restore the dump.

Can you, please, create and upload the smallest dump that demonstrates the problem you described?
[7 Feb 2006 11:38] Plamen Gradinarov
Server variables and settings:
Variable   Session value   Global value
character set client  utf8  cp1251  
character set connection  cp1251  cp1251  
character set database  cp1251  cp1251  
character set results  utf8  cp1251  
character set server  cp1251  cp1251  
character set system  utf8  utf8  
character sets dir  /usr/share/mysql/charsets/  /usr/share/mysql/charsets/  
collation connection  cp1251_general_ci  cp1251_general_ci  
collation database  cp1251_general_ci  cp1251_general_ci  
collation server  cp1251_general_ci  cp1251_general_ci

* * *
Example with all utf8 db and collation right from the start - you type И and ш and they turn right away into what you see
http://india.mudrost.org
[7 Feb 2006 16:23] Plamen Gradinarov
The above date were from the phpMyAdmin Variables info

Now from the command line

mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | cp1251                     |
| character_set_results    | latin1                     |
| character_set_server     | cp1251                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.00 sec)
[7 Feb 2006 16:25] Plamen Gradinarov
Here are the collations from the command line

mysql> SHOW VARIABLES LIKE 'collat%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | cp1251_general_ci |
| collation_server     | cp1251_general_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
[7 Feb 2006 16:31] Plamen Gradinarov
This is in my.cnf as charset and collation wise

init-connect="SET NAMES cp1251"
default-character-set=cp1251
default-collation=cp1251_general_ci
[7 Feb 2006 16:52] MySQL Verification Team
Can you check whether cp1251 charset is properly set on charset_connection in the session where dump is occuring, both before and after dumping the file.

Also, confirm that all your tables and char-derived columns have the above charset defined.
[7 Feb 2006 17:18] Plamen Gradinarov
Seems it is not set properly because when I now used this

mysqldump --default-character-set=cp1251 my_database1 > my_database.sql

then back to another db
mysql my_database2 < my_database.sql 

and connected the original site to the new db, all characters were there, no malformations.
[7 Feb 2006 21:25] Plamen Gradinarov
unified the charset and the collation

mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | cp1251                     |
| character_set_connection | cp1251                     |
| character_set_database   | cp1251                     |
| character_set_results    | cp1251                     |
| character_set_server     | cp1251                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'collat%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | cp1251_general_ci |
| collation_database   | cp1251_general_ci |
| collation_server     | cp1251_general_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

The problem with the two letters persists.
[7 Feb 2006 22:34] Plamen Gradinarov
Last feedback

Added this directive

[mysqldump]
default-character-set=cp1251

Now the export and import of dump via console is seamless. Exporting and importing by phpMyAdmin is still getting errors.

I will wait for the automatic cpbackups to take place in 24 hours and will report back.
[8 Feb 2006 9:01] Valeriy Kravchuk
So, it looks like a problem of proper settings in phpMyAdmin and, possibly, other frontends, but not a bug in MySQL server. Please, inform about the results with CPanel backups as soon as you get them.
[9 Feb 2006 4:05] Nick
We are experincing an identical problem.  We tried a fresh import from the backup file and it did not work either.
[9 Feb 2006 17:14] Plamen Gradinarov
Just restored via WHM today's cpbackup of an utf8-based Cyrillic site. No errors so far found. The check-letters are there in their proper form.

http://www.novini.org/taxonomy/term/8
Можно проверить буквы "И" и "ш"

MySQL has nothing to do with this problem. It is a matter to be tweaked in my.cnf.

Exporting and importing the dump through phpMyAdmin still reproduce the errors.

Nick, my working my.cnf has been posted here
http://www.mysql.ru/webboard/index.html?n1=7492&n2=1&page=1

Thank you all for helping me sort out this mess.