Bug #16291 mysqldump corrupts string-constants with non-ascii-chars
Submitted: 8 Jan 2006 23:41 Modified: 11 Jul 2007 19:52
Reporter: Roland Volkmann Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.19-BK, 5.0.18-nt OS:Linux (Linux, Windows XP Prof.)
Assigned to: Alexander Nozdrin CPU Architecture:Any
Tags: rt_q1_2007

[8 Jan 2006 23:41] Roland Volkmann
Description:
If you have views or stored procedures in an innodb-database where string-constants are used containing non-ascii-chars (e.g. german umlauts), these non-ascii-chars are replaced by '?' (question marks) in the dump generated with "mysqldump --default-character-set=latin1". If you reload such a dump, views are corrupted, and stored procedures are not recreated.

How to repeat:
mysql> create database ttt;
Query OK, 1 row affected (0.00 sec)

mysql> use ttt;
Database changed
mysql> set names cp850;   <--- this is only because of windows cmd line
Query OK, 0 rows affected (0.00 sec)

/* With a GUI-Tool using latin1, the above command isn't neccessary */

mysql> create view v_test as select 'äöü';
Query OK, 0 rows affected (0.01 sec)
mysql> select * from v_test;
+-----+
| äöü |
+-----+
| äöü |
+-----+
1 row in set (0.00 sec)  /* everything is ok */

mysql> create procedure p_test() select 'äöü';
Query OK, 0 rows affected (0.00 sec)

mysql> call p_test();
+-----+
| äöü |
+-----+
| äöü |
+-----+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)  /* everything is ok */

>>> Now make a dump (in original it's only one command-line):

mysqldump.exe --default-character-set=latin1 --databases --routines --single-transaction --opt --user=root --password=secret --result-file=ttt.sql

>>> if you look into the dump file "ttt.sql", you will find the umlauts replaced by '???'.

>> if you drop database ttt and load the dump file, the newly created database ttt only has view v_test, which doesn't work any more because of the replaced string. The stored procedure isn't restored at all.

With best regards,

Roland.
[10 Jan 2006 10:22] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of 

show variables like 'collat%';

and 

show variables like 'char%';

statements from mysql client just before create view and create procedure statements. Just to be sure...
[10 Jan 2006 10:53] Roland Volkmann
Here are the requested infos from mysql client:

mysql> create database ttt;
Query OK, 1 row affected (0.05 sec)

mysql> use ttt;
Database changed
mysql> set names cp850;
Query OK, 0 rows affected (0.06 sec)

mysql> show variables like 'collat%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | cp850_general_ci  |
| collation_database   | latin1_german2_ci |
| collation_server     | latin1_german2_ci |
+----------------------+-------------------+
3 rows in set (0.03 sec)

mysql> show variables like 'char%';
+--------------------------+----------------------------------------------------
-+
| Variable_name            | Value
 |
+--------------------------+----------------------------------------------------
-+
| character_set_client     | cp850
 |
| character_set_connection | cp850
 |
| character_set_database   | latin1
 |
| character_set_results    | cp850
 |
| character_set_server     | latin1
 |
| character_set_system     | utf8
 |
| character_sets_dir       | D:\Programme\MySQL\MySQL Server 5.0\share\charsets\
 |
+--------------------------+----------------------------------------------------
-+
7 rows in set (0.01 sec)

mysql>

And here the results of the same queries from EMS SQL Manager:

Variable_name                                      Value                              
---------------------------------------------------------                
collation_connection                               latin1_swedish_ci                                  
collation_database                                 latin1_german2_ci                                  
collation_server                                   latin1_german2_ci              

Variable_name                                      Value                              
---------------------------------------------------------                
character_set_client                               latin1                                              
character_set_connection                           latin1                                              
character_set_database                             latin1                                              
character_set_results                              latin1                                              
character_set_server                               latin1                                              
character_set_system                               utf8                                                
character_sets_dir                                 D:\Programme\MySQL\MySQL Server 5.0\share\charsets\ 

With best regards,

Roland.
[21 Jan 2006 14:48] Valeriy Kravchuk
Verified just as described with 5.0.19-BK on Linux:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3 to server version: 5.0.19
 
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
 
mysql> create database ttt;
Query OK, 1 row affected (0.00 sec)
 
mysql> use ttt;
Database changed
mysql> set names cp850;
Query OK, 0 rows affected (0.07 sec)
 
mysql> create view v_test as select 'äöü';
Query OK, 0 rows affected (0.04 sec)
 
mysql> select * from v_test;
+--------+
| äöü |
+--------+
| äöü |
+--------+
1 row in set (0.04 sec)
 
mysql> create procedure p_test() select 'äöü';
Query OK, 0 rows affected (0.00 sec)
 
mysql> call p_test();
+--------+
| äöü |
+--------+
| äöü |
+--------+
1 row in set (0.00 sec)
 
Query OK, 0 rows affected (0.03 sec)
 
mysql> exit
Bye
[openxs@Fedora 5.0]$ bin/mysqldump --default-character-set=latin1 --databases ttt --routines --single-transaction --opt --user=root --result-file=ttt.sql
[openxs@Fedora 5.0]$ cat ttt.sql
-- MySQL dump 10.10
--
-- Host: localhost    Database: ttt
-- ------------------------------------------------------
-- Server version       5.0.19
 
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES latin1 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
 
--
-- Current Database: `ttt`
--
 
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `ttt` /*!40100 DEFAULT CHARACTER SET latin1 */;
 
USE `ttt`;
 
--
-- Table structure for table `v_test`
--
 
DROP TABLE IF EXISTS `v_test`;
/*!50001 DROP VIEW IF EXISTS `v_test`*/;
/*!50001 DROP TABLE IF EXISTS `v_test`*/;
/*!50001 CREATE TABLE `v_test` (
  `?�?�??` varchar(6)
) */;
 
--
-- Dumping routines for database 'ttt'
--
DELIMITER ;;
/*!50003 DROP PROCEDURE IF EXISTS `p_test` */;;
/*!50003 SET SESSION SQL_MODE=""*/;;
/*!50003 CREATE PROCEDURE `p_test`()
select '���' */;;
/*!50003 SET SESSION SQL_MODE=@OLD_SQL_MODE*/;;
DELIMITER ;
 
--
-- Current Database: `ttt`
--
 
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `ttt` /*!40100 DEFAULT CHARACTER SET latin1 */;
 
USE `ttt`;
 
--
-- View structure for view `v_test`
--
 
/*!50001 DROP TABLE IF EXISTS `v_test`*/;
/*!50001 DROP VIEW IF EXISTS `v_test`*/;
/*!50001 CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v_test` AS select _cp850'���' AS `?�?�??`*/;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

So, it really looks like a bug or something that should be clearly documented.
[21 Jan 2006 18:12] Valeriy Kravchuk
Sorry, my fault. 

What we got is a result of charsets that were not properly set. Just skip that --default-character-set and check the results yourself. Or set it properly, according to data you are trying to dump. 

It is not a bug, but intended behaviour.
[23 Jan 2006 20:45] Sergei Golubchik
It's a bug.
[12 Apr 2006 14:54] Sergei Golubchik
test case:
create view v_test as select 'äöü';
create procedure p_test() select 'äöü';
show create view v_test;
show create procedure p_test;
result:
CREATE ALGORITHM=UNDEFINED DEFINER=`serg`@`localhost` SQL SECURITY DEFINER VIEW `v_test` AS select _latin1'???' AS `äöü`
CREATE DEFINER=`serg`@`localhost` PROCEDURE `p_test`() select '???'
[10 Jul 2006 6:29] Greg Lehey
This bug was entered against backup.  As serg has shown, this is not the case.
Changing to 'character sets'.
[11 Jul 2006 4:28] Greg Lehey
Checking the code: SHOW CREATE VIEW calls mysqld_show_create() (sql_show.cc), while SHOW CREATE PROCEDURE calls sp_head::show_create_procedure() (sp_head.cc).

mysqld_show_create() handles the information directly and uses the correct character set, latin1.  ::show_create_procedure() calls Protocol::send_fields() (protocol.cc), which thinks that the character set is UTF.  Investigation continues.
[11 Aug 2006 13:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10311

ChangeSet@1.2236, 2006-08-11 15:29:04+02:00, andrey@example.com +1 -0
  Fix for bug#16291: mysqldump corrupts string-constants with non-ascii-chars
  The fix is the part related to stored procedures. There is no test case but one can use this one.
  set names latin1;
  CREATE PROCEDURE `p_test_latin1`() select "öäü";
  --switch the konsole (not mysql) to utf8 to view correctly utf8
  --then switch the connection to utf8
  set names utf8;
  show create procedure p_test_latin1;
[11 Aug 2006 13:34] Andrey Hristov
set names latin1;
CREATE PROCEDURE `p_test_latin1`() select "öäü";
set names utf8;
SHOW CREATE PROCEDURE `p_test_latin1`;
[1 Nov 2006 11:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14663

ChangeSet@1.2299, 2006-11-01 14:36:16+03:00, anozdrin@alik. +6 -0
  This is a preliminary parch for fixing CREATE VIEW - part problem
  of BUG#16291: mysqldump corrupts string-constants with non-ascii-chars
[11 Dec 2006 19:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/16793

ChangeSet@1.2318, 2006-12-11 22:51:35+03:00, anozdrin@booka. +4 -0
  This is a preliminary patch for BUG#16291: mysqldump corrupts
  string-constants with non-ascii-chars.
  
  The fix is:
    - to store view-query in the original character set;
    - to store additional attributes -- original character sets
      (client, connection, server);
    - to use original character sets when parsing the view;
    - to add charset-columns into SHOW CREATE VIEW output;
    - to add charset-columns into I_S.VIEWS pseudo-table;
    - to update mysqldump so that it puts charset-switching statements
      before CREATE VIEW statement. They are required because now
      CREATE VIEW is in the original charset (not the mysqldump-client's
      character set).
[23 Jan 2007 12:57] Alexander Nozdrin
BUG#23622 (mysqldump/mysql do not dump/show triggers with cp1251 chars) marked
as a duplicate of this bug.
[27 Jun 2007 15:11] Konstantin Osipov
See the note for Bug#11986
[28 Jun 2007 23:49] Konstantin Osipov
Queued into 5.1-runtime
[28 Jun 2007 23:54] Konstantin Osipov
Was also reviewed by Alexander Barkov
[28 Jun 2007 23:56] Konstantin Osipov
ChangeSet
  anozdrin/alik@ibm.|ChangeSet|20070628173454|18968    2007/06/28 21:34:54+04:00
anozdrin@ibm. +82 -0
  Patch for the following bugs:
    - BUG#11986: Stored routines and triggers can fail if the code
      has a non-ascii symbol
    - BUG#16291: mysqldump corrupts string-constants with non-ascii-chars
    - BUG#19443: INFORMATION_SCHEMA does not support charsets properly
    - BUG#21249: Character set of SP-var can be ignored
    - BUG#25212: Character set of string constant is ignored (stored routines)
    - BUG#25221: Character set of string constant is ignored (triggers)
  
  There were a few general problems that caused these bugs:
  1. Character set information of the original (definition) query for views,
     triggers, stored routines and events was lost.
  2. mysqldump output query in client character set, which can be
     inappropriate to encode definition-query.
  3. INFORMATION_SCHEMA used strings with mixed encodings to display object
     definition;
  
  1. No query-definition-character set.
  
  In order to compile query into execution code, some extra data (such as
  environment variables or the database character set) is used. The problem
  here was that this context was not preserved. So, on the next load it can
  differ from the original one, thus the result will be different.
  
  The context contains the following data:
    - client character set;
    - connection collation (character set and collation);
    - collation of the owner database;
  
  The fix is to store this context and use it each time we parse (compile)
  and execute the object (stored routine, trigger, ...).
  
  2. Wrong mysqldump-output.
  
  The original query can contain several encodings (by means of character set
  introducers). The problem here was that we tried to convert original query
  to the mysqldump-client character set.
  
  Moreover, we stored queries in different character sets for different
  objects (views, for one, used UTF8, triggers used original character set).
  
  The solution is
    - to store definition queries in the original character set;
    - to change SHOW CREATE statement to output definition query in the
      binary character set (i.e. without any conversion);
    - introduce SHOW CREATE TRIGGER statement;
    - to dump special statements to switch the context to the original one
      before dumping and restore it afterwards.
  
  Note, in order to preserve the database collation at the creation time,
  additional ALTER DATABASE might be used (to temporary switch the database
  collation back to the original value). In this case, ALTER DATABASE
  privilege will be required. This is a backward-incompatible change.
  
  3. INFORMATION_SCHEMA showed non-UTF8 strings
  
  The fix is to generate UTF8-query during the parsing, store it in the object
  and show it in the INFORMATION_SCHEMA.
  
  Basically, the idea is to create a copy of the original query convert it to
  UTF8. Character set introducers are removed and all text literals are
  converted to UTF8.
  
  This UTF8 query is intended to provide user-readable output. It must not be
  used to recreate the object.  Specialized SHOW CREATE statements should be
  used for this.
  
  The reason for this limitation is the following: the original query can
  contain symbols from several character sets (by means of character set
  introducers).
  
  Example:
  
    - original query:
      CREATE VIEW v1 AS SELECT _cp1251 'Hello' AS c1;
  
    - UTF8 query (for INFORMATION_SCHEMA):
      CREATE VIEW v1 AS SELECT 'Hello' AS c1;
[2 Jul 2007 15:03] Bugs System
Pushed into 5.0.46
[2 Jul 2007 15:05] Bugs System
Pushed into 5.1.21-beta
[11 Jul 2007 19:52] Paul DuBois
Noted in 5.1.21 changelog.

(Description applies to Bug#11986, Bug#16291, Bug#19443,
Bug#21249, Bug#25212, Bug#25221)

Several issues were identified for stored programs
containing non-ASCII symbols. These issues involved
conversion errors due to incomplete character set
information when translating these objects to and from
stored format, such as:

* Parsing the original object definition so that it
  can be stored.

* Compiling the stored definition into executable form
  when the object is invoked.

* Retrieval of object definitions from 
  INFORMATION_SCHEMA tables.

* Displaying the object definition in SHOW statements.
  This issue also affected mysqldump, which uses SHOW.

The fix for the problems is to store character set
information from the object creation context so that this
information is available when the object needs to be used
later. The context includes the client character set, the
connection character set and collation, and the collation
of the database with which the object is associated.
As a result of the patch, several tables have new
columns:

* In the mysql database, the proc and event tables now
  have these columns: character_set_client, 
  collation_connection, db_collation, body_utf8.

* In INFORMATION_SCHEMA, the VIEWS table now has these
  columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION.
  The ROUTINES, TRIGGERS, and EVENT tables now have
  these columns: CHARACTER_SET_CLIENT,
  COLLATION_CONNECTION, DATABASE_COLLATION.

These columns store the session values of the
character_set_client and collation_connection system
variables, and the collation of the database with which
the object is associated. The values are those in effect
at object creation time. (The saved database collation is
not the value of the collation_database system variable,
which applies to the default database; the database that
contains the object is not necessarily the default
database.)

Several SHOW statements now display additional columns
corresponding to the new table columns. These statements
are: SHOW CREATE EVENT, SHOW CREATE FUNCTION, SHOW CREATE
PROCEDURE, SHOW CREATE VIEW, SHOW EVENTS, SHOW FUNCTION
STATUS, SHOW PROCEDURE STATUS, SHOW TRIGGERS.
[9 Aug 2007 13:18] Bugs System
Pushed into 5.1.22-beta
[4 Mar 2011 14:13] MprDave MprDave
Many thanks for posting such great info here. I was just seeking for some info on this topic.