Bug #19443 INFORMATION_SCHEMA does not support charsets properly
Submitted: 30 Apr 2006 12:08 Modified: 5 Jul 2007 13:43
Reporter: Alexander Drozdov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Information schema Severity:S2 (Serious)
Version:5.0.20/5.0BK/5.1BK OS:Any (All)
Assigned to: Alexander Nozdrin CPU Architecture:Any
Tags: rt_q1_2007

[30 Apr 2006 12:08] Alexander Drozdov
Description:
SHOW TRIGGERS forgets non-latin symbols in column names. mysqldump also doesn't work properly with these triggers.

How to repeat:
mysql> create table Table1 (eng int(1), рус int(1));
Query OK, 0 rows affected (0.06 sec)

Note that I use koi8r character set, and second column name has been written using russian letters.

mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | koi8r                            |
| character_set_connection | koi8r                            |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | koi8r                            |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

mysql> create trigger Table1_1 before insert on Table1 for each row set NEW.eng=
5;
Query OK, 0 rows affected (0.02 sec)

mysql> show triggers;
+----------+--------+--------+---------------+--------+---------+----------+----------------+
| Trigger  | Event  | Table  | Statement     | Timing | Created | sql_mode | Definer        |
+----------+--------+--------+---------------+--------+---------+----------+----------------+
| Table1_1 | INSERT | Table1 | set NEW.eng=5 | BEFORE |         |          | root@localhost |
+----------+--------+--------+---------------+--------+---------+----------+----------------+
1 row in set (0.00 sec)

That is OK.

mysql> drop trigger Table1_1;
Query OK, 0 rows affected (0.00 sec)

mysql> create trigger Table1_1 before insert on Table1 for each row set NEW.рус=5;
Query OK, 0 rows affected (0.00 sec)

mysql> show triggers;
+----------+--------+--------+-----------+--------+---------+----------+----------------+
| Trigger  | Event  | Table  | Statement | Timing | Created | sql_mode | Definer        |
+----------+--------+--------+-----------+--------+---------+----------+----------------+
| Table1_1 | INSERT | Table1 | set NEW.  | BEFORE |         |          | root@localhost |
+----------+--------+--------+-----------+--------+---------+----------+----------------+
1 row in set (0.00 sec)

As you can see, the Statement column is wrong.

If using the last trigger, it works fine (except strange problems with ODBC conection from Access: MySQL server reports an error if I insert a row into the table using Access, but reports no error if I add a row just after creating a trigger via INSERT command in the MySQL console client; in the last case future Access inserts also work).

I set the severity to serious because I cannot use mysqldump utility to backup my databases with russian column names properly.
[2 May 2006 14:54] MySQL Verification Team
Thank you for the bug report.
mysql> show triggers;
+----------+--------+--------+-----------+--------+---------+----------+----------------+
| Trigger  | Event  | Table  | Statement | Timing | Created | sql_mode | Definer        |
+----------+--------+--------+-----------+--------+---------+----------+----------------+
| Table1_1 | INSERT | Table1 | set
NEW.  | BEFORE | NULL    |          | root@localhost |
+----------+--------+--------+-----------+--------+---------+----------+----------------+
1 row in set (0.02 sec)
[29 May 2006 10:12] Alexander Barkov
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/6956
[18 Jul 2006 7:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/9260
[23 Jan 2007 12:57] Alexander Nozdrin
BUG#23622 (mysqldump/mysql do not dump/show triggers with cp1251 chars) marked
as a duplicate of this bug.
[27 Jun 2007 17:01] Konstantin Osipov
Patch approved just like Bug#11986
[28 Jun 2007 23:48] Konstantin Osipov
Queued into 5.1-runtime
[28 Jun 2007 23:58] Konstantin Osipov
The parser part was also reviewed by Marc Alff.
[28 Jun 2007 23:58] Konstantin Osipov
ChangeSet
  anozdrin/alik@ibm.|ChangeSet|20070628173454|18968    2007/06/28 21:34:54+04:00
anozdrin@ibm. +82 -0
  Patch for the following bugs:
    - BUG#11986: Stored routines and triggers can fail if the code
      has a non-ascii symbol
    - BUG#16291: mysqldump corrupts string-constants with non-ascii-chars
    - BUG#19443: INFORMATION_SCHEMA does not support charsets properly
    - BUG#21249: Character set of SP-var can be ignored
    - BUG#25212: Character set of string constant is ignored (stored routines)
    - BUG#25221: Character set of string constant is ignored (triggers)
  
  There were a few general problems that caused these bugs:
  1. Character set information of the original (definition) query for views,
     triggers, stored routines and events was lost.
  2. mysqldump output query in client character set, which can be
     inappropriate to encode definition-query.
  3. INFORMATION_SCHEMA used strings with mixed encodings to display object
     definition;
  
  1. No query-definition-character set.
  
  In order to compile query into execution code, some extra data (such as
  environment variables or the database character set) is used. The problem
  here was that this context was not preserved. So, on the next load it can
  differ from the original one, thus the result will be different.
  
  The context contains the following data:
    - client character set;
    - connection collation (character set and collation);
    - collation of the owner database;
  
  The fix is to store this context and use it each time we parse (compile)
  and execute the object (stored routine, trigger, ...).
  
  2. Wrong mysqldump-output.
  
  The original query can contain several encodings (by means of character set
  introducers). The problem here was that we tried to convert original query
  to the mysqldump-client character set.
  
  Moreover, we stored queries in different character sets for different
  objects (views, for one, used UTF8, triggers used original character set).
  
  The solution is
    - to store definition queries in the original character set;
    - to change SHOW CREATE statement to output definition query in the
      binary character set (i.e. without any conversion);
    - introduce SHOW CREATE TRIGGER statement;
    - to dump special statements to switch the context to the original one
      before dumping and restore it afterwards.
  
  Note, in order to preserve the database collation at the creation time,
  additional ALTER DATABASE might be used (to temporary switch the database
  collation back to the original value). In this case, ALTER DATABASE
  privilege will be required. This is a backward-incompatible change.
  
  3. INFORMATION_SCHEMA showed non-UTF8 strings
  
  The fix is to generate UTF8-query during the parsing, store it in the object
  and show it in the INFORMATION_SCHEMA.
  
  Basically, the idea is to create a copy of the original query convert it to
  UTF8. Character set introducers are removed and all text literals are
  converted to UTF8.
  
  This UTF8 query is intended to provide user-readable output. It must not be
  used to recreate the object.  Specialized SHOW CREATE statements should be
  used for this.
  
  The reason for this limitation is the following: the original query can
  contain symbols from several character sets (by means of character set
  introducers).
  
  Example:
  
    - original query:
      CREATE VIEW v1 AS SELECT _cp1251 'Hello' AS c1;
  
    - UTF8 query (for INFORMATION_SCHEMA):
      CREATE VIEW v1 AS SELECT 'Hello' AS c1;
[2 Jul 2007 15:06] Bugs System
Pushed into 5.1.21-beta
[5 Jul 2007 13:43] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.1.21 changelog.
[11 Jul 2007 19:52] Paul DuBois
Noted in 5.1.21 changelog.

(Description applies to Bug#11986, Bug#16291, Bug#19443,
Bug#21249, Bug#25212, Bug#25221)

Several issues were identified for stored programs
containing non-ASCII symbols. These issues involved
conversion errors due to incomplete character set
information when translating these objects to and from
stored format, such as:

* Parsing the original object definition so that it
  can be stored.

* Compiling the stored definition into executable form
  when the object is invoked.

* Retrieval of object definitions from 
  INFORMATION_SCHEMA tables.

* Displaying the object definition in SHOW statements.
  This issue also affected mysqldump, which uses SHOW.

The fix for the problems is to store character set
information from the object creation context so that this
information is available when the object needs to be used
later. The context includes the client character set, the
connection character set and collation, and the collation
of the database with which the object is associated.
As a result of the patch, several tables have new
columns:

* In the mysql database, the proc and event tables now
  have these columns: character_set_client, 
  collation_connection, db_collation, body_utf8.

* In INFORMATION_SCHEMA, the VIEWS table now has these
  columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION.
  The ROUTINES, TRIGGERS, and EVENT tables now have
  these columns: CHARACTER_SET_CLIENT,
  COLLATION_CONNECTION, DATABASE_COLLATION.

These columns store the session values of the
character_set_client and collation_connection system
variables, and the collation of the database with which
the object is associated. The values are those in effect
at object creation time. (The saved database collation is
not the value of the collation_database system variable,
which applies to the default database; the database that
contains the object is not necessarily the default
database.)

Several SHOW statements now display additional columns
corresponding to the new table columns. These statements
are: SHOW CREATE EVENT, SHOW CREATE FUNCTION, SHOW CREATE
PROCEDURE, SHOW CREATE VIEW, SHOW EVENTS, SHOW FUNCTION
STATUS, SHOW PROCEDURE STATUS, SHOW TRIGGERS.
[11 Apr 2008 19:50] Sveta Smirnova
Bug #35998 was marked as duplicate of this one.