MySQL Bugs: #11986: Stored routines and triggers can fail if the code has a non-ascii symbol

Bug #11986	Stored routines and triggers can fail if the code has a non-ascii symbol
Submitted:	17 Jul 2005 3:11	Modified:	11 Jul 2007 19:52
Reporter:	Sergey Petrunya	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	5.0-bk	OS:	Linux (Linux Suse 9.3)
Assigned to:	Alexander Nozdrin	CPU Architecture:	Any
Tags:	rt_q1_2007

Description:
SHOW CREATE PROCEDURE fails if
  * the procedure body has multibyte UTF-8 symbols.
  * the procedure is not in the cache.

How to repeat:
Run mysql-test-run on the following .test file, and see int to fail on the last line.
I'll attach the test a file in case copy-paste destroys UTF8 byte sequences.

connect (conn1,localhost,root,,);
connect (conn2,localhost,root,,);

connection conn1;
use test;
delimiter |;
set character set utf8|
create procedure testproc()
  lÃ¤bel1: begin end|
call testproc()|
set character set default|
show create procedure testproc|

delimiter ;|
set character set default;
connection conn2;

show create procedure testproc;

.test file

Attachment: sp_charset_failure.test (application/octet-stream, text), 352 bytes.

TEST                            RESULT
-------------------------------------------------------
sp_charset_failure             [ fail ]

Errors are (from /home/miguel/dbs/mysql-5.0/mysql-test/var/log/mysqltest-time) :
/home/miguel/dbs/mysql-5.0/client/.libs/mysqltest: At line 18: query 'show create procedure testproc' failed: 1064:

Here the test fails but when I modified the utf8 data to be some cyrillic then everything worked fine (2 different connections).

conn1:

mysql> create procedure testproc2() лейbel1: begin end|
Query OK, 0 rows affected (0.01 sec)

conn2:

mysql> show create procedure testproc2\G
*************************** 1. row ***************************
       Procedure: testproc2
        sql_mode:
Create Procedure: CREATE PROCEDURE `test`.`testproc2`()
лейbel1: begin end
1 row in set (0.00 sec)

this happens because we just stuff the procedure body in a LONGBLOB, and then try to parse it in db_load_routine() using whatever is the current connection character set when loading it. we need to store the character set of the body in the mysql.proc table, and use that character set when re-parsing the body in db_load_routine().

it looks to me like this doesn't just impact labels, it also appears to mess with string literals in the procedure body.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10194

ChangeSet@1.2235, 2006-08-09 12:52:30+02:00, andrey@lmy004. +3 -0
  Fix for bug#11986 (SHOW CREATE PROCEDURE fails if procedure has UTF8 symbols and is not in cache)
  
  For the documentation:
  An SP which contains non-latin1 characters when used from another connection could be unusable.
  During load, for example CALL statement (or SHOW CREATE, or ALTER), the server will report a syntax error.
  Possible is also an error about mysql.proc being corrupted. The users does not need to update
  their mysql.proc, and this errors will disappear with this patch.
  DROP procedure is not affected by these problems as it actually does not load the SP from disk if it isn't
  cached.
  
  
  Solved with:
  Preserve the current character set of the client, of the results and the connection.
  Set them temporarily for the the parsing phase to utf8, as before storage into mysql.proc the body
  is converted to utf8. If we don't do this preservation, the current user charset is used, which could
  be different than utf8 and in this case the SP renders unusable. Also we can get an error
  that mysql.proc is corrupted.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28717

ChangeSet@1.2550, 2007-06-14 10:47:17+04:00, anozdrin@ibm. +9 -0
  The second cleanup patch in scope of BUG#11986.
  
  1. Introduce parse_sql() as a high-level replacement for MYSQLparse().
  parse_sql() is responsible to switch and restore "parser context"
  (THD::m_lip for now).
  
  2. Fix typo in sp.cc: THD::spcont should be reset *before* calling
  the parser.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28719

ChangeSet@1.2551, 2007-06-14 11:40:31+04:00, anozdrin@ibm. +4 -0
  This is the 3-rd part of patch for BUG#11986:
  remove redundant "body" from Event_parse_data (use sp_head::m_body).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28763

ChangeSet@1.2552, 2007-06-14 18:32:08+04:00, anozdrin@ibm. +9 -0
  This the 4-th patch in scope of CS patch (BUG#11986).
  
  The patch contains the following changes:
    - Introduce auxilary functions to convenient work with character sets:
      - resolve_charset();
      - resolve_collation();
      - get_default_db_collation();
    - Introduce set_lex_string();
    - Refactor Table_trigger_list::process_triggers() &
      sp_head::execute_trigger() to be consistent with other code;
    - Move reusable code from add_table_for_trigger() into
      build_trn_path() and load_table_name_for_trigger() to be used
      in the following patch.
    - Rename triggers_file_ext and trigname_file_ext into TRN_EXT and
      TRG_EXT respectively.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28764

ChangeSet@1.2552, 2007-06-14 18:35:59+04:00, anozdrin@ibm. +9 -0
  The second cleanup patch in scope of BUG#11986.
  
  1. Introduce parse_sql() as a high-level replacement for MYSQLparse().
  parse_sql() is responsible to switch and restore "parser context"
  (THD::m_lip for now).
  
  2. Fix typo in sp.cc: THD::spcont should be reset *before* calling
  the parser.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28765

ChangeSet@1.2553, 2007-06-14 18:49:17+04:00, anozdrin@ibm. +4 -0
  This is the 3-rd part of patch for BUG#11986:
  remove redundant "body" from Event_parse_data (use sp_head::m_body).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28771

ChangeSet@1.2554, 2007-06-14 19:23:55+04:00, anozdrin@ibm. +9 -0
  This the 4-th patch in scope of CS patch (BUG#11986).
  
  The patch contains the following changes:
    - Introduce auxilary functions to convenient work with character sets:
      - resolve_charset();
      - resolve_collation();
      - get_default_db_collation();
    - Introduce lex_string_set();
    - Refactor Table_trigger_list::process_triggers() &
      sp_head::execute_trigger() to be consistent with other code;
    - Move reusable code from add_table_for_trigger() into
      build_trn_path(), check_trn_exists() and load_table_name_for_trigger()
      to be used in the following patch.
    - Rename triggers_file_ext and trigname_file_ext into TRN_EXT and
      TRG_EXT respectively.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28794

ChangeSet@1.2555, 2007-06-14 22:14:52+04:00, anozdrin@ibm. +3 -0
  Part of patch for BUG#11986: make sp_head::m_body_begin pointer
  private and provide a setter for it. The setter will be used to
  construct UTF-query in the following patches.

Pushed into 5.1.20-beta

The patch is, overall, approved.
Marc's review is pending for the parser part.
We shall take time to resolve a lot of follow ups this patch might create.

Queued into 5.1-runtime

The patch is too large, a manual link to the CS:

ChangeSet
anozdrin/alik@ibm.|ChangeSet|20070628173454|18968 2007/06/28 21:34:54+04:00 anozdrin@ibm. +82 -0
Patch for the following bugs:
- BUG#11986: Stored routines and triggers can fail if the code
has a non-ascii symbol
- BUG#16291: mysqldump corrupts string-constants with non-ascii-chars
- BUG#19443: INFORMATION_SCHEMA does not support charsets properly
- BUG#21249: Character set of SP-var can be ignored
- BUG#25212: Character set of string constant is ignored (stored routines)
- BUG#25221: Character set of string constant is ignored (triggers)

There were a few general problems that caused these bugs:
1. Character set information of the original (definition) query for views,
triggers, stored routines and events was lost.
2. mysqldump output query in client character set, which can be
inappropriate to encode definition-query.
3. INFORMATION_SCHEMA used strings with mixed encodings to display object
definition;

1. No query-definition-character set.

In order to compile query into execution code, some extra data (such as
environment variables or the database character set) is used. The problem
here was that this context was not preserved. So, on the next load it can
differ from the original one, thus the result will be different.

The context contains the following data:
- client character set;
- connection collation (character set and collation);
- collation of the owner database;

The fix is to store this context and use it each time we parse (compile)
and execute the object (stored routine, trigger, ...).

2. Wrong mysqldump-output.

The original query can contain several encodings (by means of character set
introducers). The problem here was that we tried to convert original query
to the mysqldump-client character set.

Moreover, we stored queries in different character sets for different
objects (views, for one, used UTF8, triggers used original character set).

The solution is
- to store definition queries in the original character set;
- to change SHOW CREATE statement to output definition query in the
binary character set (i.e. without any conversion);
- introduce SHOW CREATE TRIGGER statement;
- to dump special statements to switch the context to the original one
before dumping and restore it afterwards.

Note, in order to preserve the database collation at the creation time,
additional ALTER DATABASE might be used (to temporary switch the database
collation back to the original value). In this case, ALTER DATABASE
privilege will be required. This is a backward-incompatible change.

3. INFORMATION_SCHEMA showed non-UTF8 strings

The fix is to generate UTF8-query during the parsing, store it in the object
and show it in the INFORMATION_SCHEMA.

Basically, the idea is to create a copy of the original query convert it to
UTF8. Character set introducers are removed and all text literals are
converted to UTF8.

This UTF8 query is intended to provide user-readable output. It must not be
used to recreate the object. Specialized SHOW CREATE statements should be
used for this.

The reason for this limitation is the following: the original query can
contain symbols from several character sets (by means of character set
introducers).

Example:

- original query:
CREATE VIEW v1 AS SELECT _cp1251 'Hello' AS c1;

- UTF8 query (for INFORMATION_SCHEMA):
CREATE VIEW v1 AS SELECT 'Hello' AS c1;

Pushed into 5.1.21-beta

Noted in 5.1.21 changelog.

(Description applies to Bug#11986, Bug#16291, Bug#19443,
Bug#21249, Bug#25212, Bug#25221)

Several issues were identified for stored programs
containing non-ASCII symbols. These issues involved
conversion errors due to incomplete character set
information when translating these objects to and from
stored format, such as:

* Parsing the original object definition so that it
  can be stored.

* Compiling the stored definition into executable form
  when the object is invoked.

* Retrieval of object definitions from 
  INFORMATION_SCHEMA tables.

* Displaying the object definition in SHOW statements.
  This issue also affected mysqldump, which uses SHOW.

The fix for the problems is to store character set
information from the object creation context so that this
information is available when the object needs to be used
later. The context includes the client character set, the
connection character set and collation, and the collation
of the database with which the object is associated.
As a result of the patch, several tables have new
columns:

* In the mysql database, the proc and event tables now
  have these columns: character_set_client, 
  collation_connection, db_collation, body_utf8.

* In INFORMATION_SCHEMA, the VIEWS table now has these
  columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION.
  The ROUTINES, TRIGGERS, and EVENT tables now have
  these columns: CHARACTER_SET_CLIENT,
  COLLATION_CONNECTION, DATABASE_COLLATION.

These columns store the session values of the
character_set_client and collation_connection system
variables, and the collation of the database with which
the object is associated. The values are those in effect
at object creation time. (The saved database collation is
not the value of the collation_database system variable,
which applies to the default database; the database that
contains the object is not necessarily the default
database.)

Several SHOW statements now display additional columns
corresponding to the new table columns. These statements
are: SHOW CREATE EVENT, SHOW CREATE FUNCTION, SHOW CREATE
PROCEDURE, SHOW CREATE VIEW, SHOW EVENTS, SHOW FUNCTION
STATUS, SHOW PROCEDURE STATUS, SHOW TRIGGERS.

Bug #23318 Server change DDL automatically was marked a duplicate of this bug.

Pushed into 5.1.24-rc

Pushed into 6.0.5-alpha