Bug #25221 | Character set of string constant is ignored | ||
---|---|---|---|
Submitted: | 20 Dec 2006 18:23 | Modified: | 11 Jul 2007 19:52 |
Reporter: | Alexander Nozdrin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Stored Routines | Severity: | S3 (Non-critical) |
Version: | 5.0 | OS: | Any |
Assigned to: | Alexander Nozdrin | CPU Architecture: | Any |
Tags: | rt_q1_2007 |
[20 Dec 2006 18:23]
Alexander Nozdrin
[28 Jun 2007 23:50]
Konstantin Osipov
Queued into 5.1-runtime
[28 Jun 2007 23:53]
Konstantin Osipov
Actually also was reviewed by Alexander Barkov
[28 Jun 2007 23:53]
Konstantin Osipov
Final changeset: ChangeSet anozdrin/alik@ibm.|ChangeSet|20070628173454|18968 2007/06/28 21:34:54+04:00 anozdrin@ibm. +82 -0 Patch for the following bugs: - BUG#11986: Stored routines and triggers can fail if the code has a non-ascii symbol - BUG#16291: mysqldump corrupts string-constants with non-ascii-chars - BUG#19443: INFORMATION_SCHEMA does not support charsets properly - BUG#21249: Character set of SP-var can be ignored - BUG#25212: Character set of string constant is ignored (stored routines) - BUG#25221: Character set of string constant is ignored (triggers) There were a few general problems that caused these bugs: 1. Character set information of the original (definition) query for views, triggers, stored routines and events was lost. 2. mysqldump output query in client character set, which can be inappropriate to encode definition-query. 3. INFORMATION_SCHEMA used strings with mixed encodings to display object definition; 1. No query-definition-character set. In order to compile query into execution code, some extra data (such as environment variables or the database character set) is used. The problem here was that this context was not preserved. So, on the next load it can differ from the original one, thus the result will be different. The context contains the following data: - client character set; - connection collation (character set and collation); - collation of the owner database; The fix is to store this context and use it each time we parse (compile) and execute the object (stored routine, trigger, ...). 2. Wrong mysqldump-output. The original query can contain several encodings (by means of character set introducers). The problem here was that we tried to convert original query to the mysqldump-client character set. Moreover, we stored queries in different character sets for different objects (views, for one, used UTF8, triggers used original character set). The solution is - to store definition queries in the original character set; - to change SHOW CREATE statement to output definition query in the binary character set (i.e. without any conversion); - introduce SHOW CREATE TRIGGER statement; - to dump special statements to switch the context to the original one before dumping and restore it afterwards. Note, in order to preserve the database collation at the creation time, additional ALTER DATABASE might be used (to temporary switch the database collation back to the original value). In this case, ALTER DATABASE privilege will be required. This is a backward-incompatible change. 3. INFORMATION_SCHEMA showed non-UTF8 strings The fix is to generate UTF8-query during the parsing, store it in the object and show it in the INFORMATION_SCHEMA. Basically, the idea is to create a copy of the original query convert it to UTF8. Character set introducers are removed and all text literals are converted to UTF8. This UTF8 query is intended to provide user-readable output. It must not be used to recreate the object. Specialized SHOW CREATE statements should be used for this. The reason for this limitation is the following: the original query can contain symbols from several character sets (by means of character set introducers). Example: - original query: CREATE VIEW v1 AS SELECT _cp1251 'Hello' AS c1; - UTF8 query (for INFORMATION_SCHEMA): CREATE VIEW v1 AS SELECT 'Hello' AS c1;
[2 Jul 2007 15:05]
Bugs System
Pushed into 5.1.21-beta
[11 Jul 2007 19:52]
Paul DuBois
Noted in 5.1.21 changelog. (Description applies to Bug#11986, Bug#16291, Bug#19443, Bug#21249, Bug#25212, Bug#25221) Several issues were identified for stored programs containing non-ASCII symbols. These issues involved conversion errors due to incomplete character set information when translating these objects to and from stored format, such as: * Parsing the original object definition so that it can be stored. * Compiling the stored definition into executable form when the object is invoked. * Retrieval of object definitions from INFORMATION_SCHEMA tables. * Displaying the object definition in SHOW statements. This issue also affected mysqldump, which uses SHOW. The fix for the problems is to store character set information from the object creation context so that this information is available when the object needs to be used later. The context includes the client character set, the connection character set and collation, and the collation of the database with which the object is associated. As a result of the patch, several tables have new columns: * In the mysql database, the proc and event tables now have these columns: character_set_client, collation_connection, db_collation, body_utf8. * In INFORMATION_SCHEMA, the VIEWS table now has these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION. The ROUTINES, TRIGGERS, and EVENT tables now have these columns: CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION. These columns store the session values of the character_set_client and collation_connection system variables, and the collation of the database with which the object is associated. The values are those in effect at object creation time. (The saved database collation is not the value of the collation_database system variable, which applies to the default database; the database that contains the object is not necessarily the default database.) Several SHOW statements now display additional columns corresponding to the new table columns. These statements are: SHOW CREATE EVENT, SHOW CREATE FUNCTION, SHOW CREATE PROCEDURE, SHOW CREATE VIEW, SHOW EVENTS, SHOW FUNCTION STATUS, SHOW PROCEDURE STATUS, SHOW TRIGGERS.