Bug #62306 Diacritics Not Handled Correctly in Stored Proc Parameters
Submitted: 31 Aug 2011 0:52 Modified: 31 Aug 2011 17:25
Reporter: Trevor Conn Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S2 (Serious)
Version:5.1.56 OS:Any
Assigned to: CPU Architecture:Any

[31 Aug 2011 0:52] Trevor Conn
Description:
I have a repeatable example of diacritics that aren't handled correctly when part of arguments supplied to a stored procedure. However if I run inline SQL using the same diacritics, they work fine. Using stored procedures, the name Atisha Dīpaṃkara is saved to the database as Atisha D?pa?kara. Indeed, I output the value before the statement is run in the stored proc below and the value is already incorrect just by virtue of passing it.

How to repeat:
delimiter $$

CREATE TABLE `name_test` (
  `name_id` int(10) unsigned NOT NULL,
  `name` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  UNIQUE KEY `name_id_UNIQUE` (`name_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$

DROP PROCEDURE IF EXISTS `sp_name_test_insert` $$
CREATE PROCEDURE `sp_name_test_insert` (
  vNameID integer,
  vName varchar(100)
)
BEGIN
  
 /* NOTE: The supplied value for vName is output here and it's already incorrect. */
 select concat('name: ', vName);

 insert into name_test ( name_id, name) values (vNameID, vName);

END $$

delimiter ;

/* Now call this */
call sp_name_test_insert(1, 'Atisha Dīpaṃkara');

/* And then call this */
insert into name_test ( name_id, name) values (2, 'Atisha Dīpaṃkara');

/* Finally, view your results and you'll see the two names are different -- one incorrect, the other correct. */
select * from name_test;

Suggested fix:
I have no idea.
[31 Aug 2011 9:49] Peter Laursen
I cannot reproduce this.  The last SELECT returns

name_id  name               
-------  -------------------
      1  Atisha Dīpaṃkara
      2  Atisha Dīpaṃkara

(using MySQL 5.1.58 and SQLyog as client). 

Peter
(not a MySQL person)
[31 Aug 2011 14:04] Valeriy Kravchuk
I also can not repeat this on current mysql-5.1:

macbook-pro:5.1 openxs$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.60 Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> delimiter $$
mysql> 
mysql> CREATE TABLE `name_test` (
    ->   `name_id` int(10) unsigned NOT NULL,
    ->   `name` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
    ->   UNIQUE KEY `name_id_UNIQUE` (`name_id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
Query OK, 0 rows affected (0.09 sec)

mysql> 
mysql> DROP PROCEDURE IF EXISTS `sp_name_test_insert` $$
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> CREATE PROCEDURE `sp_name_test_insert` (
    ->   vNameID integer,
    ->   vName varchar(100)
    -> )
    -> BEGIN
    ->   
    ->  /* NOTE: The supplied value for vName is output here and it's already incorrect. */
    ->  select concat('name: ', vName);
    -> 
    ->  insert into name_test ( name_id, name) values (vNameID, vName);
    -> 
    -> END $$
Query OK, 0 rows affected (0.02 sec)

mysql> delimiter ;
mysql> call sp_name_test_insert(1, 'Atisha Dīpaṃkara');
+---------------------------+
| concat('name: ', vName)   |
+---------------------------+
| name: Atisha Dīpaṃkara |
+---------------------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (0.00 sec)

mysql> insert into name_test ( name_id, name) values (2, 'Atisha Dīpaṃkara');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> select * from name_test;
+---------+---------------------+
| name_id | name                |
+---------+---------------------+
|       1 | Atisha Dīpaṃkara |
|       2 | Atisha Dīpaṃkara |
+---------+---------------------+
2 rows in set (0.00 sec)

mysql> show variables like 'char%';
+--------------------------+---------------------------------------------+
| Variable_name            | Value                                       |
+--------------------------+---------------------------------------------+
| character_set_client     | latin1                                      |
| character_set_connection | latin1                                      |
| character_set_database   | latin1                                      |
| character_set_filesystem | binary                                      |
| character_set_results    | latin1                                      |
| character_set_server     | latin1                                      |
| character_set_system     | utf8                                        |
| character_sets_dir       | /Users/openxs/dbs/5.1/share/mysql/charsets/ |
+--------------------------+---------------------------------------------+
8 rows in set (0.00 sec)

So, I wonder what is the output of 

show variables like 'char%';

in your case and what OS do you use for MySQL server?
[31 Aug 2011 17:13] Trevor Conn
Thanks for the responses. Based on your feedback, I did one thing I hadn't tried and that is use the MySQL command line tool to run my test. Via that route, the results are correct. I guess I should have mentioned that I was using MySQL Workbench 5.2.31 when executing the SQL I provided. However the command line approach appears to be the only one that works and I don't know why that would be. The error is also present when inserting data from PHP.

In any case, probably means it isn't a MySQL bug if the command line approach works.
[31 Aug 2011 17:16] Trevor Conn
To Valeriy:

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

The hosting provider's OS is some flavor of Linux. The above comes from their machine.
[31 Aug 2011 17:25] Valeriy Kravchuk
OK, and what is the output of:

show variables like 'char%';

from Workbench? (Time to upgrade to 5.2.34 by the way.)
[31 Aug 2011 17:25] Trevor Conn
Closing. Valeriy pointed me down the right track by displaying the charset information. There's something amiss in the charsets as a result of a "set names" call. I reran my test case via MySQL Workbench after executing:

set names 'latin1'

and it worked fine. Via MySQL Workbench, the values had previously been set to 'utf8'. I would assume that utf8 would be more proper, but I guess not.

Thanks for taking a look, folks.
[31 Aug 2011 17:44] Peter Laursen
this is suspicious: 

| character_set_client     | latin1                     |
| character_set_connection | latin1                     |

What happens it you SET NAMES UTF8 as the very first statement? In my understanding you cannot really expect full unicode range properly supported with the above.  But still weird that is works differently from inside and outside a routine.

I also think you should tell what client or connector you use.

(I hope you dont mind that I comment).