Bug #39527 REPLACE() fails to match special characters when used in Stored Function
Submitted: 18 Sep 2008 22:26 Modified: 7 Jan 2015 16:16
Reporter: Daniel Rhoden Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S1 (Critical)
Version:5.1.22, 5.0, 5.1, 6.0 bzr OS:Linux
Assigned to: Paul DuBois CPU Architecture:Any
Tags: replace(), Stored Functions, stored procedures

[18 Sep 2008 22:26] Daniel Rhoden
Description:
Goal is to replace accented characters like, Ä Å Ç, with A A C, and so on.

In a standard SQL statement it is possible to do this:

[code]
SELECT REPLACE('Ä test', 0xC384, 'A') as test
[/code]

And you get:
'A test'

The same test inside a Stored Function fails to match.  It continues to produce:
'Ä test'

How to repeat:
Attempt to define the following script:

[code]
DROP FUNCTION IF EXISTS `strip_accents`;
DELIMITER $$
CREATE FUNCTION `strip_accents`(bubba VARCHAR(240)) RETURNS varchar(240) CHARSET utf8
    DETERMINISTIC
BEGIN
  DECLARE clean_string VARCHAR(240);
  SET clean_string = REPLACE(bubba, 0xC384, 'A');
  SET clean_string = REPLACE(clean_string, 'test', 'BUG');
  RETURN clean_string;
END
$$
[/code]

Now execute the following sql statement:
[code]
SELECT strip_accents('Ä test') as test
[/code]

Suggested fix:
Don't know how or where to fix it.  Sorry.
[18 Sep 2008 22:27] Daniel Rhoden
stored function test sql

Attachment: strip_accents.sql (application/octet-stream, text), 339 bytes.

[19 Sep 2008 5:19] Sveta Smirnova
Thank you for the report.

Verified as described.
[28 Oct 2008 7:58] Sveta Smirnova
Real test case used for verifying this:

$cat src/tests/archive/*39527*
set names utf8;

DROP FUNCTION IF EXISTS `strip_accents`;
DELIMITER $$;
CREATE FUNCTION `strip_accents`(bubba VARCHAR(240)) RETURNS varchar(240) CHARSET utf8
    DETERMINISTIC
BEGIN
  DECLARE clean_string VARCHAR(240);
  SET clean_string = REPLACE(bubba, 0xC384, 'A');
  SET clean_string = REPLACE(clean_string, 'test', 'BUG');
  RETURN clean_string;
END
$$

delimiter ;$$

SELECT strip_accents('Ä test') as test, REPLACE('Ä test', 0xC384, 'A');
[10 Nov 2008 12:02] Susanne Ebrecht
This is indeed a bug in my eyes. Or to say it better: behaviour isn't familiar.

Parameters of functions will be encoded to the database encoding instead of client encoding.

1) Create a latin1 database and use them: 
create database foo charset latin1;
use foo;

2) set names to your environment encoding. i.e. when you terminal is utf8 then use: set names utf8, if it is iso-8859-15 use latin1, ...

3) take the test from above, create the function

Result:
if terminal/client is set to utf8 then test will fail
only if terminal and client is set to latin1 it will pass

1a) Create a utf8 database and use it:
create database bar charset utf8;
use bar;

2a) set names to your environment encoding. i.e. when you terminal is utf8 then use: set names utf8, if it is iso-8859-15 use latin1, ...

3a) take the test from above, create the function

Result:
if terminal/client is set to utf8 then test will pass
if terminal/client is set to another charset test will fail
[10 Nov 2008 13:15] Susanne Ebrecht
Many thanks to bring this topic up.

We discussed this internally with the result that this is neither a bug nor a feature request.

I any case it is a lack of documentation and so I set bug as documentation bug.

Let me help you first; Your function will always work (doesn't matter which client encoding you are using) by coding it so:

CREATE FUNCTION `strip_accents`(bubba VARCHAR(240) CHARACTER SET utf8) RETURNS varchar(240) CHARSET utf8 DETERMINISTIC
BEGIN
  DECLARE clean_string VARCHAR(240);
  SET clean_string = REPLACE(bubba, 0xC384, 'A');
  SET clean_string = REPLACE(clean_string, 'test', 'BUG');
  RETURN clean_string;
END

Now let me explain how our functions according to character set work:

Functions and procedures always belong to databases.

When you create a function or procedure it needs a character set in which it will be stored. For this MySQL is using the database charset.

Of course your environment encoding and your character_set_client, character_set_result sometimes also character_set_connection has to match to your environment. When your Linux terminal is using ISO-8859-15 then these variables should be latin1 if your terminal is utf8 then these variables should be utf8. You can change these variables by using: SET NAMES <charaset name>;

Let's say your database is latin1 and your whole environment is utf8 and of course you made a SET NAMES utf8 first.

Now you create a function and use signs that are different in latin1 and utf8 like German umlauts (ä, ö, ü, ß). MySQL recognise: "ahhh, client is utf8 and database is latin1". Totally automatic and transparent it will convert your code into latin1 and store it.

The same will happen with the parameter of a function. The system recognise: "ahhh, user client is utf8 and database is latin1 which means function will only work, when I convert all to latin1 first". So your parameters will be converted to latin1 before function get executed.

Consider usually when you compare something then users do it by comparing the signs. Like: is variable == 'ä'?. For this comparing both side should have same character set.

When you want a comparing by using a specific character set then you have to tell the system (function/procedure) manually which character set you want.

CREATE FUNCTION `strip_accents`(bubba VARCHAR(240) CHARACTER SET utf8) <--- this means, the function will always convert the value of bubba into utf8. Doesn't matter if character set is utf8 or latin1 or latin2 or whatever.

REPLACE(bubba, 0xC384, 'A'); <--- 0xC384 is utf8, bubba is utf8 because you told it at parameter definition, so utf8 will be compared with utf8 and all works fine.

Summary:
When you want to use another charset in a function then database character set then you have to tell the function which charset you want to use by setting all parameters/variables which should have your charset to your charset manually.
[17 Jan 2011 9:48] Bobby k
Hi Guys,

I am not sure if this is deserving of a separate issue (or it's the same issue on this thread) or if it's just the lack of documentation, forgive me if thats the case!

But, related to replace() with non-characters: Basically we had data imported incorrectly to our magento database, and added a mumble jumble of ascii characters to some of our product descriptions in place of where an apostrophe should be.

So the first thing that made sense was to try to just replace the whole string:

update catalog_product_entity_text  set value = replace(value ,'’','test')

But this did not replace anything at all, as if it did not find the text anywhere. I tried variants of this also, such as escaping the comma's and apostrophes, etc..

Anyways,

When I replaced that same string, but in smaller chunks, like so, it worked just fine:

update catalog_product_entity_text set value = replace(value ,'ÃÆâ€','');
update catalog_product_entity_text set value = replace(value ,'™Ãƒâ€','');
update catalog_product_entity_text set value = replace(value ,'’','');
update catalog_product_entity_text set value = replace(value ,'Ãâ€','');

What's odd to me, is that there was no one defining ascii character that seemed to cause it not to find the string, after playing around with it for a bit...

Thanks
[18 Sep 2014 11:09] Jon Stephens
Reassigning to myself.
[23 Sep 2014 13:00] Paul DuBois
Reclaiming. I have some background discussion with developers to be used for this.
[7 Jan 2015 16:16] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly.

The important takeaway from the preceding discussion is this:

"
Summary:
When you want to use another charset in a function then database character set then you have to tell the function which charset you want to use by setting all parameters/variables which should have your charset to your charset manually.
"

I'll update the CREATE PROCEDURE and database-charset sections to point this out more explicitly.

http://dev.mysql.com/doc/refman/5.7/en/create-procedure.html
http://dev.mysql.com/doc/refman/5.7/en/charset-database.html