Bug #31107 SUBSTR produces wrong result if it used within function with utf8 and with TEXT
Submitted: 20 Sep 2007 12:42 Modified: 7 Nov 2008 16:19
Reporter: Victoria Reznichenko Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.46 OS:Any
Assigned to: CPU Architecture:Any

[20 Sep 2007 12:42] Victoria Reznichenko
Description:
If SUBSTR() is used within function with TEXT column and utf8 character set it gives wrong result.

Here is an example of function:

mysql> create function test1(str TEXT) RETURNS TEXT BEGIN DECLARE _rem TEXT CHARACTER SET utf8; DECLARE _pos INT DEFAULT 2; SET _rem = str; SET _pos = 2; SET _rem = SUBSTR( _rem, _pos + 1 ); RETURN _rem; end//
Query OK, 0 rows affected (0.00 sec)

mysql> select test1('A,B,C,X,Y,Z')//
+----------------------+
| test1('A,B,C,X,Y,Z') |
+----------------------+
|       Y,Z            |
+----------------------+
1 row in set (0.00 sec)

expected result is 'B,C,X,Y,Z'

If I use IN parameter in SUBSTR directly, it works fine:

mysql> create function test2(str TEXT) RETURNS TEXT BEGIN RETURN SUBSTR( str, 3 ); end//
Query OK, 0 rows affected (0.00 sec)

mysql> select test2('A,B,C,X,Y,Z')//
+----------------------+
| test2('A,B,C,X,Y,Z') |
+----------------------+
| B,C,X,Y,Z            |
+----------------------+
1 row in set (0.00 sec)

Also if I use VARCHAR(100) instead of TEXT it also works fine:

mysql> create function test3(str VARCHAR(100)) RETURNS VARCHAR(100) BEGIN DECLARE _rem VARCHAR(100); SET _rem = str; SET _rem = SUBSTR( _rem, 3 ); RETURN _rem; end//
Query OK, 0 rows affected (0.00 sec)

mysql> select test3('A,B,C,X,Y,Z')//
+----------------------+
| test3('A,B,C,X,Y,Z') |
+----------------------+
| B,C,X,Y,Z            |
+----------------------+
1 row in set (0.00 sec)

mysql> show variables like "character%";
+--------------------------+-------------------------------------------------------+
| Variable_name            | Value                                                 |
+--------------------------+-------------------------------------------------------+
| character_set_client     | utf8                                                  |
| character_set_connection | utf8                                                  |
| character_set_database   | utf8                                                  |
| character_set_filesystem | binary                                                |
| character_set_results    | utf8                                                  |
| character_set_server     | utf8                                                  |
| character_set_system     | utf8                                                  |
| character_sets_dir       | /data2/users/victoria/mysql5046/share/mysql/charsets/ |
+--------------------------+-------------------------------------------------------+
8 rows in set (0.00 sec)

How to repeat:
DELIMITER //
CREATE FUNCTION test1(str TEXT) RETURNS TEXT BEGIN DECLARE _rem TEXT CHARACTER SET utf8; DECLARE _pos INT DEFAULT 2; SET _rem = str; SET _pos = 2; SET _rem = SUBSTR( _rem, _pos + 1 ); RETURN _rem; end//
DELIMITER ;
select test1('A,B,C,X,Y,Z');
[15 Oct 2007 8:59] Alexander Barkov
Victoria,

I cannot reproduce this problem.

This is the test script:

set names utf8;
select version();
DROP FUNCTION IF EXISTS test1;
DELIMITER //
CREATE FUNCTION test1(str TEXT) RETURNS TEXT
BEGIN
  DECLARE _rem TEXT CHARACTER SET utf8;
  DECLARE _pos INT DEFAULT 2;
  SET _rem = str;
  SET _pos = 2;
  SET _rem = SUBSTR( _rem, _pos + 1 );
  RETURN _rem;
END//
DELIMITER ;
SELECT test1('A,B,C,X,Y,Z');
DROP FUNCTION test1;

This is its output with different versions:

version()
5.0.22
test1('A,B,C,X,Y,Z')
B,C,X,Y,Z

version()
5.0.46-debug
test1('A,B,C,X,Y,Z')
B,C,X,Y,Z

version()
5.0.50-debug
test1('A,B,C,X,Y,Z')
B,C,X,Y,Z
[15 Oct 2007 11:10] MySQL Verification Team
server character set should be utf8 as well:

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select version();
+-----------------------+
| version()             |
+-----------------------+
| 5.0.46-enterprise-gpl |
+-----------------------+
1 row in set (0.00 sec)

mysql> DROP FUNCTION IF EXISTS test1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> DELIMITER //
mysql> CREATE FUNCTION test1(str TEXT) RETURNS TEXT
    -> BEGIN
    ->   DECLARE _rem TEXT CHARACTER SET utf8;
    ->   DECLARE _pos INT DEFAULT 2;
    ->   SET _rem = str;
    ->   SET _pos = 2;
    ->   SET _rem = SUBSTR( _rem, _pos + 1 );
    ->   RETURN _rem;
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;
mysql> SELECT test1('A,B,C,X,Y,Z');
+----------------------+
| test1('A,B,C,X,Y,Z') |
+----------------------+
|       Y,Z            |
+----------------------+
1 row in set (0.00 sec)

mysql> DROP FUNCTION test1;
Query OK, 0 rows affected (0.00 sec)
[29 Jul 2008 11:01] Dave Marshall
I can confirm the same problem. Putting the result of the substring into a different TEXT variable works.

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT VERSION();
+------------------------+
| VERSION()              |
+------------------------+
| 5.0.51a-3ubuntu5.1-log | 
+------------------------+
1 row in set (0.05 sec)

mysql> DROP FUNCTION IF EXISTS test1;
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER //
mysql> CREATE FUNCTION test1(str TEXT) RETURNS TEXT NO SQL
    -> BEGIN
    ->     DECLARE _rem TEXT CHARACTER SET utf8;
    ->     DECLARE _pos INT DEFAULT 2;
    ->     SET _rem = str;
    ->     SET _pos = 2;
    ->     SET _rem = SUBSTR(_rem, _pos + 1);
    ->     RETURN _rem;
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;
mysql> SELECT test1('A,B,C,X,Y,Z');
+----------------------+
| test1('A,B,C,X,Y,Z') |
+----------------------+
|       Y,Z            | 
+----------------------+
1 row in set (0.00 sec)

mysql> DROP FUNCTION test1;
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER //
mysql> CREATE FUNCTION test1(str TEXT) RETURNS TEXT NO SQL
    -> BEGIN
    ->     DECLARE _rem TEXT CHARACTER SET utf8;
    ->     DECLARE _rem_temp TEXT CHARACTER SET utf8;
    ->     DECLARE _pos INT DEFAULT 2;
    ->     SET _rem = str;
    ->     SET _pos = 2;
    ->     SET _rem_temp = SUBSTR(_rem, _pos + 1);
    ->     RETURN _rem_temp;
    -> END//
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;
mysql> SELECT test1('A,B,C,X,Y,Z');
+----------------------+
| test1('A,B,C,X,Y,Z') |
+----------------------+
| B,C,X,Y,Z            | 
+----------------------+
1 row in set (0.00 sec)

mysql> DROP FUNCTION test1;
Query OK, 0 rows affected (0.00 sec)

mysql>
[31 Oct 2008 14:21] Sergei Glukhov
The bugs is not repeatable on latest main 5.0,5.1,6.0 trees:
----------
+show variables like "character%";
+Variable_name	Value
+character_set_client	utf8
+character_set_connection	utf8
+character_set_database	utf8
+character_set_filesystem	binary
+character_set_results	utf8
+character_set_server	utf8
+character_set_system	utf8
+character_sets_dir	/home/gluh/MySQL/mysql-6.0/sql/share/charsets/
+CREATE FUNCTION test1(str TEXT) RETURNS TEXT NO SQL
+BEGIN
+DECLARE _rem TEXT CHARACTER SET utf8;
+DECLARE _pos INT DEFAULT 2;
+SET _rem = str;
+SET _pos = 2;
+SET _rem = SUBSTR( _rem, _pos + 1 );
+RETURN _rem;
+END//
+SELECT test1('A,B,C,X,Y,Z');
+test1('A,B,C,X,Y,Z')
+B,C,X,Y,Z
+DROP FUNCTION IF EXISTS test1;
+CREATE FUNCTION test1(str TEXT) RETURNS TEXT NO SQL
+BEGIN
+DECLARE _rem TEXT CHARACTER SET utf8;
+DECLARE _rem_temp TEXT CHARACTER SET utf8;
+DECLARE _pos INT DEFAULT 2;
+SET _rem = str;
+SET _pos = 2;
+SET _rem_temp = SUBSTR(_rem, _pos + 1);
+RETURN _rem_temp;
+END//
+SELECT test1('A,B,C,X,Y,Z');
+test1('A,B,C,X,Y,Z')
+B,C,X,Y,Z
----------
[7 Nov 2008 13:30] MySQL Verification Team
looks like bug #38469 ?  need valgrind to test properly.
[7 Nov 2008 13:46] Susanne Ebrecht
I have the same problem like Alexander Barkov here. Doesn't matter which version. I can't repeat this on my systems. I tested 5.0.45, 5.0.46, 5.0.51a and actual source trees for 5.0, 5.1 and 6.0.

It seems this is OS/distribution related.

I found a machine where this behaviour is repeatable with 5.0.70 and I found a hint that this is already fixed in 5.0.72.

I will test this ...
[7 Nov 2008 16:18] Susanne Ebrecht
I could reproduce this on Red Hat Enterprise Linux by using MySQL 5.0.46.

On same machine I tested with bzr tree version (5.0.74 at the moment) and all worked fine.

This bug got fixed by fix of bug #38469. I will set this bug here as duplicate of the other bug because there you will find more fixing informations.