Bug #27876 SF with cyrillic variable name fails during execution (regression)
Submitted: 17 Apr 2007 8:38 Modified: 4 Jun 2007 18:26
Reporter: Andrey Hristov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Marc Alff CPU Architecture:Any
Tags: regression

[17 Apr 2007 8:38] Andrey Hristov
Description:
 This is a regression. The test case used to work with 5.1.17 but doesn't any more with 5.1.18 . The bug is verified by Miguel. Found because PHP test suite hangs.

5.1.17-beta-community-nt-debug
ret=0
ret=0
res=3404392
ret=0
ret=0
ret=0
res=3404560
Done!

5.1.18-beta-nt:
c:\temp>func
ret=0
ret=0
res=3273304
ret=0
ret=0
ret=0
res=0  <-- NULL result
Done! 

How to repeat:
#include <stdio.h>
#include <mysql.h>
int main()
{
	MYSQL conn;
	int r;
	MYSQL_RES *res;
	mysql_init(&conn);
	mysql_real_connect(&conn, "127.0.0.1", "root", "", "test", 3306, NULL, CLIENT_MULTI_RESULTS);
	r = mysql_query(&conn, "set names utf8");
	printf("ret=%d\n", r);
	r = mysql_query(&conn, "select version()");
	printf("ret=%d\n", r);
	res = mysql_store_result(&conn);
	printf("res=%d\n", res);
	r = mysql_query(&conn, "DROP FUNCTION IF EXISTS функцийка");
	printf("ret=%d\n", r);
	r = mysql_query(&conn, "CREATE FUNCTION функцийка( параметър_версия VARCHAR(25)) RETURNS VARCHAR(25) DETERMINISTIC RETURN параметър_версия");
	printf("ret=%d\n", r);
	r = mysql_query(&conn, "SELECT функцийка(VERSION())");
	printf("ret=%d\n", r);
	res = mysql_store_result(&conn);
	printf("res=%d\n", res);
	printf("Done!\n");
}
[17 Apr 2007 8:38] Andrey Hristov
test case

Attachment: func.c (text/plain), 872 bytes.

[24 May 2007 23:47] Marc Alff
ANALYSIS

The encoding of the parameter name, in UTF8, contains a 0x80 0x5f sequence.

At runtime, the execution of the function fails in:
(gdb) where
#0  my_error (nr=1054, MyFlags=0) at my_error.c:74
#1  0x00000000006546aa in find_field_in_tables (thd=0xeb52d8, item=0xeda8e0, first_table=0x0, last_table=0x0, ref=0xedaa48, report_error=REPORT_ALL_ERRORS, check_privileges=true, register_tree_change=true) at sql_base.cc:5046
#2  0x000000000056a188 in Item_field::fix_outer_field (this=0xeda8e0, thd=0xeb52d8, from_field=0x400bf9a8, reference=0xedaa48) at item.cc:3667
#3  0x000000000056a801 in Item_field::fix_fields (this=0xeda8e0, thd=0xeb52d8, reference=0xedaa48) at item.cc:3869
#4  0x0000000000798bf2 in sp_prepare_func_item (thd=0xeb52d8, it_addr=0xedaa48) at sp_head.cc:303
#5  0x0000000000798d50 in sp_eval_expr (thd=0xeb52d8, result_field=0xec8728, expr_item_ptr=0xedaa48) at sp_head.cc:338
#6  0x00000000007a1ea1 in sp_rcontext::set_return_value (this=<value optimized out>, thd=0x0, return_value_item=0xeda8c8) at sp_rcontext.cc:158
#7  0x00000000007989f4 in sp_instr_freturn::exec_core (this=<value optimized out>, thd=0x0, nextp=<value optimized out>) at sp_head.cc:2907
#8  0x0000000000799194 in sp_lex_keeper::reset_lex_and_exec_core (this=0xedaa58, thd=0xeb52d8, nextp=0x400bfd84, open_tables=false, instr=0xedaa18) at sp_head.cc:2452
#9  0x00000000007994d8 in sp_instr_freturn::execute (this=0xedaa18, thd=0xeb52d8, nextp=0x400bfd84) at sp_head.cc:2885
#10 0x000000000079d339 in sp_head::execute (this=0xeda308, thd=0xeb52d8) at sp_head.cc:1097
#11 0x000000000079fd82 in sp_head::execute_function (this=0xeda308, thd=0xeb52d8, argp=<value optimized out>, argcount=<value optimized out>, return_value_fld=0xec8728) at sp_head.cc:1543
#12 0x0000000000580e1a in Item_func_sp::execute_impl (this=0xec76d0, thd=0xeb52d8) at item_func.cc:5385
#13 0x00000000005866bc in Item_func_sp::execute (this=0xec76d0) at item_func.cc:5321
#14 0x000000000058d2b1 in Item_func_sp::val_str (this=0x41e, str=0x0) at item_func.h:1529
#15 0x00000000005632c3 in Item::send (this=0xec76d0, protocol=0xeb5690, buffer=0x400c0510) at item.cc:4907
#16 0x00000000005f9952 in select_send::send_data (this=0xec8690, items=<value optimized out>) at sql_class.cc:1214
#17 0x000000000068267d in JOIN::exec (this=0xee3c68) at sql_select.cc:1590
#18 0x0000000000684e52 in mysql_select (thd=0xeb52d8, rref_pointer_array=0xeb6e58, tables=0x0, wild_num=0, fields=<value optimized out>, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2149861888, result=0xec8690, unit=0xeb68b0, select_lex=0xeb6c70) at sql_select.cc:2264
#19 0x0000000000685389 in handle_select (thd=0xeb52d8, lex=0xeb6820, result=0xec8690, setup_tables_done_option=0) at sql_select.cc:258
#20 0x0000000000616e8f in execute_sqlcom_select (thd=0xeb52d8, all_tables=0x0) at sql_parse.cc:4431
#21 0x0000000000619f00 in mysql_execute_command (thd=0xeb52d8) at sql_parse.cc:1835
#22 0x000000000061ff03 in mysql_parse (thd=0xeb52d8, inBuf=0xec7438 "SELECT Ñ\204Ñ\203нкÑ\206ийка(VERSION())", length=36, found_semicolon=0x400c2020) at sql_parse.cc:5342
#23 0x0000000000620a8e in dispatch_command (command=COM_QUERY, thd=0xeb52d8, packet=<value optimized out>, packet_length=37) at sql_parse.cc:906
#24 0x00000000006218ef in do_command (thd=0xeb52d8) at sql_parse.cc:668
#25 0x0000000000612e83 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1091
#26 0x00002aaaaade6135 in start_thread () from /lib/libpthread.so.0
#27 0x00002aaaab46252e in clone () from /lib/libc.so.6
#28 0x0000000000000000 in ?? ()

It seems the item name at runtime is truncated, just after 0x5F,
which happen to be the '_' character in ascii, but really is the second part
of a multi-byte character here.

(gdb) x /40xb 0xeda8c8
0xeda8c8:	0xd0	0xbf	0xd0	0xb0	0xd1	0x80	0xd0	0xb0
0xeda8d0:	0xd0	0xbc	0xd0	0xb5	0xd1	0x82	0xd1	0x8a
0xeda8d8:	0xd1	0x80	0x5f	0x00	0xa5	0xa5	0xa5	0xa5
0xeda8e0:	0x70	0xd8	0x9a	0x00	0x00	0x00	0x00	0x00
0xeda8e8:	0x00	0x00	0x00	0x00	0xa5	0xa5	0xa5	0xa5
[25 May 2007 0:30] Marc Alff
ANALYSIS (II).

Ok, found the root cause.

When the prototype of skip_rear_comments changed from const uchar *
to const char*, the following code :
  (endp[-1] <= ' ')
broke, and started to wipe out every character in the [0-0x20] *and* [0x80-0xFF]
range, when they are at the end of the body of a stored procedure / function,
and event (in 5.1).

So, all non ascii characters at the end of <name> in "SELECT <name>;" are lost,
when this code appear in the body of a SP/SF/Trigger.

The real solution for this issue is to remove the function
skip_rear_comment entirely, since it's flawed for other reasons,
so the proper fix for this report depends on Bug#25411.
[25 May 2007 17:37] Marc Alff
Note that 5.0.42 is also affected.
[25 May 2007 18:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27354

ChangeSet@1.2501, 2007-05-25 11:29:53-06:00, malff@weblab.(none) +2 -0
  Bug#27876 (SF with cyrillic variable name fails during execution (regression))
  
  The root cause of this bug is related to the function skip_rear_comments,
  in sql_lex.cc
  
  Recent code changes in skip_rear_comments changed the prototype from
  "const uchar*" to "const char*", which had an unforseen impact on this test:
    (endp[-1] < ' ')
  With unsigned characters, this code filters bytes of value [0x00 - 0x20]
  With *signed* characters, this also filters bytes of value [0x80 - 0xFF].
  
  This caused the regression reported, considering cyrillic characters in the
  parameter name to be whitespace, and truncated.
  Note that the regression is present both in 5.0 and 5.1.
  
  With this fix:
  - [0x80 - 0xFF] bytes are no longer considered whitespace.
  This alone fixes the regression.
  
  In addition, filtering [0x00 - 0x20] was found bogus and abusive,
  so that the code now filters only [0x00, 0x09 (\t), 0x0A (\n) and 0x0D
  (\r)], when looking for whitespace.
  
  Note that this fix is only addressing the regression affecting UTF-8
  in general, but does not address a more fundamental problem with
  skip_rear_comments: parsing a string *backwards*, starting at end[-1],
  is not safe with multi-bytes characters, so that end[-1] can confuse the
  last byte of a multi-byte characters with a characters to filter out.
  
  The only known impact of this remaining issue affects objects that have to
  meet all the conditions below:
  
  - the object is a FUNCTION / PROCEDURE / TRIGGER / EVENT
  - the body consist of only *1* instruction, and does *not* contain a
    BEGIN-END block
  - the instruction ends, lexically, with <ident> <whitespace>* ';'
    For example, "select <ident>;" or "return <ident>;"
  - The last character of <ident> is a multi-byte character
  - the last byte of this character is 0x09, 0x0A, 0x0D, ';' '*' or '/'
  
  In this case, the body of the object will be truncated after parsing,
  and stored in an invalid format.
  
  This last issue has not been fixed in this patch, since the real fix
  will be implemented by Bug 25411 (trigger code truncated), which is caused
  by the very same code.
  The real problem is that the function skip_rear_comments is only a
  work-around, and should be removed entirely: see the proposed patch for
  bug 25411 for details.
[25 May 2007 21:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/27375

ChangeSet@1.2501, 2007-05-25 14:36:01-06:00, malff@weblab.(none) +5 -0
  Bug#27876 (SF with cyrillic variable name fails during execution (regression))
  
  The root cause of this bug is related to the function skip_rear_comments,
  in sql_lex.cc
  
  Recent code changes in skip_rear_comments changed the prototype from
  "const uchar*" to "const char*", which had an unforseen impact on this test:
    (endp[-1] < ' ')
  With unsigned characters, this code filters bytes of value [0x00 - 0x20]
  With *signed* characters, this also filters bytes of value [0x80 - 0xFF].
  
  This caused the regression reported, considering cyrillic characters in the
  parameter name to be whitespace, and truncated.
  Note that the regression is present both in 5.0 and 5.1.
  
  With this fix:
  - [0x80 - 0xFF] bytes are no longer considered whitespace.
  This alone fixes the regression.
  
  In addition, filtering [0x00 - 0x20] was found bogus and abusive,
  so that the code now filters uses my_isspace when looking for whitespace.
  
  Note that this fix is only addressing the regression affecting UTF-8
  in general, but does not address a more fundamental problem with
  skip_rear_comments: parsing a string *backwards*, starting at end[-1],
  is not safe with multi-bytes characters, so that end[-1] can confuse the
  last byte of a multi-byte characters with a characters to filter out.
  
  The only known impact of this remaining issue affects objects that have to
  meet all the conditions below:
  
  - the object is a FUNCTION / PROCEDURE / TRIGGER / EVENT / VIEW
  - the body consist of only *1* instruction, and does *not* contain a
    BEGIN-END block
  - the instruction ends, lexically, with <ident> <whitespace>* ';'?
    For example, "select <ident>;" or "return <ident>;"
  - The last character of <ident> is a multi-byte character
  - the last byte of this character is ';' '*', '/' or whitespace
  
  In this case, the body of the object will be truncated after parsing,
  and stored in an invalid format.
  
  This last issue has not been fixed in this patch, since the real fix
  will be implemented by Bug 25411 (trigger code truncated), which is caused
  by the very same code.
  The real problem is that the function skip_rear_comments is only a
  work-around, and should be removed entirely: see the proposed patch for
  bug 25411 for details.
[25 May 2007 22:24] Marc Alff
Patch approved by Igor, with a minor correction (sent by email)
[1 Jun 2007 19:21] Bugs System
Pushed into 5.0.44
[1 Jun 2007 19:22] Bugs System
Pushed into 5.1.20-beta
[4 Jun 2007 18:26] Paul Dubois
Noted in 5.0.44, 5.1.20 changelogs.

A stored program that uses a variable name containing multibyte
characters could fail to execute.