Bug #27876 | SF with cyrillic variable name fails during execution (regression) | ||
---|---|---|---|
Submitted: | 17 Apr 2007 8:38 | Modified: | 4 Jun 2007 18:26 |
Reporter: | Andrey Hristov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Stored Routines | Severity: | S3 (Non-critical) |
Version: | 5.1 | OS: | Any |
Assigned to: | Marc ALFF | CPU Architecture: | Any |
Tags: | regression |
[17 Apr 2007 8:38]
Andrey Hristov
[17 Apr 2007 8:38]
Andrey Hristov
test case
Attachment: func.c (text/plain), 872 bytes.
[24 May 2007 23:47]
Marc ALFF
ANALYSIS The encoding of the parameter name, in UTF8, contains a 0x80 0x5f sequence. At runtime, the execution of the function fails in: (gdb) where #0 my_error (nr=1054, MyFlags=0) at my_error.c:74 #1 0x00000000006546aa in find_field_in_tables (thd=0xeb52d8, item=0xeda8e0, first_table=0x0, last_table=0x0, ref=0xedaa48, report_error=REPORT_ALL_ERRORS, check_privileges=true, register_tree_change=true) at sql_base.cc:5046 #2 0x000000000056a188 in Item_field::fix_outer_field (this=0xeda8e0, thd=0xeb52d8, from_field=0x400bf9a8, reference=0xedaa48) at item.cc:3667 #3 0x000000000056a801 in Item_field::fix_fields (this=0xeda8e0, thd=0xeb52d8, reference=0xedaa48) at item.cc:3869 #4 0x0000000000798bf2 in sp_prepare_func_item (thd=0xeb52d8, it_addr=0xedaa48) at sp_head.cc:303 #5 0x0000000000798d50 in sp_eval_expr (thd=0xeb52d8, result_field=0xec8728, expr_item_ptr=0xedaa48) at sp_head.cc:338 #6 0x00000000007a1ea1 in sp_rcontext::set_return_value (this=<value optimized out>, thd=0x0, return_value_item=0xeda8c8) at sp_rcontext.cc:158 #7 0x00000000007989f4 in sp_instr_freturn::exec_core (this=<value optimized out>, thd=0x0, nextp=<value optimized out>) at sp_head.cc:2907 #8 0x0000000000799194 in sp_lex_keeper::reset_lex_and_exec_core (this=0xedaa58, thd=0xeb52d8, nextp=0x400bfd84, open_tables=false, instr=0xedaa18) at sp_head.cc:2452 #9 0x00000000007994d8 in sp_instr_freturn::execute (this=0xedaa18, thd=0xeb52d8, nextp=0x400bfd84) at sp_head.cc:2885 #10 0x000000000079d339 in sp_head::execute (this=0xeda308, thd=0xeb52d8) at sp_head.cc:1097 #11 0x000000000079fd82 in sp_head::execute_function (this=0xeda308, thd=0xeb52d8, argp=<value optimized out>, argcount=<value optimized out>, return_value_fld=0xec8728) at sp_head.cc:1543 #12 0x0000000000580e1a in Item_func_sp::execute_impl (this=0xec76d0, thd=0xeb52d8) at item_func.cc:5385 #13 0x00000000005866bc in Item_func_sp::execute (this=0xec76d0) at item_func.cc:5321 #14 0x000000000058d2b1 in Item_func_sp::val_str (this=0x41e, str=0x0) at item_func.h:1529 #15 0x00000000005632c3 in Item::send (this=0xec76d0, protocol=0xeb5690, buffer=0x400c0510) at item.cc:4907 #16 0x00000000005f9952 in select_send::send_data (this=0xec8690, items=<value optimized out>) at sql_class.cc:1214 #17 0x000000000068267d in JOIN::exec (this=0xee3c68) at sql_select.cc:1590 #18 0x0000000000684e52 in mysql_select (thd=0xeb52d8, rref_pointer_array=0xeb6e58, tables=0x0, wild_num=0, fields=<value optimized out>, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2149861888, result=0xec8690, unit=0xeb68b0, select_lex=0xeb6c70) at sql_select.cc:2264 #19 0x0000000000685389 in handle_select (thd=0xeb52d8, lex=0xeb6820, result=0xec8690, setup_tables_done_option=0) at sql_select.cc:258 #20 0x0000000000616e8f in execute_sqlcom_select (thd=0xeb52d8, all_tables=0x0) at sql_parse.cc:4431 #21 0x0000000000619f00 in mysql_execute_command (thd=0xeb52d8) at sql_parse.cc:1835 #22 0x000000000061ff03 in mysql_parse (thd=0xeb52d8, inBuf=0xec7438 "SELECT Ñ\204Ñ\203нкÑ\206ийка(VERSION())", length=36, found_semicolon=0x400c2020) at sql_parse.cc:5342 #23 0x0000000000620a8e in dispatch_command (command=COM_QUERY, thd=0xeb52d8, packet=<value optimized out>, packet_length=37) at sql_parse.cc:906 #24 0x00000000006218ef in do_command (thd=0xeb52d8) at sql_parse.cc:668 #25 0x0000000000612e83 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1091 #26 0x00002aaaaade6135 in start_thread () from /lib/libpthread.so.0 #27 0x00002aaaab46252e in clone () from /lib/libc.so.6 #28 0x0000000000000000 in ?? () It seems the item name at runtime is truncated, just after 0x5F, which happen to be the '_' character in ascii, but really is the second part of a multi-byte character here. (gdb) x /40xb 0xeda8c8 0xeda8c8: 0xd0 0xbf 0xd0 0xb0 0xd1 0x80 0xd0 0xb0 0xeda8d0: 0xd0 0xbc 0xd0 0xb5 0xd1 0x82 0xd1 0x8a 0xeda8d8: 0xd1 0x80 0x5f 0x00 0xa5 0xa5 0xa5 0xa5 0xeda8e0: 0x70 0xd8 0x9a 0x00 0x00 0x00 0x00 0x00 0xeda8e8: 0x00 0x00 0x00 0x00 0xa5 0xa5 0xa5 0xa5
[25 May 2007 0:30]
Marc ALFF
ANALYSIS (II). Ok, found the root cause. When the prototype of skip_rear_comments changed from const uchar * to const char*, the following code : (endp[-1] <= ' ') broke, and started to wipe out every character in the [0-0x20] *and* [0x80-0xFF] range, when they are at the end of the body of a stored procedure / function, and event (in 5.1). So, all non ascii characters at the end of <name> in "SELECT <name>;" are lost, when this code appear in the body of a SP/SF/Trigger. The real solution for this issue is to remove the function skip_rear_comment entirely, since it's flawed for other reasons, so the proper fix for this report depends on Bug#25411.
[25 May 2007 17:37]
Marc ALFF
Note that 5.0.42 is also affected.
[25 May 2007 18:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27354 ChangeSet@1.2501, 2007-05-25 11:29:53-06:00, malff@weblab.(none) +2 -0 Bug#27876 (SF with cyrillic variable name fails during execution (regression)) The root cause of this bug is related to the function skip_rear_comments, in sql_lex.cc Recent code changes in skip_rear_comments changed the prototype from "const uchar*" to "const char*", which had an unforseen impact on this test: (endp[-1] < ' ') With unsigned characters, this code filters bytes of value [0x00 - 0x20] With *signed* characters, this also filters bytes of value [0x80 - 0xFF]. This caused the regression reported, considering cyrillic characters in the parameter name to be whitespace, and truncated. Note that the regression is present both in 5.0 and 5.1. With this fix: - [0x80 - 0xFF] bytes are no longer considered whitespace. This alone fixes the regression. In addition, filtering [0x00 - 0x20] was found bogus and abusive, so that the code now filters only [0x00, 0x09 (\t), 0x0A (\n) and 0x0D (\r)], when looking for whitespace. Note that this fix is only addressing the regression affecting UTF-8 in general, but does not address a more fundamental problem with skip_rear_comments: parsing a string *backwards*, starting at end[-1], is not safe with multi-bytes characters, so that end[-1] can confuse the last byte of a multi-byte characters with a characters to filter out. The only known impact of this remaining issue affects objects that have to meet all the conditions below: - the object is a FUNCTION / PROCEDURE / TRIGGER / EVENT - the body consist of only *1* instruction, and does *not* contain a BEGIN-END block - the instruction ends, lexically, with <ident> <whitespace>* ';' For example, "select <ident>;" or "return <ident>;" - The last character of <ident> is a multi-byte character - the last byte of this character is 0x09, 0x0A, 0x0D, ';' '*' or '/' In this case, the body of the object will be truncated after parsing, and stored in an invalid format. This last issue has not been fixed in this patch, since the real fix will be implemented by Bug 25411 (trigger code truncated), which is caused by the very same code. The real problem is that the function skip_rear_comments is only a work-around, and should be removed entirely: see the proposed patch for bug 25411 for details.
[25 May 2007 21:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/27375 ChangeSet@1.2501, 2007-05-25 14:36:01-06:00, malff@weblab.(none) +5 -0 Bug#27876 (SF with cyrillic variable name fails during execution (regression)) The root cause of this bug is related to the function skip_rear_comments, in sql_lex.cc Recent code changes in skip_rear_comments changed the prototype from "const uchar*" to "const char*", which had an unforseen impact on this test: (endp[-1] < ' ') With unsigned characters, this code filters bytes of value [0x00 - 0x20] With *signed* characters, this also filters bytes of value [0x80 - 0xFF]. This caused the regression reported, considering cyrillic characters in the parameter name to be whitespace, and truncated. Note that the regression is present both in 5.0 and 5.1. With this fix: - [0x80 - 0xFF] bytes are no longer considered whitespace. This alone fixes the regression. In addition, filtering [0x00 - 0x20] was found bogus and abusive, so that the code now filters uses my_isspace when looking for whitespace. Note that this fix is only addressing the regression affecting UTF-8 in general, but does not address a more fundamental problem with skip_rear_comments: parsing a string *backwards*, starting at end[-1], is not safe with multi-bytes characters, so that end[-1] can confuse the last byte of a multi-byte characters with a characters to filter out. The only known impact of this remaining issue affects objects that have to meet all the conditions below: - the object is a FUNCTION / PROCEDURE / TRIGGER / EVENT / VIEW - the body consist of only *1* instruction, and does *not* contain a BEGIN-END block - the instruction ends, lexically, with <ident> <whitespace>* ';'? For example, "select <ident>;" or "return <ident>;" - The last character of <ident> is a multi-byte character - the last byte of this character is ';' '*', '/' or whitespace In this case, the body of the object will be truncated after parsing, and stored in an invalid format. This last issue has not been fixed in this patch, since the real fix will be implemented by Bug 25411 (trigger code truncated), which is caused by the very same code. The real problem is that the function skip_rear_comments is only a work-around, and should be removed entirely: see the proposed patch for bug 25411 for details.
[25 May 2007 22:24]
Marc ALFF
Patch approved by Igor, with a minor correction (sent by email)
[1 Jun 2007 19:21]
Bugs System
Pushed into 5.0.44
[1 Jun 2007 19:22]
Bugs System
Pushed into 5.1.20-beta
[4 Jun 2007 18:26]
Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs. A stored program that uses a variable name containing multibyte characters could fail to execute.