Bug #31081 server crash in regexp function
Submitted: 18 Sep 2007 14:15 Modified: 3 Dec 2007 17:50
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S1 (Critical)
Version:5.0.48-enterprise, 5.1.23BK OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: bfsm_2007_10_18

[18 Sep 2007 14:15] Shane Bester
Description:
0x820111b handle_segfault + 541
0x84eed05 ordinary + 42
0x84edb0c p_ere_exp + 484
0x84ed824 p_ere + 52
0x84ed720 my_regcomp + 432
0x818f539 Item_func_regex::fix_fields(THD*, Item**) + 825
0x827914c find_order_in_list(THD*, Item**, TABLE_LIST*, st_order*, List<Item>&, List<Item>&, bool) + 614
0x8279205 setup_order(THD*, Item**, TABLE_LIST*, List<Item>&, List<Item>&, st_order*) + 83
0x82964b7 mysql_delete(THD*, TABLE_LIST*, Item*, st_sql_list*, unsigned long long, unsigned long long, bool) + 647
0x8210e6f mysql_execute_command(THD*) + 11583
0x8217550 mysql_parse(THD*, char const*, unsigned int, char const**) + 372
0x820cc00 dispatch_command(enum_server_command, THD*, char*, unsigned int) + 2354
0x820c2c2 do_command(THD*) + 600
0x820acbd handle_one_connection + 255
0x40038aa7 _end + 931807543
0x4017ec2e _end + 933143230

How to repeat:
drop table if exists `t1`;
create table `t1` (`col001` set('a') charset ucs2 collate ucs2_latvian_ci ,key(`col001` ))engine=myisam;
insert into `t1` values (),(),(),(),(),(),(),();
delete from t1 order by (col001 regexp pi()) asc limit 10;

Suggested fix:
.
[21 Sep 2007 10:05] Gleb Shchepa
The problem is in the incomplete code in the ctype_uca.c file: all of the ucs2_latvian_ci, ucs2_swedish_ci etc collations have NULL value in the CHARSET_INFO::ctype field instead of a pointer to a valid ctype array. Also there is no such array.
[27 Sep 2007 23:08] Jeffrey Pugh
Is this possibly related to 31159?
[1 Oct 2007 12:15] Alexander Barkov
A simplified test which demonstrates the same crash:

drop table if exists t1;
create table t1 (col001 set('a') charset ucs2 collate ucs2_latvian_ci);
insert into t1 values ();
select * from t1 where col001 regexp pi();
[1 Oct 2007 14:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34716

ChangeSet@1.2534, 2007-10-01 19:01:51+05:00, bar@mysql.com +11 -0
  Bug#31081 server crash in regexp function
  Problem: The "regex" library written by Henry Spencer
  does not support tricky character sets like UCS2.
  Fix: convert tricky character sets to UTF8 before calling
  regex functions.
[5 Oct 2007 7:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34950

ChangeSet@1.2534, 2007-10-05 12:15:11+05:00, bar@mysql.com +11 -0
  Bug#31081 server crash in regexp function
  Problem: The "regex" library written by Henry Spencer
  does not support tricky character sets like UCS2.
  Fix: convert tricky character sets to UTF8 before calling
  regex functions.
[16 Oct 2007 10:29] Alexander Barkov
Pushed into 5.0.55-rpl
Pushed into 5.1.23-rpl
[24 Oct 2007 7:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36229

ChangeSet@1.2543, 2007-10-24 12:08:33+05:00, bar@mysql.com +1 -0
  Bug#31081 server crash in regexp function
  Additional fix for valgrind warning
[27 Nov 2007 10:49] Bugs System
Pushed into 5.0.54
[27 Nov 2007 10:50] Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:52] Bugs System
Pushed into 6.0.4-alpha
[3 Dec 2007 17:50] Paul DuBois
Noted in 5.0.54, 5.1.23, 6.0.4 changelogs.

REGEXP operations could cause a server crash for character sets such
as ucs2. Now the arguments are converted to utf8 if possible, to
allow correct results to be produced if the resulting string contains
only 8-bit characters.