Bug #31999 | REGEXP match too many string for accentued character between bracket | ||
---|---|---|---|
Submitted: | 31 Oct 2007 17:38 | Modified: | 16 Nov 2007 11:33 |
Reporter: | Séb Cabot | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 5.1.22-rc, 5.0, 4.1 BK | OS: | MacOS |
Assigned to: | CPU Architecture: | Any | |
Tags: | Characters Sets, REGEXP |
[31 Oct 2007 17:38]
Séb Cabot
[13 Nov 2007 9:17]
Sveta Smirnova
Thank you for the report. Please provide output of SHOW VARIABLES LIKE 'char%' and SHOW VARIABLES LIKE 'coll%'
[15 Nov 2007 19:45]
Séb Cabot
As requested, here the output of output of SHOW VARIABLES LIKE 'char%' and SHOW VARIABLES LIKE 'coll%' mysql> SHOW VARIABLES LIKE 'char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.02 sec) mysql> SHOW VARIABLES LIKE 'coll%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_bin | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec)
[16 Nov 2007 10:35]
Sveta Smirnova
Thank you for the report. Verified as described.
[16 Nov 2007 11:33]
Sergei Golubchik
This is expected and documented here: http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html "Warning The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal. " It will be ultimately fixed when we will replace the regex library in MySQL.