Bug #44838 REGEXP search yields wrong results
Submitted: 13 May 2009 1:00 Modified: 13 May 2009 5:37
Reporter: Miguel Sousa Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0.67-community OS:Any
Assigned to: CPU Architecture:Any
Tags: regexp search

[13 May 2009 1:00] Miguel Sousa
Description:
The database has one table containing six rows with the following content:

অ
অহ
থ
দ
দহ
হ

Doing a REGEXP search using the pattern '^[অহ]+$' yields all 6 rows, 
whereas the correct result is these 3 rows below:

অ
অহ
হ

How to repeat:
Follow the steps in the Description.

Suggested fix:
None
[13 May 2009 5:37] Sveta Smirnova
Thank you for the report.

But according to http://dev.mysql.com/doc/refman/5.0/en/charset-restrictions.html:

The REGEXP and RLIKE  operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal. 

So this is currently "Not a Bug"

Support for multi-byte character sets is planned. See http://forge1.mysql.com/worklog/task.php?id=353 for details