Bug #64370 Increase Unicode support for REGEXP
Submitted: 17 Feb 2012 21:27 Modified: 15 May 2012 16:23
Reporter: David Dykstra Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.5.14 OS:Any
Assigned to: CPU Architecture:Any
Tags: REGEXP, Unicode

[17 Feb 2012 21:27] David Dykstra
Description:
Unicode characters involved in REGEXP, returns improper results.

My guess is that there is a conflict with the use of Multi-Byte Unicode characters. I think it is finding characters that match parts of the unicode bytes.

My default collation used is utf8_bin

How to repeat:
The following doesn't work properly
SELECT * FROM `table` WHERE `column_with_unicode_strings` REGEXP '[^∩]*'

The following does
SELECT * FROM `table` WHERE `column_with_unicode_strings` NOT LIKE '%∩%'
[11 May 2012 8:25] Valeriy Kravchuk
I think this is a duplicate/yet another case of Bug #30241. Please, check.
[11 May 2012 15:22] David Dykstra
This Bug is confirmed as yet another duplicate of Bug #30241. Please note the many requests for the enhancement of the REGEXP system.
[15 May 2012 16:23] Valeriy Kravchuk
Duplicate of Bug #30241.