Bug #71779 regexp :upper: should work in case insensitive fields normally.
Submitted: 20 Feb 2014 13:48 Modified: 24 Jul 2014 19:40
Reporter: Sergio Abreu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: case insensitive, REGEXP, upper

[20 Feb 2014 13:48] Sergio Abreu
Description:
I was trying to select nicknames in a database that had some of then containing Capital LETTERS.
As the field collation is case insensitive, the REGEXP function with [[:upper:]] doesn't work.
You will say that I have to change it to case-sensitive to solve the problem.
BUT...
If MySql had a better intelligence, using REGEXP '[[:upper:]]+' SHOULD check for capital LETTERS normally, independent of the field character colation being case sensitive or case insensitive.

How to repeat:
Do a select on a case insensitive field using regexp with [[:upper:]] and you will see that this regexp is ignored, or does not work.

Suggested fix:
Making regexp function work independent of the case-sensitive aspect of the field.
[20 Feb 2014 14:16] Peter Laursen
I don't think it is related to collation being case sensitive or case insensitive (in other words, I belive you would have exactly the same issue with a CS or binary collation).

As far as I remember from a previous discussion here, the implementation of regular expression simply does not check for and do not consider case sensitivity of collations.  In my understanding they are also incompatible. 

However I find (to my pleasant surprise) that you may use constructions like

SELECT 'abc' REGEXP 'a' COLLATE utf8_bin; -- returns 1/true
SELECT 'abc' REGEXP 'A' COLLATE utf8_bin; -- returns 0/false
(where it is not the regular expression that handles the case sensivity, but the COLLATE clause does)

.. and this could be a point to add to documentation somewhere, I think actually. 

Peter
(not a MySQL/Oracle person)
[20 Feb 2014 14:34] Sergio Abreu
Thank you for the quick response, and I found an easier way:
Just have to add BINARY in the query.

Select .... from ... where  info REGEXP BINARY '[[:upper:]]+'

but IMHO, I think the BINARY behavior of REGEXP function "should the DEFAULT BEHAVIOR" because it is BAD that REGEXP '[A-Z]' matches 'abcde' that are all lower case...  and that "LOWER(field) REGEXP '[[:upper:]]+'" matches anything, because there should not found any CAPITAL LETTER inside a LOWER converted field!
So, my comment is:
MySql REGEXP should use the BINARY behavior BY DEFAULT, and to do a case-insensitive search, there could be add different command... 
Because Regular expressions should be very restrictive and precise when looking for patterns. It is a fault of MySql, in my oppinion, that we have to use REGEXP BINARY to accomplish the correct behavior of REGEXP...
[20 Feb 2014 14:37] Sergio Abreu
I think it does get related with the field collation.
Because when I created a general_cs field to test the same SELECT, the REGEXP behavior was exactly the REGEXP BINARY one.
[24 Jul 2014 19:40] Sveta Smirnova
Thank you for the report.

Closing as "Not a bug", because last comment.