Bug #746 adding "SIMILAR TO" as an alias for RLIKE
Submitted: 28 Jun 2003 6:28 Modified: 14 Jul 2003 16:00
Reporter: Daniel Penning Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S4 (Feature request)
Version: OS:
Assigned to: Peter Gulutzan CPU Architecture:Any

[28 Jun 2003 6:28] Daniel Penning
"SIMILAR TO" should be supported as it is the SQL-99 syntax for regexp matching.

SELECT * FROM foo WHERE bar SIMILAR TO '[a-d].*'

How to repeat:
[14 Jul 2003 16:00] Peter Gulutzan
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

SIMILAR TO is indeed a standard-SQL feature, so it is on our list of things to do eventually. 
But is a non-core (i.e. optional) feature, so we assign it a lower priority than other things. 
In any case, the suggestion -- "add SIMILAR TO as an alias for RLIKE" -- makes the task 
seem simpler than it is. Here are some reasons why (I use the word REGEXP instead of 
RLIKE in what follows). 
We currently support REGEXP (see http://www.mysql.com/doc/en/Regexp.html). 
It would be nice to support the standard-SQL predicate 
SIMILAR TO, which is non-core Feature T141. 
REGEXP and SIMILAR TO do the same thing: look for 
patterns in strings. However, I believe that we 
cannot just add SIMILAR TO as an alias for REGEXP. 
There are syntax differences. They're small, but 
enough to make REGEXP and SIMILAR TO incompatible. 
The specifications for REGEXP are far more extensive 
than the specifications for SIMILAR TO, so I've just 
listed here the REGEXP behaviour that would prohibit 
SIMILAR TO from working according to the SQL standard, 
if it was a REGEXP alias. 
The differences I've noticed are: 
1. Is the set of special match characters the same? 
   The problem here is that SIMILAR TO, like LIKE, 
   allows '%' and '_' for wildcarding. REGEXP does 
   not. REGEXP assumes it. For example: 
   SIMILAR TO:         'rain' SIMILAR TO 'a' is FALSE 
   REGEXP:             'rain' REGEXP 'a' is TRUE 
   SIMILAR TO:         'rain' SIMILAR TO '_a%' is TRUE 
   REGEXP:             'rain' REGEXP '_a%' is FALSE 
2. What happens if the pattern argument is ''? 
   SIMILAR TO:        finds nothing 
   REGEXP:            error 
3. What happens if the pattern argument has a trailing ' '? 
   SIMILAR TO:        'a b' SIMILAR TO '[a] %' is TRUE 
   REGEXP:            'a b' REGEXP '[a] ' is FALSE 
4. Are regular character set identifiers the same? 
   Regular character set identifiers (called "character classes" 
   in the MySQL manual) are predefined designations for 
   alphabetic, numeric, etc. For example, in the string 
   '[[:alnum:]]', alnum is an identifier meaning "any 
   alphanumeric character is okay". 
   With SIMILAR TO, the identifiers must be upper case. 
   With REGEXP, the identifiers must be lower case. 
   Thus ':ALNUM:' is compulsory for SIMILAR TO but illegal for REGEXP. 
   This is trivial, we just have to convert identifiers to lower case. 
   REGEXP has:     alpha, upper, lower, digit, alnum, space 
5. How do you escape if you want to match against a special character? 
   SIMILAR TO:        this is decided by the optional clause ESCAPE 'e' 
   REGEXP:            you have to enclose in square brackets 
6. Does collation have the same effect? 
   Consider an expression, using the default (Swedish) collation: 
   SIMILAR TO:        'ü' similar to '[y-z]' is TRUE 
   REGEXP:            'ü' regexp '[y-z]'     is FALSE