Bug #746 | adding "SIMILAR TO" as an alias for RLIKE | ||
---|---|---|---|
Submitted: | 28 Jun 2003 6:28 | Modified: | 14 Jul 2003 16:00 |
Reporter: | Daniel Penning | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | OS: | ||
Assigned to: | Peter Gulutzan | CPU Architecture: | Any |
[28 Jun 2003 6:28]
Daniel Penning
[14 Jul 2003 16:00]
Peter Gulutzan
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php SIMILAR TO is indeed a standard-SQL feature, so it is on our list of things to do eventually. But is a non-core (i.e. optional) feature, so we assign it a lower priority than other things. In any case, the suggestion -- "add SIMILAR TO as an alias for RLIKE" -- makes the task seem simpler than it is. Here are some reasons why (I use the word REGEXP instead of RLIKE in what follows). SIMILAR TO versus REGEXP We currently support REGEXP (see http://www.mysql.com/doc/en/Regexp.html). It would be nice to support the standard-SQL predicate SIMILAR TO, which is non-core Feature T141. REGEXP and SIMILAR TO do the same thing: look for patterns in strings. However, I believe that we cannot just add SIMILAR TO as an alias for REGEXP. There are syntax differences. They're small, but enough to make REGEXP and SIMILAR TO incompatible. The specifications for REGEXP are far more extensive than the specifications for SIMILAR TO, so I've just listed here the REGEXP behaviour that would prohibit SIMILAR TO from working according to the SQL standard, if it was a REGEXP alias. The differences I've noticed are: 1. Is the set of special match characters the same? The problem here is that SIMILAR TO, like LIKE, allows '%' and '_' for wildcarding. REGEXP does not. REGEXP assumes it. For example: SIMILAR TO: 'rain' SIMILAR TO 'a' is FALSE REGEXP: 'rain' REGEXP 'a' is TRUE SIMILAR TO: 'rain' SIMILAR TO '_a%' is TRUE REGEXP: 'rain' REGEXP '_a%' is FALSE 2. What happens if the pattern argument is ''? SIMILAR TO: finds nothing REGEXP: error 3. What happens if the pattern argument has a trailing ' '? SIMILAR TO: 'a b' SIMILAR TO '[a] %' is TRUE REGEXP: 'a b' REGEXP '[a] ' is FALSE 4. Are regular character set identifiers the same? Regular character set identifiers (called "character classes" in the MySQL manual) are predefined designations for alphabetic, numeric, etc. For example, in the string '[[:alnum:]]', alnum is an identifier meaning "any alphanumeric character is okay". With SIMILAR TO, the identifiers must be upper case. With REGEXP, the identifiers must be lower case. Thus ':ALNUM:' is compulsory for SIMILAR TO but illegal for REGEXP. This is trivial, we just have to convert identifiers to lower case. SIMILAR TO has: ALPHA, UPPER, LOWER, DIGIT, ALNUM, SPACE, WHITESPACE REGEXP has: alpha, upper, lower, digit, alnum, space 5. How do you escape if you want to match against a special character? SIMILAR TO: this is decided by the optional clause ESCAPE 'e' REGEXP: you have to enclose in square brackets 6. Does collation have the same effect? Consider an expression, using the default (Swedish) collation: SIMILAR TO: 'ü' similar to '[y-z]' is TRUE REGEXP: 'ü' regexp '[y-z]' is FALSE