Bug #29590 Request to add support for back-references in regular expressions
Submitted: 6 Jul 2007 5:30 Modified: 6 Jul 2007 18:45
Reporter: Robbie Haertel Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: General Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any

[6 Jul 2007 5:30] Robbie Haertel
Description:
I hereby request that back-references be allowed in REGEXP expressions (note: this is different that other feature requests for a regular expression-based subsitution). Not only is it a part of the POSIX standard, but many other database systems support it, e.g. Oracle and PostgreSQL.

Here is an example query:

SELECT word
FROM words
WHERE word REGEXP '([aeiou])[^aeiou]\1';

(Example hits: ADA, NASA, etc.)

Thanks for this consideration.

How to repeat:
Try the example query; doesn't return (correct) results
[6 Jul 2007 9:48] Sergei Golubchik
To my surprise, back references are NOT part of the POSIX standard, at least according to "man 7 regex" :
.....
DESCRIPTION
       Regular  expressions  (``RE''s),  as  defined  in  POSIX.2, come in two
       forms:  modern  REs  (roughly  those  of  egrep;  POSIX.2  calls  these
       ``extended''  REs)  and  obsolete  REs (roughly those of ed(1); POSIX.2
       ``basic'' REs).  Obsolete REs mostly exist for  backward  compatibility
       in  some  old  programs;  they  will  be discussed at the end.
.....
       Obsolete  (``basic'')  regular  expressions differ in several respects.
.....
       parenthesized  subexpression  (after a possible leading `^').  Finally,
       there is one new type of atom, a back reference: `\' followed by a non-
       zero decimal digit d matches the same sequence of characters matched by
.....

As you can see back references are only supported in *basic* REs, not in the extended REs.

Henry Spencer regex library, that we use in MySQL, supports back references, but only in basic RE mode. MySQL uses extended REs.
[6 Jul 2007 18:45] Robbie Haertel
Perhaps there could be an option (either at compile-time or run-time, but preferably the latter) to choose between the basic and extended syntax???
[6 Jul 2007 19:20] Sergei Golubchik
Yes, this is a possibility.
My choice would be, though, to move to a completely different regexp library, more powerful, and with support for multi-byte character sets :)