Bug #29590 | Request to add support for back-references in regular expressions | ||
---|---|---|---|
Submitted: | 6 Jul 2007 5:30 | Modified: | 6 Jul 2007 18:45 |
Reporter: | Robbie Haertel | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: General | Severity: | S4 (Feature request) |
Version: | OS: | Any | |
Assigned to: | CPU Architecture: | Any |
[6 Jul 2007 5:30]
Robbie Haertel
[6 Jul 2007 9:48]
Sergei Golubchik
To my surprise, back references are NOT part of the POSIX standard, at least according to "man 7 regex" : ..... DESCRIPTION Regular expressions (``RE''s), as defined in POSIX.2, come in two forms: modern REs (roughly those of egrep; POSIX.2 calls these ``extended'' REs) and obsolete REs (roughly those of ed(1); POSIX.2 ``basic'' REs). Obsolete REs mostly exist for backward compatibility in some old programs; they will be discussed at the end. ..... Obsolete (``basic'') regular expressions differ in several respects. ..... parenthesized subexpression (after a possible leading `^'). Finally, there is one new type of atom, a back reference: `\' followed by a non- zero decimal digit d matches the same sequence of characters matched by ..... As you can see back references are only supported in *basic* REs, not in the extended REs. Henry Spencer regex library, that we use in MySQL, supports back references, but only in basic RE mode. MySQL uses extended REs.
[6 Jul 2007 18:45]
Robbie Haertel
Perhaps there could be an option (either at compile-time or run-time, but preferably the latter) to choose between the basic and extended syntax???
[6 Jul 2007 19:20]
Sergei Golubchik
Yes, this is a possibility. My choice would be, though, to move to a completely different regexp library, more powerful, and with support for multi-byte character sets :)