Bug #1624 regexp bug
Submitted: 22 Oct 2003 7:46 Modified: 16 Dec 2003 0:39
Reporter: Eugene Toder Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:ver 12.21 Distrib 4.0.15 OS:FreeBSD (freebsd)
Assigned to: Alexander Barkov CPU Architecture:Any

[22 Oct 2003 7:46] Eugene Toder
Description:
I have found that regexp '[a-b]' (where 'a' and 'b' are placeholders for real characters) matches characters that satisfy:
    ord('a') <= ord(x) <= ord('b'),
whereas the proper behavior should be matching characters
    'a' <= x <= 'b',
where comparasion is done using character-set specific function. (That it, regexp uses character codes instead of their natural ordering).

How to repeat:
The difference is important when using character set like koi8, which has a brain damaged characters ordering. For example, set charset to koi8 and try
regexp '[à-ÿ]' (first and last russian letters). The results are far from expected.
[15 Dec 2003 22:56] Alexander Barkov
Unfortunately, the regular expression library is far from being excellent.
We decided not to fix this bugs in the real future. 
Regex library supports only case sensitivity. It does not support
all MySQL collation features, and one of unsupported features is
ranges for tricky letter ordering as in koi8r.  As a workaround,
you can specify all Cyrillic characters instead of a range,
i.e. [abcdef] instead of [a-f].
[16 Dec 2003 0:39] Sergei Golubchik
just a note: "We decided not to fix this bugs in the near future" means that we will not fix this bug in this regex library.

Instead we plan to switch to another regex library - that is better, faster, and works correctly with different charsets.

Timeframe for the switch is stil unknown :(