| Bug #1624 | regexp bug | ||
|---|---|---|---|
| Submitted: | 22 Oct 2003 7:46 | Modified: | 16 Dec 2003 0:39 |
| Reporter: | Eugene Toder | Email Updates: | |
| Status: | Won't fix | Impact on me: | |
| Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
| Version: | ver 12.21 Distrib 4.0.15 | OS: | FreeBSD (freebsd) |
| Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[15 Dec 2003 22:56]
Alexander Barkov
Unfortunately, the regular expression library is far from being excellent. We decided not to fix this bugs in the real future. Regex library supports only case sensitivity. It does not support all MySQL collation features, and one of unsupported features is ranges for tricky letter ordering as in koi8r. As a workaround, you can specify all Cyrillic characters instead of a range, i.e. [abcdef] instead of [a-f].
[16 Dec 2003 0:39]
Sergei Golubchik
just a note: "We decided not to fix this bugs in the near future" means that we will not fix this bug in this regex library. Instead we plan to switch to another regex library - that is better, faster, and works correctly with different charsets. Timeframe for the switch is stil unknown :(

Description: I have found that regexp '[a-b]' (where 'a' and 'b' are placeholders for real characters) matches characters that satisfy: ord('a') <= ord(x) <= ord('b'), whereas the proper behavior should be matching characters 'a' <= x <= 'b', where comparasion is done using character-set specific function. (That it, regexp uses character codes instead of their natural ordering). How to repeat: The difference is important when using character set like koi8, which has a brain damaged characters ordering. For example, set charset to koi8 and try regexp '[à-ÿ]' (first and last russian letters). The results are far from expected.