Bug #75438 | REGEXP is case sensitive for most russian letters | ||
---|---|---|---|
Submitted: | 7 Jan 2015 18:21 | Modified: | 25 Nov 2019 22:15 |
Reporter: | Павел Сикорский | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S3 (Non-critical) |
Version: | 5.1.68, 5.6.20, 5.6.24 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | REGEXP, russian, utf8_unicode_ci |
[7 Jan 2015 18:21]
Павел Сикорский
[7 Jan 2015 18:32]
Павел Сикорский
tags changed
[7 Jan 2015 19:42]
Peter Laursen
Trying with Danish character 'æ': SELECT @v:='æ' COLLATE utf8_unicode_ci letter, UCASE(@v) REGEXP LCASE(@v) RUvsL, LCASE(@v) REGEXP UCASE(@v) RLvsU, UCASE(@v) LIKE LCASE(@v) LUvsL, LCASE(@v) LIKE UCASE(@v) LLvsU, UCASE(@v), LCASE(@v); /* retruns letter RUvsL RLvsU LUvsL LLvsU UCASE(@v) LCASE(@v) ------ ------ ------ ------ ------ --------- ----------- æ 0 1 1 1 Æ æ */ I think that regular expressions have serious issues with non-ASCII characters. I remember I bumped into something similar before. -- Peter -- not an Oracle/MySQL person
[7 Jan 2015 19:47]
Peter Laursen
see also http://bugs.mysql.com/bug.php?id=63439 (umlauts as in Swedish, German, Hungarian etc.) http://bugs.mysql.com/bug.php?id=30241 (hebrew)
[8 Jan 2015 7:40]
MySQL Verification Team
Hello Павел Сикорский, Thank you for the report and test case. Thanks, Umesh
[8 Jan 2015 7:42]
MySQL Verification Team
// 5.6.24 mysql> SELECT -> @v:='a' COLLATE utf8_unicode_ci letter, -> UCASE(@v) REGEXP LCASE(@v) RUvsL, -> LCASE(@v) REGEXP UCASE(@v) RLvsU, -> UCASE(@v) LIKE LCASE(@v) LUvsL, -> LCASE(@v) LIKE UCASE(@v) LLvsU, -> UCASE(@v), LCASE(@v); +--------+-------+-------+-------+-------+-----------+-----------+ | letter | RUvsL | RLvsU | LUvsL | LLvsU | UCASE(@v) | LCASE(@v) | +--------+-------+-------+-------+-------+-----------+-----------+ | a | 1 | 1 | 1 | 1 | A | a | +--------+-------+-------+-------+-------+-----------+-----------+ 1 row in set (0.00 sec) mysql> SELECT -> @v:='п' COLLATE utf8_unicode_ci letter, -> UCASE(@v) REGEXP LCASE(@v) RUvsL, -> LCASE(@v) REGEXP UCASE(@v) RLvsU, -> UCASE(@v) LIKE LCASE(@v) LUvsL, -> LCASE(@v) LIKE UCASE(@v) LLvsU, -> UCASE(@v), LCASE(@v); +--------+-------+-------+-------+-------+-----------+-----------+ | letter | RUvsL | RLvsU | LUvsL | LLvsU | UCASE(@v) | LCASE(@v) | +--------+-------+-------+-------+-------+-----------+-----------+ | п | 0 | 1 | 1 | 1 | П | п | +--------+-------+-------+-------+-------+-----------+-----------+ 1 row in set (0.00 sec) mysql> show variables like '%version%'; +-------------------------+---------------------------------------------------------+ | Variable_name | Value | +-------------------------+---------------------------------------------------------+ | innodb_version | 5.6.24 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.6.24-enterprise-commercial-advanced | | version_comment | MySQL Enterprise Server - Advanced Edition (Commercial) | | version_compile_machine | x86_64 | | version_compile_os | linux-glibc2.5 | +-------------------------+---------------------------------------------------------+ 7 rows in set (0.00 sec) mysql> \s -------------- bin/mysql Ver 14.14 Distrib 5.6.24, for linux-glibc2.5 (x86_64) using EditLine wrapper Connection id: 1 Current database: test Current user: root@localhost SSL: Not in use Current pager: more Using outfile: '' Using delimiter: ; Server version: 5.6.24-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial) Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: latin1 Db characterset: latin1 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /tmp/75438.sock Uptime: 4 min 39 sec Threads: 1 Questions: 12 Slow queries: 0 Opens: 67 Flush tables: 1 Open tables: 60 Queries per second avg: 0.043 --------------
[25 Nov 2019 22:15]
Roy Lyseng
Posted by developer: Fixed in 8.0 with new REGEXP implementation