| Bug #75438 | REGEXP is case sensitive for most russian letters | ||
|---|---|---|---|
| Submitted: | 7 Jan 2015 18:21 | Modified: | 25 Nov 2019 22:15 |
| Reporter: | Павел Сикорский | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S3 (Non-critical) |
| Version: | 5.1.68, 5.6.20, 5.6.24 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | REGEXP, russian, utf8_unicode_ci | ||
[7 Jan 2015 18:21]
Павел Сикорский
[7 Jan 2015 18:32]
Павел Сикорский
tags changed
[7 Jan 2015 19:42]
Peter Laursen
Trying with Danish character 'æ': SELECT @v:='æ' COLLATE utf8_unicode_ci letter, UCASE(@v) REGEXP LCASE(@v) RUvsL, LCASE(@v) REGEXP UCASE(@v) RLvsU, UCASE(@v) LIKE LCASE(@v) LUvsL, LCASE(@v) LIKE UCASE(@v) LLvsU, UCASE(@v), LCASE(@v); /* retruns letter RUvsL RLvsU LUvsL LLvsU UCASE(@v) LCASE(@v) ------ ------ ------ ------ ------ --------- ----------- æ 0 1 1 1 Æ æ */ I think that regular expressions have serious issues with non-ASCII characters. I remember I bumped into something similar before. -- Peter -- not an Oracle/MySQL person
[7 Jan 2015 19:47]
Peter Laursen
see also http://bugs.mysql.com/bug.php?id=63439 (umlauts as in Swedish, German, Hungarian etc.) http://bugs.mysql.com/bug.php?id=30241 (hebrew)
[8 Jan 2015 7:40]
MySQL Verification Team
Hello Павел Сикорский, Thank you for the report and test case. Thanks, Umesh
[8 Jan 2015 7:42]
MySQL Verification Team
// 5.6.24
mysql> SELECT
-> @v:='a' COLLATE utf8_unicode_ci letter,
-> UCASE(@v) REGEXP LCASE(@v) RUvsL,
-> LCASE(@v) REGEXP UCASE(@v) RLvsU,
-> UCASE(@v) LIKE LCASE(@v) LUvsL,
-> LCASE(@v) LIKE UCASE(@v) LLvsU,
-> UCASE(@v), LCASE(@v);
+--------+-------+-------+-------+-------+-----------+-----------+
| letter | RUvsL | RLvsU | LUvsL | LLvsU | UCASE(@v) | LCASE(@v) |
+--------+-------+-------+-------+-------+-----------+-----------+
| a | 1 | 1 | 1 | 1 | A | a |
+--------+-------+-------+-------+-------+-----------+-----------+
1 row in set (0.00 sec)
mysql> SELECT -> @v:='п' COLLATE utf8_unicode_ci letter,
-> UCASE(@v) REGEXP LCASE(@v) RUvsL,
-> LCASE(@v) REGEXP UCASE(@v) RLvsU,
-> UCASE(@v) LIKE LCASE(@v) LUvsL,
-> LCASE(@v) LIKE UCASE(@v) LLvsU,
-> UCASE(@v), LCASE(@v);
+--------+-------+-------+-------+-------+-----------+-----------+
| letter | RUvsL | RLvsU | LUvsL | LLvsU | UCASE(@v) | LCASE(@v) |
+--------+-------+-------+-------+-------+-----------+-----------+
| п | 0 | 1 | 1 | 1 | П | п |
+--------+-------+-------+-------+-------+-----------+-----------+
1 row in set (0.00 sec)
mysql> show variables like '%version%';
+-------------------------+---------------------------------------------------------+
| Variable_name | Value |
+-------------------------+---------------------------------------------------------+
| innodb_version | 5.6.24 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.24-enterprise-commercial-advanced |
| version_comment | MySQL Enterprise Server - Advanced Edition (Commercial) |
| version_compile_machine | x86_64 |
| version_compile_os | linux-glibc2.5 |
+-------------------------+---------------------------------------------------------+
7 rows in set (0.00 sec)
mysql> \s
--------------
bin/mysql Ver 14.14 Distrib 5.6.24, for linux-glibc2.5 (x86_64) using EditLine wrapper
Connection id: 1
Current database: test
Current user: root@localhost
SSL: Not in use
Current pager: more
Using outfile: ''
Using delimiter: ;
Server version: 5.6.24-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /tmp/75438.sock
Uptime: 4 min 39 sec
Threads: 1 Questions: 12 Slow queries: 0 Opens: 67 Flush tables: 1 Open tables: 60 Queries per second avg: 0.043
--------------
[25 Nov 2019 22:15]
Roy Lyseng
Posted by developer: Fixed in 8.0 with new REGEXP implementation
