Bug #27782 | MYSQL SOUNDEX collation, utf8_hungarian_ci shows false positive | ||
---|---|---|---|
Submitted: | 12 Apr 2007 12:43 | Modified: | 4 Sep 2007 16:54 |
Reporter: | Laszlo KAROLYI | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.0.26, 5.0.33, 5.0.37 (latest) too | OS: | Linux (gentoo and debian) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | soundex, utf8_hungarian_ci |
[12 Apr 2007 12:43]
Laszlo KAROLYI
[12 Apr 2007 12:45]
Laszlo KAROLYI
sorry, the workaround query really looks like this: select * from proba where SUBSTR(SOUNDEX(duma), 2) = SUBSTR(SOUNDEX('Ápr. 6.án elköltöztünk. De a postán át lett irányíttatva az érkező posta ill. az új tulaj is szólna. Nem hiszem, hogy a kerületben kavarodott volna el, postásunk nagyon megbízható. Ajánlva adtátok fel?'), 2) i forgot the secons SOUNDEX.
[13 Apr 2007 11:34]
Sveta Smirnova
Thank you for the report. I can not repeat it with current sources. Please try with current 5.0.37 version and if you can repeat provide output of statements SHOW VARIABLES LIKE '%coll%'; and SHOW VARIABLES LIKE '%char%';
[17 Apr 2007 10:37]
Laszlo KAROLYI
Hello Sveta, I have compiled 5.0.37, and it seems like the problem is still not solved. Here are the settings you asked for: mysql> SHOW VARIABLES LIKE '%coll%'; +----------------------+---------------------+ | Variable_name | Value | +----------------------+---------------------+ | collation_connection | latin2_hungarian_ci | | collation_database | latin2_hungarian_ci | | collation_server | latin2_hungarian_ci | +----------------------+---------------------+ 3 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin2 | | character_set_connection | latin2 | | character_set_database | latin2 | | character_set_filesystem | binary | | character_set_results | latin2 | | character_set_server | latin2 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) The settings are from the mysql command line, but in the code i use 'SET NAMES utf8'. After that command, the outputs are looking like this: mysql> SHOW VARIABLES LIKE '%coll%'; +----------------------+---------------------+ | Variable_name | Value | +----------------------+---------------------+ | collation_connection | utf8_general_ci | | collation_database | latin2_hungarian_ci | | collation_server | latin2_hungarian_ci | +----------------------+---------------------+ 3 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin2 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin2 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) Regards, Laszlo
[17 Apr 2007 10:39]
Laszlo KAROLYI
Sorry, i forgot to change the database too. Here are the outputs after changing the database: mysql> use crxforum; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> set names utf8; Query OK, 0 rows affected (0.43 sec) mysql> SHOW VARIABLES LIKE '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin2 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE '%coll%'; +----------------------+---------------------+ | Variable_name | Value | +----------------------+---------------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_hungarian_ci | | collation_server | latin2_hungarian_ci | +----------------------+---------------------+ 3 rows in set (0.36 sec) mysql> -- Laszlo
[17 Apr 2007 20:12]
Sveta Smirnova
test case options
Attachment: bug27782-master.opt (application/octet-stream, text), 69 bytes.
[17 Apr 2007 20:12]
Sveta Smirnova
test case
Attachment: bug27782.test (application/octet-stream, text), 1.45 KiB.
[17 Apr 2007 20:13]
Sveta Smirnova
Thank you for the feedback. Verified as described on Linux using attached test case. Only community tree is affected.
[4 Sep 2007 16:54]
Alexander Barkov
I believe this problem was fixed in 5.0.40 under terms of the bug#22638 "SOUNDEX broken for international characters". http://bugs.mysql.com/bug.php?id=22638 I can repeat this problem with 5.0.22, but it works fine 5.0.46. I'm closing this bug as a duplicate for #22638