| Bug #37413 | MySQL does not recognise diacritics (accents) when performing SELECT | ||
|---|---|---|---|
| Submitted: | 15 Jun 2008 14:18 | Modified: | 16 Jun 2008 11:27 |
| Reporter: | Ken DV | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.0.45 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | accents, diacritics | ||
[15 Jun 2008 14:18]
Ken DV
[15 Jun 2008 19:44]
Peter Laursen
I think you should understand COLLATIONS: See for instance how two different german collations handle UMLAUT characters: http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html (and related paragraphs. set collation_connection = utf8_general_ci; SELECT 'éáú' = 'eau'; -- returns '1' set collation_connection = utf8_unicode_ci; SELECT 'éáú' = 'eau'; -- returns '0' set collation_connection = utf8_danish_ci; SELECT 'éáú' = 'eau'; -- returns '1' set collation_connection = utf8_bin; SELECT 'éáú' = 'eau'; -- returns '0' set collation_connection = utf8_general_ci; -- note now that for a specific statement you can collate as you want -- and ignore collation_connection setting SELECT 'éáú' = 'eau' collate utf8_bin; -- returns '0' I will only comment on the example with utf8_danish_ci. This is correct behaviour as e is alphabetized like é in Danish. What is the case with other COLLATIONS designed to match other natural languages I do not know - there might be some where they are alphabetized differently (but I have pointed out before that COLLLATIONS provided with MySQL are very rudimentary - for instance _cs (case ensitive) collations are largely missing!). Anyway you can 1) use the " .. collate utf8_bin" clause in string comparisons. 2) create tables (or even colums) with a "COLLATE utf8_bin" clause for those data where no 'natural language rules' but 'byte value rules' should apply. .. if you do not want to apply any 'natural language' alphabetisation rules. Peter (not a MySQL person)
[15 Jun 2008 19:46]
Peter Laursen
oops .. 3rd-4th line in the example should read set collation_connection = utf8_unicode_ci; SELECT 'éáú' = 'eau'; -- returns '1'
[15 Jun 2008 20:30]
Peter Laursen
yes ... SELECT 'éáú' = 'eau' collate utf8_bin; -- or use 'collate latin1_bin' if you have latin1 data This will use a BINARY collation where no accented and nonaccented characters are considered identical (no matter what is the current charset/collation setting for the connection). With a BIANRY collation every character as identified by its byte/octet pattern only. Non-binary colllations implement 'natural language rules' for string comparison, filtering, ordering, grouping etc (to the extend that proper collations are available) Anayway .. you should use a couple of hours understanding this paragraph http://dev.mysql.com/doc/refman/5.0/en/charset-charsets.html .. in the MySQL manual, if you are new to MySQL 4.1 or higher!
[15 Jun 2008 20:31]
Peter Laursen
or set collation_connection = utf8_bin; SELECT 'éáú' = 'eau'; -- returns '0'
[16 Jun 2008 11:27]
MySQL Verification Team
Thank you for the bug report. Please see the several comments form Peter.
