Bug #12227 | select * from x where col like '%ä%' doesn't work with UTF8 | ||
---|---|---|---|
Submitted: | 27 Jul 2005 20:54 | Modified: | 29 Jul 2005 6:57 |
Reporter: | Markus Popp | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | 4.1.13, 5.0.10 | OS: | Windows (Windows, Linux) |
Assigned to: | CPU Architecture: | Any |
[27 Jul 2005 20:54]
Markus Popp
[28 Jul 2005 12:14]
Marko Mäkelä
Are you sure that the non-ASCII characters in the script are encoded in utf8 and not e.g. latin1? Do something like SELECT HEX('ü') in the script and compare the result against a hex dump (od -t x1) of the script.
[28 Jul 2005 12:25]
Markus Popp
I've tested the following in MySQL query browser. When I make 'show variables', all character settings are 'utf8' and collation is generally 'utf8_general_ci'. Also my 'waehrungen' table is set to utf8, so nothing else should be involved here anywhere. select hex('ü'); returns 'C3BC'. If I extract the 'ü' from 'Türkei' (from the 'waehrungen' table) and put it into the hex-function: SELECT hex(substring(land, 2, 1)) FROM waehrungen I also get 'C3BC'.
[28 Jul 2005 13:04]
Marko Mäkelä
If you submit the query to the query browser, will it then work properly? If yes, then it is probable that the character set has been misconfigured on the PHP side. Can you execute those queries in a PHP script?
[28 Jul 2005 15:14]
Markus Popp
No, I've tested those examples with QueryBrowser and it returns both rows, although I should only get 'Türkei'.
[29 Jul 2005 6:57]
Marko Mäkelä
Sorry, I didn't read the original description carefully enough. It wasn't a character encoding issue after all. In the default collation of utf8, the umlauted characters are equivalent to the non-umlaut ones. You can force an umlaut-sensitive collation, e.g., like this: CREATE TABLE `waehrungen` ( `bezeichnung` varchar(50) NOT NULL default '', `land` varchar(50) NOT NULL default '' collate utf8_bin, `iso` varchar(3) NOT NULL default '', `waehrung` varchar(100) NOT NULL default '', PRIMARY KEY (`bezeichnung`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; See the documentation for possible collations. utf8_bin is most probably not what you really want.