Bug #36043 | UTF-8 string is broken | ||
---|---|---|---|
Submitted: | 13 Apr 2008 22:03 | Modified: | 16 Apr 2008 17:09 |
Reporter: | Victor Porton | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: General | Severity: | S2 (Serious) |
Version: | 5.0.51a | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[13 Apr 2008 22:03]
Victor Porton
[16 Apr 2008 16:32]
Susanne Ebrecht
Many thanks for reporting a bug. This is not a bug. Lemme try to explain: Your database: UTF8 Let's say as example: mysql> show variables like '%char%' character_set_client Latin1 Now you want to insert data: Your terminal or your environment use UTF8 encoding. You type some text here. Because character_set_client is set to Latin1 the mysql daemon know thinks that it will get Latin1 data and will recode them to UTF8. You can see if this happens by using length(). UTF8 needs 1-4 bytes for one sign and Latin1 always just need one sign. Just insert into a text field letters like 'ä' or 'ö'. Then do: SELECT LENGTH(my_text_field) FROM my_database; These letter need 2 bytes in UTF8. When you now won't get 2 bytes for one of these letters then your data are stored wrong in your database. When you will get 1 or 4 bytes then this is wrong. When this is happened, that the data already stored wrong at the database then you don't have a chance to repair it easily. You have to dump the data to recode them manually and to upload them again. To avoid this: Look that your environment encoding (terminal encoding or whatever) will match to the character_set_client variable. Means on Unix/Linux: Use ISO-8859-15 for Latin1 and UTF8 for UTF8. The other way to avoid it is: Look what you use as environment encoding and then set the server variables to that environment encoding. For example: when your environment is using utf8 then make a "SET NAMES UTF8" before you will start inserting data. The set names command will set some other needful variables too. When your data are stored right at the database, then the problem isn't so difficult to solve. When you will get curious stuff out of the database then you have to look if character_set_client and your environment encoding will match. If character_set_client is Latin1 and your environment is UTF8 then switch your environment to ISO-8859-15. The other possiblitiy is here to make a "SET NAMES UTF8" to change character_set_client and some other necessary variables to UTF8.
[16 Apr 2008 16:54]
Peter Laursen
@both of you .. to this "When this is happened, that the data already stored wrong at the database then you don't have a chance to repair it easily. You have to dump the data to recode them manually and to upload them again." there IS an easier way refer to: http://webyog.com/faq/content/34/152/en/my-special-characters-display-as-_noisy_-latin-cha... "So basically what you need to do is to perform changes (ALTER TABLE) in two operations like: latin1 varchar >> varbinary (or BLOB) >> utf8 varchar (and similar for char and TEXT types)... for every encoded column" And the same the other way around (utf8 >> latin1 or whatever) of course! @Susanne: how would you *in practice* do this recoding of a dump? Why not use the *knowledge* of the MySQL server about encodings? 'pushing in' a non-encoded step is the solution!
[16 Apr 2008 17:09]
Victor Porton
I already have UTF-8 set database, indeed it does not work. character_set_client utf8 character_set_connection utf8 character_set_database binary character_set_filesystem binary character_set_results utf8 character_set_server cp1251 character_set_system utf8
[28 Apr 2008 15:20]
Susanne Ebrecht
To figure out, which character set your database is using you should use the command: SHOW CREATE DATABASE dbname; Also SHOW CREATE TABLE table_name;