Bug #3267 table data is broken after ALTER TABLE CHARACTER SET
Submitted: 22 Mar 2004 22:28 Modified: 1 Apr 2004 17:47
Reporter: Andrey Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:4.1.2 OS:Windows (Win2k)
Assigned to: Paul DuBois CPU Architecture:Any

[22 Mar 2004 22:28] Andrey
Description:
Server running in LATIN1 charset.
Database charset is LATIN1 too.
For your information, I cite result of "show variables like '%char%'":
----------------------------
character_set_server	latin1
character_set_system	utf8
character_set_database	latin1
character_set_client	latin1
character_set_connectionlatin1
character-sets-dir	c:/inet/mysql/4.1.2-alpha-2004-02-18/share/charsets
character_set_results	latin1
-----------------------------

I create table, fill it with data in my native language then change table charset. Data become broken:

CREATE TABLE chartest(a char(1));
INSERT INTO chartest VALUES ('à')..('ÿ') # russion alphabet ...
ALTER TABLE chartest CHARACTER SET cp1251;
SELECT * FROM chartest;
# select returns set of "?".

In MYD file _all_ records replaced by "?" (char code 0x3F). 
May be it's not a bug, but this behavior looks like not normal.

I don't test case when ALTER runs before INSERT, because i already have data in tables and I want to change table charset. I looked for information in doc's but there are no solution suggested ...

How to repeat:
[23 Mar 2004 4:08] Alexander Keremidarski
This is in fact expected behaviour, but unfortunately not well decumented. Therefore I am changing the Category and assigning it to Docs team.

The root of the problem is:

mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT CONVERT('абв' USING cp1251); -- 'абв' in bulgarian alphabet 
+---------------------------+
| CONVERT('а' USING cp1251) |
+---------------------------+
| ???                       |
+---------------------------+

This is because character set conversion between latin1 and cp1251 is impossible.

The workaround is to convert to BINARY first which in 4.1 means "No character set" and strings are treate as sequence of bytes. Then convert from BINARY to cp1251

mysql> SELECT CONVERT('абв' USING BINARY); -- result is without charset 
+-----------------------------+
| CONVERT('абв' USING BINARY) |
+-----------------------------+
| абв                         |
+-----------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT CONVERT(CONVERT('абв' USING BINARY) USING cp1251);
+---------------------------------------------------+
| CONVERT(CONVERT('абв' USING BINARY) USING cp1251) |
+---------------------------------------------------+
| абв                                               |
+---------------------------------------------------+
[23 Mar 2004 4:21] Andrey
Ok. I've got your idea. 

As I can see, data in my databases stored in cp1251 (inspite of server charset is latin1). And this work very well except ORDER BY'ing and LIKE'ing.
Can I change _table_ charset into BINARY and then into CP1251? I think in this case data should be unchanged. 
Or may be there are some way to just change table charset only in header, without data converting?
[1 Apr 2004 17:47] Paul DuBois
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

Additional info:

Conversion procedure now is documented at:

http://www.mysql.com/doc/en/Charset-convert.html