Bug #28862 Extended Latin1 characters get lost in CVS engine
Submitted: 4 Jun 2007 6:02 Modified: 20 Jun 2007 0:50
Reporter: Alexander Barkov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1, 5.0 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any

[4 Jun 2007 6:02] Alexander Barkov
Description:
Extended Latin1 characters (in the range 0xC0..0xFF) get lost
after inserting into a CSV table.

How to repeat:
set names latin1;
show variables like 'have_csv';
drop table if exists t1;
create table t1 (
  c varchar(1),
  name varchar(64)
) character set latin1 engine=csv;
insert into t1 values (0xC0,'LATIN CAPITAL LETTER A WITH GRAVE');
insert into t1 values (0xE0,'LATIN SMALL LETTER A WITH GRAVE');
insert into t1 values (0xEE,'LATIN SMALL LETTER I WITH CIRCUMFLEX');
insert into t1 values (0xFE,'LATIN SMALL LETTER THORN');
insert into t1 values (0xF7,'DIVISION SIGN');
insert into t1 values (0xFF,'LATIN SMALL LETTER Y WITH DIAERESIS');
select hex(c), c, name from t1;

Output is:

+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| have_csv      | YES   |
+---------------+-------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

+--------+------+--------------------------------------+
| hex(c) | c    | name                                 |
+--------+------+--------------------------------------+
| 3F     | ?    | LATIN CAPITAL LETTER A WITH GRAVE    |
| NULL   | NULL | LATIN SMALL LETTER A WITH GRAVE      |
| NULL   | NULL | LATIN SMALL LETTER I WITH CIRCUMFLEX |
| 3F     | ?    | LATIN SMALL LETTER THORN             |
| 3F     | ?    | DIVISION SIGN                        |
| 3F     | ?    | LATIN SMALL LETTER Y WITH DIAERESIS  |
+--------+------+--------------------------------------+
6 rows in set, 4 warnings (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xC0' for column 'c' at row 0 |
| Warning | 1366 | Incorrect string value: '\xFE' for column 'c' at row 3 |
| Warning | 1366 | Incorrect string value: '\xF7' for column 'c' at row 4 |
| Warning | 1366 | Incorrect string value: '\xFF' for column 'c' at row 5 |
+---------+------+--------------------------------------------------------+
4 rows in set (0.00 sec)

Suggested fix:
Fix the CSV engine to support extended characters.
[4 Jun 2007 7:52] Sveta Smirnova
Thank you for the report.

Verified as described.
[9 Jun 2007 5:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28451

ChangeSet@1.2518, 2007-06-09 10:34:56+05:00, bar@mysql.com +3 -0
  Bug#28862 Extended Latin1 characters get lost in CVS engine
  Problem: Temporary buffer which is used for quoting and escaping
  was initialized to character set utf8, and thus didn't allow
  to store data in other character sets.
  Fix: changing character set of the buffer to be able to
  store any arbitrary sequence of bytes.
[15 Jun 2007 6:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28835

ChangeSet@1.2521, 2007-06-15 11:19:35+05:00, bar@mysql.com +3 -0
    Bug#28862 Extended Latin1 characters get lost in CVS engine
    Problem: Temporary buffer which is used for quoting and escaping
    was initialized to character set utf8, and thus didn't allow
    to store data in other character sets.
    Fix: changing character set of the buffer to be able to
    store any arbitrary sequence of bytes.
[15 Jun 2007 6:32] Alexander Barkov
Pushed into 5.0.44-engines
Pushed into 5.1.20-engines
[18 Jun 2007 7:49] Bugs System
Pushed into 5.1.20-beta
[18 Jun 2007 7:50] Bugs System
Pushed into 5.0.44
[18 Jun 2007 19:19] Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs.

Non-utf8 characters could get mangled when stored in CSV tables.
[19 Jun 2007 10:27] Daniel Fischer
Didn't make it into 5.0.44, will be in 5.0.46.
[20 Jun 2007 0:50] Paul DuBois
Moved 5.0.44 changelog entry to 5.0.46.
[12 Jul 2008 9:15] Aloke Nath
I have a MySQL Server installed with Latin-1 charset. Do I have to do anything special in my application to store names with extended characters? If I use the mysql client utlity it seems to lose these characters (stores them as ?).

Thanks,
Aloke