Bug #55113 mysqlimport should have an alternative charset option
Submitted: 9 Jul 2010 6:47
Reporter: Mikiya Okuno Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S2 (Serious)
Version: OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[9 Jul 2010 6:47] Mikiya Okuno
Description:
While LOAD DATA command reads a file content using a default database's character set and mysqlimport command set it to "binary", a file content is read using "binary" character set and collation whatever the real encoding is. This is not a problem unless the real encoding is sjis or cp932. As sjis or cp932 characters may have "\" in their second byte, sjis or cp932 strings are badly escaped if they are read as a "binary" escaped string. This is a famous 5C problem.

IMHO, we cannot avoid this "bad escape" using "binary" charset against sjis/cp932 strings. So, we should have an option on mysqlimport command so that it can set a database character set to sjis/cp932 rather than binary.

How to repeat:
1) Create a table and populate it using 5C character(s)

mysql> use test
mysql> create table sjis_load(a char(100) character set sjis);
mysql> insert into sjis_load values(0x955c); # ่กจ

2) Dump it to a file and truncate the table

mysql> select * into outfile 'sjis_load.txt' from sjis_load;
mysql> truncate sjis_load;

3) Load the table content using mysqlimport command

shell> mysqlimport -uuser -ppassword test /var/lib/mysql/test/sjis_load.txt

Suggested fix:
I have three options.

1. A new option for mysqlimport to specify a charset to use.
2. Let mysqlimport to use --default-character-set option.
3. Allow SELECT ... INTO OUTFILE to dump sjis/cp932 columns using hex format. (Columns can be read correctly if they are not enclosed.) Or let mysqldump to do so.