Bug #31677 | SELECT INTO OUTFILE never escapes multibyte character | ||
---|---|---|---|
Submitted: | 17 Oct 2007 23:10 | Modified: | 19 Aug 2009 16:53 |
Reporter: | Gleb Shchepa | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Parser | Severity: | S3 (Non-critical) |
Version: | 5.0 and up | OS: | Any |
Assigned to: | Marc ALFF | CPU Architecture: | Any |
[17 Oct 2007 23:10]
Gleb Shchepa
[17 Oct 2007 23:12]
Gleb Shchepa
test case
Attachment: 31677.test (application/octet-stream, text), 720 bytes.
[16 Nov 2007 20:17]
Gleb Shchepa
Test case above is not precise. See correct test below. Also there is a test for single-byte 8bit termination/enclosed character and single-byte data: LOAD DATA INFILE is 8bit clear, but SELECT INTO OUTFILE is not, so encode/decode process in destructive (data corruption): mysql> SELECT VERSION(); +--------------+ | VERSION() | +--------------+ | 5.0.52-debug | +--------------+ 1 row in set (0.00 sec) mysql> mysql> SET NAMES utf8; Query OK, 0 rows affected (0.00 sec) mysql> SET character_set_database= utf8; Query OK, 0 rows affected (0.00 sec) mysql> mysql> CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8; Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t1 VALUES (0xC381, 0xC381); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM t1; +------+------+ | c1 | c2 | +------+------+ | ц | ц | +------+------+ 1 row in set (0.00 sec) mysql> SELECT * INTO OUTFILE 't.txt' FROM t1; Query OK, 1 row affected (0.00 sec) mysql> TRUNCATE t1; Query OK, 0 rows affected (0.00 sec) # # Escaping is not necessary, correct result: # mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1; Query OK, 1 row affected (0.00 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT * FROM t1; +------+------+ | c1 | c2 | +------+------+ | ц | ц | +------+------+ 1 row in set (0.00 sec) ######################################################### mysql> SELECT * INTO OUTFILE 't.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1; Query OK, 1 row affected (0.00 sec) mysql> TRUNCATE t1; Query OK, 0 rows affected (0.00 sec) # # Escaping is required, incorrect result: # mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81; Query OK, 1 row affected, 1 warning (0.00 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 1 mysql> SELECT * FROM t1; +--------+------+ | c1 | c2 | +--------+------+ | ццц | NULL | +--------+------+ 1 row in set (0.00 sec) ######################################################### # # Data encoding is not 8bit clear: # mysql> CREATE TABLE t1 (c1 VARCHAR(256)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t1 VALUES (0xC3); Query OK, 1 row affected (0.00 sec) mysql> SELECT HEX(c1) FROM t1; +---------+ | HEX(c1) | +---------+ | C3 | +---------+ 1 row in set (0.00 sec) mysql> SELECT * INTO OUTFILE 't.txt' FIELDS ENCLOSED BY 0xC3 FROM t1; Query OK, 1 row affected (0.00 sec) mysql> TRUNCATE t1; Query OK, 0 rows affected (0.00 sec) mysql> mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3; Query OK, 1 row affected (0.00 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT HEX(c1) FROM t1; +---------+ | HEX(c1) | +---------+ | C3C30A | +---------+ 1 row in set (0.00 sec)
[16 Nov 2007 20:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/37979 ChangeSet@1.2566, 2007-11-16 23:44:55+04:00, gshchepa@gleb.loc +4 -0 Fixed bug #31677. 1. Multi-byte characters was never escaped, because they was always skipped before escaping process. 2. 8bit escape characters, termination and enclosed characters were silently ignored by SELECT INTO query, but LOAD DATA INFILE algorithm is 8bit-clear, so data was corrupted during encoding.
[19 Nov 2007 13:39]
Sergei Golubchik
This will be fixed in 5.1 after bug#30946 is fixed
[19 Aug 2009 16:35]
Marc ALFF
With the following script: ------------------ select version(); drop table if exists t1; SET NAMES utf8; SET character_set_database= utf8; CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8; INSERT INTO t1 VALUES (0xC381, 0xC381); SELECT * FROM t1; SELECT * INTO OUTFILE 'ta.txt' FROM t1; TRUNCATE t1; LOAD DATA INFILE 'ta.txt' INTO TABLE t1; SELECT * FROM t1; SELECT * INTO OUTFILE 'tb.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1; TRUNCATE t1; LOAD DATA INFILE 'tb.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81; SELECT * FROM t1; drop table t1; ------------------ the result is with 5.4.4: select version(); version() 5.4.4-alpha-debug-log drop table if exists t1; Warnings: Note 1051 Unknown table 't1' SET NAMES utf8; SET character_set_database= utf8; CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8; INSERT INTO t1 VALUES (0xC381, 0xC381); SELECT * FROM t1; c1 c2 Á Á SELECT * INTO OUTFILE 'ta.txt' FROM t1; TRUNCATE t1; LOAD DATA INFILE 'ta.txt' INTO TABLE t1; SELECT * FROM t1; c1 c2 Á Á SELECT * INTO OUTFILE 'tb.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1; Warnings: Warning 1638 Non-ASCII separator arguments are not fully supported TRUNCATE t1; LOAD DATA INFILE 'tb.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81; Warnings: Warning 1638 Non-ASCII separator arguments are not fully supported Warning 1261 Row 1 doesn't contain data for all columns SELECT * FROM t1; c1 c2 ÁÁÁ NULL drop table t1;