Bug #78371 Dumpfile with utf16 can't
Submitted: 8 Sep 2015 19:59 Modified: 9 Sep 2015 7:14
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.8, 5.7.9 OS:Any
Assigned to: CPU Architecture:Any
Tags: Unicode, utf16, utf16le

[8 Sep 2015 19:59] Daniël van Eeden
Description:
If I dump a table with character set utf16 or utf16le I can't read in back in with the same character set.

The warnings on creating the dump are expected:
https://en.wikipedia.org/wiki/UTF-16#U.2BD800_to_U.2BDFFF

Test data:
https://github.com/dveeden/mysqlunicodedata

How to repeat:
mysql-5.7.8-rc-log> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql-5.7.8-rc-log> SELECT * FROM ucd INTO OUTFILE 'ucd.out';
Query OK, 29215 rows affected (0.06 sec)

mysql-5.7.8-rc-log> SELECT * FROM ucd INTO OUTFILE 'ucd_utf32.out' CHARACTER SET 'utf32';
Query OK, 29215 rows affected (0.13 sec)

mysql-5.7.8-rc-log> SELECT * FROM ucd INTO OUTFILE 'ucd_utf16.out' CHARACTER SET 'utf16';
Query OK, 29215 rows affected, 6 warnings (0.08 sec)

Warning (Code 1366): Incorrect string value: '\xED\xA0\x80' for column 'char' at row 14872
Warning (Code 1366): Incorrect string value: '\xED\xAD\xBF' for column 'char' at row 14873
Warning (Code 1366): Incorrect string value: '\xED\xAE\x80' for column 'char' at row 14874
Warning (Code 1366): Incorrect string value: '\xED\xAF\xBF' for column 'char' at row 14875
Warning (Code 1366): Incorrect string value: '\xED\xB0\x80' for column 'char' at row 14876
Warning (Code 1366): Incorrect string value: '\xED\xBF\xBF' for column 'char' at row 14877
mysql-5.7.8-rc-log> SELECT * FROM ucd INTO OUTFILE 'ucd_utf16le.out' CHARACTER SET 'utf16le';
Query OK, 29215 rows affected, 6 warnings (0.08 sec)

Warning (Code 1366): Incorrect string value: '\xED\xA0\x80' for column 'char' at row 14872
Warning (Code 1366): Incorrect string value: '\xED\xAD\xBF' for column 'char' at row 14873
Warning (Code 1366): Incorrect string value: '\xED\xAE\x80' for column 'char' at row 14874
Warning (Code 1366): Incorrect string value: '\xED\xAF\xBF' for column 'char' at row 14875
Warning (Code 1366): Incorrect string value: '\xED\xB0\x80' for column 'char' at row 14876
Warning (Code 1366): Incorrect string value: '\xED\xBF\xBF' for column 'char' at row 14877
mysql-5.7.8-rc-log> CREATE TABLE ucd2 LIKE ucd;
Query OK, 0 rows affected (0.06 sec)

mysql-5.7.8-rc-log> CREATE TABLE ucd_utf16 LIKE ucd;
Query OK, 0 rows affected (0.06 sec)

mysql-5.7.8-rc-log> CREATE TABLE ucd_utf16le LIKE ucd;
Query OK, 0 rows affected (0.06 sec)

mysql-5.7.8-rc-log> CREATE TABLE ucd_utf32 LIKE ucd;
Query OK, 0 rows affected (0.06 sec)

mysql-5.7.8-rc-log> LOAD DATA INFILE 'ucd.out' INTO TABLE ucd2;
Query OK, 29215 rows affected (0.87 sec)
Records: 29215  Deleted: 0  Skipped: 0  Warnings: 0

mysql-5.7.8-rc-log> LOAD DATA INFILE 'ucd_utf16.out' INTO TABLE ucd_utf16 CHARACTER SET 'utf16';
ERROR 1062 (23000): Duplicate entry '' for key 'char'
mysql-5.7.8-rc-log> LOAD DATA INFILE 'ucd_utf16le.out' INTO TABLE ucd_utf16le CHARACTER SET 'utf16le';
ERROR 1366 (HY000): Incorrect string value: '\x00\x09\x3C\x00\x63\x00...' for column 'name' at row 10
mysql-5.7.8-rc-log> LOAD DATA INFILE 'ucd_utf32.out' INTO TABLE ucd_utf32 CHARACTER SET 'utf32';
Query OK, 29215 rows affected (0.51 sec)
Records: 29215  Deleted: 0  Skipped: 0  Warnings: 0

mysql-5.7.8-rc-log> SELECT `char`,value, name FROM ucd LIMIT 6 OFFSET 14871;
+------+-------+-----------------------------------------+
| char | value | name                                    |
+------+-------+-----------------------------------------+
| xxx    | D800  | <Non Private Use High Surrogate, First> |
| xxx    | DB7F  | <Non Private Use High Surrogate, Last>  |
| xxx    | DB80  | <Private Use High Surrogate, First>     |
| xxx    | DBFF  | <Private Use High Surrogate, Last>      |
| xxx    | DC00  | <Low Surrogate, First>                  |
| xxx    | DFFF  | <Low Surrogate, Last>                   |
+------+-------+-----------------------------------------+
6 rows in set (0.02 sec)

mysql-5.7.8-rc-log> CHECKSUM TABLE ucd, ucd2, ucd_utf16, ucd_utf16le, ucd_utf32;
+-------------------------+-----------+
| Table                   | Checksum  |
+-------------------------+-----------+
| unicodedata.ucd         | 349222814 |
| unicodedata.ucd2        | 349222814 |
| unicodedata.ucd_utf16   |         0 |
| unicodedata.ucd_utf16le |         0 |
| unicodedata.ucd_utf32   | 349222814 |
+-------------------------+-----------+
5 rows in set (0.09 sec)

special characters are replaced with xxx as bugs.mysql.com doesn't allow utf8mb4 yet.
[8 Sep 2015 20:02] Daniël van Eeden
With IGNORE it is able to get most rows back…

mysql-5.7.8-rc-log> LOAD DATA INFILE 'ucd_utf16.out' IGNORE INTO TABLE ucd_utf16 CHARACTER SET 'utf16';
Query OK, 28832 rows affected, 3822 warnings (0.50 sec)
Records: 29335  Deleted: 0  Skipped: 503  Warnings: 3822

Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x01\x09\x00\x44\x00\x45...' for column 'name' at row 2126
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x02\x09\x00\x44\x00\x45...' for column 'name' at row 2127
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x03\x09\x00\x44\x00\x45...' for column 'name' at row 2128
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x04\x09\x00\x44\x00\x45...' for column 'name' at row 2129
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x05\x09\x00\x44\x00\x45...' for column 'name' at row 2130
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x06\x09\x00\x44\x00\x45...' for column 'name' at row 2131
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x07\x09\x00\x44\x00\x45...' for column 'name' at row 2132
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x08\x09\x00\x44\x00\x45...' for column 'name' at row 2133
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1265): Data truncated for column 'category' at row 2134
Warning (Code 1265): Data truncated for column 'bidi_mirrored' at row 2134
Warning (Code 1265): Data truncated for column 'ISO_comment' at row 2134
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1261): Row 2135 doesn't contain data for all columns
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1265): Data truncated for column 'char' at row 2136
Warning (Code 1265): Data truncated for column 'category' at row 2136
Warning (Code 1265): Data truncated for column 'bidi_class' at row 2136
Warning (Code 1265): Data truncated for column 'bidi_mirrored' at row 2136
Warning (Code 1261): Row 2136 doesn't contain data for all columns
Warning (Code 1366): Incorrect string value: '\x0B\x09\x00\x44\x00\x45...' for column 'name' at row 2137
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x0C\x09\x00\x44\x00\x45...' for column 'name' at row 2138
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x0D\x09\x00\x44\x00\x45...' for column 'name' at row 2139
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x0E\x09\x00\x44\x00\x45...' for column 'name' at row 2140
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x0F\x09\x00\x44\x00\x45...' for column 'name' at row 2141
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x10\x09\x00\x44\x00\x45...' for column 'name' at row 2142
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x11\x09\x00\x44\x00\x45...' for column 'name' at row 2143
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x12\x09\x00\x44\x00\x45...' for column 'name' at row 2144
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x13\x09\x00\x44\x00\x45...' for column 'name' at row 2145
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x14\x09\x00\x44\x00\x45...' for column 'name' at row 2146
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x15\x09\x00\x44\x00\x45...' for column 'name' at row 2147
Warning (Code 1062): Duplicate entry '' for key 'char'
Warning (Code 1366): Incorrect string value: '\x16\x09\x00\x44\x00\x45...' for column 'name' at row 2148
Warning (Code 1062): Duplicate entry '' for key 'char'
[9 Sep 2015 7:14] MySQL Verification Team
Hi Daniel,

Thank you for the report.
Verified as described with 5.7.8/5.7.9 builds.

Thanks,
Umesh