| Bug #71746 | mysqldump may use utf8mb4 as default-character-set | ||
|---|---|---|---|
| Submitted: | 17 Feb 2014 8:53 | ||
| Reporter: | Sadao Hiratsuka | Email Updates: | |
| Status: | Open | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
| Version: | 5.6.16 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | mysqldump | ||
[5 Mar 2015 22:32]
Stephen Van Bruwaene
I'm unable to dump tables that contain utf8mb4 character sets with emoji -- dump happens, but when I try to import, emoji are lost, and I get duplicate errors
[6 Mar 2015 5:31]
MySQL Verification Team
Can you dump/reload if you perform the dump with options like this? mysqldump --hex-blob --default-character-set=binary
[6 Sep 2015 9:05]
Daniël van Eeden
Test data: https://github.com/dveeden/mysqlunicodedata (this is a script to import the unicode character database into MySQL, it also has a dump of the data) The 4-byte characters get replaced by a '?' (why not �, that would be more correct). The solution is easy: set utf8mb4 as character set for mysqldump. $ mysqldump --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','?','DOLPHIN','So','0','ON','','','','','N','','','','',''); $ mysqldump --default-character-set=utf8mb4 --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','
[6 Sep 2015 9:06]
Daniël van Eeden
With workaround because bugs.mysql.com is not yet using utf8mb4 :( Test data: https://github.com/dveeden/mysqlunicodedata (this is a script to import the unicode character database into MySQL, it also has a dump of the data) The 4-byte characters get replaced by a '?' (why not �, that would be more correct). The solution is easy: set utf8mb4 as character set for mysqldump. $ mysqldump --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','?','DOLPHIN','So','0','ON','','','','','N','','','','',''); $ mysqldump --default-character-set=utf8mb4 --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','___ACTUAL__DOLPHIN__CHAR__','DOLPHIN','So','0','ON','','','','','N','','','','',''); This data is in a utf8mb4 column, mysqldump should detect that and - Give an error/warning if it can't properly dump the data - Use utf8mb4 Why not make utf8mb4 the default for mysqldump?
[6 Sep 2015 9:10]
Daniël van Eeden
This also affects mysqlpump, created a separate bugreport for that Bug #78348 mysqlpump support for 4-byte UTF-8
[6 Sep 2015 9:12]
Daniël van Eeden
For bugs.mysql.com there is this open bug... Bug #62458 Not possible to report issues involving utf8mb4 characters
[6 Sep 2015 9:15]
Daniël van Eeden
Fixing this could be as easy as:
------------------------------------------------------------------
diff --git a/client/base/mysql_connection_options.cc b/client/base/mysql_connection_options.cc
index 4a11c89..97c78a2 100644
--- a/client/base/mysql_connection_options.cc
+++ b/client/base/mysql_connection_options.cc
@@ -161,7 +161,7 @@ MYSQL* Mysql_connection_options::create_connection()
}
else
{
- mysql_options(connection, MYSQL_SET_CHARSET_NAME, "utf8");
+ mysql_options(connection, MYSQL_SET_CHARSET_NAME, "utf8mb4");
}
if (this->m_plugin_dir.has_value())
mysql_options(connection, MYSQL_PLUGIN_DIR,
------------------------------------------------------------------
[7 Sep 2015 9:57]
Daniël van Eeden
I don't consider this 'S4 (Feature request)', but 'S2 (Serious)' because it silently drops the data.
[8 Sep 2015 5:30]
MySQL Verification Team
it does not matter what charset is used - one cannot mysqldump text data and import it properly. You have to use hex for all char fields. I filed this years ago, internally: Bug 14271638 - CHARACTER SET DUPLICATE HANDLING SEEMS BROKEN e.g. Try dump this table and import it: --- set names utf8; drop table if exists t1; create table t1(a varchar(1) character set armscii8 collate armscii8_bin, unique key(a))engine=innodb; insert into t1 values (0x29),(0xa4); ---
[5 Feb 2016 15:16]
Brad Kent
MySQL Workbench also passes default-character-set=utf8 and the option is not configurable
[1 Jan 2018 1:31]
Daniel Black
utf8mb4 made the default in ebaff9fffc958030a57d8ea7f1f2d527cac1df64 / WL#10353

Description: mysqldump uses utf8 as default-character-set by default, but it can't dump utf8mb4 data correctly. If utf8 is chosen for character capacity, utf8mb4 is better than utf8. How to repeat: mysql> CREATE TABLE mb4 (point VARCHAR(10) PRIMARY KEY, data VARCHAR(10)) CHARACTER SET utf8mb4; mysql> INSERT INTO mb4 (point, data) VALUES ('01f300', UNHEX('f09f8c80')); <= U+1F300 CYCLONE $ mysqldump --no-defaults -u root -p test mb4 ... /*!40101 SET NAMES utf8 */; -- -- Dumping data for table `mb4` -- LOCK TABLES `mb4` WRITE; /*!40000 ALTER TABLE `mb4` DISABLE KEYS */; INSERT INTO `mb4` VALUES ('01f300','?'); <= can't dump correctly /*!40000 ALTER TABLE `mb4` ENABLE KEYS */; UNLOCK TABLES; Suggested fix: Please use utf8mb4 as default-character-set.