Bug #71746 | mysqldump may use utf8mb4 as default-character-set | ||
---|---|---|---|
Submitted: | 17 Feb 2014 8:53 | ||
Reporter: | Sadao Hiratsuka | Email Updates: | |
Status: | Open | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.6.16 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | mysqldump |
[17 Feb 2014 8:53]
Sadao Hiratsuka
[5 Mar 2015 22:32]
Stephen Van Bruwaene
I'm unable to dump tables that contain utf8mb4 character sets with emoji -- dump happens, but when I try to import, emoji are lost, and I get duplicate errors
[6 Mar 2015 5:31]
MySQL Verification Team
Can you dump/reload if you perform the dump with options like this? mysqldump --hex-blob --default-character-set=binary
[6 Sep 2015 9:05]
Daniël van Eeden
Test data: https://github.com/dveeden/mysqlunicodedata (this is a script to import the unicode character database into MySQL, it also has a dump of the data) The 4-byte characters get replaced by a '?' (why not �, that would be more correct). The solution is easy: set utf8mb4 as character set for mysqldump. $ mysqldump --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','?','DOLPHIN','So','0','ON','','','','','N','','','','',''); $ mysqldump --default-character-set=utf8mb4 --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','
[6 Sep 2015 9:06]
Daniël van Eeden
With workaround because bugs.mysql.com is not yet using utf8mb4 :( Test data: https://github.com/dveeden/mysqlunicodedata (this is a script to import the unicode character database into MySQL, it also has a dump of the data) The 4-byte characters get replaced by a '?' (why not �, that would be more correct). The solution is easy: set utf8mb4 as character set for mysqldump. $ mysqldump --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','?','DOLPHIN','So','0','ON','','','','','N','','','','',''); $ mysqldump --default-character-set=utf8mb4 --skip-extended-insert unicodedata | grep DOLPHIN INSERT INTO `ucd` VALUES ('1F42C','___ACTUAL__DOLPHIN__CHAR__','DOLPHIN','So','0','ON','','','','','N','','','','',''); This data is in a utf8mb4 column, mysqldump should detect that and - Give an error/warning if it can't properly dump the data - Use utf8mb4 Why not make utf8mb4 the default for mysqldump?
[6 Sep 2015 9:10]
Daniël van Eeden
This also affects mysqlpump, created a separate bugreport for that Bug #78348 mysqlpump support for 4-byte UTF-8
[6 Sep 2015 9:12]
Daniël van Eeden
For bugs.mysql.com there is this open bug... Bug #62458 Not possible to report issues involving utf8mb4 characters
[6 Sep 2015 9:15]
Daniël van Eeden
Fixing this could be as easy as: ------------------------------------------------------------------ diff --git a/client/base/mysql_connection_options.cc b/client/base/mysql_connection_options.cc index 4a11c89..97c78a2 100644 --- a/client/base/mysql_connection_options.cc +++ b/client/base/mysql_connection_options.cc @@ -161,7 +161,7 @@ MYSQL* Mysql_connection_options::create_connection() } else { - mysql_options(connection, MYSQL_SET_CHARSET_NAME, "utf8"); + mysql_options(connection, MYSQL_SET_CHARSET_NAME, "utf8mb4"); } if (this->m_plugin_dir.has_value()) mysql_options(connection, MYSQL_PLUGIN_DIR, ------------------------------------------------------------------
[7 Sep 2015 9:57]
Daniël van Eeden
I don't consider this 'S4 (Feature request)', but 'S2 (Serious)' because it silently drops the data.
[8 Sep 2015 5:30]
MySQL Verification Team
it does not matter what charset is used - one cannot mysqldump text data and import it properly. You have to use hex for all char fields. I filed this years ago, internally: Bug 14271638 - CHARACTER SET DUPLICATE HANDLING SEEMS BROKEN e.g. Try dump this table and import it: --- set names utf8; drop table if exists t1; create table t1(a varchar(1) character set armscii8 collate armscii8_bin, unique key(a))engine=innodb; insert into t1 values (0x29),(0xa4); ---
[5 Feb 2016 15:16]
Brad Kent
MySQL Workbench also passes default-character-set=utf8 and the option is not configurable
[1 Jan 2018 1:31]
Daniel Black
utf8mb4 made the default in ebaff9fffc958030a57d8ea7f1f2d527cac1df64 / WL#10353