Bug #79317 | MySQL 5.6.27 trying to validate binary data as UTF-8 | ||
---|---|---|---|
Submitted: | 17 Nov 2015 22:15 | Modified: | 16 Dec 2015 13:52 |
Reporter: | Giovanni Go | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Data Types | Severity: | S3 (Non-critical) |
Version: | 5.5.46, 5.6.27, 5.7.9.1 | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
[17 Nov 2015 22:15]
Giovanni Go
[2 Dec 2015 11:20]
Chiranjeevi Battula
Hello Giovanni Go, Thank you for the bug report. Verified this behavior on MySql Server 5.5.46, 5.6.27, 5.7.9.1 as described. Thanks, Chiranjeevi.
[2 Dec 2015 11:20]
Chiranjeevi Battula
Test results: sql scripts: CREATE TABLE user ( uuid binary(16) NOT NULL, PRIMARY KEY (`uuid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO user (uuid) VALUES ('admin𝌆'); select * from user; SHOW WARNINGS; 5.6.27 mysql> status -------------- C:\Program Files\MySQL\MySQL Server 5.6\bin\mysql.exe Ver 14.14 Distrib 5.6.27, for Win64 (x86_64) Connection id: 8 Current database: test Current user: root@localhost SSL: Not in use Using delimiter: ; Server version: 5.6.27-log MySQL Community Server (GPL) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 TCP port: 3306 Uptime: 1 hour 14 min 6 sec warnings message: # Level, Code, Message Warning, 1300, Invalid utf8 character string: 'F09D8C' 5.7.9.1 mysql> status -------------- C:\Program Files\MySQL\MySQL Server 5.7\bin\mysql.exe Ver 14.14 Distrib 5.7.9, for Win64 (x86_64) Connection id: 5 Current database: test Current user: root@localhost SSL: Not in use Using delimiter: ; Server version: 5.7.9-log MySQL Community Server (GPL) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 TCP port: 5791 Uptime: 28 min 35 sec warnings message: # Level, Code, Message Warning, 1300, Invalid utf8 character string: 'F09D8C' 5.5.46 mysql> status -------------- C:\Program Files\MySQL\MySQL Server 5.5\bin\mysql.exe Ver 14.14 Distrib 5.5.46, for Win64 (x86) Connection id: 7 Current database: test Current user: root@localhost SSL: Not in use Using delimiter: ; Server version: 5.5.46-log MySQL Community Server (GPL) Protocol version: 10 Connection: localhost via TCP/IP Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 TCP port: 5546 Uptime: 22 min 48 sec warnings message: # Level, Code, Message Warning, 1300, Invalid utf8 character string: 'F09D8C'
[14 Dec 2015 10:15]
Ashim Mishra
Is there any work-around available for this bug ?
[14 Dec 2015 18:16]
Giovanni Go
Depends on how this bug affects you. In particular with Django, Django will treat any MySQL warnings as exception if you have DEBUG turned on. Obviously, you can turn it off or disable this behavior by editing django/db/backends/mysql/base.py:51 where it checks if DEBUG is true. This is not ideal as it still would be useful to know about about warnings and setting DEBUG to true is useful in our dev environments. Another route is to send the uuid as hex. Not ideal as that would 32 characters instead of 16 so doubling the amount the data that needs to be sent. Hope that helps!
[15 Dec 2015 14:30]
Guilhem Bichot
Hi Giovanni, replace: cursor.execute(""" INSERT INTO `user` (uuid) VALUES (%s) """, my_uuid) with cursor.execute(""" INSERT INTO `user` (uuid) VALUES (_binary %s) """, my_uuid) Mind the underscore. It's "_binary", not "binary". This "_binary" tells MySQL that the following string is to be interpreted as binary, not to be interpreted/validated as utf8. http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html Please let us know if it solves the issue.
[15 Dec 2015 15:19]
Guilhem Bichot
There are also per-session settings influencing how the query is treated: set character_set_connection=binary And also SET NAMES. More details at: http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html
[15 Dec 2015 18:22]
Giovanni Go
That seems to work! By any chance you know how to get this to work in the Django ORM so it will specify '_binary' before %s?
[15 Dec 2015 20:28]
Guilhem Bichot
Django cannot guess when validation is wanted and when it's not, I think. Only in the latter case is adding "_binary" a good idea. For sure, I don't know how to configure Django...
[16 Dec 2015 13:52]
Guilhem Bichot
A recap: - illegal utf8 characters can be harmful to users, so MySQL must warn about their presence. - Django is sending a binary string, which has no reason to be valid utf8; but by default all strings are in the value of @@character_set_connection (likely utf8), so warnings are raised. Such warnings undesirable, but MySQL cannot guess the string has no reason to be valid utf8. The solution is to tell MySQL, see below. - So the solution is * to alter connection's charset settings (see http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html), * or to prefix the string with "_binary" (a charset introducer, see http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html ). - I suggest filing a bug against Django, that it should apply some suggestion above to insert UUIDs. An alternate suggestion for Django is to send the UUID in x'' notation: https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html : insert into user (uuid) values(x'1234af0d...'); - Because it's not a MySQL bug, I'm closing it. But we can continue the discussion here if needed.
[25 Jan 2016 2:14]
Ryan Brothers
I am running into a similar issue with trying to store binary data in a mediumblob field. The above suggestion to prefix the binary data with _binary removes the warning, but if I mysqldump my table and reload it, then a separate warning appears for every row in my table. Is there a way to suppress the warnings there too?
[25 Jan 2016 10:27]
Guilhem Bichot
I can repeat the problem just above, with 5.7.11. I used a BINARY column but BLOB could show it as well: create table t(a binary(100)); insert into t values(x'EE0C6D03C34C11E5B1640026B977EB17'); select hex(a) from t; EE0C6D03C34C11E5B1640026B977EB17000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 All correct, now dump it: $ ../client/mysqldump -uroot -S var/tmp/mysqld.1.sock test >/tmp/dump.sql $ cat /tmp/dump.sql -- MySQL dump 10.13 Distrib 5.7.12, for Linux (x86_64) ... /*!40101 SET NAMES utf8 */; ... INSERT INTO `t` VALUES ('\� m\�L\��d\0&�w\�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'); So, raw bytes are in the dump. Now, load this dump with MySQL command-line client: mysql> source /tmp/dump.sql ... INSERT INTO `t` VALUES ('\� m\�L\��d\0&�w\�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') Query OK, 1 row affected, 1 warning (0,00 sec) Warning (Code 1300): Invalid utf8 character string: 'EE0C6D' The content is correctly inserted, fortunately: mysql> select hex(a) from t; EE0C6D03C34C11E5B1640026B977EB17000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 | So the problem is this warning during restore. We need to analyse this.
[25 Jan 2016 22:36]
Ryan Brothers
Guilhem - thanks for confirming the issue. I am running into it on 5.6.28 too. Should I open a new ticket since this ticket is closed, or would you be able to re-open this ticket?
[26 Jan 2016 9:06]
Guilhem Bichot
This report remains closed as "not a bug". I made a new report for the mysqldump issue : http://bugs.mysql.com/bug.php?id=80150 .
[26 Jan 2016 9:43]
Guilhem Bichot
Filed https://code.djangoproject.com/ticket/26139 to see if Django's INSERT could be modified.
[25 Apr 2016 13:30]
Guilhem Bichot
Posted by developer: - mysqldump issue has been filed as separate bug - Django team pushed a fix in their code. - Closing this report.