Bug #79317 MySQL 5.6.27 trying to validate binary data as UTF-8
Submitted: 17 Nov 2015 22:15 Modified: 16 Dec 2015 13:52
Reporter: Giovanni Go Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Data Types Severity:S3 (Non-critical)
Version:5.5.46, 5.6.27, 5.7.9.1 OS:Any
Assigned to: Guilhem Bichot CPU Architecture:Any

[17 Nov 2015 22:15] Giovanni Go
Description:
We use UUID stored as binary(16) PK to identify users.

We're seeing an issue where MySQL is trying to validate binary as UTF8 starting with MySQL 5.6.27 when trying to insert (and presumingly select) users. It will insert the row, but since Django treats MySQL warnings as exception, Django will bail out.

I'm not sure why MySQL trying to do this validation as per the doc:

[quote]
The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than non-binary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values. 
[/quote]
[url=http://dev.mysql.com/doc/refman/5.7/en/binary-varbinary.html]Reference[/url]

I believe this is the commit that is breaking things for us: https://github.com/mysql/mysql-server/commit/33a2e5abd.

Status
---
[code]
mysql> status
--------------
mysql  Ver 14.14 Distrib 5.7.9, for osx10.11 (x86_64) using  EditLine wrapper

Connection id:		90
Current database:	
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.9-log Homebrew
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
[/code]
Note that above shows version 5.7.9, but this started with 5.6.27

How to repeat:
Setup: Python 2.7, Django 1.6.6, MySQL-python 1.2.5, Ubuntu 14.04/OSX

Sample Python 2.7 code
[code]
import MySQLdb
import uuid

db = MySQLdb.connect('localhost', user='root', passwd='')

cursor = db.cursor()
cursor.execute("DROP DATABASE IF EXISTS `uuidtest`")
cursor.execute("CREATE DATABASE `uuidtest`")

db.select_db("uuidtest")
cursor.execute("""
          CREATE TABLE `user` (
              `uuid` binary(16) NOT NULL,
              PRIMARY KEY (`uuid`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
          """)
db.commit()

my_uuid = (uuid.uuid4().bytes,)

# Insert UUIDs into database
cursor.execute("""
    INSERT INTO `user` (uuid)
    VALUES (%s)
""", my_uuid)
db.commit()
cursor.close()
[/code]

Output
---
[code]./simple.py:27: Warning: Invalid utf8 character string: 'BE7BB6'
  """, my_uuid)[/code]

---
my.cnf
---
Should be using the default
[2 Dec 2015 11:20] Chiranjeevi Battula
Hello Giovanni Go,

Thank you for the bug report.
Verified this behavior on MySql Server 5.5.46, 5.6.27, 5.7.9.1 as described.

Thanks,
Chiranjeevi.
[2 Dec 2015 11:20] Chiranjeevi Battula
Test results:

sql scripts:

CREATE TABLE user ( uuid binary(16) NOT NULL, PRIMARY KEY (`uuid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
INSERT INTO user (uuid) VALUES ('admin𝌆');

select * from user;
 
SHOW WARNINGS;

5.6.27

mysql> status
--------------
C:\Program Files\MySQL\MySQL Server 5.6\bin\mysql.exe  Ver 14.14 Distrib 5.6.27, for Win64 (x86_64)

Connection id:          8
Current database:       test
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         5.6.27-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             localhost via TCP/IP
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:               3306
Uptime:                 1 hour 14 min 6 sec

warnings message:

# Level, Code, Message
Warning, 1300, Invalid utf8 character string: 'F09D8C'

5.7.9.1

mysql> status
--------------
C:\Program Files\MySQL\MySQL Server 5.7\bin\mysql.exe  Ver 14.14 Distrib 5.7.9, for Win64 (x86_64)

Connection id:          5
Current database:       test
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         5.7.9-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             localhost via TCP/IP
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:               5791
Uptime:                 28 min 35 sec

warnings message:

# Level, Code, Message
Warning, 1300, Invalid utf8 character string: 'F09D8C'

5.5.46

mysql> status
--------------
C:\Program Files\MySQL\MySQL Server 5.5\bin\mysql.exe  Ver 14.14 Distrib 5.5.46, for Win64 (x86)

Connection id:          7
Current database:       test
Current user:           root@localhost
SSL:                    Not in use
Using delimiter:        ;
Server version:         5.5.46-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             localhost via TCP/IP
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:               5546
Uptime:                 22 min 48 sec

warnings message:

# Level, Code, Message
Warning, 1300, Invalid utf8 character string: 'F09D8C'
[14 Dec 2015 10:15] Ashim Mishra
Is there any work-around available for this bug ?
[14 Dec 2015 18:16] Giovanni Go
Depends on how this bug affects you.

In particular with Django, Django will treat any MySQL warnings as exception if you have DEBUG turned on. Obviously, you can turn it off or disable this behavior by editing django/db/backends/mysql/base.py:51 where it checks if DEBUG is true. This is not ideal as it still would be useful to know about about warnings and setting DEBUG  to true is useful in our dev environments.

Another route is to send the uuid as hex. Not ideal as that would 32 characters instead of 16 so doubling the amount the data that needs to be sent.

Hope that helps!
[15 Dec 2015 14:30] Guilhem Bichot
Hi Giovanni,

replace:

cursor.execute("""
    INSERT INTO `user` (uuid)
    VALUES (%s)
""", my_uuid)

with

cursor.execute("""
    INSERT INTO `user` (uuid)
    VALUES (_binary %s)
""", my_uuid)

Mind the underscore. It's "_binary", not "binary".
This "_binary" tells MySQL that the following string is to be interpreted as binary, not to be interpreted/validated as utf8.
http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html

Please let us know if it solves the issue.
[15 Dec 2015 15:19] Guilhem Bichot
There are also per-session settings influencing how the query is treated:
set character_set_connection=binary
And also SET NAMES.
More details at:
http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html
[15 Dec 2015 18:22] Giovanni Go
That seems to work!

By any chance you know how to get this to work in the Django ORM so it will specify '_binary' before %s?
[15 Dec 2015 20:28] Guilhem Bichot
Django cannot guess when validation is wanted and when it's not, I think. Only in the latter case is adding "_binary" a good idea. For sure, I don't know how to configure Django...
[16 Dec 2015 13:52] Guilhem Bichot
A recap:

- illegal utf8 characters can be harmful to users, so MySQL must warn about their presence.

- Django is sending a binary string, which has no reason to be valid utf8; but by default all strings are in the value of @@character_set_connection (likely utf8), so warnings are raised. Such warnings undesirable, but MySQL cannot guess the string has no reason to be valid utf8. The solution is to tell MySQL, see below.

- So the solution is
  * to alter connection's charset settings (see http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html),
  * or to prefix the string with "_binary" (a charset introducer, see
http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html ).

- I suggest filing a bug against Django, that it should apply some suggestion above to insert UUIDs. An alternate suggestion for Django is to send the UUID in x'' notation:
https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html :
insert into user (uuid) values(x'1234af0d...');

- Because it's not a MySQL bug, I'm closing it. But we can continue the discussion here if needed.
[25 Jan 2016 2:14] Ryan Brothers
I am running into a similar issue with trying to store binary data in a mediumblob field.  The above suggestion to prefix the binary data with _binary removes the warning, but if I mysqldump my table and reload it, then a separate warning appears for every row in my table.  Is there a way to suppress the warnings there too?
[25 Jan 2016 10:27] Guilhem Bichot
I can repeat the problem just above, with 5.7.11. I used a BINARY column but BLOB could show it as well:

create table t(a binary(100));
insert into t values(x'EE0C6D03C34C11E5B1640026B977EB17');
select hex(a) from t;
EE0C6D03C34C11E5B1640026B977EB17000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

All correct, now dump it:

$ ../client/mysqldump -uroot -S var/tmp/mysqld.1.sock test >/tmp/dump.sql

$ cat /tmp/dump.sql 
-- MySQL dump 10.13  Distrib 5.7.12, for Linux (x86_64)
...
/*!40101 SET NAMES utf8 */;
...
INSERT INTO `t` VALUES ('\�
                           m\�L\��d\0&�w\�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0');

So, raw bytes are in the dump.
Now, load this dump with MySQL command-line client:
mysql> source /tmp/dump.sql
...
INSERT INTO `t` VALUES ('\�
                           m\�L\��d\0&�w\�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0')
Query OK, 1 row affected, 1 warning (0,00 sec)

Warning (Code 1300): Invalid utf8 character string: 'EE0C6D'

The content is correctly inserted, fortunately:
mysql> select hex(a) from t;
EE0C6D03C34C11E5B1640026B977EB17000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |

So the problem is this warning during restore.
We need to analyse this.
[25 Jan 2016 22:36] Ryan Brothers
Guilhem - thanks for confirming the issue.  I am running into it on 5.6.28 too.  Should I open a new ticket since this ticket is closed, or would you be able to re-open this ticket?
[26 Jan 2016 9:06] Guilhem Bichot
This report remains closed as "not a bug". I made a new report for the mysqldump issue : http://bugs.mysql.com/bug.php?id=80150 .
[26 Jan 2016 9:43] Guilhem Bichot
Filed https://code.djangoproject.com/ticket/26139 to see if Django's INSERT could be modified.
[25 Apr 2016 13:30] Guilhem Bichot
Posted by developer:
 
- mysqldump issue has been filed as separate bug
- Django team pushed a fix in their code.
- Closing this report.