Bug #67930 mysqldump POINT field format incompatible with UTF-8, hex-blob doesn't apply
Submitted: 17 Dec 2012 18:58 Modified: 27 Dec 2012 17:21
Reporter: Martin Hradil Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:5.5 OS:Any
Assigned to: CPU Architecture:Any

[17 Dec 2012 18:58] Martin Hradil
Description:
mysqldump dumps POINT fields as a string with bytes >127 unescaped.
Since binary POINT data isn't traditionally UTF-8 compatible, such data can't be directly imported with charset UTF-8 which is a problem if there are any UTF-8 fields present.

Additionally, specifying --hex-blob on the mysqldump command line has no effect since it doesn't affect POINT.

Sample output from less(1) - note that the ^A, <E6> <99> and <9C> are actually unescaped bytes:

INSERT INTO `test` VALUES (
    'Žluťoučký kůň', '\0\0\0\0^A^A\0\0\0<E6>>9\n^P<99>H@3p@KW<9C>0@'
);

Sorry, couldn't test it on 5.6 but after reading mysqldump-5.6 manpage, I see no relevant changes.

How to repeat:
CREATE TABLE `test` (
  `utf8str` varchar(256),
  `point` point,
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `test` VALUES (
   'Žluťoučký kůň', X'000000000101000000e63e390a109948403370404b579c3040'
);

Run mysqldump.
See output.

Try importing it back and see it either fail or break utf8str field encoding.

Suggested fix:
I can see a few alternatives:

* Add POINT to data types affected by --hex-blob, or ideally add an option to specify a list.

* Export POINT data as POINT(1.2, 3.4).
  Or possibly as AsWKT() output but then, MySQL would have to be able to import such data (ie. implicit GeomFromText()).

* Quote 8bit characters if they don't fit the current encoding.
[17 Dec 2012 19:39] Sveta Smirnova
Thank you for the report.

Why this is a problem for you? I tested restore and data was restored fine:

Before restore:

select hex(utf8str), hex(point) from test;
hex(utf8str)	hex(point)
C5BD6C75C5A56F75C48D6BC3BD206BC5AFC588	000000000101000000E63E390A109948403370404B579C3040

After restore:

select hex(utf8str), hex(point) from test;
hex(utf8str)	hex(point)
C5BD6C75C5A56F75C48D6BC3BD206BC5AFC588	000000000101000000E63E390A109948403370404B579C3040
[17 Dec 2012 19:46] Martin Hradil
> Why this is a problem for you? I tested restore and data was restored fine

Oh, sorry, I wasn't specific enough..

It's not a problem when you use mysql itself to restore the data.

It is a problem though when you try to use *any* other tool. Say, phpmyadmin. Or adminer. Basically once you use a web based tool or anything that expects some kind of consistent encoding, you need valid UTF-8 data.

And please don't say it's a bug in that other tool, it stops being so when all are affected.
[17 Dec 2012 20:12] Sveta Smirnova
Thank you for the feedback.

I am sorry, but I have to close it, because this is utility error.

PhpMyAdmin uses PHP. See how this dump works with PHP:

mysql> CREATE TABLE `test` (   `utf8str` varchar(256),   `point` point ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.13 sec)

mysql> INSERT INTO `test` VALUES (    'Žluťoučký kůň', X'000000000101000000e63e390a109948403370404b579c3040' );de_ci;
Query OK, 1 row affected (0.00 sec)

mysql> \q

$ ../client/mysqldump -uroot -S ./var/tmp/mysqld.1.sock test >dump.sql

mysql> select hex(utf8str), hex(point) from test;
+----------------------------------------+----------------------------------------------------+
| hex(utf8str)                           | hex(point)                                         |
+----------------------------------------+----------------------------------------------------+
| C5BD6C75C5A56F75C48D6BC3BD206BC5AFC588 | 000000000101000000E63E390A109948403370404B579C3040 |
+----------------------------------------+----------------------------------------------------+
1 row in set (0.00 sec)

mysql> \q

$ php -r '$res = mysqli_connect("127.0.0.1", "root", "", "test", 13000); $query = file_get_contents("./dump.sql"); mysqli_multi_query($res, $query); var_dump(mysqli_error($res));'
string(0) ""

mysql> select hex(utf8str), hex(point) from test;
+----------------------------------------+----------------------------------------------------+
| hex(utf8str)                           | hex(point)                                         |
+----------------------------------------+----------------------------------------------------+
| C5BD6C75C5A56F75C48D6BC3BD206BC5AFC588 | 000000000101000000E63E390A109948403370404B579C3040 |
+----------------------------------------+----------------------------------------------------+
1 row in set (0.00 sec)

So proper use of PHP allows to restore dump without any issue.
[26 Dec 2012 23:20] Martin Hradil
I'm sorry but that's not sufficient.
While PHP itself can concievably be used to do it properly, such data cannot be pasted in a web browser, that's the problem.

But, even if we ignore that, we can't ignore the *fact* that mysql simply dumps UTF-8 incompatible data when specifically asked to dump UTF-8, so that's the real issue here. And that, is definitely MySQL bug.
[26 Dec 2012 23:24] Martin Hradil
A follow-up question .. Regardless of whether you agree it's a bug or not, if I sent a patch for mysqldump that makes it possible to do:
  mysqldump --hex-blob=POINT,LINE ...
(so, adding an optional parameter to hex-blob - a comma separated list of types to add to the types blobbed by default), would it be accepted?
[27 Dec 2012 1:32] Sveta Smirnova
Thank you for the feedback.

> But, even if we ignore that, we can't ignore the *fact* that mysql
simply dumps UTF-8 incompatible data when specifically asked to dump
UTF-8, so that's the real issue here. And that, is definitely MySQL
bug.

This is how it worked since very long time. If we start rejecting this now we will break many old applications.

> A follow-up question .. Regardless of whether you agree it's a bug or
not, if I sent a patch for mysqldump that makes it possible to do:
  mysqldump --hex-blob=POINT,LINE ...
(so, adding an optional parameter to hex-blob - a comma separated list
of types to add to the types blobbed by default), would it be accepted?

It depends. Usually procedure is as follow:

1. You need to sign OCA agreement as described in "Contribution" tab in this bug report (http://bugs.mysql.com/bug.php?id=67930&contribs=1)

2. You need either open new feature request or convert this bug to feature request, then attach patch to the report.

3. Our development and product management teams will examine the patch and consider both if the feature is useful addition and if the patch is of good quality and can be added to the code. If both conditions are satisfied patch will be accepted. Also, if development thinks they can implement same feature in better way they can reject the patch, but implement the feature.
[27 Dec 2012 17:21] Martin Hradil
> This is how it worked since very long time.
> If we start rejecting this now we will break many old applications.

Agreed, changing the default is not an option, that's why I proposed those options.. at least extra quoting can't break anything. And changing default hex-blob behaviour probably wouldn't either, since POINT is a binary type anyway (but then it would have to be extended to all binary types to be consistent).

> It depends. Usually procedure is as follow:
> 1. You need to sign OCA agreement as described in "Contribution" tab in this
> bug report (http://bugs.mysql.com/bug.php?id=67930&contribs=1)

Ok, no problem with that, assuming it doesn't stop me from contributing the patch *as myself* to different projects (say, MariaDB). And if I read it correctly, it's fine.

> 2. You need either open new feature request or convert this bug to feature
> request, then attach patch to the report.

OK, I'll create a new one (referencing this one) so that the intent is clear without reading through everything in here.

> 3. ...

Thank you for your guidance, much appreciated