Bug #25131 Regression with UNHEX() and binary fields.
Submitted: 18 Dec 2006 9:07 Modified: 21 Feb 2007 19:08
Reporter: Alexis ROYER Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0.26/5.0BK/5.1BK OS:Windows (Windows XP/Linux)
Assigned to: Paul DuBois CPU Architecture:Any
Tags: BINARY, UNHEX

[18 Dec 2006 9:07] Alexis ROYER
Description:
UNHEX() function returns NULL on the result of HEX() stored in a BINARY field.

How to repeat:
Here is a log to reproduce.

mysql> SELECT @@VERSION;
+-------------------------+
| @@VERSION               |
+-------------------------+
| 5.0.26-community-nt-log |
+-------------------------+
1 row in set (0.20 sec)

mysql> CREATE TABLE mytable(mybin BINARY(128));
Query OK, 0 rows affected (0.17 sec)

mysql> SHOW CREATE TABLE mytable;
+---------+----------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                       |
+---------+----------------------------------------------------------------------------------------------------+
| mytable | CREATE TABLE `mytable` (
  `mybin` binary(128) default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+---------+----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO mytable VALUES(HEX('hello'));
Query OK, 1 row affected (0.06 sec)

mysql> SELECT * FROM mytable;
+----------------------------------------------------------------------------------------------------------------------------------+
| mybin                                                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------------+
| 68656C6C6F                                                                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT UNHEX(mybin) FROM mytable;
+--------------+
| UNHEX(mybin) |
+--------------+
| NULL         |
+--------------+
1 row in set (0.00 sec)

mysql> SELECT UNHEX(CAST(mybin AS CHAR)) FROM mytable;
+----------------------------+
| UNHEX(CAST(mybin AS CHAR)) |
+----------------------------+
| NULL                       |
+----------------------------+
1 row in set (0.00 sec)

Suggested fix:
As you can see in the log, I tried casting the binary field in some string to have UNHEX() work better, but it didn't work the way I do.
I guess MySQL 4.1.10-nt had some implicit data type conversion between binary and varchars since it worked in this previous version.
[18 Dec 2006 11:30] MySQL Verification Team
Thank you for the bug report.

mysql> SELECT @@VERSION;
+--------------+
| @@VERSION    |
+--------------+
| 4.1.23-debug |
+--------------+
1 row in set (0.01 sec)

mysql> CREATE TABLE mytable(mybin BINARY(128));
Query OK, 0 rows affected (0.03 sec)

mysql> SHOW CREATE TABLE mytable\G
*************************** 1. row ***************************
       Table: mytable
Create Table: CREATE TABLE `mytable` (
  `mybin` binary(128) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> INSERT INTO mytable VALUES(HEX('hello'));
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM mytable;
+------------+
| mybin      |
+------------+
| 68656C6C6F |
+------------+
1 row in set (0.00 sec)

mysql> SELECT UNHEX(mybin) FROM mytable;
+--------------+
| UNHEX(mybin) |
+--------------+
| hello        |
+--------------+
1 row in set (0.00 sec)
-----------------------------------------------------------------------------------

mysql> SELECT @@VERSION;
+--------------+
| @@VERSION    |
+--------------+
| 5.0.32-debug | 
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE mytable(mybin BINARY(128));
Query OK, 0 rows affected (0.01 sec)

mysql> SHOW CREATE TABLE mytable\G
*************************** 1. row ***************************
       Table: mytable
Create Table: CREATE TABLE `mytable` (
  `mybin` binary(128) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> INSERT INTO mytable VALUES(HEX('hello'));
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM mytable\G
*************************** 1. row ***************************
mybin: 68656C6C6F
1 row in set (0.00 sec)

mysql> SELECT UNHEX(mybin) FROM mytable;
+--------------+
| UNHEX(mybin) |
+--------------+
| NULL         | 
+--------------+
1 row in set (0.00 sec)
[24 Jan 2007 22:22] Timothy Smith
Hello, Alexis.

First, thanks for filing this bug report; we've looked closely at it and agree that it's a surprising difference from previous behavior.

I'm changing this to a Documentation bug, because I want to give our documentation team a chance to improve the manual if they see fit.

This behavior change is a result of the incompatible change that is listed on our 4.1 -> 5.0 upgrade guide here:

http://dev.mysql.com/doc/refman/5.0/en/upgrading-from-4-1.html

=================

Warning: Incompatible change. For BINARY columns, the pad value and how it is handled has changed as of MySQL 5.0.15. The pad value for inserts now is 0x00 rather than space, and there is no stripping of the pad value for retrievals. For details, see Section 11.4.2, “The BINARY and VARBINARY Types”.

=================

This change can affect many results, not just the results of how UNHEX() works.    Since this is an intentional feature change in the 5.0 version, it's not a bug that will be reverted.

Alexis, the best way to handle this is to convert your data to an appropriate data type for 5.0 and later versions.  The BINARY type should be used only for fixed-width values which store true binary data, where NUL bytes (0x00) are significant.

The closest alternative, for your purposes, would be CHAR(128) BINARY, which will behave like the old (4.1 and earlier) BINARY type.  That is, it will pad the end of the value with spaces, strip any trailing spaces when selecting the value, and will sort and compare values in byte-by-byte fashion (using the binary collation for the column's character set; e.g., latin1_bin).

Other alternatives which will also work are VARBINARY, TINYBLOB, VARCHAR BINARY, etc.  And, if you're only storing the hex representation of some data in this field, then using a plain CHAR or VARCHAR field will actually work best, with a case-insensitive collation: you'd want, for example, 'd3E4' = 'D3e4' to be true, since they both represent the same data when unhex()d.

For the documentation team:  I think it would be helpful to describe, in the binary-varbinary.html page, why it makes sense for BINARY to not strip trailing zeros.  Imagine, for example, storing IP addresses as BINARY(4).  Storing the value of 0xffffff00, and retrieving 0x00ffffff, would be a bad thing.  Also, if you notice any other way to improve the documentation around this incompatible behavior change in 5.0, it probably deserves a bit more attention.

Best regards,

Timothy
[21 Feb 2007 19:08] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

The real issue has been misidentified by the preceding
analysis (in my opinion). The explanation is fairly simple, 
although it will be complicated by the presence of an
unrelated bug that affects some of the results shown for 
earlier discussion in this bug report.

In regard to UNHEX(), the change in BINARY columns such
that trailing nul bytes no longer are stripped is not the
issue. That only serves to highlight the actual behavior of 
the UNHEX() function: UNHEX() returns NULL if it encounters
_any_ non-hexadecimal character in the argument. It doesn't
matter whether the argument comes from a BINARY column,
and the "bad" character need not be a nul byte. It can be
any character outside the range '0' .. '9', 'A' .. 'F',
'a' .. 'f'. Example:

mysql> SELECT UNHEX('GG');
+-------------+
| UNHEX('GG') |
+-------------+
| NULL        |
+-------------+

(Note: this result appears to contradict the result shown 
by Tim on 24 Jan, where an argument of '6666  ' produces
a non-NULL value. That result is spurious, and is due to
a bug in some versions of the mysql client that displayed
NULL values as spaces. You will see spaces for MySQL 5.0.19
through 5.0.25, and 5.1.8 through 5.1.11, I believe.)

I will attempt to clarify the description for UNHEX() in
the manual with regard to legal input strings:

The characters in the argument string must be legal
hexadecimal digits: '0' .. '9', 'A' .. 'F', 'a' .. 'f'.
If UNHEX() encounters any non-hexadecimal digits in the
argument, it returns NULL:

mysql> SELECT UNHEX('GG');
+-------------+
| UNHEX('GG') |
+-------------+
| NULL        |
+-------------+

A NULL result can occur if the argument to UNHEX() is a
BINARY column, because values are padded with 0x00 bytes
when stored but those bytes are not stripped on retrieval.
For example 'aa' is stored into a CHAR(3) column as 'aa
' and retrieved as 'aa' (with the trailing pad space
stripped), so UNHEX() for the column value returns 'A'.
By contrast 'aa' is stored into a BINARY(3) column as
'aa\0' and retrieved as 'aa\0' (with the trailing pad 0x00
byte not stripped). '\0' is not a legal hexadecimal digit,
so UNHEX() for the column value returns NULL.