Bug #100064 Cannot remove padding 0x00 from a binary string with replace
Submitted: 1 Jul 2020 12:32 Modified: 2 Jul 2020 14:24
Reporter: Yushan ZHANG Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: BINARY, REPLACE

[1 Jul 2020 12:32] Yushan ZHANG
Description:
mysql> select binary col1 from t1;
+------------------------------------------------------------------------+
| binary col1                                                            |
+------------------------------------------------------------------------+
| 1000110                                                                |
| NULL                                                                   |
+------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select col1 from t1 where binary col1 = 1000110 ;
+------------------------------------------------------------------------+
| col1                                                                   |
+------------------------------------------------------------------------+
| 1000110                                                                |
+------------------------------------------------------------------------+
1 row in set (0.00 sec)

-- the padding 0x00 are not removed
mysql> select replace(binary col1, '0x00','') from t1;
+------------------------------------------------------------------------+
| replace(binary col1, '0x00','')                                        |
+------------------------------------------------------------------------+
| NULL                                                                   |
| 1000110                                                                |
+------------------------------------------------------------------------+
2 rows in set (0.00 sec)

-- this should yield one row
mysql> select col1 from t1 where replace(binary col1, '0x00','') = '1000110' ;
Empty set (0.00 sec)

How to repeat:
create table t1 (col0 CHAR(179), col1 BINARY(70), col2 SMALLINT PRIMARY KEY);
insert into t1 values (NULL, '1000110', '-32768');
insert into t1 values (NULL, NULL, '0');
create index usygq on t1 (col1, col2);
select replace(binary col1, '0x00','') from t1;
select col1 from t1 where replace(binary col1, '0x00','') = '1000110' ;
[2 Jul 2020 12:23] MySQL Verification Team
Hi Mr. CHANG,

Thank you for your bug report.

However, this is not a bug.

Simply, I do not see `0x00` anywhere in that BINARY. Also, BINARY domains can't be treated as numeric or strings.

Not a bug.
[2 Jul 2020 13:26] Yushan ZHANG
This is the explanation of padding '0x00' in the documentation:

When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 for inserts, and no trailing bytes are removed for retrievals. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 and space differ in comparisons, with 0x00 sorting before space. 

Example: For a BINARY(3) column, 'a ' becomes 'a \0' when inserted. 'a\0' becomes 'a\0\0' when inserted. Both inserted values remain unchanged for retrievals. 

https://dev.mysql.com/doc/refman/8.0/en/binary-varbinary.html

This is what actually stored in the table (YES, it is with padding `0x00`):

mysql> insert into t1 values (NULL, '1000110', '-32768');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (NULL, NULL, '0');
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 values (NULL, '1000110 ', '-32767');
Query OK, 1 row affected (0.00 sec)

mysql> select hex(col1) from t1;
+----------------------------------------------------------------------------------------------------------------------------------------------+
| hex(col1)                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| NULL                                                                                                                                         |
| 31303030313130000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
| 31303030313130200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

You could see the empty string '' is saved as `20` in the last row.

And the documentation for `BINARY` is consistent, every byte is significant in comparison:

BINARY expr 
The BINARY operator converts the expression to a binary string (a string that has the binary character set and binary collation). A common use for BINARY is to force a character string comparison to be done byte by byte using numeric byte values rather than character by character. The BINARY operator also causes trailing spaces in comparisons to be significant. For information about the differences between the binary collation of the binary character set and the _bin collations of nonbinary character sets, see Section 10.8.5, “The binary Collation Compared to _bin Collations”. 
https://dev.mysql.com/doc/refman/8.0/en/binary-varbinary.html

I agree my claim that: `select col1 from t1 where replace(binary col1, '0x00','') = '1000110';` should return one row is wrong, but this doesn't conflict with the claim that `replace` should be valid on the binary string.
[2 Jul 2020 13:45] Yushan ZHANG
-- doesn't work
mysql> select replace(hex(col1), '0x00', '') from t1;
+----------------------------------------------------------------------------------------------------------------------------------------------+
| replace(hex(col1), '0x00', '')                                                                                                               |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| NULL                                                                                                                                         |
| 31303030313130000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
| 31303030313130200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

-- works
mysql> select replace(hex(col1), '00', '') from t1;
+------------------------------+
| replace(hex(col1), '00', '') |
+------------------------------+
| NULL                         |
| 31303030313130               |
| 3130303031313020             |
+------------------------------+
3 rows in set (0.00 sec)
[2 Jul 2020 14:09] Yushan ZHANG
Thank you for the reminder. But the problem is not with the zero byte.

The replace function is expected to be valid on the binary strings.
If it cannot replace the `0x00`, it doesn't have the expected behavior.

For the padding `0x00`, it means '\0' if I understand correctly.

For this query, it is expected to remove the `0x00`:

mysql> select replace(col1, '0x00', '') from t1;
+------------------------------------------------------------------------+
| replace(col1, '0x00', '')                                              |
+------------------------------------------------------------------------+
| NULL                                                                   |
| 1000110                                                                |
| 1000110                                                                |
+------------------------------------------------------------------------+
3 rows in set (0.00 sec)
[2 Jul 2020 14:11] Yushan ZHANG
But why this would work:

mysql> select replace(col1, '\0', '') from t1;
+-------------------------+
| replace(col1, '\0', '') |
+-------------------------+
| NULL                    |
| 1000110                 |
| 1000110                 |
+-------------------------+
3 rows in set (0.00 sec

Is the '\0' equivalent to `0x00` when considering the paddings?
[2 Jul 2020 14:14] Yushan ZHANG
This is described in the documentation, but why you said that it is not a zero byte?

````
When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte)....

Example: For a BINARY(3) column, 'a ' becomes 'a \0' when inserted. 'a\0' becomes 'a\0\0' when inserted. Both inserted values remain unchanged for retrievals. 
````

Doesn't this mean `\0` and `0x00` are equivalent?
[2 Jul 2020 14:20] MySQL Verification Team
Hi,

You are attempting to replace empty binary string with binary zero, which is one and the same.

Also, this is not a forum for asking questions, but for repeatable test cases that demonstrate some buggy behaviour.

For questions, please go to forums.mysql.com .
[2 Jul 2020 14:24] Yushan ZHANG
I'm just trying to clarify my misunderstanding. 
Now I get the point, thank you.