Bug #69666 Strange result types for LPAD/RPAD and CONCAT when mixing binary and non-binary
Submitted: 3 Jul 2013 19:09 Modified: 22 Jul 2014 19:40
Reporter: Arthur O'Dwyer Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Types Severity:S3 (Non-critical)
Version:5.5.31, 5.6.21 OS:Any
Assigned to: CPU Architecture:Any

[3 Jul 2013 19:09] Arthur O'Dwyer
Description:
$ mysql --version
mysql  Ver 14.14 Distrib 5.5.31, for debian-linux-gnu (x86_64) using readline 6.2
$ mysqld --version
mysqld  Ver 5.5.31-0ubuntu0.12.04.2 for debian-linux-gnu on x86_64 ((Ubuntu))

This bug report is about the behavior of LPAD/RPAD and CONCAT in the type system. The return type of these functions varies depending on their argument types, which is somewhat expected --- but the return type *ALSO* varies depending on the constant-ness of the arguments, and on their order. This bad behavior is unpredictable and could lead to bugs; it's also impossible to explain to users.

Here's a truth table of what I observed with MySQL 5.5. My test program is pasted into the "How to repeat" section of this report.

LHS constant? LHS binary? RHS constant? RHS binary? ==> LPAD/RPAD result binary? CONCAT result binary? (What would we expect?)
T T T T ==> T T (T)
T T T F ==> T T (T)
T T F T ==> T T (T)
T T F F ==> T F (T) (Why is CONCAT's behavior different from LPAD's here?)
T F T T ==> T T (T)
T F T F ==> F F (F)
T F F T ==> T T (T)
T F F F ==> F F (F)
F T T T ==> T T (T)
F T T F ==> T T (T)
F T F T ==> T T (T)
F T F F ==> T T (T)
F F T T ==> F F (T) (Why is this not "T T", since it involves binary data?)
F F T F ==> F F (F)
F F F T ==> T T (T)
F F F F ==> F F (F)

How to repeat:
drop database if exists d;
create database d;
use d;
create table foo (bpiroshk varbinary(20), cpiroshk varchar(20), bA varbinary(5), cA varchar(5));
insert into foo values (_binary "пирожки", "пирожки", _binary "A", "A");

## These SELECT queries will return "Aпирожки" if the LPAD is done binary-wise,
## or "AAAAAAAAпирожки" if it's done character-wise.

select lpad(_binary "пирожки", 15, _binary "A") from foo;
select lpad(_binary "пирожки", 15, "A") from foo;
select lpad(_binary "пирожки", 15, bA) from foo;
select lpad(_binary "пирожки", 15, cA) from foo;
select lpad("пирожки", 15, _binary "A") from foo;
select lpad("пирожки", 15, "A") from foo;
select lpad("пирожки", 15, bA) from foo;
select lpad("пирожки", 15, cA) from foo;

select lpad(bpiroshk, 15, _binary "A") from foo;
select lpad(bpiroshk, 15, "A") from foo;
select lpad(bpiroshk, 15, bA) from foo;
select lpad(bpiroshk, 15, cA) from foo;
select lpad(cpiroshk, 15, _binary "A") from foo;
select lpad(cpiroshk, 15, "A") from foo;
select lpad(cpiroshk, 15, bA) from foo;
select lpad(cpiroshk, 15, cA) from foo;

## Given this behavior of LPAD, we can test the return type of CONCAT.
## These SELECT queries will return "AпирожкиA" if the LPAD is done binary-wise,
## or "AAAAAAAAпирожкиA" if it's done character-wise; which from above we can
## infer that the CONCAT itself was done binary-wise or character-wise.

select lpad(concat(_binary "пирожки", _binary "A"), 16, "A") from foo;
select lpad(concat(_binary "пирожки", "A"), 16, "A") from foo;
select lpad(concat(_binary "пирожки", bA), 16, "A") from foo;
select lpad(concat(_binary "пирожки", cA), 16, "A") from foo;
select lpad(concat("пирожки", _binary "A"), 16, "A") from foo;
select lpad(concat("пирожки", "A"), 16, "A") from foo;
select lpad(concat("пирожки", bA), 16, "A") from foo;
select lpad(concat("пирожки", cA), 16, "A") from foo;

select lpad(concat(bpiroshk, _binary "A"), 16, "A") from foo;
select lpad(concat(bpiroshk, "A"), 16, "A") from foo;
select lpad(concat(bpiroshk, bA), 16, "A") from foo;
select lpad(concat(bpiroshk, cA), 16, "A") from foo;
select lpad(concat(cpiroshk, _binary "A"), 16, "A") from foo;
select lpad(concat(cpiroshk, "A"), 16, "A") from foo;
select lpad(concat(cpiroshk, bA), 16, "A") from foo;
select lpad(concat(cpiroshk, cA), 16, "A") from foo;

Suggested fix:
We would very much like to see a simple rule implemented; for example, "If either operand is non-binary, the other operand is promoted to binary before the pad/concat operation is performed, and the result is binary. Otherwise the result is non-binary." There are only two deviations from this simple rule in the truth table above. Let's fix them!

There may also be bad behaviors when one or both operands are the empty string, since CONCAT has a special optimization to skip empty strings. The types of these arguments should still be taken into account when computing the return type.
[21 Jul 2014 18:10] Sveta Smirnova
Thank you for the report.

In the concat part you have few rows similar to select lpad(concat("пирожки", cA), 16, "A") from foo; I mean all have "A", but not bA or _binary "A" like in simple LPAD example. And result from some of these rows do not correspond to result of simple LPAD. Is this a typo? If not please explain why results should be same?
[21 Jul 2014 21:51] Arthur O'Dwyer
@Sveta: As described in the #comment above, I'm using lpad(X, 16, "A") as a proxy for collation(X). Originally this was because I didn't know the collation(X) builtin existed. Does this help clear up your confusion?

mysql> select 1 as 'LHS constant',1 as 'LHS binary',1 as 'RHS constant',1 as 'RHS binary', lpad(_binary "пирожки", 15, _binary "A") as 'lpad', collation(lpad(_binary "пирожки", 15, _binary "A")) as 'collation.....', coercibility(lpad(_binary "пирожки", 15, _binary "A")) as 'coerc' from foo union all
    -> select 1,1,1,0, lpad(_binary "пирожки", 15, "A"), collation(lpad(_binary "пирожки", 15, "A")), coercibility(lpad(_binary "пирожки", 15, "A")) from foo union all
    -> select 1,1,0,1, lpad(_binary "пирожки", 15, bA), collation(lpad(_binary "пирожки", 15, bA)), coercibility(lpad(_binary "пирожки", 15, bA)) from foo union all
    -> select 1,1,0,0, lpad(_binary "пирожки", 15, cA), collation(lpad(_binary "пирожки", 15, cA)), coercibility(lpad(_binary "пирожки", 15, cA)) from foo union all
    -> select 1,0,1,1, lpad("пирожки", 15, _binary "A"), collation(lpad("пирожки", 15, _binary "A")), coercibility(lpad("пирожки", 15, _binary "A")) from foo union all
    -> select 1,0,1,0, lpad("пирожки", 15, "A"), collation(lpad("пирожки", 15, "A")), coercibility(lpad("пирожки", 15, "A")) from foo union all
    -> select 1,0,0,1, lpad("пирожки", 15, bA), collation(lpad("пирожки", 15, bA)), coercibility(lpad("пирожки", 15, bA)) from foo union all
    -> select 1,0,0,0, lpad("пирожки", 15, cA), collation(lpad("пирожки", 15, cA)), coercibility(lpad("пирожки", 15, cA)) from foo;
+--------------+------------+--------------+------------+------------------------+-----------------+-------+
| LHS constant | LHS binary | RHS constant | RHS binary | lpad                   | collation.....  | coerc |
+--------------+------------+--------------+------------+------------------------+-----------------+-------+
|            1 |          1 |            1 |          1 | Aпирожки               | binary          |     4 |
|            1 |          1 |            1 |          0 | Aпирожки               | binary          |     4 |
|            1 |          1 |            0 |          1 | Aпирожки               | binary          |     2 |
|            1 |          1 |            0 |          0 | Aпирожки               | utf8_general_ci |     2 |
|            1 |          0 |            1 |          1 | Aпирожки               | binary          |     4 |
|            1 |          0 |            1 |          0 | AAAAAAAAпирожки        | utf8_general_ci |     4 |
|            1 |          0 |            0 |          1 | Aпирожки               | binary          |     2 |
|            1 |          0 |            0 |          0 | AAAAAAAAпирожки        | utf8_general_ci |     2 |
+--------------+------------+--------------+------------+------------------------+-----------------+-------+
8 rows in set (0.00 sec)

mysql> select 0,1,1,1, lpad(bpiroshk, 15, _binary "A") as 'lpad', collation(lpad(bpiroshk, 15, _binary "A")) as 'collation.....', coercibility(lpad(bpiroshk, 15, _binary "A")) as 'coerc' from foo union all
    -> select 0,1,1,0, lpad(bpiroshk, 15, "A"), collation(lpad(bpiroshk, 15, "A")), coercibility(lpad(bpiroshk, 15, "A")) from foo union all
    -> select 0,1,0,1, lpad(bpiroshk, 15, bA), collation(lpad(bpiroshk, 15, bA)), coercibility(lpad(bpiroshk, 15, bA)) from foo union all
    -> select 0,1,0,0, lpad(bpiroshk, 15, cA), collation(lpad(bpiroshk, 15, cA)), coercibility(lpad(bpiroshk, 15, cA)) from foo union all
    -> select 0,0,1,1, lpad(cpiroshk, 15, _binary "A"), collation(lpad(cpiroshk, 15, _binary "A")), coercibility(lpad(cpiroshk, 15, _binary "A")) from foo union all
    -> select 0,0,1,0, lpad(cpiroshk, 15, "A"), collation(lpad(cpiroshk, 15, "A")), coercibility(lpad(cpiroshk, 15, "A")) from foo union all
    -> select 0,0,0,1, lpad(cpiroshk, 15, bA), collation(lpad(cpiroshk, 15, bA)), coercibility(lpad(cpiroshk, 15, bA)) from foo union all
    -> select 0,0,0,0, lpad(cpiroshk, 15, cA), collation(lpad(cpiroshk, 15, cA)), coercibility(lpad(cpiroshk, 15, cA)) from foo;
+---+---+---+---+------------------------+-----------------+-------+
| 0 | 1 | 1 | 1 | lpad                   | collation.....  | coerc |
+---+---+---+---+------------------------+-----------------+-------+
| 0 | 1 | 1 | 1 | Aпирожки               | binary          |     2 |
| 0 | 1 | 1 | 0 | Aпирожки               | binary          |     2 |
| 0 | 1 | 0 | 1 | Aпирожки               | binary          |     2 |
| 0 | 1 | 0 | 0 | Aпирожки               | binary          |     2 |
| 0 | 0 | 1 | 1 | AAAAAAAAпирожки        | utf8_general_ci |     2 |
| 0 | 0 | 1 | 0 | AAAAAAAAпирожки        | utf8_general_ci |     2 |
| 0 | 0 | 0 | 1 | Aпирожки               | binary          |     2 |
| 0 | 0 | 0 | 0 | AAAAAAAAпирожки        | utf8_general_ci |     2 |
+---+---+---+---+------------------------+-----------------+-------+
8 rows in set (0.00 sec)
[21 Jul 2014 21:51] Arthur O'Dwyer
(continued from above)

mysql> select 1,1,1,1, lpad(concat(_binary "пирожки", _binary "A"), 16, "A") as 'lpad concat', collation(lpad(concat(_binary "пирожки", _binary "A"), 16, "A")) as 'collation.....', coercibility(lpad(concat(_binary "пирожки", _binary "A"), 16, "A")) as 'coerc' from foo union all
    -> select 1,1,1,0, lpad(concat(_binary "пирожки", "A"), 16, "A"), collation(lpad(concat(_binary "пирожки", "A"), 16, "A")), coercibility(lpad(concat(_binary "пирожки", "A"), 16, "A")) from foo union all
    -> select 1,1,0,1, lpad(concat(_binary "пирожки", bA), 16, "A"), collation(lpad(concat(_binary "пирожки", bA), 16, "A")), coercibility(lpad(concat(_binary "пирожки", bA), 16, "A")) from foo union all
    -> select 1,1,0,0, lpad(concat(_binary "пирожки", cA), 16, "A"), collation(lpad(concat(_binary "пирожки", cA), 16, "A")), coercibility(lpad(concat(_binary "пирожки", cA), 16, "A")) from foo union all
    -> select 1,0,1,1, lpad(concat("пирожки", _binary "A"), 16, "A"), collation(lpad(concat("пирожки", _binary "A"), 16, "A")), coercibility(lpad(concat("пирожки", _binary "A"), 16, "A")) from foo union all
    -> select 1,0,1,0, lpad(concat("пирожки", "A"), 16, "A"), collation(lpad(concat("пирожки", "A"), 16, "A")), coercibility(lpad(concat("пирожки", "A"), 16, "A")) from foo union all    -> select 1,0,0,1, lpad(concat("пирожки", bA), 16, "A"), collation(lpad(concat("пирожки", bA), 16, "A")), coercibility(lpad(concat("пирожки", bA), 16, "A")) from foo union all
    -> select 1,0,0,0, lpad(concat("пирожки", cA), 16, "A"), collation(lpad(concat("пирожки", cA), 16, "A")), coercibility(lpad(concat("пирожки", cA), 16, "A")) from foo;
+---+---+---+---+-------------------------+-----------------+-------+
| 1 | 1 | 1 | 1 | lpad concat             | collation.....  | coerc |
+---+---+---+---+-------------------------+-----------------+-------+
| 1 | 1 | 1 | 1 | AпирожкиA               | binary          |     4 |
| 1 | 1 | 1 | 0 | AпирожкиA               | binary          |     4 |
| 1 | 1 | 0 | 1 | AпирожкиA               | binary          |     2 |
| 1 | 1 | 0 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |
| 1 | 0 | 1 | 1 | AпирожкиA               | binary          |     4 |
| 1 | 0 | 1 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     4 |
| 1 | 0 | 0 | 1 | AпирожкиA               | binary          |     2 |
| 1 | 0 | 0 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |
+---+---+---+---+-------------------------+-----------------+-------+
8 rows in set (0.00 sec)

mysql> select 0,1,1,1, lpad(concat(bpiroshk, _binary "A"), 16, "A") as 'lpad concat', collation(lpad(concat(bpiroshk, _binary "A"), 16, "A")) as 'collation.....', coercibility(lpad(concat(bpiroshk, _binary "A"), 16, "A")) as 'coerc' from foo union all
    -> select 0,1,1,0, lpad(concat(bpiroshk, "A"), 16, "A"), collation(lpad(concat(bpiroshk, "A"), 16, "A")), coercibility(lpad(concat(bpiroshk, "A"), 16, "A")) from foo union all
    -> select 0,1,0,1, lpad(concat(bpiroshk, bA), 16, "A"), collation(lpad(concat(bpiroshk, bA), 16, "A")), coercibility(lpad(concat(bpiroshk, bA), 16, "A")) from foo union all
    -> select 0,1,0,0, lpad(concat(bpiroshk, cA), 16, "A"), collation(lpad(concat(bpiroshk, cA), 16, "A")), coercibility(lpad(concat(bpiroshk, cA), 16, "A")) from foo union all
    -> select 0,0,1,1, lpad(concat(cpiroshk, _binary "A"), 16, "A"), collation(lpad(concat(cpiroshk, _binary "A"), 16, "A")), coercibility(lpad(concat(cpiroshk, _binary "A"), 16, "A")) from foo union all
    -> select 0,0,1,0, lpad(concat(cpiroshk, "A"), 16, "A"), collation(lpad(concat(cpiroshk, "A"), 16, "A")), coercibility(lpad(concat(cpiroshk, "A"), 16, "A")) from foo union all
    -> select 0,0,0,1, lpad(concat(cpiroshk, bA), 16, "A"), collation(lpad(concat(cpiroshk, bA), 16, "A")), coercibility(lpad(concat(cpiroshk, bA), 16, "A")) from foo union all
    -> select 0,0,0,0, lpad(concat(cpiroshk, cA), 16, "A"), collation(lpad(concat(cpiroshk, cA), 16, "A")), coercibility(lpad(concat(cpiroshk, cA), 16, "A")) from foo;
+---+---+---+---+-------------------------+-----------------+-------+
| 0 | 1 | 1 | 1 | lpad concat             | collation.....  | coerc |
+---+---+---+---+-------------------------+-----------------+-------+
| 0 | 1 | 1 | 1 | AпирожкиA               | binary          |     2 |
| 0 | 1 | 1 | 0 | AпирожкиA               | binary          |     2 |
| 0 | 1 | 0 | 1 | AпирожкиA               | binary          |     2 |
| 0 | 1 | 0 | 0 | AпирожкиA               | binary          |     2 |
| 0 | 0 | 1 | 1 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |
| 0 | 0 | 1 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |
| 0 | 0 | 0 | 1 | AпирожкиA               | binary          |     2 |
| 0 | 0 | 0 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |
+---+---+---+---+-------------------------+-----------------+-------+
8 rows in set (0.00 sec)
[22 Jul 2014 16:45] Sveta Smirnova
Thank you for the feedback.

> @Sveta: As described in the #comment above, I'm using lpad(X, 16, "A") as a proxy for collation(X). Originally this was because I didn't know the collation(X) builtin existed. Does this help clear up your confusion?

I still don't understand why you compare different things and expect same result. Lets examine row #4.

select lpad(_binary "пирожки", 15, cA) from foo;

It returns:

1 |          1 |            0 |          0 | Aпирожки               | utf8_general_ci |     2 |

Corresponding row for CONCAT is:

select lpad(concat(_binary "пирожки", cA), 16, "A") from foo;

And it returns:

| 1 | 1 | 0 | 0 | AAAAAAAAпирожкиA        | utf8_general_ci |     2 |

Issue here is that in the first case you call LPAD on following VAR_STRING with binary collation, number and VAR_STRING with collation utf8_general_ci:

mysql> select _binary "пирожки", 15, cA from foo;
Field   1:  ``
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  binary (63)
Length:     14
Max_length: 14
Decimals:   31
Flags:      NOT_NULL BINARY 

Field   2:  `15`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     2
Max_length: 2
Decimals:   0
Flags:      NOT_NULL BINARY NUM 

Field   3:  `cA`
Catalog:    `def`
Database:   `test`
Table:      `foo`
Org_table:  `foo`
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     15
Max_length: 1
Decimals:   0
Flags:      

+----------------+----+------+
|                | 15 | cA   |
+----------------+----+------+
| пирожки        | 15 | A    |
+----------------+----+------+
1 row in set (0.01 sec)

In the case if you use CONCAT LPAD receives result of the CONCAT and it is called on VAR_STRING with collation utf8_general_ci, number and VAR_STRING with collation utf8_general_ci:

mysql> select concat(_binary "пирожки", 15, cA), 16, "A" from foo;
Field   1:  `concat(_binary "пирожки", 15, cA)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     63
Max_length: 17
Decimals:   31
Flags:      

Field   2:  `16`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     2
Max_length: 2
Decimals:   0
Flags:      NOT_NULL BINARY NUM 

Field   3:  `A`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     3
Max_length: 1
Decimals:   31
Flags:      NOT_NULL 

+------------------------------------------+----+---+
| concat(_binary "пирожки", 15, cA)        | 16 | A |
+------------------------------------------+----+---+
| пирожки15A                               | 16 | A |
+------------------------------------------+----+---+
1 row in set (0.01 sec)

So I don't get why do you expect different results here.
[22 Jul 2014 18:01] Arthur O'Dwyer
@Sveta: Okay, let's forget about CONCAT for the time being, because it's confusing. Let's stick to *just* the LPAD function.

mysql> select lpad(_binary "пирожки", 15, cA), collation(lpad(_binary "пирожки", 15, cA)) from foo;
+----------------------------------------+---------------------------------------------------+
| lpad(_binary "пирожки", 15, cA)        | collation(lpad(_binary "пирожки", 15, cA))        |
+----------------------------------------+---------------------------------------------------+
| Aпирожки                               | utf8_general_ci                                   |
+----------------------------------------+---------------------------------------------------+
1 row in set (0.00 sec)

mysql> select lpad(_binary "пирожки", 15, 'A'), collation(lpad(_binary "пирожки", 15, 'A')) from foo;
+-----------------------------------------+----------------------------------------------------+
| lpad(_binary "пирожки", 15, 'A')        | collation(lpad(_binary "пирожки", 15, 'A'))        |
+-----------------------------------------+----------------------------------------------------+
| Aпирожки                                | binary                                             |
+-----------------------------------------+----------------------------------------------------+

Here 'A' and cA have the same type and collation, and so the customer naturally expects that the result of the LPAD will be the same in both cases. And it is, **in value.** We can see here that the LPAD was performed UTF8-wise, rather than byte-wise. (Do you understand what I mean by those terms? I can back up and explain those if you want.)

**However,** as you can see, MySQL has secretly decided that one of those UTF8-wise LPAD results ought to get the binary (bytewise) collation from now on, whereas the other one remains UTF8. This is unexpected behavior, impossible to explain to the customer.

We can return to CONCAT later, but for now let's stick to only LPAD.
[22 Jul 2014 19:03] Sveta Smirnova
Arthur,

I agree that behavior which you provide in last comment is buggy, but before I mark this report as "Verified" lets finish with CONCAT.
[22 Jul 2014 19:18] Arthur O'Dwyer
Here's one of the self-inconsistent cases from CONCAT. However, the main issue here is that LPAD is inconsistent with CONCAT. If you're planning to accept that as a bug, you should probably just accept it now, and we can always come back to CONCAT later.

mysql> select collation(concat(cpiroshk, _binary "A")) from foo;
+------------------------------------------+
| collation(concat(cpiroshk, _binary "A")) |
+------------------------------------------+
| utf8_general_ci                          |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> select collation(concat(cpiroshk, bA)) from foo;
+---------------------------------+
| collation(concat(cpiroshk, bA)) |
+---------------------------------+
| binary                          |
+---------------------------------+
[22 Jul 2014 19:40] Sveta Smirnova
Thank you for the feedback.

Verified as described.