Bug #112854 The relationship between length and string type is not consistent in response
Submitted: 27 Oct 2023 9:36 Modified: 30 Oct 2023 6:23
Reporter: Yang Keao Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:8.0 OS:Linux
Assigned to: CPU Architecture:Any

[27 Oct 2023 9:36] Yang Keao
Description:
The relationship between `VAR_STRING`, `MEDIUM_BLOB` and `LONG_BLOB` is not consistent.

It's expected that the length of `VAR_STRING` column is always smaller than 65536, the length of `MEDIUM_BLOB` is always smaller than 16777216, and in other situation the type will be `LONG_BLOB`. However, it's not always the case:

```
select IFNULL(JSON_EXTRACT('"abc"', '$'), '0');
```

This SQL will give you a column with `VAR_STRING` type and 4294967292 length:

```
mysql> select IFNULL(JSON_EXTRACT('"abc"', '$'), '0');
Field   1:  `IFNULL(JSON_EXTRACT('"abc"', '$'), '0')`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8mb4_0900_ai_ci (255)
Length:     4294967292
Max_length: 5
Decimals:   31
Flags:      NOT_NULL BINARY 

+-----------------------------------------+
| IFNULL(JSON_EXTRACT('"abc"', '$'), '0') |
+-----------------------------------------+
| "abc"                                   |
+-----------------------------------------+
1 row in set (0.00 sec)
```

For other function, it behaves well:

```
mysql> select json_unquote(json_extract('"abc"', '$'));
Field   1:  `json_unquote(json_extract('"abc"', '$'))`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONG_BLOB
Collation:  utf8mb4_0900_ai_ci (255)
Length:     4294967295
Max_length: 3
Decimals:   31
Flags:      BINARY 

+------------------------------------------+
| json_unquote(json_extract('"abc"', '$')) |
+------------------------------------------+
| abc                                      |
+------------------------------------------+
1 row in set (0.00 sec)

```

I'm not sure whether it's a bug or not. Is it possible to conclude several rules so that I can understand how the `Type` and `Length` are calculated? (or is these "rules" documented on the website? I didn't find them)

How to repeat:
Run the following two statements, their `Type`s are different:

```
select IFNULL(JSON_EXTRACT('"abc"', '$'), '0');
select json_unquote(json_extract('"abc"', '$'));
```

Suggested fix:
I've tried to read some source codes of MySQL. The following function fixes the relationship between string types and length:

```cpp
  inline void set_data_type_string(uint32 max_l) {
    max_length = max_l * collation.collation->mbmaxlen;
    decimals = DECIMAL_NOT_SPECIFIED;
    if (max_length <= Field::MAX_VARCHAR_WIDTH)
      set_data_type(MYSQL_TYPE_VARCHAR);
    else if (max_length <= Field::MAX_MEDIUM_BLOB_WIDTH)
      set_data_type(MYSQL_TYPE_MEDIUM_BLOB);
    else
      set_data_type(MYSQL_TYPE_LONG_BLOB);
  }
```

However, it seems that it doesn't take effect for each function. I'm not sure whether it's intentionally or by mistake.
[27 Oct 2023 10:43] MySQL Verification Team
Hi Mr. Keao,

Thank you for your bug report.

You are reporting a bug in the lengths of various strings, but you are using only JSON type. That is a totally separate case.

To prove your point, declare a table with the three types that you are reporting and try to prove that our documentation is wrong.

Not a bug.
[30 Oct 2023 6:23] Yang Keao
> You are reporting a bug in the lengths of various strings, but you are using only JSON type. That is a totally separate case.

The direct string type works fine. My point is that both of the following expressions are **string**, right?

1. `IFNULL(JSON_EXTRACT('"abc"', '$'), '0')`
2. `json_unquote(json_extract('"abc"', '$'))`

However, they have different type in the return value: one is `VAR_STRING` and the other is `LONG_BLOB` (as shown in the original comment). I hope them to have consistent types.
[30 Oct 2023 12:21] MySQL Verification Team
Hi,

Actually, the result is correct.

Those two statements return the data types according to the expression. One resolves as the string of the variable length, while the other can not resolve as a string, hence it returns a BLOB.