Bug #102843 CSV Engine (tina) does not handle json type
Submitted: 7 Mar 2021 4:02 Modified: 2 Jun 2021 18:38
Reporter: Kaiwang CHen (OCA) Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: CSV Severity:S3 (Non-critical)
Version:8.0.23 OS:Any
Assigned to: CPU Architecture:Any

[7 Mar 2021 4:02] Kaiwang CHen
CSV Engine (tina) allows creating json-type field and writing a row, however, it does not read it.

How to repeat:
mysql> create table t1 (c json not null) ENGINE CSV;
Query OK, 0 rows affected (0.08 sec)

mysql> insert t1 values ('{"a":1}');
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1;
ERROR 3144 (22032): Cannot create a JSON value from a string with CHARACTER SET 'binary'.
[7 Mar 2021 4:32] Kaiwang CHen
The direct cause is ha_tina::find_current_row() uses attribute buffer as binary while Field_json checks charset with ensure_utf8mb4(). The field is always persisted as my_charset_bin (encode_quote), although Field_json::str_val() always produces my_charset_utf8mb4_bin.


      if ((*field)->store(buffer.ptr(), buffer.length(), buffer.charset(),,
                          is_enum ? CHECK_FIELD_IGNORE : CHECK_FIELD_WARN)) {
        if (!is_enum) goto err;


  String attribute(attribute_buffer, sizeof(attribute_buffer), &my_charset_bin);
  for (Field **field = table->field; *field; field++) {
    (*field)->val_str(&attribute, &attribute);
[8 Mar 2021 5:36] MySQL Verification Team
Hello Kaiwang,

Thank you for the report and feedback.

[2 Jun 2021 18:38] Jon Stephens
Documented fix as follows in the MySQL 8.0.26 changelog:

    Reading JSON values from tables that used the CSV storage engine
    raised an error such as -Cannot create a JSON value from a
    string with CHARACTER SET 'binary'-. This happened because the
    CSV engine uses my_charset_bin as the character set for the
    record buffer but creation of JSON values includes an explicit
    check for my_charset_bin, and raises an error if this character
    set is given.

    We handle this issue by passing the actual character set of the
    column instead of the character set of the buffer holding the
    data, which is always binary.