Bug #98496 JSON columns allow empty string, allowing invalid JSON in binlog row images
Submitted: 5 Feb 2020 20:21 Modified: 11 Sep 2023 7:38
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, empty string, json, null, RBR

[5 Feb 2020 20:21] Jeremy Cole
Description:
## tl; dr ##

The JSON type allows an empty string (which is an invalid JSON document) to be inserted into JSON-typed columns by bypassing validation checks using INSERT IGNORE or by disabling strict mode. The empty string inserted is then coerced silently to JSON null on read. However, these invalid values are in fact stored in the underlying tables, and may then be observed at any later point in row-based binary log row images and potentially other contexts.

## Background ##

The JSON column type stores a JSON document, and allows functions to manipulate its contents as JSON. Per the JSON specification (https://www.json.org/json-en.html), a JSON document may contain an object (even an empty object as {}), an array (using [ and ]), or a single value (including null). However, per the specification, an empty string is *not* a valid JSON document. MySQL performs validation of the JSON document provided, and generally rejects invalid JSON documents using an error formatted as:

ERROR 3140 (22032): Invalid JSON text: <explanation>.

Separately, MySQL supports an INSERT IGNORE syntax which ignores many types of errors, including JSON validation errors. It also supports disabling of strict modes related to data validation. These features are a source of frequent consternation as they would repair a <type> NOT NULL (without a DEFAULT clause) to some form of a zero value in order to force insertion. The zero value for a string is the *empty string*, which as described above is an invalid JSON document, so should be rejected for a JSON column.

To complicate matters, the JSON column type is actually stored internally as TEXT, which is in fact a sub-class of the BLOB type. The BLOB type does not support defining a default value, so it is not possible to, for instance, define a column as JSON NOT NULL DEFAULT '{}' which would allow it to be populated with a known and valid default value (in this case an empty object {}) when omitted in an INSERT statement.

When combining these behaviors by creating a JSON NOT NULL column (with no DEFAULT clause, as it's disallowed), and inserting into it with a missing or disallowed NULL value, an empty string is in fact inserted and stored in the underlying table, despite being an invalid JSON document.

At some point in the development of the JSON column type, MySQL decided to silently "repair" invalid empty string JSON documents to JSON null values. This silent repair has been a frequent source of tricky bugs and misbehaviors related to the JSON column type.

## References

1. https://bugs.mysql.com/bug.php?id=87734
2. https://bugs.mysql.com/bug.php?id=79432
3. https://bugs.mysql.com/bug.php?id=98063
4. https://bugs.mysql.com/bug.php?id=86502
5. https://bugs.mysql.com/bug.php?id=93167
6. https://bugs.mysql.com/bug.php?id=85755
7. https://bugs.mysql.com/bug.php?id=78527
8. https://bugs.mysql.com/bug.php?id=77733
9. https://github.com/mysql/mysql-server/commit/b6959fe6d553baeaa52161038b97d69b3b0b269b
10. https://github.com/siddontang/go-mysql/issues/266 and https://github.com/siddontang/go-mysql/pull/267

How to repeat:
## Observed behavior

A table is created containing a JSON NOT NULL typed column:

mysql> create table t (id int not null, j json not null, primary key (id));
Query OK, 0 rows affected (0.04 sec)

Normally (and by default), a client connected with strict mode enabled will not be able to insert SQL NULL or an empty string into the column:

mysql> insert into t (id, j) values (1, NULL);
ERROR 1048 (23000): Column 'j' cannot be null

mysql> insert into t (id) values (1);
ERROR 1364 (HY000): Field 'j' doesn't have a default value

mysql> insert into t (id, j) values (1, '');
ERROR 3140 (22032): Invalid JSON text: "The document is empty." at position 0 in value for column 't.j'.

Despite the apparent limitation of JSON columns to only store valid JSON documents, which would disallow storing of an empty string, a client can insert an empty string into the column using INSERT IGNORE and an SQL NULL:

mysql> insert ignore into t (id, j) values (1, NULL);
Query OK, 1 row affected, 1 warning (0.01 sec)

It can similarly insert an invalid value by not providing a value at all for the column j:

mysql> set sql_mode='';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> insert into t (id) values (1);
Query OK, 1 row affected, 1 warning (0.02 sec)
```

Despite the warnings returned, the INSERT [IGNORE] commands actually succeed, and insert an empty string instead of NULL. On read, however, this empty string is coerced to a JSON null instead of the empty string (which, as mentioned before, would be an invalid JSON document):

mysql> select * from t;
+----+------+
| id | j    |
+----+------+
|  1 | null |
+----+------+
1 row in set (0.00 sec)

While the result shows "null" here (which is valid, if potentially wrongly coerced), the actual table in fact stored an empty string, and thus the row images written to the binary logs contain this invalid empty string value for the JSON-typed column j (which is @2 in the mysqldump output below) in all contexts which they appear.

For INSERT:

# at 5646
#200204 18:05:32 server id 1  end_log_pos 5690 CRC32 0xabd64784 	Write_rows: table id 114 flags: STMT_END_F
### INSERT INTO `test`.`t`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

For UPDATE:

# at 5904
#200204 18:08:00 server id 1  end_log_pos 5958 CRC32 0x8ce30381 	Update_rows: table id 114 flags: STMT_END_F
### UPDATE `test`.`t`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

For DELETE:

# at 6172
#200204 18:08:31 server id 1  end_log_pos 6216 CRC32 0x54498c6d 	Delete_rows: table id 114 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

These row images then cannot be re-inserted directly (or even have their values matched through SELECT queries) with any safe query formulation. This requires special-casing code in software which deals with binary logs, such as Debezium, Ghostferry, gh-ost, and others.

Suggested fix:
Values inserted in JSON-typed columns should be consistent between the underlying tables and the binary logs, and should be valid values if the column type uses validation.

This would likely mean coercing the empty string to "null" _before_ writing it to the table, so that table data and binary logs are consistent.
[6 Feb 2020 6:45] MySQL Verification Team
Hi Jeremy,

Thanks for the report. I appreciate the details.

Verified as stated.
[7 Sep 2023 14:10] Matt Lord
I cannot repeat the original issue, but we've run into a similar one:

mysql> show global variables like "%version%";
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| admin_tls_version        | TLSv1.2,TLSv1.3              |
| innodb_version           | 8.0.34                       |
| protocol_version         | 10                           |
| replica_type_conversions |                              |
| slave_type_conversions   |                              |
| tls_version              | TLSv1.2,TLSv1.3              |
| version                  | 8.0.34                       |
| version_comment          | MySQL Community Server - GPL |
| version_compile_machine  | arm64                        |
| version_compile_os       | macos13                      |
| version_compile_zlib     | 1.2.13                       |
+--------------------------+------------------------------+
11 rows in set (0.00 sec)

mysql> create table jstest (id int not null auto_increment primary key, json_data json not null);
Query OK, 0 rows affected (0.00 sec)

mysql> set session sql_mode='';
Query OK, 0 rows affected (0.00 sec)

mysql> insert into jstest (id, json_data) values (1, '');
ERROR 3140 (22032): Invalid JSON text: "The document is empty." at position 0 in value for column 'jstest.json_data'.
mysql> insert ignore into jstest (id, json_data) values (1, '');
ERROR 3140 (22032): Invalid JSON text: "The document is empty." at position 0 in value for column 'jstest.json_data'.

mysql> insert into jstest (id, json_data) values (1, NULL);
ERROR 1048 (23000): Column 'json_data' cannot be null
mysql> insert ignore into jstest (id, json_data) values (1, NULL);
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> show warnings;
+---------+------+-----------------------------------+
| Level   | Code | Message                           |
+---------+------+-----------------------------------+
| Warning | 1048 | Column 'json_data' cannot be null |
+---------+------+-----------------------------------+
1 row in set (0.00 sec)

mysql> select * from jstest;
+----+-----------+
| id | json_data |
+----+-----------+
|  1 | null      |
+----+-----------+
1 row in set (0.00 sec)

In this case the error is converted to a warning and it's inserting a literal string 'null' which is not a valid JSON document. One could argue that this is how old MySQL / non-strict mode could be expected to work (allowing invalid data) with INSERT IGNORE:
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#ignore-effect-on-execution

BUT, using an invalid implicit default for the column/type here seems like a bug to me. I would expect a valid "empty" value to be used which in the case of JSON would be '{}' rather than 'null'.

Would you prefer that I open a new bug report for this? Or do you feel it's NOT a bug for some reason?

Thanks!
[8 Sep 2023 9:20] Knut Anders Hatlen
Hi Matt,

'null' is not an invalid JSON value. The JSON specification allows three literals: true, false and null. Whether 'null' or '{}' is the best value to use for this case, can of course be argued, but using 'null' is not causing non-JSON data to be inserted into the table, as you seem to think.
[8 Sep 2023 13:47] Matt Lord
Hi Knut!

Thank you for the response. I can see that you are correct. From https://www.rfc-editor.org/rfc/rfc8259 :
3.  Values

   A JSON value MUST be an object, array, number, or string, or one of
   the following three literal names:

      false
      null
      true

   The literal names MUST be lowercase.  No other literal names are
   allowed.

And this is already supported in Vitess' vtgate:
mysql> create table jsontest (id int not null auto_increment primary key, json_data json not null);
Query OK, 0 rows affected (0.03 sec)

mysql> insert into jsontest (json_data) values ('null');
Query OK, 1 row affected (0.00 sec)

mysql> insert into jsontest (json_data) values ('true');
Query OK, 1 row affected (0.01 sec)

mysql> insert into jsontest (json_data) values ('false');
Query OK, 1 row affected (0.01 sec)

mysql> insert into jsontest (json_data) values ('nil');
ERROR 3140 (22032): target: commerce.0.primary: vttablet: rpc error: code = Unknown desc = Invalid JSON text: "Invalid value." at position 1 in value for column 'jsontest.json_data'. (errno 3140) (sqlstate 22032) (CallerID: userData1): Sql: "insert into jsontest(json_data) values (:vtg1 /* VARCHAR */)", BindVars: {vtg1: "type:VARCHAR value:\"nil\""}

Thanks again!

P.S. Could this bug be closed then? At least I didn't seem able to repeat it. Perhaps I was missing something though.
[11 Sep 2023 7:38] Knut Anders Hatlen
As far as I can tell, the problem with the binlog and the table disagreeing was fixed in bug#87734. I see 'null' reported by mysqlbinlog too now. So I assume it's OK to close it, yes. Done.

Thanks.