Bug #98496 JSON columns allow empty string, allowing invalid JSON in binlog row images
Submitted: 5 Feb 2020 20:21 Modified: 6 Feb 2020 6:45
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, empty string, json, null, RBR

[5 Feb 2020 20:21] Jeremy Cole
Description:
## tl; dr ##

The JSON type allows an empty string (which is an invalid JSON document) to be inserted into JSON-typed columns by bypassing validation checks using INSERT IGNORE or by disabling strict mode. The empty string inserted is then coerced silently to JSON null on read. However, these invalid values are in fact stored in the underlying tables, and may then be observed at any later point in row-based binary log row images and potentially other contexts.

## Background ##

The JSON column type stores a JSON document, and allows functions to manipulate its contents as JSON. Per the JSON specification (https://www.json.org/json-en.html), a JSON document may contain an object (even an empty object as {}), an array (using [ and ]), or a single value (including null). However, per the specification, an empty string is *not* a valid JSON document. MySQL performs validation of the JSON document provided, and generally rejects invalid JSON documents using an error formatted as:

ERROR 3140 (22032): Invalid JSON text: <explanation>.

Separately, MySQL supports an INSERT IGNORE syntax which ignores many types of errors, including JSON validation errors. It also supports disabling of strict modes related to data validation. These features are a source of frequent consternation as they would repair a <type> NOT NULL (without a DEFAULT clause) to some form of a zero value in order to force insertion. The zero value for a string is the *empty string*, which as described above is an invalid JSON document, so should be rejected for a JSON column.

To complicate matters, the JSON column type is actually stored internally as TEXT, which is in fact a sub-class of the BLOB type. The BLOB type does not support defining a default value, so it is not possible to, for instance, define a column as JSON NOT NULL DEFAULT '{}' which would allow it to be populated with a known and valid default value (in this case an empty object {}) when omitted in an INSERT statement.

When combining these behaviors by creating a JSON NOT NULL column (with no DEFAULT clause, as it's disallowed), and inserting into it with a missing or disallowed NULL value, an empty string is in fact inserted and stored in the underlying table, despite being an invalid JSON document.

At some point in the development of the JSON column type, MySQL decided to silently "repair" invalid empty string JSON documents to JSON null values. This silent repair has been a frequent source of tricky bugs and misbehaviors related to the JSON column type.

## References

1. https://bugs.mysql.com/bug.php?id=87734
2. https://bugs.mysql.com/bug.php?id=79432
3. https://bugs.mysql.com/bug.php?id=98063
4. https://bugs.mysql.com/bug.php?id=86502
5. https://bugs.mysql.com/bug.php?id=93167
6. https://bugs.mysql.com/bug.php?id=85755
7. https://bugs.mysql.com/bug.php?id=78527
8. https://bugs.mysql.com/bug.php?id=77733
9. https://github.com/mysql/mysql-server/commit/b6959fe6d553baeaa52161038b97d69b3b0b269b
10. https://github.com/siddontang/go-mysql/issues/266 and https://github.com/siddontang/go-mysql/pull/267

How to repeat:
## Observed behavior

A table is created containing a JSON NOT NULL typed column:

mysql> create table t (id int not null, j json not null, primary key (id));
Query OK, 0 rows affected (0.04 sec)

Normally (and by default), a client connected with strict mode enabled will not be able to insert SQL NULL or an empty string into the column:

mysql> insert into t (id, j) values (1, NULL);
ERROR 1048 (23000): Column 'j' cannot be null

mysql> insert into t (id) values (1);
ERROR 1364 (HY000): Field 'j' doesn't have a default value

mysql> insert into t (id, j) values (1, '');
ERROR 3140 (22032): Invalid JSON text: "The document is empty." at position 0 in value for column 't.j'.

Despite the apparent limitation of JSON columns to only store valid JSON documents, which would disallow storing of an empty string, a client can insert an empty string into the column using INSERT IGNORE and an SQL NULL:

mysql> insert ignore into t (id, j) values (1, NULL);
Query OK, 1 row affected, 1 warning (0.01 sec)

It can similarly insert an invalid value by not providing a value at all for the column j:

mysql> set sql_mode='';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> insert into t (id) values (1);
Query OK, 1 row affected, 1 warning (0.02 sec)
```

Despite the warnings returned, the INSERT [IGNORE] commands actually succeed, and insert an empty string instead of NULL. On read, however, this empty string is coerced to a JSON null instead of the empty string (which, as mentioned before, would be an invalid JSON document):

mysql> select * from t;
+----+------+
| id | j    |
+----+------+
|  1 | null |
+----+------+
1 row in set (0.00 sec)

While the result shows "null" here (which is valid, if potentially wrongly coerced), the actual table in fact stored an empty string, and thus the row images written to the binary logs contain this invalid empty string value for the JSON-typed column j (which is @2 in the mysqldump output below) in all contexts which they appear.

For INSERT:

# at 5646
#200204 18:05:32 server id 1  end_log_pos 5690 CRC32 0xabd64784 	Write_rows: table id 114 flags: STMT_END_F
### INSERT INTO `test`.`t`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

For UPDATE:

# at 5904
#200204 18:08:00 server id 1  end_log_pos 5958 CRC32 0x8ce30381 	Update_rows: table id 114 flags: STMT_END_F
### UPDATE `test`.`t`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */
### SET
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

For DELETE:

# at 6172
#200204 18:08:31 server id 1  end_log_pos 6216 CRC32 0x54498c6d 	Delete_rows: table id 114 flags: STMT_END_F
### DELETE FROM `test`.`t`
### WHERE
###   @1=2 /* INT meta=0 nullable=0 is_null=0 */
###   @2='' /* JSON meta=4 nullable=0 is_null=0 */

These row images then cannot be re-inserted directly (or even have their values matched through SELECT queries) with any safe query formulation. This requires special-casing code in software which deals with binary logs, such as Debezium, Ghostferry, gh-ost, and others.

Suggested fix:
Values inserted in JSON-typed columns should be consistent between the underlying tables and the binary logs, and should be valid values if the column type uses validation.

This would likely mean coercing the empty string to "null" _before_ writing it to the table, so that table data and binary logs are consistent.
[6 Feb 2020 6:45] MySQL Verification Team
Hi Jeremy,

Thanks for the report. I appreciate the details.

Verified as stated.