MySQL Bugs: #85755: JSON containing null value is extracted as a string "null"

Bug #85755	JSON containing null value is extracted as a string "null"
Submitted:	1 Apr 2017 0:53	Modified:	22 Jan 2019 22:26
Reporter:	Dave Pullin (Basic Quality Contributor)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: JSON	Severity:	S2 (Serious)
Version:	5.7.17	OS:	CentOS (centos 7)
Assigned to:		CPU Architecture:	Any

Description:
Wrapping a null value in a JSON, and then extracting it yields the 4 charcaters string "null", not a null value.

select json_extract(json_object( "a",null),'$.a');
+--------------------------------------------+
| json_extract(json_object( "a",null),'$.a') |
+--------------------------------------------+
| null                                       |
+--------------------------------------------+
1 row in set (0.00 sec)

*Looks* right, but it is not the null value. 

select length(json_extract(json_object( "a",NULL),'$.a'));
+----------------------------------------------------+
| length(json_extract(json_object( "a",NULL),'$.a')) |
+----------------------------------------------------+
|                                                  4 |
+----------------------------------------------------+
1 row in set (0.00 sec)

select ifnull(json_extract(json_object( "a",NULL),'$.a'),'it is null');
+-----------------------------------------------------------------+
| ifnull(json_extract(json_object( "a",NULL),'$.a'),'it is null') |
+-----------------------------------------------------------------+
| null                                                            |
+-----------------------------------------------------------------+
1 row in set (0.00 sec)

How to repeat:
select json_extract(json_object( "a",null),'$.a');

Suggested fix:

preserve the null value.

Hello Dave Pullin,

Thank you for the report.

Thanks,
Umesh

Posted by developer:
 
Hi Dave,

I think what you are seeing is the intended behaviour.

JSON_OBJECT('a', NULL) evaluates to the JSON document { "a" : null }. Here, JSON_OBJECT has converted the SQL NULL value to the JSON null literal, which are different in SQL. When you call JSON_EXTRACT with path $.a on { "a" : null }, it extracts the JSON null literal from that document. If JSON_EXTRACT had not found anything on the specified path, it would have returned SQL NULL instead. So the difference between SQL NULL and the JSON null literal allows you do distinguish between the case where JSON_EXTRACT found a (JSON) null value and the case where it didn't find anything.

I didn't find any mentioning of how JSON_OBJECT handles NULL values in the reference manual, only in the design documents for WL#7909 which added the JSON_OBJECT function. Unless I have overlooked something in the manual, maybe what we should do is clarify how JSON_OBJECT handles NULL arguments in the manual.

I fear you may be correct - it is "working as designed".

I submit that this is a design error. Failure to round-trip values is encouraging obscure and difficult to handle errors.

JSON_CONTAINS should be used to determine if the object contains the path.
(while there appears to be no good way to determine if JSON_extract returns json null.)

Hi Dave,

Just to clarify the current behaviour of JSON_EXTRACT and null values:

1. When the input value is SQL NULL, the return value is SQL NULL:

SELECT JSON_EXTRACT(NULL, '$'), JSON_EXTRACT(NULL, '$') IS NULL
-> (NULL, 1)

2. When the path does not exist in the input value, the return value is SQL NULL:

SELECT JSON_EXTRACT('{"a":1}', '$.b'), JSON_EXTRACT('{"a":1}', '$.b') IS NULL
-> (NULL, 1)

3. When the path exists, and it contains the JSON null literal, the return value is the JSON null literal, which is not SQL NULL:

SELECT JSON_EXTRACT('null', '$'), JSON_EXTRACT('null', '$') IS NULL
-> (null, 0)

SELECT JSON_EXTRACT('{"a":null}', '$.a'), JSON_EXTRACT('{"a":null}', '$.a') IS NULL
-> (null, 0)

Which of these do you want to change, and what do you want them to return?

I understand your dilemma. Perhaps json_extract is not the right place to fix the problem. The core problem is that there ought to way that any value can be put into a JSON and retrieved as the original value: ie. some function Y such that 

Y(json_extract(json_object("a",x),'$.a') <=> x

is true for all x.

I figured that json_unquote() was the function Y(), or more specifically, that
 json_column->>'$.a' would do the job.

Unfortunately I 'simplified out' the JSON_UNQUOTE in my bug report.

My expectation was that
  
 insert (json_column) values(x) 
then
 json_column->>'$.a' <=> x
would be true for all x. But it is not true when x is null.

In which case the fix may be to make json_unquote(json_null) return SQL null

Similarly json_unquote(json_literal_boolean) should return SQL boolean

The underlying problem may be that MySQL isn't treating the JSON literals as entities distinct from their written forms, but as character strings equal to the written forms.

note that
select length(false),length(true),length(null)

,length(json_extract(json_object('a',false),'$.a'))
,length(json_extract(json_object('a',true),'$.a'))
,length(json_extract(json_object('a',null),'$.a'))

,length(json_unquote(json_extract(json_object('a',false),'$.a')))
,length(json_unquote(json_extract(json_object('a',true),'$.a')))
,length(json_unquote(json_extract(json_object('a',null),'$.a')))
;

-->  ( 1,1,null, 5,4,4, 5,4,4)

Hi,

while it would at this point introduce a backward incompatibility, I second Dave Pullin's proposal that json_unquote should return SQL values whenever possible.

It's quite unexpected, baffling and error-prone for:

select if(json_unquote(json_extract(cast('{"id": null}' as JSON), '$.id')) is null, "is null", "is not null")

to produce "is not null" (same goes for boolean values), while equivalent statements for strings and numbers behave as expected:

select if(json_unquote(json_extract(cast('{"id": "blabla"}' as JSON), '$.id')) = "blabla", "is blabla", "is not blabla")
select if(json_unquote(json_extract(cast('{"id": 10}' as JSON), '$.id')) = 10, "is 10", "is not 10")

The ability to discriminate between "value is null" and "value is undefined", while still possible using just json_extract, is indeed better served by json_contains (or, possibly, by introducing an UNDEFINED mysql value, although that might have far reaching implications).

Bug #93167 marked as duplicate of this one

To add some examples here. I have had this use case (filter things or retrieve things based on a json property that can sometimes be null and sometimes be unset) and these are the best ways I found to work around it:

To filter:

    SELECT * FROM xx WHERE JSON_TYPE(JSON_EXTRACT(attributes, "$.property")) != "NULL"

To query:

    SELECT IF(JSON_TYPE(JSON_EXTRACT(attributes, "$.property"))="NULL", NULL, attributes->>"$.property") AS property FROM xx

As you can see, this is rather long for what I initially expected I could write as (combining both filter and reading out the value here in one query, which is almost shorter than either of the above):

    SELECT attributes->>"$.property" FROM xx WHERE attributes->>"$.property" IS NOT NULL

I understand that changing the semantics of ->> at this point might be harmful. Adding a new way to express this shorter though would be very welcome. The current workaround is verbose, hard to read and error-prone.

Here is an example, where JSON_DECODE would work as an alternative to JSON_EXTRACT, but which would return a mysql NULL if it reads a json null.

    SELECT JSON_DECODE(attributes, "$.property") FROM xx WHERE JSON_DECODE(attributes, "$.property") IS NOT NULL

Thanks for considering this as a possible improvement.

Json_deCode() seems like a good idea.

Perhaps paired with json_code() such that

Json_decode(json_code(x))<=>x

For any x database column value or SQL expression.

I agree with the suggestion of those that have commented here (either adding a JSON_DECODE or changing the meaning of JSON_EXTRACT/->> operator).  

If backwards compatibility is the concern, the MySQL way seems like to add a new server setting along the lines 'JSON_EXTRACT_USES_SQL_TYPES' or whatever naming makes sense.

Then, the 'simple' queries of column_name->>'$.value' will work the way that pretty much every developer who begins experimenting with mysql JSON data types expects, while allowing a route for production environments that use the current system to maintain that reality.

If something like JSON_DECODE is added, I would personally prefer it had a shorthand operator, as I agree with the OP that as a computer scientist "make the common case fast" is relevant here.  If I want to test presence of a node, there is JSON_CONTAINS, but most of the time (in a SELECT, SET, ORDER BY, WHERE, etc) I do not want to wrap things in a 'conversion' function that turns 'null' or 'false' into NULL and FALSE.

So my personal suggestion would be to add a JSON_DECODE operator, and make ->> operator use JSON_DECODE instead of JSON_EXTRACT.  Add a server behavior flag that allows making ->> continue to use JSON_EXTRACT for those that need it until the next major release, when the default for new installs can 'switch'.  Or, replace JSON_EXTRACT with the proposed JSON_DECODE and base behavior off the same setting.

I get that to the MySQL devs it is working as designed, but they must understand it is not working as expected.  If JSON null/boolean and MySQL NULL/BOOLEAN (lets not even talk about the absence of a true BOOLEAN) are truly incompatible, then the query:
SELECT JSON_OBJECT('a', NULL);
Should not result in '{"a": null}'

But the fact that native MySQL types TRUE, FALSE, NULL can be encoded but not decoded makes this a unidirectional operation, which should be a naughty word in database systems.  

Currently, bidirectionality of boolean values requires the query:
SELECT CASE JSON_EXTRACT(JSON_OBJECT('a', TRUE), '$.a') WHEN 'true' THEN TRUE WHEN 'false' THEN FALSE ELSE NULL END;

It should require
SELECT JSON_EXTRACT(JSON_OBJECT('a', TRUE), '$.a');
or
SELECT JSON_OBJECT('a', TRUE)->>'$.a';
(Which surprisingly, doesn't seem to pass the query parser, I'm not sure why)

The less common case, where we want to distinguish between a present JSON literal null and the absence of the value entirely (generally not an important distinction in my career) can and will still be handled by:
SELECT JSON_CONTAINS(JSON_OBJECT('a', NULL), 'null', '$.a');

It seems like if a new function/operator were to be added (and not necessarily swapped in to replace JSON_EXTRACT/->>) this would be a low impact addition that would make the JSON features of MySQL capable of being used in WHERE/HAVING/ORDER tests in the expected way without having to resort the the undesirable wrapping mentioned above and elsewhere as workarounds.

Was this ever addressed/resolved? I just ran into this issue and per the discussion in this report, I agree; it's a problem with the current JSON encapsulation. This is not the expected behavior and working around it requires a ton of unnecessary checking/scrubbing.

The solution is a new function JSON_VALUE (defined in the SQL standard);
Something like
SELECT JSON_VALUE(JSON_OBJECT('a', false), '$.a' RETURNING <SQL_type_here>);
Example:
SELECT JSON_VALUE(JSON_OBJECT('a', false), '$.a' returning signed);
should return integer 0,
SELECT JSON_VALUE(JSON_OBJECT('a', true), '$.a' returning signed);
should return integer 1,
SELECT JSON_VALUE(JSON_OBJECT('a', null), '$.a' returning signed);
should return NULL (SQL's NULL).
And all these 3 return values fit well with fed to IS FALSE/TRUE/NULL.

MySQL 8.0.20 won't have this function. But I have seen what will be in 8.0.21 if everything goes right. All I can say is: don't worry, we have not neglected your issue.

Processing JSON null literals is inconsistent... NULL is a perfectly valid value, but in MySql JSON-functions it's (sometimes) used for transforations eg., when calling JSON_MERGE_PATCH or JSON_MERGE_PRESERVE. A literal null value can be stored with JSON_OBJECT, but null values are lost (sometimes) if it's included in a patch or preserve operation.

It seems like the consensus on this is that JSON_VALUE is a valid replacement for JSON_EXTRACT when null values being treated as expected is needed. I don't believe this is true. There are cases where JSON_EXTRACT can be used where JSON_VALUE cannot. For example: I have a loop where I need to be able to iterate over values in a json object. This means that I need to be able to use a dynamic value for the path.

JSON_EXTRACT(new_vals, concat('$.', @selector))
JSON_VALUE(new_vals, concat('$.', @selector))

JSON_EXTRACT works fine. JSON_VALUE gives a syntax error since, as the doc says, JSON_VALUE requires the path to be a string literal. What is the reason for making JSON_VALUE require a string literal for the path? 

I really think that this problem needs a more valid and complete solution.