Bug #84523 JSON_EXTRACT using wildcards takes forever
Submitted: 16 Jan 2017 20:17 Modified: 8 Aug 2017 15:21
Reporter: Anthony Stump Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: JSON Severity:S2 (Serious)
Version:5.7.16, 5.7.17 OS:Ubuntu
Assigned to: CPU Architecture:Any
Tags: json

[16 Jan 2017 20:17] Anthony Stump
Description:
Searching for a key with wildcard characters takes 300x the amount of time finding the key with the direct JSON path does. Furthermore, You can not use direct JSON path expressions to find a key in a nested array without searching using wild cards.

How to repeat:
Store JSON in 2 array formats and compare speed using key searches.

I was storing in a [{"key":{"subkey":"val","subkey":"val"}}, {"key":{"subkey":"val","subkey":"val"}}]

I was having to find the data in MySQL using json_extract(field, '$**.key') - which was taking roughly 30 seconds to scan the data on 1 single row.

I took the [ ] master array out, now I have the data in a {"key":{"subkey":"val","subkey":"val"}}, {"key":{"subkey":"val","subkey":"val"}} array structure - I can access the data using '$.key' instead of '$**.key' - and it's instant.

Suggested fix:
Figure out why it takes so long when you search for a key with ** wildcard characters versus outright.
[16 Jan 2017 20:19] Anthony Stump
Note: I submitted this as a P2 bug because it renders extracting JSON data from MySQL pretty much useless if you have a large JSON field - and can actually cause Apache2 to become completely unresponsive.
[17 Jan 2017 4:21] MySQL Verification Team
Hello  Anthony Stump,

Thank you for the report.
Could you please provide exact test case(create table statement using SHOW CREATE TABLE <table_name>\G, insert query and SELECT query which is showing performance issue) to reproduce the issue at our end? You may want to mark it as private after posting here.

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Thanks,
Umesh
[17 Jan 2017 13:45] Anthony Stump
There is no sensitive info in my tables so no risk in sharing them.

Old, not working:  
https://drive.google.com/open?id=0B6ScTEIoK3tPbGhrZkJOZmhRUTQ

New
https://drive.google.com/open?id=0B6ScTEIoK3tPYTFzdVp1Zm9oV3M

Note, even though the new table has the name it has, there is no json indexing built yet.

Query that fails on old, works on new StationDataIndexed table, instantly.

select json_extract(jsonData, '$.KOJC') from StationData;

Query that takes 30 seconds, even with a 1 row constraint:

select json_extract(jsonData, '$**.KOJC') from StationData limit 1;
[17 Jan 2017 16:50] MySQL Verification Team
I'm unable to download from the google drive, could be some temporary problem or uploaded files no longer exists?

Attachment: 84523.png (image/png, text), 50.29 KiB.

[17 Jan 2017 16:55] Anthony Stump
I am able to view the links fine. Thats not a Google error, likely a corporate firewall blocking your access.
[17 Jan 2017 17:58] MySQL Verification Team
Are you able to attach the files here using the Files tab?. Thanks.
[17 Jan 2017 18:42] Anthony Stump
Because of the file sizes I can not attach. I am sftping them.

mysql-bug-oldStations.zip (~70 MB) is the old format that causes the 30 min queries as I can not use JSON path expression and have to use wildcard.

mysql-bug-newStations.zip (~1.8 GB) is the new working schema.
[17 Jan 2017 18:43] Anthony Stump
The newStations dump should be done in 15min.
[18 Jan 2017 14:55] Morgan Tocker
I can reproduce on a table with one row (4M JSON document):

mysql [localhost] {msandbox} (bug84523) > select json_extract(doc, '$**.KOJC') FROM t1\G
*************************** 1. row ***************************
json_extract(doc, '$**.KOJC'): [{"D0": "1.1", "T0": "7.1", "CCL": "9.999e+20", "CIN": "-0.14", "WD0": "339.5", "WS0": "13", "CAPE": "0", "D100": "-87.4", "D125": "-82.9", "D150": "-78.7", "D175": "-74.6", "D200": "-72.1", "D225": "-69.1", "D250": "-61.9", "D275": "-55.8", "D300": "-57.1", "D325": "-55.3", "D350": "-52.3", "D375": "-49.0", "D400": "-45.4", "D425": "-40.8", "D450": "-36.1", "D475": "-32.1", "D500": "-29.0", "D525": "-26.4", "D550": "-24.1", "D575": "-21.9", "D600": "-19.8", "D625": "-17.0", "D650": "-14.3", "D675": "-11.5", "D700": "-9.4", "D725": "-7.7", "D750": "-7.3", "D775": "-6.9", "D800": "-6.4", "D825": "-5.2", "D850": "-3.2", "D875": "-1.7", "D900": "-0.7", "D925": "-0.1", "D950": "0.4", "D975": "1.4", "FZLV": "11679.79", "PWAT": "0.48", "SLCL": "642.60", "T100": "-68.4", "T125": "-64.7", "T150": "-63.0", "T175": "-62.8", "T200": "-62.4", "T225": "-57.9", "T250": "-52.6", "T275": "-48.1", "T300": "-44.1", "T325": "-40.1", "T350": "-36.4", "T375": "-32.6", "T400": "-28.8", "T425": "-25.0", "T450": "-21.4", "T475": "-18.0", "T500": "-14.9", "T525": "-11.9", "T550": "-9.1", "T575": "-6.6", "T600": "-4.3", "T625": "-2.2", "T650": "-0.1", "T675": "1.9", "T700": "3.7", "T725": "5.4", "T750": "6.4", "T775": "7.1", "T800": "7.6", "T825": "8.2", "T850": "9.1", "T875": "9.5", "T900": "8.1", "T925": "4.9", "T950": "5.7", "T975": "7.6", "WZLV": "2418.7", "D1000": "2.7", "T1000": "8.9", "WD100": "253.2", "WD125": "266.1", "WD150": "268.0", "WD175": "262.0", "WD200": "257.0", "WD225": "256.8", "WD250": "259.1", "WD275": "261.7", "WD300": "264.7", "WD325": "267.0", "WD350": "266.5", "WD375": "262.8", "WD400": "257.9", "WD425": "254.2", "WD450": "252.3", "WD475": "251.4", "WD500": "251.8", "WD525": "252.3", "WD550": "252.9", "WD575": "252.3", "WD600": "251.5", "WD625": "249.9", "WD650": "248.4", "WD675": "248.1", "WD700": "248.5", "WD725": "249.5", "WD750": "249.7", "WD775": "247.0", "WD800": "239.2", "WD825": "225.7", "WD850": "225.1", "WD875": "234.4", "WD900": "295.1", "WD925": "340.6", "WD950": "341.4", "WD975": "340.5", "WS100": "67", "WS125": "80", "WS150": "105", "WS175": "112", "WS200": "118", "WS225": "121", "WS250": "123", "WS275": "120", "WS300": "112", "WS325": "105", "WS350": "95", "WS375": "86", "WS400": "81", "WS425": "80", "WS450": "79", "WS475": "79", "WS500": "77", "WS525": "73", "WS550": "70", "WS575": "66", "WS600": "62", "WS625": "58", "WS650": "56", "WS675": "56", "WS700": "57", "WS725": "57", "WS750": "53", "WS775": "46", "WS800": "37", "WS825": "30", "WS850": "26", "WS875": "18", "WS900": "10", "WS925": "19", "WS950": "20", "WS975": "14", "HGT500": "561", "WD1000": "340.8", "WS1000": "14", "Weather": "Fair", "Dewpoint": "32.0", "Pressure": "1005.3", "WindGust": "23.0", "WindSpeed": "15.0", "TimeString": "Last Updated on Jan 11 2017, 4:53 pm CST", "Visibility": "10.00", "Temperature": "39.0", "WindDegrees": "340", "WindDirection": "North", "RelativeHumidity": "76"}]
1 row in set (2.21 sec)

mysql [localhost] {msandbox} (bug84523) > select json_extract(doc, '$.KOJC') FROM t1\G
*************************** 1. row ***************************
json_extract(doc, '$.KOJC'): NULL
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (bug84523) > select length(doc)/1024/1024 from t1;
+-----------------------+
| length(doc)/1024/1024 |
+-----------------------+
|            4.42477036 |
+-----------------------+
1 row in set (0.15 sec)

mysql [localhost] {msandbox} (bug84523) > SELECT version();
+-----------+
| version() |
+-----------+
| 5.7.17    |
+-----------+
1 row in set (0.00 sec)

I will attach the reduced test-case.
[18 Jan 2017 14:58] Morgan Tocker
8.0.1 results:

mysql [localhost] {msandbox} (bug84523) > SELECT version();
+------------------------------------------+
| version()                                |
+------------------------------------------+
| 8.0.1-dmr-enterprise-commercial-advanced |
+------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (bug84523) >  select json_extract(doc, '$.KOJC') FROM t1\G
*************************** 1. row ***************************
json_extract(doc, '$.KOJC'): NULL
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (bug84523) > select json_extract(doc, '$**.KOJC') FROM t1\G
*************************** 1. row ***************************
json_extract(doc, '$**.KOJC'): [{"D0": "1.1", "T0": "7.1", "CCL": "9.999e+20", "CIN": "-0.14", "WD0": "339.5", "WS0": "13", "CAPE": "0", "D100": "-87.4", "D125": "-82.9", "D150": "-78.7", "D175": "-74.6", "D200": "-72.1", "D225": "-69.1", "D250": "-61.9", "D275": "-55.8", "D300": "-57.1", "D325": "-55.3", "D350": "-52.3", "D375": "-49.0", "D400": "-45.4", "D425": "-40.8", "D450": "-36.1", "D475": "-32.1", "D500": "-29.0", "D525": "-26.4", "D550": "-24.1", "D575": "-21.9", "D600": "-19.8", "D625": "-17.0", "D650": "-14.3", "D675": "-11.5", "D700": "-9.4", "D725": "-7.7", "D750": "-7.3", "D775": "-6.9", "D800": "-6.4", "D825": "-5.2", "D850": "-3.2", "D875": "-1.7", "D900": "-0.7", "D925": "-0.1", "D950": "0.4", "D975": "1.4", "FZLV": "11679.79", "PWAT": "0.48", "SLCL": "642.60", "T100": "-68.4", "T125": "-64.7", "T150": "-63.0", "T175": "-62.8", "T200": "-62.4", "T225": "-57.9", "T250": "-52.6", "T275": "-48.1", "T300": "-44.1", "T325": "-40.1", "T350": "-36.4", "T375": "-32.6", "T400": "-28.8", "T425": "-25.0", "T450": "-21.4", "T475": "-18.0", "T500": "-14.9", "T525": "-11.9", "T550": "-9.1", "T575": "-6.6", "T600": "-4.3", "T625": "-2.2", "T650": "-0.1", "T675": "1.9", "T700": "3.7", "T725": "5.4", "T750": "6.4", "T775": "7.1", "T800": "7.6", "T825": "8.2", "T850": "9.1", "T875": "9.5", "T900": "8.1", "T925": "4.9", "T950": "5.7", "T975": "7.6", "WZLV": "2418.7", "D1000": "2.7", "T1000": "8.9", "WD100": "253.2", "WD125": "266.1", "WD150": "268.0", "WD175": "262.0", "WD200": "257.0", "WD225": "256.8", "WD250": "259.1", "WD275": "261.7", "WD300": "264.7", "WD325": "267.0", "WD350": "266.5", "WD375": "262.8", "WD400": "257.9", "WD425": "254.2", "WD450": "252.3", "WD475": "251.4", "WD500": "251.8", "WD525": "252.3", "WD550": "252.9", "WD575": "252.3", "WD600": "251.5", "WD625": "249.9", "WD650": "248.4", "WD675": "248.1", "WD700": "248.5", "WD725": "249.5", "WD750": "249.7", "WD775": "247.0", "WD800": "239.2", "WD825": "225.7", "WD850": "225.1", "WD875": "234.4", "WD900": "295.1", "WD925": "340.6", "WD950": "341.4", "WD975": "340.5", "WS100": "67", "WS125": "80", "WS150": "105", "WS175": "112", "WS200": "118", "WS225": "121", "WS250": "123", "WS275": "120", "WS300": "112", "WS325": "105", "WS350": "95", "WS375": "86", "WS400": "81", "WS425": "80", "WS450": "79", "WS475": "79", "WS500": "77", "WS525": "73", "WS550": "70", "WS575": "66", "WS600": "62", "WS625": "58", "WS650": "56", "WS675": "56", "WS700": "57", "WS725": "57", "WS750": "53", "WS775": "46", "WS800": "37", "WS825": "30", "WS850": "26", "WS875": "18", "WS900": "10", "WS925": "19", "WS950": "20", "WS975": "14", "HGT500": "561", "WD1000": "340.8", "WS1000": "14", "Weather": "Fair", "Dewpoint": "32.0", "Pressure": "1005.3", "WindGust": "23.0", "WindSpeed": "15.0", "TimeString": "Last Updated on Jan 11 2017, 4:53 pm CST", "Visibility": "10.00", "Temperature": "39.0", "WindDegrees": "340", "WindDirection": "North", "RelativeHumidity": "76"}]
1 row in set (5.27 sec)
[18 Jan 2017 14:59] Morgan Tocker
Reduced Testcase

Attachment: bug84523.sql.gz (application/gzip, text), 907.44 KiB.

[18 Jan 2017 16:11] Morgan Tocker
I also tried the test case on Mac OS X.  The difference is much larger:

json_extract(doc, '$.KOJC')  -> 0.01 sec
json_extract(doc, '$**.KOJC') -> 1 min 0.45 sec
[19 Jan 2017 4:06] MySQL Verification Team
Thank you Morgan!
[19 Jan 2017 8:10] Naveen Kumar
$**.key wildcard on a large JSON field will ofcourse take huge time as it has to check each and every key no matter how deep it is (coz of the wildcard).

You can use JSON path expressions on a JSON array using $[*].key wildcard but suggest not $**.key wildcard as previous is effective in current scenario.

$**.key wildcard is supposed to be used on JSON object where you don't know how deep the searching key is present.

I think this is not a bug. Maybe the execution time on ** wildcard is too high even for a single record, but I assume deep traversal might need this much time.
[19 Jan 2017 8:19] Naveen Kumar
SELECT SQL_NO_CACHE JSON_EXTRACT(doc, '$[*].KOJC') FROM t1;

1 row in set (0.00 sec)
[20 Jan 2017 14:40] Anthony Stump
So even if it is a deep query I still think that 30 seconds for 1 row is rather bad.

Especially when I can loop through and find any value in the JSON string in PHP in a fraction of the time, provided I'm not pulling so much data that PHP generates the dreaded memory size exceeded error (the whole reason why I wasn't able to use the data in the original form I had!)

PHP can iterate through and find the station in the original format in a period of time that is un-noticable. 

$jsonData=json_decode($row['jsonData']);

foreach ($jsonData as $thisJsonData) {
 foreach ($thisJsonData as $station=>$data) {
   if ($station == "KOJC") {
     Temperature = $data['Temperature'];
   }
 }
}

why does that take milliseconds when simply finding the key with the $**.KOJC  method takes 30 seconds? Even if it is deep traversal?
[9 Mar 2017 16:56] Morgan Tocker
Posted by developer:
 
This has improved considerably in 8.0.1.  New run times:

Linux:
json_extract(doc, '$.KOJC')     : 0.01 sec
json_extract(doc, '$**.KOJC') : 0.2 sec

Mac OS X:
json_extract(doc, '$.KOJC')     : 0.00 sec
json_extract(doc, '$**.KOJC') : 3.21 sec
[11 Jun 2017 10:30] Knut Anders Hatlen
Posted by developer:
 
The improvements in 8.0.1 seem to be a result of Bug#25151440 ALLOW MOVING OR EMPLACING ELEMENTS INTO PREALLOCED_ARRAY.
[8 Aug 2017 15:21] Jon Stephens
Documented fix in the MySQL 8.0.3 changelog as follows:

    Searches with JSON_EXTRACT() that used wildcards took an inordinate
    amount of time.

    See also BUG#83959.

Closed.