Bug #98836 GeoJSON Util.importJson: Input does not start with a JSON object at offset 0
Submitted: 4 Mar 2020 20:36 Modified: 4 Jun 2020 9:59
Reporter: Tim D Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Document Store: MySQL Shell Severity:S3 (Non-critical)
Version:8.0.19 OS:Windows (10 Pro)
Assigned to: CPU Architecture:x86 (i7-4790k)

[4 Mar 2020 20:36] Tim D
Description:
I can successfully import a trivial json from file, but I am receiving an error from importJson when trying to import a GeoJSON of 81,437 bytes.

I can successfully SQL INSERT INTO [table] the same GeoJSON as a literal string into a table column of type json, and rountrip validate the GeoJSON with this query

SELECT ST_IsValid(ST_GeomFromGeoJSON(geojson)) FROM test;

The troublesome GeoJSON can be downloaded from here:

https://environment.data.gov.uk/flood-monitoring/id/floodAreas/053WAF113LWA/polygon

 I will also attach it to this ticket.

How to repeat:
d:\>"c:\Program Files\MySQL\MySQL Shell 8.0\bin\mysqlsh.exe" root@127.0.0.1:33060/eaflooddata --import "d:\temp\FloodAreaPolygon.json"
Please provide the password for 'root@127.0.0.1:33060': ****************
Save password for 'root@127.0.0.1:33060'? [Y]es/[N]o/Ne[v]er (default No):
Importing from file "d:\temp\FloodAreaPolygon.json" to collection `eaflooddata`.`FloodAreaPolygon` in MySQL Server at 127.0.0.1:33060

Processed 0 bytes in 0 documents in 0.0027 sec (0.00 documents/s)
Total successfully imported documents 0 (0.00 documents/s)

ERROR: Util.importJson: Input does not start with a JSON object at offset 0

Suggested fix:
Smells like a json size limit issue in the importJson utility.
[4 Mar 2020 20:37] Tim D
The troublesome GeoJSON

Attachment: FloodAreaPolygon.json (application/json, text), 79.53 KiB.

[5 Mar 2020 4:15] MySQL Verification Team
Hello Tim Downs,

Thank you for the report and test case.
Verified as described with 8.0.19 build.

regards,
Umesh
[10 Mar 2020 18:02] Krzysztof Grzadziel
Hi Tim,

thanks for bug report. The reason why you are not able to import attached d:\temp\FloodAreaPolygon.json is that your version of FloodAreaPolygon.json contains UTF-8 BOM (byte mark order) at the beginning of the file. MySQL Shell currently does not support importing files that contains UTF-8 BOM.

As workaround please remove UTF-8 BOM from files that you are try to import (byte sequence ef bb bf). As a note, UTF-8 BOM is not needed for UTF-8 encoded files, because byte order in UTF-8 files is unambiguous. Some text editors always add BOM at the beginning of the file during save. It is recommended to not use such editors, or configure them to not do that.

Nevertheless, we will add support for such files in the future.

[1]
kg@parasol ~ » xxd -l 16 ~/Downloads/FloodAreaPolygon.json                     
00000000: efbb bf7b 0d0a 2020 2274 7970 6522 3a20  ...{..  "type": 

[2]
kg@parasol ~ » wget https://environment.data.gov.uk/flood-monitoring/id/floodAreas/053WAF113LWA/polygon
--2020-03-10 18:42:33--  https://environment.data.gov.uk/flood-monitoring/id/floodAreas/053WAF113LWA/polygon
Resolving environment.data.gov.uk (environment.data.gov.uk)... 51.141.123.204
Connecting to environment.data.gov.uk (environment.data.gov.uk)|51.141.123.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78147 (76K) [application/json]
Saving to: ‘polygon’

polygon             100%[===================>]  76,32K  --.-KB/s    in 0,06s   

2020-03-10 18:42:33 (1,23 MB/s) - ‘polygon’ saved [78147/78147]

kg@parasol ~ » xxd -l 16 polygon                                         483ms 
00000000: 7b0a 2020 2274 7970 6522 3a20 2246 6561  {.  "type": "Fea
[10 Mar 2020 19:34] Tim D
Nice work Krzysztof. For the record I was using notepad++ on Windows 10. Thank you for the workaround and being to prepared to include it in a future build. Tim.
[4 Jun 2020 9:59] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL Shell 8.0.21:

MySQL Shell's JSON import utility importJSON() now handles UTF-8 encoded files that include a BOM (byte mark order) at the start, which is the sequence 0xEF 0xBB 0xBF. As a workaround in earlier releases, remove this byte sequence, which is not needed.