Bug #80712 Invalid data for ST_GeomFromGeoJSON() is hard to diagnose
Submitted: 12 Mar 2016 11:17 Modified: 31 May 2016 15:46
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: GIS Severity:S3 (Non-critical)
Version:5.7.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: geo, geojson, gis, json, spatial

[12 Mar 2016 11:17] Daniël van Eeden
Description:
Troubleshooting virtual columns with ST_GeomFromGeoJSON is difficult because it doesn't tell you which row has invalid data.

It is stored as JSON type, so it is valid JSON data.

How to repeat:
mysql> ALTER TABLE test_features ADD COLUMN `geometry` geometry GENERATED ALWAYS AS (ST_GeomFromGeoJSON(feature->'$.geometry',2)) STORED;
ERROR 3072 (HY000): Invalid GeoJSON data provided to function st_geomfromgeojson
mysql> ALTER TABLE test_features ADD COLUMN `geometry` geometry GENERATED ALWAYS AS (ST_GeomFromGeoJSON(feature->'$.geometry',2)) VIRTUAL;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT LENGTH(`geometry`) FROM test_features;
ERROR 3072 (HY000): Invalid GeoJSON data provided to function st_geomfromgeojson
mysql> SHOW CREATE TABLE test_features\G
*************************** 1. row ***************************
       Table: test_features
Create Table: CREATE TABLE `test_features` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feature` json NOT NULL,
  `street` varchar(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL,
  `geometry` geometry GENERATED ALWAYS AS (st_geomfromgeojson(json_extract(`feature`,'$.geometry'),2)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `street` (`street`)
) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT COUNT(`geometry`) FROM test_features;
ERROR 3072 (HY000): Invalid GeoJSON data provided to function st_geomfromgeojson
[12 Mar 2016 11:20] Daniël van Eeden
Related:
https://gist.github.com/dveeden/c5b9fbf56e4559d9f400
https://www.percona.com/blog/2016/03/07/json-document-fast-lookup-with-mysql-5-7/
http://databaseblog.myname.nl/2016/03/re-json-document-fast-lookup-with-mysql.html
[12 Mar 2016 11:57] Daniël van Eeden
This works:
SELECT 
ST_geomFromGeoJSON(feature->'$.geometry',2) 
FROM test_features 
WHERE json_type(feature->'$.geometry')<>'NULL'

The invalid data is for rows like this where $.geometry is null:
mysql> SELECT feature, feature->'$.geometry' FROM test_features WHERE id=45504\G
*************************** 1. row ***************************
              feature: {"type": "Feature", "geometry": null, "properties": {"TO_ST": null, "BLKLOT": "1245061", "STREET": null, "FROM_ST": null, "LOT_NUM": "061", "ST_TYPE": null, "ODD_EVEN": null, "BLOCK_NUM": "1245", "MAPBLKLOT": "1245061"}}
feature->'$.geometry': null
1 row in set (0.00 sec)
[12 Mar 2016 11:58] Daniël van Eeden
Adding a ST_IsValidGeoJSON() function would be nice.
Also it would be nice to handle null/NULL values differently to make this work for virtual columns
[15 Mar 2016 6:56] MySQL Verification Team
Hello Daniël,

Thank you for the report.

Thanks,
Umesh
[15 Mar 2016 6:57] MySQL Verification Team
--

-- 5.7.11

Database changed
mysql> CREATE TABLE `test_features` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `feature` json NOT NULL,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB;
ERROR 1050 (42S01): Table 'test_features' already exists
mysql> drop table test_features;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE `test_features` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `feature` json NOT NULL,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

- Insert

mysql> INSERT INTO test.test_features(feature)
VALUES('{
    "type": "Feature",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ],
                [
                    -122.41983177253881,
                    37.80720512387136,
                    0
                ]
            ]
        ]
    },
    "properties": {
        "TO_ST": "600",
        "BLKLOT": "0010001",
        "STREET": "BEACH",
        "FROM_ST": "600",
        "LOT_NUM": "001",
        "ST_TYPE": "ST",
        "ODD_EVEN": "E",
        "BLOCK_NUM": "0010",
        "MAPBLKLOT": "0010001"
    }
}')

mysql> ALTER TABLE test_features ADD COLUMN `geometry` geometry GENERATED ALWAYS AS (ST_GeomFromGeoJSON(feature->'$.geometry',2)) VIRTUAL;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test_features ADD COLUMN `street` varchar(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table test_features\G
*************************** 1. row ***************************
       Table: test_features
Create Table: CREATE TABLE `test_features` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feature` json NOT NULL,
  `geometry` geometry GENERATED ALWAYS AS (st_geomfromgeojson(json_extract(`feature`,'$.geometry'),2)) VIRTUAL,
  `street` varchar(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql>  SELECT COUNT(`geometry`) FROM test_features;
ERROR 3072 (HY000): Invalid GeoJSON data provided to function st_geomfromgeojson
mysql> SELECT LENGTH(`geometry`) FROM test_features;
ERROR 3072 (HY000): Invalid GeoJSON data provided to function st_geomfromgeojson
mysql>
[22 Mar 2016 9:12] Erik Frøseth
Posted by developer:
 
Here is a small analysis of the problem:

The GeoJSON standard specifies the following: "A complete GeoJSON data structure is always an object (in JSON terms).". That means that if the input to ST_GeomFromGeoJSON is NOT a JSON object, it is considered illegal. And as with almost all other MySQL functions, if any of the input arguments to ST_GeomFromGeoJSON is SQL NULL, we return SQL NULL.

The use case reported can be boiled down to the following statements for simplicity:

  CREATE TABLE t1 (col1 JSON);
  INSERT INTO t1 VALUES ('{ "type": "Feature", "geometry": null, "properties": {} }');
  ALTER TABLE t1 ADD COLUMN `col2` geometry GENERATED ALWAYS AS (ST_GeomFromGeoJSON(col1->'$.geometry')) VIRTUAL;
  SELECT * FROM t1;

The last SELECT fails, because when the virtual column 'col2' tries to extract the property 'geometry' from the JSON field, it gets a JSON NULL and not a SQL NULL. Hence, it is considered as an illegal/invalid GeoJSON input.

The fix for this is rather simple; Treat JSON NULL as a SQL NULL in ST_GeomFromGeoJSON. The consequence of this fix is that the following SQL becomes valid:

  SELECT ST_GeomFromGeoJSON(CAST('null' AS JSON));

I do not see any issues with allowing the above statement, as the use case given by the reported seems like a rather common use case. So I suggest that we handle JSON NULL as a SQL NULL in ST_GeomFromGeoJSON.

I'm not sure though whether this should be considered as a bug or feature request...
[31 May 2016 15:46] Paul DuBois
Posted by developer:
 
Noted in 5.7.14 changelog.

ST_GeomFromGeoJSON() treated JSON NULL as invalid input. Now it
treats JSON NULL as SQL NULL and thus returns SQL NULL for JSON NULL
input.