Bug #111506 Spatial index cardinality equals to the number of rows of the table.
Submitted: 21 Jun 2023 3:46 Modified: 21 Jun 2023 7:25
Reporter: Aristotle Po Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7.42, 8.0.33 OS:Any
Assigned to: CPU Architecture:Any
Tags: cardinality, spatial index

[21 Jun 2023 3:46] Aristotle Po
Description:
We observed that spatial index cardinality estimate is based on the total number of rows of the table. 
While the documentation does not state such behavior : 
https://dev.mysql.com/doc/mysql-infoschema-excerpt/8.0/en/information-schema-statistics-ta...
https://dev.mysql.com/doc/refman/8.0/en/show-index.html
> Cardinality
> An estimate of the number of unique values in the index.
It does not make an exception for spatial index.

Got similar result on testing on both latest version below :
Server version: 5.7.42 MySQL Community Server (GPL)
Server version: 8.0.33 MySQL Community Server - GPL

mysql [localhost:5742] {msandbox} (db6) > SHOW INDEXES FROM ip_ranges;
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name                    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ip_ranges |          0 | PRIMARY                     |            1 | i           | A         |      292500 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | idx_start_end               |            1 | ip_start    | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | idx_start_end               |            2 | ip_end      | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | index_ip_ranges_on_ip_range |            1 | ip_range    | A         |      292500 |       32 | NULL   |      | SPATIAL    |         |               |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

mysql [localhost:5742] {msandbox} (db6) > SELECT COUNT(*), ip_range FROM ip_ranges GROUP BY 2;
+----------+---------------------------------------------------------------------------------------------------+
| COUNT(*) | ip_range                                                                                          |
+----------+---------------------------------------------------------------------------------------------------+
|   262144 |              �A      �?�A          �A          �A      �?�A      �?         |
|    32768 |                 pA      �?   pA           00pA           00pA      �?   pA      �?    |
+----------+---------------------------------------------------------------------------------------------------+
2 rows in set (0.61 sec)

How to repeat:
############################################
# Prepare the schema
############################################
CREATE DATABASE db6;
USE db6;
CREATE TABLE ip_ranges(
        i INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
        , ip_start int unsigned not null
        , ip_end int unsigned not null
        , KEY idx_start_end(ip_start,ip_end)
);

SET @new_ip_start = INET_ATON('1.1.1.1');
SET @new_ip_end   = INET_ATON('1.1.3.3');
SET @new2_ip_start= INET_ATON('2.1.1.1');
SET @new2_ip_end  = INET_ATON('2.1.3.3');

INSERT INTO  ip_ranges(ip_start, ip_end) VALUES (@new_ip_start , @new_ip_end) ;
INSERT INTO  ip_ranges(ip_start, ip_end) VALUES (@new2_ip_start, @new2_ip_end) ;

-- Below changes are from https://bignerdranch.com/blog/using-mysql-spatial-extensions-for-range-queries/
ALTER TABLE ip_ranges add ip_range POLYGON NULL;
UPDATE ip_ranges
  SET ip_range=Polygon(
    LineString(
      Point(ip_start, 1),
      Point(ip_start, 0),
      Point(ip_end, 0),
      Point(ip_end, 1),
      Point(ip_start, 1)
    )
  );
ALTER TABLE ip_ranges MODIFY ip_range POLYGON NOT NULL;
CREATE SPATIAL INDEX index_ip_ranges_on_ip_range on ip_ranges (ip_range);

############################################
# Load data more data
############################################
for i in {1..15};
do
        mysql -h127.0.0.1 -P5742 db6 -e"INSERT INTO ip_ranges(ip_start, ip_end, ip_range) SELECT ip_start, ip_end, ip_range FROM ip_ranges v1 WHERE EXISTS ( SELECT 1 FROM ip_ranges v2 WHERE v1.ip_start=v2.ip_start AND v1.ip_end=v2.ip_end AND v2.i=1)"
done

for i in {1..18};
do
        mysql -h127.0.0.1 -P5742 db6 -e"INSERT INTO ip_ranges(ip_start, ip_end, ip_range) SELECT ip_start, ip_end, ip_range FROM ip_ranges v1 WHERE EXISTS ( SELECT 1 FROM ip_ranges v2 WHERE v1.ip_start=v2.ip_start AND v1.ip_end=v2.ip_end AND v2.i=2)"
done

############################################
# Count the data and get unique values for spatial column
############################################
mysql [localhost:5742] {msandbox} (db6) > SELECT COUNT(*), ip_range FROM ip_ranges GROUP BY 2;
+----------+---------------------------------------------------------------------------------------------------+
| COUNT(*) | ip_range                                                                                          |
+----------+---------------------------------------------------------------------------------------------------+
|   262144 |              �A      �?�A          �A          �A      �?�A      �?         |
|    32768 |                 pA      �?   pA           00pA           00pA      �?   pA      �?    |
+----------+---------------------------------------------------------------------------------------------------+
2 rows in set (0.61 sec).

############################################
# Update statistics
############################################
mysql [localhost:5742] {msandbox} (db6) > ANALYZE TABLE ip_ranges;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| db6.ip_ranges | analyze | status   | OK       |
+---------------+---------+----------+----------+
1 row in set (0.01 sec)

############################################
# Verify statistics : 
# ISSUE : We can see that cardinality estimate of spatial index in not the unique count value but the total number of rows for the table. 
############################################

mysql [localhost:5742] {msandbox} (db6) > SHOW INDEXES FROM ip_ranges;
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table     | Non_unique | Key_name                    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ip_ranges |          0 | PRIMARY                     |            1 | i           | A         |      292500 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | idx_start_end               |            1 | ip_start    | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | idx_start_end               |            2 | ip_end      | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
| ip_ranges |          1 | index_ip_ranges_on_ip_range |            1 | ip_range    | A         |      292500 |       32 | NULL   |      | SPATIAL    |         |               |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

mysql [localhost:5742] {msandbox} (db6) > SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE table_name = 'ip_ranges' AND table_schema = 'db6';
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | INDEX_SCHEMA | INDEX_NAME                  | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULLABLE | INDEX_TYPE | COMMENT | INDEX_COMMENT |
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| def           | db6          | ip_ranges  |          0 | db6          | PRIMARY                     |            1 | i           | A         |      292500 |     NULL | NULL   |          | BTREE      |         |               |
| def           | db6          | ip_ranges  |          1 | db6          | idx_start_end               |            1 | ip_start    | A         |           1 |     NULL | NULL   |          | BTREE      |         |               |
| def           | db6          | ip_ranges  |          1 | db6          | idx_start_end               |            2 | ip_end      | A         |           1 |     NULL | NULL   |          | BTREE      |         |               |
| def           | db6          | ip_ranges  |          1 | db6          | index_ip_ranges_on_ip_range |            1 | ip_range    | A         |      292500 |       32 | NULL   |          | SPATIAL    |         |               |
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
4 rows in set (0.00 sec)

############################################
# While checking innodb_index_stats, we cannot see the data for spatial index  below.
############################################
mysql [localhost:5742] {msandbox} (db6) > SELECT * FROM mysql.innodb_index_stats WHERE database_name='db6' and table_name = 'ip_ranges';
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name    | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
| db6           | ip_ranges  | PRIMARY       | 2023-06-21 08:57:08 | n_diff_pfx01 |     292500 |          20 | i                                 |
| db6           | ip_ranges  | PRIMARY       | 2023-06-21 08:57:08 | n_leaf_pages |       2500 |        NULL | Number of leaf pages in the index |
| db6           | ip_ranges  | PRIMARY       | 2023-06-21 08:57:08 | size         |       2532 |        NULL | Number of pages in the index      |
| db6           | ip_ranges  | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx01 |          1 |           3 | ip_start                          |
| db6           | ip_ranges  | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx02 |          1 |           3 | ip_start,ip_end                   |
| db6           | ip_ranges  | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx03 |     295681 |          20 | ip_start,ip_end,i                 |
| db6           | ip_ranges  | idx_start_end | 2023-06-21 08:57:08 | n_leaf_pages |        319 |        NULL | Number of leaf pages in the index |
| db6           | ip_ranges  | idx_start_end | 2023-06-21 08:57:08 | size         |        353 |        NULL | Number of pages in the index      |
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)

############################################
# Checking the source code of mysql-5.7.42/storage/innobase/dict/dict0stats.cc :
# Based on a function name, the index is ignored if it is an FTS or spatial, if it should be dropped, if it's corrupted, and if it's not committed. So the SPATIAL KEY will be ignored in the stats
############################################

 158 /*********************************************************************//**
 159 Checks whether an index should be ignored in stats manipulations:
 160 * stats fetch
 161 * stats recalc
 162 * stats save
 163 @return true if exists and all tables are ok */
 164 UNIV_INLINE
 165 bool
 166 dict_stats_should_ignore_index(
 167 /*===========================*/
 168         const dict_index_t*     index)  /*!< in: index */
 169 {
 170         return((index->type & DICT_FTS)
 171                || dict_index_is_corrupted(index)
 172                || dict_index_is_spatial(index)
 173                || index->to_be_dropped
 174                || !index->is_committed());
 175 }

Suggested fix:
Please make the cardinality of spatial index accurately according to the documentation : "An estimate of the number of unique values in the index".

Or if it is an expected behavior, please do update the documentation :
1) To mention an exception that cardinality for spatial index is based on total rows of the tables.
https://dev.mysql.com/doc/mysql-infoschema-excerpt/8.0/en/information-schema-statistics-ta...
https://dev.mysql.com/doc/refman/8.0/en/show-index.html
2) To mention that spatial index is skipped in gathering stats on table innodb_index_stats.
https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html
[21 Jun 2023 7:25] MySQL Verification Team
Hello Aristotle Po,

Thank you for the report and feedback.

regards,
Umesh