Description:
We observed that spatial index cardinality estimate is based on the total number of rows of the table.
While the documentation does not state such behavior :
https://dev.mysql.com/doc/mysql-infoschema-excerpt/8.0/en/information-schema-statistics-ta...
https://dev.mysql.com/doc/refman/8.0/en/show-index.html
> Cardinality
> An estimate of the number of unique values in the index.
It does not make an exception for spatial index.
Got similar result on testing on both latest version below :
Server version: 5.7.42 MySQL Community Server (GPL)
Server version: 8.0.33 MySQL Community Server - GPL
mysql [localhost:5742] {msandbox} (db6) > SHOW INDEXES FROM ip_ranges;
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ip_ranges | 0 | PRIMARY | 1 | i | A | 292500 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | idx_start_end | 1 | ip_start | A | 1 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | idx_start_end | 2 | ip_end | A | 1 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | index_ip_ranges_on_ip_range | 1 | ip_range | A | 292500 | 32 | NULL | | SPATIAL | | |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)
mysql [localhost:5742] {msandbox} (db6) > SELECT COUNT(*), ip_range FROM ip_ranges GROUP BY 2;
+----------+---------------------------------------------------------------------------------------------------+
| COUNT(*) | ip_range |
+----------+---------------------------------------------------------------------------------------------------+
| 262144 | �A �?�A �A �A �?�A �? |
| 32768 | pA �? pA 00pA 00pA �? pA �? |
+----------+---------------------------------------------------------------------------------------------------+
2 rows in set (0.61 sec)
How to repeat:
############################################
# Prepare the schema
############################################
CREATE DATABASE db6;
USE db6;
CREATE TABLE ip_ranges(
i INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, ip_start int unsigned not null
, ip_end int unsigned not null
, KEY idx_start_end(ip_start,ip_end)
);
SET @new_ip_start = INET_ATON('1.1.1.1');
SET @new_ip_end = INET_ATON('1.1.3.3');
SET @new2_ip_start= INET_ATON('2.1.1.1');
SET @new2_ip_end = INET_ATON('2.1.3.3');
INSERT INTO ip_ranges(ip_start, ip_end) VALUES (@new_ip_start , @new_ip_end) ;
INSERT INTO ip_ranges(ip_start, ip_end) VALUES (@new2_ip_start, @new2_ip_end) ;
-- Below changes are from https://bignerdranch.com/blog/using-mysql-spatial-extensions-for-range-queries/
ALTER TABLE ip_ranges add ip_range POLYGON NULL;
UPDATE ip_ranges
SET ip_range=Polygon(
LineString(
Point(ip_start, 1),
Point(ip_start, 0),
Point(ip_end, 0),
Point(ip_end, 1),
Point(ip_start, 1)
)
);
ALTER TABLE ip_ranges MODIFY ip_range POLYGON NOT NULL;
CREATE SPATIAL INDEX index_ip_ranges_on_ip_range on ip_ranges (ip_range);
############################################
# Load data more data
############################################
for i in {1..15};
do
mysql -h127.0.0.1 -P5742 db6 -e"INSERT INTO ip_ranges(ip_start, ip_end, ip_range) SELECT ip_start, ip_end, ip_range FROM ip_ranges v1 WHERE EXISTS ( SELECT 1 FROM ip_ranges v2 WHERE v1.ip_start=v2.ip_start AND v1.ip_end=v2.ip_end AND v2.i=1)"
done
for i in {1..18};
do
mysql -h127.0.0.1 -P5742 db6 -e"INSERT INTO ip_ranges(ip_start, ip_end, ip_range) SELECT ip_start, ip_end, ip_range FROM ip_ranges v1 WHERE EXISTS ( SELECT 1 FROM ip_ranges v2 WHERE v1.ip_start=v2.ip_start AND v1.ip_end=v2.ip_end AND v2.i=2)"
done
############################################
# Count the data and get unique values for spatial column
############################################
mysql [localhost:5742] {msandbox} (db6) > SELECT COUNT(*), ip_range FROM ip_ranges GROUP BY 2;
+----------+---------------------------------------------------------------------------------------------------+
| COUNT(*) | ip_range |
+----------+---------------------------------------------------------------------------------------------------+
| 262144 | �A �?�A �A �A �?�A �? |
| 32768 | pA �? pA 00pA 00pA �? pA �? |
+----------+---------------------------------------------------------------------------------------------------+
2 rows in set (0.61 sec).
############################################
# Update statistics
############################################
mysql [localhost:5742] {msandbox} (db6) > ANALYZE TABLE ip_ranges;
+---------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| db6.ip_ranges | analyze | status | OK |
+---------------+---------+----------+----------+
1 row in set (0.01 sec)
############################################
# Verify statistics :
# ISSUE : We can see that cardinality estimate of spatial index in not the unique count value but the total number of rows for the table.
############################################
mysql [localhost:5742] {msandbox} (db6) > SHOW INDEXES FROM ip_ranges;
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ip_ranges | 0 | PRIMARY | 1 | i | A | 292500 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | idx_start_end | 1 | ip_start | A | 1 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | idx_start_end | 2 | ip_end | A | 1 | NULL | NULL | | BTREE | | |
| ip_ranges | 1 | index_ip_ranges_on_ip_range | 1 | ip_range | A | 292500 | 32 | NULL | | SPATIAL | | |
+-----------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)
mysql [localhost:5742] {msandbox} (db6) > SELECT * FROM INFORMATION_SCHEMA.STATISTICS WHERE table_name = 'ip_ranges' AND table_schema = 'db6';
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | INDEX_SCHEMA | INDEX_NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULLABLE | INDEX_TYPE | COMMENT | INDEX_COMMENT |
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| def | db6 | ip_ranges | 0 | db6 | PRIMARY | 1 | i | A | 292500 | NULL | NULL | | BTREE | | |
| def | db6 | ip_ranges | 1 | db6 | idx_start_end | 1 | ip_start | A | 1 | NULL | NULL | | BTREE | | |
| def | db6 | ip_ranges | 1 | db6 | idx_start_end | 2 | ip_end | A | 1 | NULL | NULL | | BTREE | | |
| def | db6 | ip_ranges | 1 | db6 | index_ip_ranges_on_ip_range | 1 | ip_range | A | 292500 | 32 | NULL | | SPATIAL | | |
+---------------+--------------+------------+------------+--------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
4 rows in set (0.00 sec)
############################################
# While checking innodb_index_stats, we cannot see the data for spatial index below.
############################################
mysql [localhost:5742] {msandbox} (db6) > SELECT * FROM mysql.innodb_index_stats WHERE database_name='db6' and table_name = 'ip_ranges';
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
| db6 | ip_ranges | PRIMARY | 2023-06-21 08:57:08 | n_diff_pfx01 | 292500 | 20 | i |
| db6 | ip_ranges | PRIMARY | 2023-06-21 08:57:08 | n_leaf_pages | 2500 | NULL | Number of leaf pages in the index |
| db6 | ip_ranges | PRIMARY | 2023-06-21 08:57:08 | size | 2532 | NULL | Number of pages in the index |
| db6 | ip_ranges | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx01 | 1 | 3 | ip_start |
| db6 | ip_ranges | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx02 | 1 | 3 | ip_start,ip_end |
| db6 | ip_ranges | idx_start_end | 2023-06-21 08:57:08 | n_diff_pfx03 | 295681 | 20 | ip_start,ip_end,i |
| db6 | ip_ranges | idx_start_end | 2023-06-21 08:57:08 | n_leaf_pages | 319 | NULL | Number of leaf pages in the index |
| db6 | ip_ranges | idx_start_end | 2023-06-21 08:57:08 | size | 353 | NULL | Number of pages in the index |
+---------------+------------+---------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)
############################################
# Checking the source code of mysql-5.7.42/storage/innobase/dict/dict0stats.cc :
# Based on a function name, the index is ignored if it is an FTS or spatial, if it should be dropped, if it's corrupted, and if it's not committed. So the SPATIAL KEY will be ignored in the stats
############################################
158 /*********************************************************************//**
159 Checks whether an index should be ignored in stats manipulations:
160 * stats fetch
161 * stats recalc
162 * stats save
163 @return true if exists and all tables are ok */
164 UNIV_INLINE
165 bool
166 dict_stats_should_ignore_index(
167 /*===========================*/
168 const dict_index_t* index) /*!< in: index */
169 {
170 return((index->type & DICT_FTS)
171 || dict_index_is_corrupted(index)
172 || dict_index_is_spatial(index)
173 || index->to_be_dropped
174 || !index->is_committed());
175 }
Suggested fix:
Please make the cardinality of spatial index accurately according to the documentation : "An estimate of the number of unique values in the index".
Or if it is an expected behavior, please do update the documentation :
1) To mention an exception that cardinality for spatial index is based on total rows of the tables.
https://dev.mysql.com/doc/mysql-infoschema-excerpt/8.0/en/information-schema-statistics-ta...
https://dev.mysql.com/doc/refman/8.0/en/show-index.html
2) To mention that spatial index is skipped in gathering stats on table innodb_index_stats.
https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html