Bug #108620 Wrong count when choosing json index
Submitted: 27 Sep 2022 2:39 Modified: 27 Sep 2022 6:06
Reporter: Shaohua Wang (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any

[27 Sep 2022 2:39] Shaohua Wang
Description:
when we do a SELECT COUNT(*) in a table with json index, we get wrong result if the json index is chosen in query plan.

How to repeat:
create table t1(
  move_time datetime(0),
  node_tree json,
  sale_count bigint,
  index idx_move_time(move_time),
  index idx_move_time_sale_count_tree(move_time, (cast(json_extract(node_tree, _utf8mb4'$.node') as char(32) array)), sale_count));

insert into t1 values('2022-08-22 00:00:00', '{"node": ["393459011", "industrial", "12900351", "12899801", "office-products", "1069242"]}', 549);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["166099011", "166092011", "toys-and-games"]}', 11978);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["1069462", "1069454", "1069242", "office-products", "490790011", "12899801"]}', 2972);

#we got 15 here, which is wrong.
select count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';

#we got 15 here, which is wrong.
select distinct count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';

select count(1) from t1 force index(idx_move_time) where move_time = '2022-8-22';

select count(1) from t1 where move_time = '2022-8-22';

drop table t1;

Suggested fix:
Avoid to use json index in this cases, or show distinct in count.
[27 Sep 2022 6:06] MySQL Verification Team
Hello Shaohua,

Thank you for the report and test case.
Observed that 8.0.28 build is affected but this issue is no longer seen after 8.0.29+.  Could you please confirm this at your end? Thank you.

== 8.0.28 affected
bin/mysql -uroot -S /tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.28 MySQL Community Server - GPL

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> use test
Database changed
mysql> create table t1(
  move_time datetime(0),
  node_tree json,
  sale_count bigint,
  index idx_move_time(move_time),
  index idx_move_time_sale_count_tree(move_time, (cast(json_extract(node_tree, _utf8mb4'$.node') as char(32) array)), sale_count));

insert into t1 values('2022-08-22 00:00:00', '{"node": ["393459011", "industrial", "12900351", "12899801", "office-products", "1069242"]}', 549);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["166099011", "166092011", "toys-and-games"]}', 11978);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["1069462", "1069454", "1069242", "office-products", "490790011", "12899801"]}', 2972);
    ->   move_time datetime(0),
    ->   node_tree json,
    ->   sale_count bigint,
    ->   index idx_move_time(move_time),
    ->   index idx_move_time_sale_count_tree(move_time, (cast(json_extract(node_tree, _utf8mb4'$.node') as char(32) array)), sale_count));
Query OK, 0 rows affected (0.03 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["393459011", "industrial", "12900351", "12899801", "office-products", "1069242"]}', 549);
Query OK, 1 row affected (0.01 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["166099011", "166092011", "toys-and-games"]}', 11978);
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["1069462", "1069454", "1069242", "office-products", "490790011", "12899801"]}', 2972);
Query OK, 1 row affected (0.01 sec)

mysql> select count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|       15 |
+----------+
1 row in set (0.00 sec)

mysql> select distinct count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|       15 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from t1 force index(idx_move_time) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from t1 where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+

== 8.0.29+ not affected
 bin/mysql -uroot -S /tmp/mysql.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.29 MySQL Community Server - GPL

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> use test
Database changed
mysql> create table t1(
  move_time datetime(0),
  node_tree json,
  sale_count bigint,
  index idx_move_time(move_time),
  index idx_move_time_sale_count_tree(move_time, (cast(json_extract(node_tree, _utf8mb4'$.node') as char(32) array)), sale_count));

insert into t1 values('2022-08-22 00:00:00', '{"node": ["393459011", "industrial", "12900351", "12899801", "office-products", "1069242"]}', 549);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["166099011", "166092011", "toys-and-games"]}', 11978);

insert into t1 values('2022-08-22 00:00:00', '{"node": ["1069462", "1069454", "1069242", "office-products", "490790011", "12899801"]}', 2972);
    ->   move_time datetime(0),
    ->   node_tree json,
    ->   sale_count bigint,
    ->   index idx_move_time(move_time),
    ->   index idx_move_time_sale_count_tree(move_time, (cast(json_extract(node_tree, _utf8mb4'$.node') as char(32) array)), sale_count));
Query OK, 0 rows affected (0.14 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["393459011", "industrial", "12900351", "12899801", "office-products", "1069242"]}', 549);
Query OK, 1 row affected (0.01 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["166099011", "166092011", "toys-and-games"]}', 11978);
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> insert into t1 values('2022-08-22 00:00:00', '{"node": ["1069462", "1069454", "1069242", "office-products", "490790011", "12899801"]}', 2972);
Query OK, 1 row affected (0.01 sec)

mysql> select count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> select distinct count(1) from t1 force index(idx_move_time_sale_count_tree) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from t1 force index(idx_move_time) where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> select count(1) from t1 where move_time = '2022-8-22';
+----------+
| count(1) |
+----------+
|        3 |
+----------+

Most likely fixed in 8.0.29 after - SELECT COUNT(*) using a multi-valued index reported the wrong number of rows. (Bug #104898, Bug #33334928)

More https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html

regards,
Umesh