Bug #99714 Updates to JSON column type are not
Submitted: 28 May 2020 3:30 Modified: 3 Jul 2020 17:51
Reporter: Noah Zucker Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.15 OS:Linux
Assigned to: CPU Architecture:Any

[28 May 2020 3:30] Noah Zucker
Description:
We have a simple table having the following schema:

CREATE TABLE `tasks` (
  `task_id` binary(24) NOT NULL,
  `task` json NOT NULL,
  `task_kryo` mediumblob,
  `task_type` varchar(180) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$.t'))) STORED,
  `created` datetime GENERATED ALWAYS AS (cast(left(json_unquote(json_extract(`task`,_utf8mb4'$._task.timestamp')),19) as datetime)) STORED,
  `last_updated` datetime GENERATED ALWAYS AS (cast(left(json_unquote(json_extract(`task`,_utf8mb4'$._task.latestStatus.timestamp')),19) as datetime)) STORED,
  `latest_status` varchar(180) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$._task.latestStatus.t'))) STORED,
  `marker` binary(24) GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$.marker'))) STORED,
  PRIMARY KEY (`task_id`),
  KEY `task_type_index` (`task_type`),
  KEY `created_index` (`created`),
  KEY `last_updated_index` (`last_updated`),
  KEY `latest_status_index` (`latest_status`),
  KEY `marker_index` (`marker`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Our application inserts a row and them performs updates on the column "task" which is JSON column type, using in-place JSON document updates with JSON_SET as described here:

https://mysqlhighavailability.com/efficient-json-replication-in-mysql-8-0/

Over time, the size of the table grows and grows even though we purge old records nightly.

As of today, MySQL reported that the tasks table (schema above) had size 40gb and show table status reported 3MM - 7MM rows.

However, select count(*) tasks reported ~500k rows, and the following query.

SELECT SUM(octet_length(marker))/ 1024 / 1024 / 1024, 
sum(octet_length(task_id))/ 1024 / 1024 / 1024, 
sum(octet_length(task_kryo))/ 1024 / 1024 / 1024, 
Sum(OCTET_LENGTH(task))/ 1024 / 1024 / 1024, 
sum(octet_length(latest_status))/ 1024 / 1024 / 1024 
FROM tasks;

Showed that our large size columns has less than 10gb total data between them.

In addition, this query showed almost no" data free: space in the table
SELECT 
	table_schema,
	SUM(data_length + index_length + data_free)/1024/1024/1024 AS total_gb,
	SUM(data_length)/1024/1024/1024 AS data_gb,
	SUM(index_length)/1024/1024/1024 AS index_gb,
	SUM(data_free)/1024/1024/1024 AS free_gb,
	COUNT(*) AS tables,
	CURDATE() AS today 
FROM 
	information_schema.tables
	GROUP BY table_schema
	ORDER BY 2 DESC
;

We ran OPTIMIZE TABLE tasks.  Indeed, when this completed, the size of the tasks table reported by MySQL reduced from 40gb to 12gb, and the row count from table status now more closely reflects the value of SELECT COUNT(*) FROM tasks. i.e. 100k off vs reporting 3MM - 7MM rows (wildly inaccurate).  

How to repeat:
Create a simple schema like the following:

CREATE TABLE `tasks` (
  `id` long not null,
  `doc` json NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Write a simple application that in a tight loop inserts and then updates the document

INSERT INTO tasks(id, doc) values(1, '{"a" : 1}')

FOR n = 1 to 10000000
  UPDATE tasks SET doc = JSON_SET(doc, '$.a", n) WHERE id = 1

Rather than staying constant in size, the table size will continuously grow
[28 May 2020 3:31] Noah Zucker
https://mysqlserverteam.com/partial-update-of-json-values/
[28 May 2020 8:24] Frederic Descamps
I've tried to reproduce this on the latest MySQL (8.0.20), and I don't see such behavior:

I used the following bash script

for n in `seq 1 100000`; do  mysqlsh root@localhost/test --sql -e "UPDATE tasks SET doc = JSON_SET(doc, '$.a', $n) WHERE id =1" >/dev/null 2>&1; done

The size didn't change:

-rw-r----- 1 mysql mysql 112K May 28 08:50 /var/lib/mysql/test/tasks.ibd

-rw-r----- 1 mysql mysql 112K May 28 10:12 /var/lib/mysql/test/tasks.ibd
[28 May 2020 9:27] MySQL Verification Team
Please see last comment and try 8.0.20. Thanks.
[28 May 2020 15:33] Noah Zucker
Thank you, I'm going to test with local mysql in 8.0.15 and then 8.0.20
[1 Jun 2020 13:23] MySQL Verification Team
Hi Mr. Zucker,

Thank you for your bug report.

However, I do not think that this is a bug.

InnoDB does use lot's of space and only dump / restore or OPTIMIZE can revoke that space. There is no other way of accomplishing that.

Also, for reporting on the number of rows, size of the tablespace and other statistics, send us data from the INFORMATION_SCHEMA. That info is much closer to real data then the one that you have used.

To report a bug, hence use I_S or P_S respectively.
[2 Jul 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[3 Jul 2020 17:51] Noah Zucker
Since opening this report I've since set our DB to run OPTIMIZE on a weekly schedule.

I agree it'd not a bug, but it seems to me that there is some room for improvement / optimization of how InnoDB stores data / updates to JSON column types.

I'm fine with closing this ticket.
[6 Jul 2020 12:01] MySQL Verification Team
Thank you, Mr. Zucker.