Bug #116118 Need to bound optimise-like statement in order to reclaim the disk space back
Submitted: 16 Sep 8:51 Modified: 16 Sep 13:18
Reporter: Aaditya Dubey Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any

[16 Sep 8:51] Aaditya Dubey
Description:
JSON partial updates using (JSON_SET, JSON_REPLACE, JSON_REMOVE) and other similar functions prevent the free space from being released back to disk.

Direct assignments needs to be do like below in order to release the space after optimize command.

Direct assignments needs to be do like below in order to release the space after optimize command.

How to repeat:
1. Creating a JSON table and pushing some data. The same data can be verified via the hex dump as well.

mysql> CREATE TABLE test01 (a JSON);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test01 (a) values ( json_object('a',repeat('a',1*1024*1024)) );
Query OK, 1 row affected (0.07 sec)

mysql> insert into test01 (a) values ( json_object('a',repeat('a',1*1024*1024)) );
Query OK, 1 row affected (0.04 sec)

hexdump -C test01.ibd

0028bff0  61 61 61 61 61 61 61 61  38 1b ed 99 20 91 fa 20  |aaaaaaaa8... .. |
0028c000  98 73 6d 50 00 00 00 a3  00 00 00 00 ff ff ff ff  |.smP............|
0028c010  00 00 00 09 20 91 fa 20  00 17 00 00 00 00 00 00  |.... .. ........|
0028c020  00 00 00 00 02 30 00 00  00 10 df 00 00 00 01 0b  |.....0..........|
0028c030  0e 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |.aaaaaaaaaaaaaaa|
0028c040  61 61 61 61 61 61 61 61  61 61 61 61 61 61 61 61  |aaaaaaaaaaaaaaaa|
*
0028d110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0028fff0  00 00 00 00 00 00 00 00  98 73 6d 50 20 91 fa 20  |.........smP .. |
00290000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

mysql> select json_storage_size(a), json_storage_free(a) from test01;
+----------------------+----------------------+
| json_storage_size(a) | json_storage_free(a) |
+----------------------+----------------------+
|              1048600 |                    0 |
|              1048600 |                    0 |
+----------------------+----------------------+
2 rows in set (0.01 sec)

2. Now, we will do a partial update using the JSON_REPLACE function. In the hex dump, we can see the previous details with the pattern “a“ still appearing and not cleaned yet. 

mysql> UPDATE test01 SET a = JSON_REPLACE(a, '$.a', 'b');
Query OK, 2 rows affected (0.03 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> select * from test01;
+------------+
| a          |
+------------+
| {"a": "b"} |
| {"a": "b"} |
+------------+
2 rows in set (0.01 sec)

hexdump -C test01.ibd

001c7ff0  61 61 61 61 61 61 61 61  25 c5 15 d3 20 85 98 62  |aaaaaaaa%... ..b|
001c8000  eb 64 b6 0b 00 00 00 72  00 00 00 00 ff ff ff ff  |.d.....r........|
001c8010  00 00 00 09 20 85 98 62  00 17 00 00 00 00 00 00  |.... ..b........|
001c8020  00 00 00 00 02 30 00 00  00 3f c7 00 00 00 01 0b  |.....0...?......|
001cbff0  61 61 61 61 61 61 61 61  eb 64 b6 0b 20 85 98 62  |aaaaaaaa.d.. ..b|
001cc010  00 00 00 09 20 85 98 62  00 17 00 00 00 00 00 00  |.... ..b........|
001cc020  00 00 00 00 02 30 00 00  00 3f c7 00 00 00 01 0b  |.....0...?......|

mysql> select json_storage_size(a), json_storage_free(a) from test01;
+----------------------+----------------------+
| json_storage_size(a) | json_storage_free(a) |
+----------------------+----------------------+
|              1048600 |              1048577 |
|              1048600 |              1048577 |
+----------------------+----------------------+
2 rows in set (0.00 sec)

3. Now, we will try to reclaim the space by running the optimize command. Ideally, after releasing the space, it would be in a few KBs, but we can still see it shows ~5 MB from ~10 MB.

ls -lh test01.ibd
-rw-r-----  1 aniljoshi  admin    10M Sep 13 16:58 test01.ibd

mysql> optimize table test01;
+-------------+----------+----------+-------------------------------------------------------------------+
| Table       | Op       | Msg_type | Msg_text                                                          |
+-------------+----------+----------+-------------------------------------------------------------------+
| test.test01 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.test01 | optimize | status   | OK                                                                |
+-------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.08 sec)

test % ls -lh test01.ibd
-rw-r-----  1 aniljoshi  admin   5.0M Sep 13 17:01 test01.ibd

4. Now, after we run a direct update query which introduces new field, the space is released fully. It showing ~112 KB, which should be the correct behaviour. 

mysql> update test01 set a = '{"a": "b", "c":"d"}';
Query OK, 2 rows affected (0.03 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> select * from test01
    -> ;
+----------------------+
| a                    |
+----------------------+
| {"a": "b", "c": "d"} |
| {"a": "b", "c": "d"} |
+----------------------+
2 rows in set (0.00 sec)

ls -lh test01.ibd
-rw-r-----  1 aniljoshi  admin   112K Sep 13 17:12 test01.ibd

mysql> select json_storage_size(a), json_storage_free(a) from test01;
+----------------------+----------------------+
| json_storage_size(a) | json_storage_free(a) |
+----------------------+----------------------+
|                   25 |                    0 |
|                   25 |                    0 |
+----------------------+----------------------+
2 rows in set (0.00 sec)

Suggested fix:
The point is that it should be better to have some linkage in the optimise table or similar commands to release the space without any such dual efforts, which also seems a bit impractical and unsafe.
[16 Sep 13:18] MySQL Verification Team
Hello Aaditya,

Thank you for the report and test case.

regards,
Umesh