Bug #88918 Minimized serialized dictionary information
Submitted: 14 Dec 2017 11:54 Modified: 5 Feb 2018 13:33
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Data Dictionary Severity:S3 (Non-critical)
Version:8.0.5 OS:Any
Assigned to: CPU Architecture:Any

[14 Dec 2017 11:54] Magnus Blåudd
The new data dictionary(DD) in MySQL Server 8.0 implements a serialized dictionary information format(called SDI) which is intended to be used to describe the metadata of DD objects in the MySQL Server. 

JSON is the current SDI format. There are serialize() and deserialize() functions for packing and unpacking the SDI.

Currently the JSON is serialized in "pretty format", this makes it easy for humans to read the serialized SDI as it's plain text formatted in a nice way. However this causes the SDI data to be unnecessarily large. Typically the whitespace and newlines in the SDI makes up 50% of it's size.

For example a 512 attribute table will give a 678883 bytes SDI and it's possible to get it even bigger using large DEFAULT values. By removing the whitespace and newlines from the SDI it's size will shrink to 357819 bytes. This is quite a significant saving for this data which typically will be modified fairly seldom and saved for a long time.

How to repeat:
The above values was found by adding some printouts to the code in ha_ndbcluster which uses the dd::serialize() function. The value for compact was achieved by changing sdi.cc to use Writer instead of PrettyWriter.

Also it was checked how well the SDI can be compressed with zlib(the compression library provided with MySQL). The pretty format compressed to 6829 bytes and the compact format compressed to 5278. This bug report is not suggesting that the SDI should be compressed, just showing that zlib will do a better job with the whitespace removed.

Suggested fix:
Add support for generating SDI which is in compact JSON format. 

This is supported by rapidjson by simply switching to use the Writer class instead of PrettyWriter when serializing the metadata. Serializing in a more compact format will not affect deserialize in any way since it will still be valid JSON. Unfortunately the JSON will be much harder to read by a human.

Alternative solutions:
1) Change dd::serialize() to use Writer instead of PrettyWriter
2) Add new function dd::serialize_minimized()
3) Do 1 and then add dd::serialize_pretty()
4) Add parameter "pretty" to dd::serialize()
5) Use the pretty format for "small" tables, switching to the compact format when the generated JSON exceeds some threshold.

Probably it's nice to keep the ability to somehow use the "pretty format" for debugging. But we could just as well have a dd::sdi_t pretty(dd::sdi_t sdi) function which unpacks the sdi and returns it in pretty format, that's a trivial function to implement using rapidjson.
[5 Feb 2018 13:29] Daniel Price
Posted by developer:
Changed version fixed to 8.0.5.
[5 Feb 2018 13:33] Daniel Price
Posted by developer:
Fixed as of the upcoming 8.0.5 release, and here's the changelog entry:

To reduce its size and storage footprint, Serialized Dictionary
Information (SDI) is now generated in a compact JSON format.