| Bug #104040 | Update histogram with json data | ||
|---|---|---|---|
| Submitted: | 16 Jun 2021 13:26 | Modified: | 28 Jun 2022 10:10 |
| Reporter: | Kaiwang CHen (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
| Version: | 8.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[17 Jun 2021 6:24]
MySQL Verification Team
Hello Kaiwang, Thank you for the feature request! regards, Umesh
[21 Sep 2021 1:33]
casa zhang
Hi, This is really an useful feature. Besides offload of histogram gathering tasks, it helps tools to migrate optimization context between servers. I have a patch for it. Note that this patch depends on that of Bug#104108 to be stable on tests, so it is also included from reference.
[21 Sep 2021 1:37]
casa zhang
ANALYZE TABLE UPDATE HISTOGRAM USING DATA (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Bug-104040-ANALYZE-TABLE-UPDATE-HISTOGRAM-USING-DATA.patch (application/octet-stream, text), 230.51 KiB.
[28 Jun 2022 10:10]
Jon Stephens
Fixed in MySQL 8.0.31 as part of WL#15123. See same for docs info. Closed.
[20 Jul 2022 15:30]
Frederic Descamps
Thank you for your contribution ! As you could read this has been included in MySQL 8.0.31. Cheers
[12 Oct 2022 15:10]
Jon Stephens
Documented fix as follows in the MySQL 8.0.31 changelog:
It is now possible to set a column histogram to a user-specified
JSON value. This can be useful when sampling leaves out
important values. This also means that a secondary (replica)
MySQL server can assume the work of sampling the data and
building the histogram, which can then be used on the primary
(source) without impacting its performance.
This capability is provided through an extension to ANALYZE
TABLE ... UPDATE HISTOGRAM. The following statement sets the
histogram for column col_name of table tbl_name to the
histogram's JSON representation json_data:
ANALYZE TABLE tbl_name
UPDATE HISTOGRAM ON col_name
USING DATA 'json_data'
You can confirm afterwards this by checking the value of the
HISTOGRAM column of the Information Schema COLUMN_STATISTICS
table.
See Histogram Statistics Analysis, for more information and
examples.
Our thanks to Kaiwen Chen for this contribution to MySQL.
Also updated "ANALYZE TABLE Syntax" in the 8.0 Manual.
(Already closed.)

Description: Histogram was introduced in 8.0 to support estimating filter factor, however, gathering histogram is a heavy operation. A workaround may be gathering histograms in a separated standby machine then setting the constructed histogram directly in the production machine. However, MySQL currently does not support setting histograms directly. How to repeat: 1. Gather histogram in a standby machine with ANALYZE TABLE t UPDATE HISTOGRAM ON c1; -- needs write permission SELECT * FROM information_schema.column_statistics; or, SELECT histogram(c1) FROM t; -- read-only, not supported yet SELECT c1 FROM t; -- read-only, construct histogram elsewhere 2. set the histogram in a production machine (this issue). Suggested fix: Add a new syntax (set histogram): ANALYZE TABLE t UPDATE HISTOGRAM ON c1 USING DATA '...'; Existing syntax (gather histogram): ANALYZE TABLE t UPDATE HISTOGRAM ON c1,... WITH n BUCKETS; Some notes: 1. JSON in the SQL is standard json rather than binary json, so it needs compatible json_to_histogram() functions. 2. some fundamental structural checks against the input json. 3. values in the input json should conform to column definition, although it is almost impossible to check real meaning. Example: mysql> analyze table t update histogram on c1 using data '{"buckets": [[1, 0.3333333333333333], [9, 0.6666666666666666], [10, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2021-06-08 14:19:53.088572", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 100}'; +--------+-----------+----------+-----------------------------------------------+ | Table | Op | Msg_type | Msg_text | +--------+-----------+----------+-----------------------------------------------+ | test.t | histogram | status | Histogram statistics created for column 'c1'. | +--------+-----------+----------+-----------------------------------------------+ 1 row in set (0.02 sec) mysql> select * from information_schema.column_statistics\G *************************** 1. row *************************** SCHEMA_NAME: test TABLE_NAME: t COLUMN_NAME: c1 HISTOGRAM: {"buckets": [[1, 0.3333333333333333], [9, 0.6666666666666666], [10, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2021-06-16 13:04:58.234165", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 100} 1 row in set (0.00 sec)