Bug #104108 | cumulative frequency may grow over 1.0 due to float errors | ||
---|---|---|---|
Submitted: | 24 Jun 2021 17:06 | Modified: | 11 Oct 2021 23:52 |
Reporter: | Kaiwang CHen (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0.25 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[24 Jun 2021 17:06]
Kaiwang CHen
[25 Jun 2021 6:46]
MySQL Verification Team
Hello Kaiwang, Thank you for the report and feedback. regards, Umesh
[25 Jun 2021 6:47]
MySQL Verification Team
- 8.0.25 - either use ./mtr main.histograms or run below which is extracted from the same source i.e histograms.test CREATE TABLE t2 ( pk int(11) NOT NULL AUTO_INCREMENT, col_int int(11) NOT NULL, col_int_key int(11) GENERATED ALWAYS AS ((col_int + col_int)) VIRTUAL, col_date date NOT NULL, col_date_key date GENERATED ALWAYS AS ((col_date + interval 30 day)) VIRTUAL, col_datetime datetime NOT NULL, col_time time NOT NULL, col_datetime_key datetime GENERATED ALWAYS AS (addtime(col_datetime,col_time)) VIRTUAL, col_time_key time GENERATED ALWAYS AS (addtime(col_datetime,col_time)) VIRTUAL, col_varchar varchar(1) NOT NULL, col_varchar_key varchar(2) GENERATED ALWAYS AS (concat(col_varchar,col_varchar)) VIRTUAL, PRIMARY KEY (pk), KEY col_date_key (col_date_key) ) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=utf8mb4; INSERT INTO t2 (pk, col_int, col_date, col_datetime, col_time, col_varchar) VALUES (1,2,'2002-10-13','1900-01-01 00:00:00','00:00:00','s'),(2,4,'1900-01-01','2005-08-15 00:00:00','15:57:25','r'),(3,8,'0000-00-00','1900-01-01 00:00:00','07:05:51','m'),(4,4,'2006-03-09','2008-05-16 08:09:06','19:22:21','b'),(5,4,'2001-06-05','2001-01-20 12:47:23','03:53:16','x'),(6,7,'2006-05-28','2008-07-02 00:00:00','09:16:38','g'),(7,4,'2001-04-19','1900-01-01 00:00:00','15:37:26','p'),(8,1,'1900-01-01','2002-12-08 11:34:58','00:00:00','q'),(9,9,'2004-08-20','1900-01-01 00:00:00','05:03:03','w'),(10,4,'2004-10-10','1900-01-01 00:00:00','02:59:24','d'),(11,8,'2000-04-02','2002-08-25 20:35:06','00:01:58','e'),(12,4,'2006-11-02','2001-10-22 11:13:24','00:00:00','b'),(13,8,'2009-01-28','2003-03-12 02:00:34','02:20:16','y'),(14,0,'2005-04-19','2007-04-10 12:16:04','04:59:50','p'),(15,0,'2006-08-12','2009-11-07 00:00:00','21:14:04','f'),(16,0,'2005-03-12','2003-12-04 11:14:26','00:00:00','p'),(17,7,'1900-01-01','2006-09-11 18:25:21','12:59:27','d'),(18,7,'1900-01-01','1900-01-01 00:00:00','16:39:36','f'),(19,5,'0000-00-00','2001-07-25 08:40:24','00:00:00','j'),(20,3,'2007-09-09','2009-06-07 13:48:58','00:00:00','e'); analyze table t2; ANALYZE TABLE t2 UPDATE HISTOGRAM ON col_datetime WITH 100 BUCKETS; select * from information_schema.column_statistics; - # SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, HISTOGRAM 'u', 't', 'c1', '{\"buckets\": [[1.0, 3.0, 0.5, 3], [4.0, 5.0, 1.0, 2]], \"data-type\": \"double\", \"null-values\": 0.0, \"collation-id\": 8, \"last-updated\": \"2021-06-25 06:13:27.195500\", \"sampling-rate\": 1.0, \"histogram-type\": \"equi-height\", \"number-of-buckets-specified\": 2}' 'u', 't2', 'col_datetime', '{\"buckets\": [[\"1900-01-01 00:00:00.000000\", 0.3], [\"2001-01-20 12:47:23.000000\", 0.35], [\"2001-07-25 08:40:24.000000\", 0.39999999999999997], [\"2001-10-22 11:13:24.000000\", 0.44999999999999996], [\"2002-08-25 20:35:06.000000\", 0.49999999999999994], [\"2002-12-08 11:34:58.000000\", 0.5499999999999999], [\"2003-03-12 02:00:34.000000\", 0.6], [\"2003-12-04 11:14:26.000000\", 0.65], [\"2005-08-15 00:00:00.000000\", 0.7000000000000001], [\"2006-09-11 18:25:21.000000\", 0.7500000000000001], [\"2007-04-10 12:16:04.000000\", 0.8000000000000002], [\"2008-05-16 08:09:06.000000\", 0.8500000000000002], [\"2008-07-02 00:00:00.000000\", 0.9000000000000002], [\"2009-06-07 13:48:58.000000\", 0.9500000000000003], [\"2009-11-07 00:00:00.000000\", 1.0000000000000002]], \"data-type\": \"datetime\", \"null-values\": 0.0, \"collation-id\": 8, \"last-updated\": \"2021-06-25 06:45:02.674980\", \"sampling-rate\": 1.0, \"histogram-type\": \"singleton\", \"number-of-buckets-specified\": 100}'
[17 Sep 2021 6:10]
casa zhang
Hi, Looks like only Singleton has such problem because it accumulates floats, Equi-height hasn’t because it accumulates integers. I have a precise fix for this case. See enclosed.
[17 Sep 2021 6:11]
casa zhang
CUMULATIVE FREQUENCY MAY GROW OVER 1.0 DUE TO FLOAT ERRORS (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Bug-104108-CUMULATIVE-FREQUENCY-MAY-GROW-OVER-1.0-DU.patch (application/octet-stream, text), 9.18 KiB.
[11 Oct 2021 23:52]
Jon Stephens
Documented fix as follows in the MySQL 8.0.28 changelog: When a singleton histogram is constructed, its cumulative frequency is calculated by adding frequencies of previous buckets with the current bucket; because a floating-point value was used for the accumulator, this sometimes led to accumulated float errors, with the final cumulative frequency fractionally greater than 1.0. This fix accumulates the frequency with an integer type instead, to avoid intermediate floating-point errors. Our thanks to Casa Zhang and the Tencent team for the contribution. Closed.