Bug #56927 | SUM(distinct) gives wrong result when reducing max_heap_table_size | ||
---|---|---|---|
Submitted: | 22 Sep 2010 12:01 | Modified: | 16 Jan 2016 15:45 |
Reporter: | Øystein Grøvlen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.5.7,5.1.51 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | disabled |
[22 Sep 2010 12:01]
Øystein Grøvlen
[22 Sep 2010 12:22]
Valeriy Kravchuk
Verified with current mysql-5.5 tree on Mac OS X: macbook-pro:5.5 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.5.7-rc-debug Source distribution Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to modify and redistribute it under the GPL v2 license Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE TABLE t1 (id INTEGER); Query OK, 0 rows affected (0.19 sec) mysql> CREATE TABLE t2 (id INTEGER); Query OK, 0 rows affected (0.19 sec) mysql> INSERT INTO t1 (id) VALUES (1), (1), (1),(1); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 (id) SELECT id FROM t1; /* 8 */ Query OK, 4 rows affected (0.01 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 (id) SELECT id FROM t1; /* 12 */ Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 (id) SELECT id FROM t1; /* 16 */ Query OK, 16 rows affected (0.00 sec) Records: 16 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 (id) SELECT id FROM t1; /* 20 */ Query OK, 32 rows affected (0.00 sec) Records: 32 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 (id) SELECT id FROM t1; /* 24 */ Query OK, 64 rows affected (0.01 sec) Records: 64 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+1 FROM t1; Query OK, 128 rows affected (0.01 sec) Records: 128 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+2 FROM t1; Query OK, 256 rows affected (0.03 sec) Records: 256 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+4 FROM t1; Query OK, 512 rows affected (0.12 sec) Records: 512 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+8 FROM t1; Query OK, 1024 rows affected (0.13 sec) Records: 1024 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+16 FROM t1; Query OK, 2048 rows affected (0.07 sec) Records: 2048 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+32 FROM t1; Query OK, 4096 rows affected (0.41 sec) Records: 4096 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+64 FROM t1; Query OK, 8192 rows affected (0.56 sec) Records: 8192 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+128 FROM t1; Query OK, 16384 rows affected (0.55 sec) Records: 16384 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+256 FROM t1; Query OK, 32768 rows affected (0.75 sec) Records: 32768 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+512 FROM t1; Query OK, 65536 rows affected (1.51 sec) Records: 65536 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+1024 FROM t1; Query OK, 131072 rows affected (3.16 sec) Records: 131072 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+2048 FROM t1; Query OK, 262144 rows affected (7.08 sec) Records: 262144 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+4096 FROM t1; Query OK, 524288 rows affected (13.11 sec) Records: 524288 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t1 SELECT id+8192 FROM t1; Query OK, 1048576 rows affected (26.12 sec) Records: 1048576 Duplicates: 0 Warnings: 0 mysql> INSERT INTO t2 SELECT id FROM t1 ORDER BY id*rand(); Query OK, 2097152 rows affected (1 min 3.31 sec) Records: 2097152 Duplicates: 0 Warnings: 0 mysql> SELECT SUM(DISTINCT id) sm FROM t2; +-----------+ | sm | +-----------+ | 134225920 | +-----------+ 1 row in set (8.22 sec) mysql> SET max_heap_table_size=16384; Query OK, 0 rows affected (0.00 sec) mysql> SHOW variables LIKE 'max_heap%'; +---------------------+-------+ | Variable_name | Value | +---------------------+-------+ | max_heap_table_size | 16384 | +---------------------+-------+ 1 row in set (0.00 sec) mysql> SELECT SUM(DISTINCT id) sm FROM t2; +------+ | sm | +------+ | NULL | +------+ 1 row in set (8.66 sec)
[23 Sep 2010 8:00]
Øystein Grøvlen
I have checked this on head of 5.1, and this bug does not exist there. Hence this is a 5.5 regression.
[24 Sep 2010 9:16]
Øystein Grøvlen
A manual "bzrfind" reveals that this regression is introduced by: revno: 2875.8.8 revision-id: joro@sun.com-20090928072125-cbx5j0v2oe5d7av6 parent: alik@sun.com-20090925094758-j4rfs37jf3v482t9 committer: Georgi Kodinov <joro@sun.com> branch nick: mysql-wl3220-next-mr timestamp: Mon 2009-09-28 10:21:25 +0300 message: Ported WL#3220 to mysql-next-mr.
[7 Oct 2010 13:51]
Georgi Kodinov
This fails on 5.1 : +SELECT version(); +version() +5.1.51-debug-log +CREATE TABLE t1 (id INTEGER); +CREATE TABLE t2 (id INTEGER); +INSERT INTO t1 (id) VALUES (1), (1), (1),(1); +INSERT INTO t1 (id) SELECT id FROM t1; +/* 8 */ +INSERT INTO t1 (id) SELECT id FROM t1; +/* 12 */ +INSERT INTO t1 (id) SELECT id FROM t1; +/* 16 */ +INSERT INTO t1 (id) SELECT id FROM t1; +/* 20 */ +INSERT INTO t1 (id) SELECT id FROM t1; +/* 24 */ +INSERT INTO t1 SELECT id+1 FROM t1; +INSERT INTO t1 SELECT id+2 FROM t1; +INSERT INTO t1 SELECT id+4 FROM t1; +INSERT INTO t1 SELECT id+8 FROM t1; +INSERT INTO t1 SELECT id+16 FROM t1; +INSERT INTO t1 SELECT id+32 FROM t1; +INSERT INTO t1 SELECT id+64 FROM t1; +INSERT INTO t1 SELECT id+128 FROM t1; +INSERT INTO t1 SELECT id+256 FROM t1; +INSERT INTO t1 SELECT id+512 FROM t1; +INSERT INTO t1 SELECT id+1024 FROM t1; +INSERT INTO t1 SELECT id+2048 FROM t1; +INSERT INTO t1 SELECT id+4096 FROM t1; +INSERT INTO t1 SELECT id+8192 FROM t1; +INSERT INTO t1 SELECT id+16384 FROM t1; +INSERT INTO t2 SELECT id FROM t1 ORDER BY id*rand(); +SELECT SUM(DISTINCT id) sm FROM t2; +sm +536887296 +SET max_heap_table_size=16384; +SHOW variables LIKE 'max_heap_table_size'; +Variable_name Value +max_heap_table_size 16384 +SELECT SUM(DISTINCT id) sm FROM t2; +sm +0 The problem is like this : The Unique class starts collecting the distinct values in memory. If the memory is too low (as is the case here) it swaps sorted fragments to a disk file and then takes the next portion. When the incoming values are done it produces the ordered sequence by merge-sorting the sorted disk fragments. In order to do this it also requires a memory block. If this memory block that is needed by merge sort is too big to fit into the max_heap_table_size segment it will just bail out doing nothing (as demonstrated by the test case above). Now how does this relate to WL#3220 ? in this case the SUM(DISTINCT) in 5.5 uses a longlong (8 bytes) key to store the data. Thus it overflows quicker than 5.1 (that uses a 4 bytes key). This is why in the 5.1 example above I had to insert some more data to demonstrate the problem.
[14 Dec 2015 15:18]
Øystein Grøvlen
Posted by developer: In MySQL 5.7, a significant larger buffer is needed in order for the test to succeed. While max_heap_table_size=32k was sufficient in 5.6, 512k is not sufficient in 5.7. I have run git bisect, and the offending commit is: commit 341696fa105d8010457915fc1eab4d14a2ceeaef Author: Tor Didriksen <tor.didriksen@oracle.com> Date: Wed Nov 5 11:01:23 2014 +0100 Bug#19917028 TEMPLATE-BASED QUEUE/HEAP IN MERGE_WALK AND ROR_UNION Our QUEUE implementation should be replaced by a templatized C++ version. Templatized heaps are - type safe - easier to read/maintain - faster
[15 Dec 2015 10:11]
Øystein Grøvlen
Posted by developer: Correction: In 5.6, the setting for max_heap_table_size needs to be 22k for the test to succeed (21k fails). In 5.7, 512k succeeds while 511k does not. Since the table has only 16k records I suspect this means that our Unique algorithm in 5.7 no longer works when the buffer overflows. If so, for larger tables, the default setting may also give wrong results. Testing this on DBT3 scale factor 1 shows that something is fishy: mysql> select sum(distinct l_orderkey) from lineitem; +--------------------------+ | sum(distinct l_orderkey) | +--------------------------+ | 4499987250000 | +--------------------------+ 1 row in set (5,27 sec) mysql> select sum(distinct l_orderkey) from lineitem use index(); +--------------------------+ | sum(distinct l_orderkey) | +--------------------------+ | 4499993541458 | +--------------------------+ 1 row in set (5,43 sec) In MySQL 5.6 both queries give the same result.
[15 Dec 2015 10:22]
Øystein Grøvlen
Posted by developer: Another correction: Table has 2M records, but 16k unique values.
[16 Jan 2016 15:45]
Paul DuBois
Noted in 5.7.11, 5.8.0 changelogs. Queries using SUM(DISTINCT) could produce incorrect results when there were many distinct values.