MySQL Bugs: #114475: COUNT(1) vs SUM of partial counts

Bug #114475	COUNT(1) vs SUM of partial counts
Submitted:	25 Mar 2024 15:24	Modified:	9 May 2024 8:46
Reporter:	Robbert Nix	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	8.0.35	OS:	Windows (11)
Assigned to:		CPU Architecture:	x86 (32GB)

Description:
I have a table t(k1 int, k2 int, f varchar(4000) )
with 3M rows, size in disk 10GB

running
        select count(1) from t;
takes 30 minutes.

running
        select sum(a) from (select count(1) a, k1 from test.t4 group by 2) as b;
takes 1 minute.

For a table of half the size both sqls run in 10 seconds.

I can reproduce this on different windows pc's, but not on a Linux box.

How to repeat:
create schema test;
drop table if exists test.h; 
drop table if exists test.t4; 
CREATE TABLE test.t4 (k1 int NOT NULL,k2 int not null
 , f varchar(4000) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (k1,k2)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin 
 ROW_FORMAT=COMPRESSED;

create table test.h (i int);
insert into test.h values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
insert into test.t4 (k1,k2,f) select 0 , 
 h0.i+10*h1.i+100*h2.i+1000*h3.i+10000*h4.i, SPACE(4000)
 from test.h h0,test.h h1,test.h h2,test.h h3,test.h h4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f 
from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
select sum(a) from (select count(1) a, k1 from test.t4 group by 2) as b;
select count(1) from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
select sum(a) from (select count(1) a, k1 from test.t4 group by 2) as b;
select count(1) from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
select sum(a) from (select count(1) a, k1 from test.t4 group by 2) as b;
select count(1) from test.t4;
insert into test.t4 (k1,k2,f) SELECT (SELECT 1+max(k1) from test.t4) + k1,k2,f from test.t4;
select sum(a) from (select count(1) a, k1 from test.t4 group by 2) as b;
select count(1) from test.t4;

HI Mr. Nix,

Thank you for your bug report.

However, this is not our bug.

We repeated your test case, but we found out that difference between different operating systems, different settings of the operating systems is due only to the performance of the filesystem cacheing.

It is a very well known fact that Linux has by far the best filesystem cacheing of all existing operating systems. Even macOS has excellent filesystem cacheing, unlike Windows.

You should report this bug to Microsoft.

Not a bug.

Thank you for your quick reply. Is there any explanation on why this problem does not arise when you run an sql that counts subtotals? Both are supposed to do a full index scan.

Hi,

Yes, there is an explanation .....

When calculating subtotals, filesystem has to be touched, just touched, due to the aggregation .... That improves Windows filesystem cacheing ..... which is why Windows is   not as widespread for production as Linux .....

Please run EXPLAIN ANALYZE of the queries.

Thank you, Roy.

EXPLAIN analyze select sum(a) from (select count(1) a, k1 from test.t3 group by 2) as b;

137.157 sec 

-> Aggregate: sum(b.a)  (cost=4.63e+6..4.63e+6 rows=1) (actual time=137153..137153 rows=1 loops=1)
    -> Table scan on b  (cost=4.63e+6..4.63e+6 rows=1) (actual time=137152..137152 rows=128 loops=1)
        -> Materialize  (cost=4.63e+6..4.63e+6 rows=1) (actual time=137152..137152 rows=128 loops=1)
            -> Group aggregate: count(1)  (cost=4.63e+6 rows=1) (actual time=5242..137151 rows=128 loops=1)
                -> Covering index scan on t3 using PRIMARY  (cost=3.71e+6 rows=9.28e+6) (actual time=11.7..136670 rows=12.8e+6 loops=1)

EXPLAIN analyze select count(1) from test.t3

8260.390 sec

-> Count rows in t3  (actual time=8.26e+6..8.26e+6 rows=1 loops=1)

It could be that the problem is caused by paging, therefore I tried this script with the page file set to max 4 GB. The runtimes were the same as with 20GB of pagefile (selected by windows).

Hi Mr. Nix,

This turns out to be potentially a performance problems with InnoDB.

Hence, we shall consult InnoDB team on whether this could qualify as Performance bug.

Thank you for your quick update

Hello Robbert,

It looks like this problem is fixed in 8.0.37. Would you be able to confirm it, please.

Hi Mr. Nix,

This bug is definitely  fixed in 8.0.35.

We checked it 4 (four) times.

This bug is now closed.

Hallo, I upgraded to 8.0.37 today, and I can comfirm that the problem has been fixed for me.
Out of curiosity: can you tell me what the problem was?
Is there any relation to windows paging mechanism?

Many thanks, Robbert

Thank you Robbert for conforming. 
I did not investigate the problem or zeroed it down to an individual commit.
But if I were to guess it would be https://bugs.mysql.com/bug.php?id=113482 fix. 

Perhaps you would be using HDD that would have saturated the IO. 
The fix is to enable to linear read ahead during scan.

Thank you, Rahul.