Bug #97709 MySQL 8 Select Count(*) is very slow
Submitted: 20 Nov 2019 13:03 Modified: 7 Jan 2020 14:37
Reporter: Vivek Texeira Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:8.0.17 OS:Ubuntu (Ubuntu 18.04.1 LTS)
Assigned to: CPU Architecture:Any

[20 Nov 2019 13:03] Vivek Texeira
Description:
I have a table with 10 columns, in which two columns are JSON field and total records in the table is 1 million. When I do select count(*) it takes more than 10 seconds to display the results, same query and same data in MySQL 5.7 gives results in 0.9 seconds.

If I remove two JSON columns I get results within 1.94 Seconds.

How to repeat:
CREATE TABLE `cases` (
  `id` INT unsigned NOT NULL AUTO_INCREMENT,
  `info` JSON NOT NULL,
  `party` JSON NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

select count(*) from cases;
[20 Nov 2019 16:37] MySQL Verification Team
Hi Mr. Texeira,

Thank you for your bug report.

However, this is not a bug.

InnoDB SE is based on MVCC principles, which means that each transaction can see a different number of rows. Hence, since its very beginning, due to this fact, InnoDB SE had to physically count all rows sequentially. When there are no big objects, like JSON or BLOB, then it of course goes faster.

If you wish to get an approximate number of rows instantly, you have to query information_schema.

This is all described in our Reference Manual.

Not a bug.
[21 Nov 2019 8:36] Vivek Texeira
Same data set works faster on MySQL 5.7 InnoDB.
[21 Nov 2019 13:02] MySQL Verification Team
Yes, that is expected behaviour.

Organisation of pages has slightly changed, for the new features, so the scanning is slower. But, then again, data scanning is not supposed to be fast.
[22 Nov 2019 5:07] Vivek Texeira
Could you please give me the documentation URL where this is mentioned. Thank you.
[22 Nov 2019 12:45] Norvald Ryeng
Hi Vivek,

I'm sorry, I think Sinisa's explanation is incorrect. There shouldn't be any significant performance drop from 5.7 to 8.0.

However, we're not able to reproduce your results. We don't see any performance regression at all. The test case you provided doesn't contain any data, so we used arbitrary values. Can you provide us with some test data to help us reproduce the bug?

Also, please specify the exact 5.7 version you're using.

Best regards,

Norvald H. Ryeng
[22 Nov 2019 13:51] Vivek Texeira
Hello Norvald,

Following are my table structure and sample data set.

CREATE TABLE `cases` (
 `id` INT unsigned NOT NULL AUTO_INCREMENT,
  `short_description` varchar(511) NOT NULL,
  `case_details` JSON NOT NULL,
  `party_details` JSON NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

INSERT INTO cases (`short_description`,`case_details`,`party_details`) VALUES ('Family case was filed by County Of Los Angeles against John in the jurisdiction of Los Angeles County Superior Courts, Central Civil West Courthouse located in Los Angeles, California.', '{"judge": {"link": "", "name": ""}, "case_type": {"link": "", "name": "Child Support", "group": "Family", "category": "Civil"}, "courthouse": {"link": "", "name": "Central Civil West Courthouse"}, "case_status": {"name": "Other Disposed", "category": "Disposed"}, "jurisdiction": {"link": "", "name": "Los Angeles County Superior Courts"}}', '{"more": {"count": 0}, "parties": [{"Other": {"link": "", "name": "FERNANDEZ ANA JANETTA", "attorneys": [], "representation_type": "Unrepresented"}}, {"Plaintiff": {"link": "", "name": "COUNTY OF LOS ANGELES", "attorneys": [{"link": "", "name": "CHILD SUPPORT SERVICES DEPT.", "analytics_link": ""}], "representation_type": "Attorney Represented"}}, {"Defendant": {"link": "", "name": "RUIZ EDWIN SAMUEL", "attorneys": [], "representation_type": "Unrepresented"}}, {"Undetermined": {"link": "", "name": "FERNANDEZ ANA JANETTA", "attorneys": [], "representation_type": "Unrepresented"}}, {"Petitioner": {"link": "", "name": "COUNTY OF LOS ANGELES", "attorneys": [], "representation_type": "Unrepresented"}}, {"Respondent": {"link": "", "name": "RUIZ EDWIN SAMUEL", "attorneys": [], "representation_type": "Unrepresented"}}]}');

My current MySQL version is 5.7.27-0ubuntu0.18.04.1 (Ubuntu)

MySQL 8 Version is Server version: 8.0.17 MySQL Community Server - GPL

Please add at least 1 Million records to see the difference.
[22 Nov 2019 16:42] Norvald Ryeng
Hi Vivek,

We're still not able to reproduce your results. We tried with both 1M and 4M rows:

5.7.27: 1.5s and 6.2s
8.0.17: 1.4s and 4.6s

You only supplied one row, so we inserted that exact same row 1M and 4M times. No special setup, just a plain server with normal settings.

Do you run with any special config in 5.7.27 or 8.0.17?

Best regards,

Norvald
[23 Nov 2019 7:36] Vivek Texeira
My Instance

Attachment: Screenshot from 2019-11-23 13:04:47.png (image/png, text), 128.18 KiB.

[23 Nov 2019 7:38] Vivek Texeira
Hello Norvald,

I have attached a screen shot, I'm using 2GB Ram with SSD AWS (t3.small) instance type with default MySQL configuration with same data provided to you.
[26 Nov 2019 11:55] Norvald Ryeng
Hi Vivek,

I don't doubt that you experience the slowdown, but without being able to reproduce the issue, there's not much we can do.

You say you're running with 2 GB RAM. That is very low. There may be changes in the buffer sizes that affect your query negatively.

Best regards,

Norvald
[27 Dec 2019 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 Jan 2020 16:18] MySQL Verification Team
for what it's worth I was able to verify this,  but re-filed it internally:

Bug 30766089 - 8.0.17: SEVERE REGRESSION IN PERFORMANCE OF SELECT COUNT(*) FROM LARGE TABLE WITH LOB
[15 Jan 2020 17:18] MySQL Verification Team
The problem exists in >=8.0.17 and is even worse when innodb_flush_method=O_DIRECT
[24 Mar 2020 11:19] Adrian Jäkel
We recently switched our amazon db instance to 8.0.17 since it's available there now since march 2020 [1] and are experiencing a heavy drop in performance in our counting scripts. Are there any news on the mentioned internal bug?

[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/WhatsNew.html
[24 Mar 2020 12:29] MySQL Verification Team
Yes, that bug #30766089 I filed is fixed in 8.0.20.  Please wait for the release.
[21 May 2020 13:30] Peter Frederiksen
From MySQL 8.0 Release Notes -> Changes in MySQL 8.0.20 (2020-04-27, General Availability)

InnoDB: Changes to parallel read threads functionality introduced in MySQL 8.0.17 caused a degradation in SELECT COUNT(*) performance. Pages were read from disk unnecessarily. (Bug #30766089)

I'm running version 8.0.20, still VERY VERY slow !!! Table count 5000, no lobs...
[22 May 2020 12:00] MySQL Verification Team
So, that does not seem to be the same bug, but we still need a fully repeatable test case for 8.0.20 or higher.
[22 May 2020 13:09] Peter Frederiksen
Hi, here goes...

CREATE TABLE `photos` (
  `id` int NOT NULL,
  `albumid` int DEFAULT NULL,
  `title` varchar(200) DEFAULT NULL,
  `url` varchar(200) DEFAULT NULL,
  `thumbnailurl` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Attached file, photos.csv, 5000 rows.
Mysql workbench -> select count(*) from <db/schema>.photos;
DB version -> select version() -> 8.0.20
[22 May 2020 13:11] Peter Frederiksen
photos.csv

Attachment: photos.csv (application/vnd.ms-excel, text), 626.47 KiB.

[22 May 2020 13:17] MySQL Verification Team
Hi Mr. Frederiksen,

I hope that you are aware that for :

select count(*) from talbe1;

Innodb has to read all the rows in their entire sizes.

This is described in our Reference Manual.
[22 May 2020 13:27] Peter Frederiksen
Hi
yes, i know... (or i'm missing...?)
DB ver. 5.7.30 -> Select count(*).... -> Duration = 0 sec
DB ver. 8.0.20 -> Select count(*).... -> Duration = never under a 1 sec !!!
[22 May 2020 14:08] MySQL Verification Team
This report remains in the status "Can't repeat".

My results with 8.0.20:

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.058s
user	0m0.010s
sys	0m0.008s

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.018s
user	0m0.009s
sys	0m0.005s

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.065s
user	0m0.009s
sys	0m0.005s

and these are my results with 5.7.30:

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.029s
user	0m0.007s
sys	0m0.004s

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.031s
user	0m0.007s
sys	0m0.005s

+----------+
| count(*) |
+----------+
|     5000 |
+----------+

real	0m0.032s
user	0m0.007s
sys	0m0.005s

Can't repeat.
[22 May 2020 14:49] MySQL Verification Team
I've filed the FR for small tables the other day.

Enh 31376535 - IMPROVE PERFORMANCE OF FREQUENT SELECT COUNT(*) FROM SMALL INNODB TABLE 

Best they can do is set innodb_parallel_read_threads=1 for the session.
[23 May 2020 14:01] Peter Frederiksen
Hi.
Sorry, you can't repeat... 
Searched (google) and found 2 of the most common solutions... (pattern)

1. select count(id) from photos -> Duration = never under a 1 sec. (slow)   
2. select count(*) from photos where id > 0  ->  Duration = 0 sec.  (bingo) 

If it's a bug ?
[5 Aug 2020 16:29] Christopher Martin
I'd like to confirm that I have the same problem with MySQL 8.0.21 on my Windows developer machine (64 GB RAM, Ryzen 3900, NVE SSD) where a count of a simple table with ~ 70.000 data sets takes 22 seconds. Adding WHERE id > 0 as suggested above reduces the query time to 0.2 seconds. So there is definitely a bug in MySQL 8. Testing the same table on a much slower windows machine (Surface Pro 3) with Maria 10 DB or any online Host with MySQL 5.7 also gives instant results.
[5 Aug 2020 17:08] Christopher Martin
I just made a few tests. My table has the following fields:

id (INT primary)
year_id (INT index)
text (TEXT)
hash (CHAR(32) unique)

COUNT(*) takes 22s
COUNT(id) takes 22s
COUNT(id) ... WHERE id > 0 takes 0.2s
COUNT(text) takes 0.2s (despite having no index!)
COUNT(hash) takes 0.02s

EXPLAIN SELECT COUNT(id) shows nothing as possible_keys and "year_id" as key (why?)
whereas
EXPLAIN SELECT COUNT(id) ... WHERE id > 0 shows PRIMARY, year_id as possible_key and PRIMARY as key

To me it looks like MySQL 8 "forgets" to use the primary key if COUNT() is unspecific or using the field of the primary key and the query has no WHERE clause
[6 Aug 2020 12:22] MySQL Verification Team
Hi,

I was not able to repeat last example as well.

However, this is not important, since an internal feature request is already filed for the situations as described in this report.
[11 Dec 2020 7:01] gaijun gao
I ran into the same problem,my MySQL version is 8.0.22.I used the script to insert 199w data, count(*) took 40s but where id>0 took 1.12s.Version 5.7 takes 0.8s
[11 Dec 2020 13:30] MySQL Verification Team
Hi ,

These are expected results.

COUNT(*) can't use indices, due to MVCC, while filtering condition can. Even if filtering data can not use any index, it could benefit from the already cached data pages.