Bug #72015 InnoDB calculates estimated row count wildly incorrectly with BLOBs
Submitted: 12 Mar 2014 0:26 Modified: 6 Jun 2014 8:47
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.6.16, 5.6.18, 5.7.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: innodb blob row estimate, regression

[12 Mar 2014 0:26] Jeremy Cole
Description:
In MySQL 5.6 the "Rows" output by SHOW TABLE STATUS and TABLE_ROWS column of information_schema.TABLES (in addition to the average rows size columns) show wildly inaccurate information if the table contains a BLOB column.

I was not able to find the cause of this as I can scarcely understand the new statistics collection code in MySQL 5.6. It is however easily repeatable, so I've produced a test case which creates a table with a BLOB, inserts a few rows, and demonstrates that the statistics are way off. In the test I have typically seen the estimated-to-actual ratio that it calculates be 12, when on MySQL 5.5 it is 1. In production one of our tables is showing a ratio of ~167, and the ratio seems to scale with the size of the table (it is not random).

How to repeat:
CREATE TABLE t (i INT NOT NULL auto_increment, b BLOB, PRIMARY KEY (i)) ENGINE=InnoDB;

INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));
INSERT INTO t (b) VALUES (REPEAT("A", 10000));

ANALYZE TABLE t;

# Calculate the estimated to actual ratio. Ideally it should be 1.
SELECT FLOOR((table_rows / (SELECT COUNT(*) FROM t)) + 0.5) AS should_be_1
FROM information_schema.tables
WHERE table_schema = "test" AND table_name = "t";

DROP TABLE t;

Suggested fix:
Calculate the row count correctly. :)
[12 Mar 2014 7:34] MySQL Verification Team
Hello Jeremy,

Thank you for the bug report and test case.
Verified as described.

Confirmed that 5.6.16,5.6.18(trunk), and 5.7.5 are affected and 5.5(checked 5.5.38) works fine.

Thanks,
Umesh
[3 Apr 2014 8:02] Vasil Dimov
Hello, a fix for this has been pushed to mysql-trunk under:
vasil.dimov@oracle.com-20140403070224-eu2mw56ut6ydp354 and
vasil.dimov@oracle.com-20140403070651-w1nefsafrqeid6ct. It will be backported to mysql-5.6 soon.

I confirm that the above two patches fix this bug:

without the patches:
--------------------
mysql> SELECT FLOOR((table_rows / (SELECT COUNT(*) FROM t)) + 0.5) AS should_be_1
    -> FROM information_schema.tables
    -> WHERE table_schema = "test" AND table_name = "t";
+-------------+
| should_be_1 |
+-------------+
|          12 |
+-------------+
1 row in set (0.00 sec)

with the patches:
-----------------
mysql> SELECT FLOOR((table_rows / (SELECT COUNT(*) FROM t)) + 0.5) AS should_be_1
    -> FROM information_schema.tables
    -> WHERE table_schema = "test" AND table_name = "t";
+-------------+
| should_be_1 |
+-------------+
|           1 |
+-------------+
1 row in set (0.01 sec)
[5 May 2014 19:34] Jeremy Cole
Will this fix appear in the next 5.6.19 release?
[6 Jun 2014 8:47] Erlend Dahl
[5 Jun 2014 7:54] Daniel T Price

Fixed as of the 5.6.19 and the upcoming 5.7.5 releases, and here's the changelog
entry:

With persistent statistics enabled, "SHOW TABLE STATUS" output and the
"TABLE_ROWS" column of "INFORMATION_SCHEMA.TABLES" could report an
incorrect number of tables rows for tables with externally stored pages.