Bug #58234 | LENGTH() / OCTET_LENGTH() on BLOB field very slow | ||
---|---|---|---|
Submitted: | 16 Nov 2010 16:26 | Modified: | 18 Dec 2011 16:32 |
Reporter: | Mike Ruskai | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 5.1.49-r1, 5.5.20 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[16 Nov 2010 16:26]
Mike Ruskai
[22 Dec 2010 7:39]
Thor Harald Johansen
I can confirm this bug. I am storing some 8 GB of data across ~30,000 rows in an InnoDB table and a SELECT query for SUM(OCTET_LENGTH(xxx)) on the BLOB column is taking WAY too long.
[8 Jun 2011 15:31]
Emanuel Greisen
I too can confirm this bug. I have just forked mysqlFS and switched the type of tables to InnoDB. Part of my changes was that upon inserting or updating data-blocks it also "calculates" their length like: INSERT into data_blocks SET data=?, data_size=LENGTH(data);
[18 Dec 2011 16:32]
Valeriy Kravchuk
Verified just as described with current 5.5.20 on Mac OS X: macbook-pro:5.5 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6 Server version: 5.5.20-debug Source distribution Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show create table tblob\G *************************** 1. row *************************** Table: tblob Create Table: CREATE TABLE `tblob` ( `c1` int(11) NOT NULL AUTO_INCREMENT, `c2` longblob, PRIMARY KEY (`c1`) ) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> show table status like 'tblob'\G *************************** 1. row *************************** Name: tblob Engine: InnoDB Version: 10 Row_format: Compact Rows: 130 Avg_row_length: 1060800 Data_length: 137904128 Max_data_length: 0 Index_length: 0 Data_free: 77594624 Auto_increment: 249 Create_time: 2011-12-18 18:23:03 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) mysql> show variables like 'innodb_bu%'; +------------------------------+-----------+ | Variable_name | Value | +------------------------------+-----------+ | innodb_buffer_pool_instances | 1 | | innodb_buffer_pool_size | 134217728 | +------------------------------+-----------+ 2 rows in set (0.00 sec) mysql> select length(c2) from tblob; +------------+ | length(c2) | +------------+ | 1048576 | ... | 1048576 | +------------+ 128 rows in set (3.41 sec) mysql> select length(c2) from tblob; +------------+ | length(c2) | +------------+ | 1048576 | ... | 1048576 | +------------+ 128 rows in set (3.34 sec) mysql> alter table tblob engine = MyISAM; Query OK, 128 rows affected (4.76 sec) Records: 128 Duplicates: 0 Warnings: 0 mysql> select length(c2) from tblob; +------------+ | length(c2) | +------------+ | 1048576 | ... | 1048576 | +------------+ 128 rows in set (0.24 sec) Having buffer pool smaller than entire table is essential here (with 64 rows that surely fits result is returned in 0.1 second), but repeated execution for a table just a bit bigger than buffer pool should return results notably faster, but this does not happen as we see above.
[20 Nov 2013 15:50]
Yves Martin
I faced this problem when computing SonarQube projects consumption in "measure_data" table. It took me 12 minutes to compute snapshots size for a 50 Gb total database: select root_project_id, sum(octet_length(data)) from measure_data, snapshots S where measure_data.snapshot_id=S.id group by root_project_id order by sum(octet_length(data)); A fix would be really comfortable.
[26 Feb 2014 12:47]
Marko Mäkelä
Fixing this would require some changes to the storage engine API. I see some room for improving Index Condition Pushdown (ICP) that was introduced in MySQL 5.6. Even if ICP was used for this query, InnoDB would still attempt to fetch the entire BLOB in order to determine the length. This is unnecessary, because the 20-byte InnoDB BLOB pointer does contain the byte length of the BLOB. For determining the length in characters, we might still need to fetch the BLOB, in case it is a variable-length character set (mbminlen<mbmaxlen). Similarly, I believe that MySQL could attempt to fetch the BLOB in order to evaluate an IS NULL condition. That one might be worked around by defining a prefix index on the column, e.g., index(blob_field(1)) would be NULL if and only if blob_field is NULL. A workaround for this bug would be to manually maintain a "functional index" on the BLOB column. Whenever you update the BLOB column, you would also update a column that contains blob_length.