MySQL Bugs: #33490: Falcon considerably slower than other engines in blob INSERT and LENGTH()

Bug #33490	Falcon considerably slower than other engines in blob INSERT and LENGTH()
Submitted:	23 Dec 2007 12:22	Modified:	26 May 2010 17:49
Reporter:	Philip Stoev	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0.4	OS:	Linux (2.6.21-1.3194.fc7 i686)
Assigned to:		CPU Architecture:	Any
Tags:	F_BLOB, performance

Description:
Hello,

I noticed that Falcon is considerably slower than other engines when inserting blobs and taking their length().

I decided to file because blob insert and length() are both dot-com-economy youtube-style operations and it would be nice to be able to execute them efficiently. LENGTH() in particular is required for generating Content-length: HTTP header for the blob you are about to serve.

How to repeat:
mysql> set max_allowed_packet = 127 * 1024 * 1024;

For myisam:

mysql> create table myisam_blob (f1 longtext) engine=myisam;
Query OK, 0 rows affected (0.02 sec)
mysql>  insert into myisam_blob values (repeat('x', 17037021));
Query OK, 1 row affected (0.79 sec)
mysql> select length(f1) from myisam_blob;
\+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (0.07 sec)

For Innodb:

mysql> create table innodb_blob (f1 longtext) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into innodb_blob values (repeat('x', 17037021));
Query OK, 1 row affected (2.81 sec)

mysql> select length(f1) from innodb_blob;
+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (0.17 sec)

For Falcon:

mysql> create table falcon_blob (f1 longtext) engine=falcon;
Query OK, 0 rows affected (0.19 sec)

mysql> insert into falcon_blob values (repeat('x', 17037021));
Query OK, 1 row affected (25.89 sec)

mysql> select length(f1) from falcon_blob;
+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (2.22 sec)

Suggested fix:
This is all run in VMware -- running Sysinternals Filemon on the host machine shows that the virtual machine disk is being written under Falcon in 4K segments, whereas under Innodb or MyISAM the writes 64K each.

Thank you for the bug report. I can't to repeat with team source server on
Windows:

c:\dbs>6.0t\bin\mysql -uroot db1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 6.0.5-alpha-team-tree-nt Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> set max_allowed_packet = 127 * 1024 * 1024;
Query OK, 0 rows affected (0.05 sec)

mysql> create table myisam_blob (f1 longtext) engine=myisam;
Query OK, 0 rows affected (0.08 sec)

mysql> insert into myisam_blob values (repeat('x', 17037021));
Query OK, 1 row affected (1.39 sec)

mysql> select length(f1) from myisam_blob;
+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (0.05 sec)

mysql> create table innodb_blob (f1 longtext) engine=innodb;
Query OK, 0 rows affected (0.16 sec)

mysql> insert into innodb_blob values (repeat('x', 17037021));
Query OK, 1 row affected (4.23 sec)

mysql> select length(f1) from innodb_blob;
+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (0.58 sec)

mysql> create table falcon_blob (f1 longtext) engine=falcon;
Query OK, 0 rows affected (0.06 sec)

mysql>  insert into falcon_blob values (repeat('x', 17037021));
Query OK, 1 row affected (0.83 sec)

mysql>  select length(f1) from falcon_blob;
+------------+
| length(f1) |
+------------+
|   17037021 |
+------------+
1 row in set (0.22 sec)

mysql> show create table falcon_blob;
+-------------+-------------------------------------------------------------------------------------+
| Table       | Create Table                                                                        |
+-------------+-------------------------------------------------------------------------------------+
| falcon_blob | CREATE TABLE `falcon_blob` (
  `f1` longtext
) ENGINE=Falcon DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>

Could you please try with the BK source tree. Thanks in advance.

I can confirm this behavoir with the latest 6.0 BK tree. Strace -f -F on the mysqld --one-thread process shows that writes are performed like this:

[pid 29263] pwrite64(17, "\t\0\0\0\301+\0\0\0\0\0\0\300+\0\0\354\17\0\0xxxxxxxxx"..., 4096, 45879296 <unfinished ...>
[pid 29259] pwrite64(17, "\t\0\0\0\302+\0\0\0\0\0\0\301+\0\0\354\17\0\0xxxxxxxxx"..., 4096, 45883392 <unfinished ...>

and 

[pid 29263] pwrite64(17, "\t\0\0\0\30-\0\0\0\0\0\0\27-\0\0\354\17\0\0xxxxxxxxxxx"..., 4096, 47284224) = 4096
[pid 29263] pwrite64(17, "\t\0\0\0\31-\0\0\0\0\0\0\30-\0\0\354\17\0\0xxxxxxxxxxx"..., 4096, 47288320) = 4096

meaning the writes are performed in 4K blocks. Either the windows version uses a different system call, or the call is very efficient on that OS and the OS is smart enough to batch the writes.

Either way, in my humble opinion this is not something that can be relied upon to happen on all operating systems and virtualized environments. I would say that all other things being equal, 64K writes will always outperform 4K ones, especially on files that are being SYNC-ed after each write.

On my particular platform, the Falcon tablespace is open with O_RDWR|O_DIRECT|O_LARGEFILE. I am running Fedora Core 7, Linux philips 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686 i686 i386 GNU/Linux

Philip: Can you please fill in the o/s for which this is clearly verified? I do not see that in the bug information - it seems clear that the current value of "Any" is not correct.

Triage; Impact is substantial because 17 Mb blobs are not widespread in our customer base.  James suggests that 1 mb blobs are more widespread.

Vlad,  Here is the Blob performance problem .  I think the issue is that blob pages are written one at a time having to do with the way they are linked up.  Now that we use O_DIRECT, the pages are no longer pooled.  We should consider some specific pooling of blob pages as well as maybe some pre-allocation.  Kelly may also have some input.

For what it's worth, writing blobs a page a time can be fixed by
allocating them contiguously.  Reading blobs a page at a time is
harder to fix because each blob page contains the address of the 
next page.  For large blobs, we should have some kind of array
of page numbers so we can read the whole thing at once.