Bug #33490 | Falcon considerably slower than other engines in blob INSERT and LENGTH() | ||
---|---|---|---|
Submitted: | 23 Dec 2007 12:22 | Modified: | 26 May 2010 17:49 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.4 | OS: | Linux (2.6.21-1.3194.fc7 i686) |
Assigned to: | CPU Architecture: | Any | |
Tags: | F_BLOB, performance |
[23 Dec 2007 12:22]
Philip Stoev
[23 Dec 2007 22:29]
MySQL Verification Team
Thank you for the bug report. I can't to repeat with team source server on Windows: c:\dbs>6.0t\bin\mysql -uroot db1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 6.0.5-alpha-team-tree-nt Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> set max_allowed_packet = 127 * 1024 * 1024; Query OK, 0 rows affected (0.05 sec) mysql> create table myisam_blob (f1 longtext) engine=myisam; Query OK, 0 rows affected (0.08 sec) mysql> insert into myisam_blob values (repeat('x', 17037021)); Query OK, 1 row affected (1.39 sec) mysql> select length(f1) from myisam_blob; +------------+ | length(f1) | +------------+ | 17037021 | +------------+ 1 row in set (0.05 sec) mysql> create table innodb_blob (f1 longtext) engine=innodb; Query OK, 0 rows affected (0.16 sec) mysql> insert into innodb_blob values (repeat('x', 17037021)); Query OK, 1 row affected (4.23 sec) mysql> select length(f1) from innodb_blob; +------------+ | length(f1) | +------------+ | 17037021 | +------------+ 1 row in set (0.58 sec) mysql> create table falcon_blob (f1 longtext) engine=falcon; Query OK, 0 rows affected (0.06 sec) mysql> insert into falcon_blob values (repeat('x', 17037021)); Query OK, 1 row affected (0.83 sec) mysql> select length(f1) from falcon_blob; +------------+ | length(f1) | +------------+ | 17037021 | +------------+ 1 row in set (0.22 sec) mysql> show create table falcon_blob; +-------------+-------------------------------------------------------------------------------------+ | Table | Create Table | +-------------+-------------------------------------------------------------------------------------+ | falcon_blob | CREATE TABLE `falcon_blob` ( `f1` longtext ) ENGINE=Falcon DEFAULT CHARSET=latin1 | +-------------+-------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> Could you please try with the BK source tree. Thanks in advance.
[24 Dec 2007 7:30]
Philip Stoev
I can confirm this behavoir with the latest 6.0 BK tree. Strace -f -F on the mysqld --one-thread process shows that writes are performed like this: [pid 29263] pwrite64(17, "\t\0\0\0\301+\0\0\0\0\0\0\300+\0\0\354\17\0\0xxxxxxxxx"..., 4096, 45879296 <unfinished ...> [pid 29259] pwrite64(17, "\t\0\0\0\302+\0\0\0\0\0\0\301+\0\0\354\17\0\0xxxxxxxxx"..., 4096, 45883392 <unfinished ...> and [pid 29263] pwrite64(17, "\t\0\0\0\30-\0\0\0\0\0\0\27-\0\0\354\17\0\0xxxxxxxxxxx"..., 4096, 47284224) = 4096 [pid 29263] pwrite64(17, "\t\0\0\0\31-\0\0\0\0\0\0\30-\0\0\354\17\0\0xxxxxxxxxxx"..., 4096, 47288320) = 4096 meaning the writes are performed in 4K blocks. Either the windows version uses a different system call, or the call is very efficient on that OS and the OS is smart enough to batch the writes. Either way, in my humble opinion this is not something that can be relied upon to happen on all operating systems and virtualized environments. I would say that all other things being equal, 64K writes will always outperform 4K ones, especially on files that are being SYNC-ed after each write. On my particular platform, the Falcon tablespace is open with O_RDWR|O_DIRECT|O_LARGEFILE. I am running Fedora Core 7, Linux philips 2.6.21-1.3194.fc7 #1 SMP Wed May 23 22:35:01 EDT 2007 i686 i686 i386 GNU/Linux
[26 Dec 2007 16:29]
Jeffrey Pugh
Philip: Can you please fill in the o/s for which this is clearly verified? I do not see that in the bug information - it seems clear that the current value of "Any" is not correct.
[27 Feb 2008 20:14]
Kevin Lewis
Triage; Impact is substantial because 17 Mb blobs are not widespread in our customer base. James suggests that 1 mb blobs are more widespread.
[25 Nov 2008 17:30]
Kevin Lewis
Vlad, Here is the Blob performance problem . I think the issue is that blob pages are written one at a time having to do with the way they are linked up. Now that we use O_DIRECT, the pages are no longer pooled. We should consider some specific pooling of blob pages as well as maybe some pre-allocation. Kelly may also have some input.
[25 Nov 2008 20:16]
Ann Harrison
For what it's worth, writing blobs a page a time can be fixed by allocating them contiguously. Reading blobs a page at a time is harder to fix because each blob page contains the address of the next page. For large blobs, we should have some kind of array of page numbers so we can read the whole thing at once.