Bug #33175 Falcon table hangs when insertng over 8M rows
Submitted: 12 Dec 2007 16:56 Modified: 2 May 2008 1:48
Reporter: Omer Barnir (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.4 OS:Any
Assigned to: Christopher Powers CPU Architecture:Any

[12 Dec 2007 16:56] Omer Barnir
Description:
When 'loop' inserting a growing number of rows into a falcon table, the insert slows down and comes to a standstill after inserting 8 million rows.

(The following is based on the 'alter_table.test' in the 'large_tests' suite)

create table `t1` (`c1` tinyint unsigned not null, key `c1` (`c1`))
engine=falcon default charset=utf8;
alter table `t1` disable keys;
insert into t1 values (1),(2),(3),(4);
--                                Total Rows    Insert took
insert into t1 select * from t1;  -- 8
insert into t1 select * from t1;  -- 16
insert into t1 select * from t1;  -- 32 
insert into t1 select * from t1;  -- 64
insert into t1 select * from t1;  -- 128
insert into t1 select * from t1;  -- 256
insert into t1 select * from t1;  -- 512
insert into t1 select * from t1;  -- 1024
insert into t1 select * from t1;  -- 2048
insert into t1 select * from t1;  -- 4096
insert into t1 select * from t1;  -- 8192
insert into t1 select * from t1;  -- 16384
insert into t1 select * from t1;  -- 32768
insert into t1 select * from t1;  -- 65536
insert into t1 select * from t1;  -- 131072
insert into t1 select * from t1;  -- 262144
insert into t1 select * from t1;  -- 524288
insert into t1 select * from t1;  -- 1048576
insert into t1 select * from t1;  -- 2097152 (insert of 1048576 took 18.26 sec)
insert into t1 select * from t1;  -- 4194304 (insert of 2097152 took 14.05 min)
insert into t1 select * from t1;  -- 8388608 (insert of 4194304 took 1.62 hour)
insert into t1 select * from t1;  -- At this point the query is still 'running'

'show processlist' shows the insert query still running (currently over 20 hours) and 'top' reports the mysqld process using ~36% of CPU (8-way EMT64, 2.3GHz).

The test is run with 'falcon_record_memory_max' set to 4999610368

This bug might be related to bug#31286.

How to repeat:
1) Start the server using mysql-test-run.pl  with the following options:
   mysql-test-run.pl --suite=large_tests --big-test --suite-timeout=6360 
   --testcase-timeout=795 --vardir=/data1/qa_test_data/6004_64bit-rhel5/var 
   --mysqld=--falcon-record-memory-max=4999610368
2) Using the mysql client run the attached sql script
3) run: source ./large_falcon_tbl.sql

Alternatively you can modify the 'alter_table.test' in the 'large_tests' suite to use a falcon table and run:
mysql-test-run.pl --suite=large_tests --big-test --suite-timeout=6360 --testcase-timeout=795 --vardir=/data1/qa_test_data/6004_64bit-rhel5/var --mysqld=--falcon-record-memory-max=4999610368

Suggested fix:
Should not 'hang and run to completion
[12 Dec 2007 16:57] Omer Barnir
test case

Attachment: large_falcon_tbl.sql (application/octet-stream, text), 2.99 KiB.

[12 Dec 2007 22:09] Kevin Lewis
Omer, it does not say how much memory is on your system which uses this 5 Gb record cache.  Or maybe I missed it.  Can you provide more memory use information?
[13 Dec 2007 14:04] Omer Barnir
Kevin, sorry about that. The machine is running with 16GB of memory
[13 Dec 2007 16:48] Kevin Lewis
Chris, It seems that there is something other than record cache restricting these large transactions.  After making an old record backlog, WL#4155, Pleas check if this problem still exists.
[24 Dec 2007 7:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40388

ChangeSet@1.2775, 2007-12-24 01:57:55-06:00, chris@xeno.mysql.com +1 -0
  Bug#33175, Falcon table hangs when insertng over 8M rows
  - Changed exclusive lock to shared lock for syncIndexes in SRLUpdateIndexes to reduce conflict between the client and the gopher threads.
[24 Dec 2007 16:58] Christopher Powers
There are a few issues here.

1. The Falcon parameters were not optimally configured for this test (details below).
2. There is a race condition between the client and gopher threads that causes continuous stalls on the client thread. 
3. An out-of-memory condition with the record cache may result in a long delay due to a rollback.

Details: Configuration
======================
This testcase was configured with a large record cache (~5GB) and small page cache (default 4MB). The test script issues repeated combinations of INSERT INTO/SELECT FROM followed by a SELECT *, which quickly fill the record cache. I found that a record cache of 8.5GB is required for the testcase to complete Round 23, 33 million rows. 

I configured Xeno as follows:
  Total RAM:      8GB
  record cache:  12GB (8.5GB used)
  page cache:     1GB
  serial buffers: 200
  record chill:    40
  index chill:     30
  
Note that a record cache > physical memory is acceptable, however, this will obviously result in page faulting. A larger number of serial log buffers is useful if large transactions are expected.

Details: Race condition
=======================
The client thread requests an exclusive lock on SerialLog::syncIndexes, whereas the gopher threads are content with a shared lock. With multiple gopher threads, the client thread becomes starved and eventually stalls.

The client thread waits for an exclusive lock on SerialLog::syncIndexes:

Transaction::commit()
 Dbb::logIndexUpdates()
    SRLUpdateIndex::append()
       syncIndexes.lock(Exclusive) <--- wait for lock

Each gopher thread has a shared lock on the same. Note that addIndexEntry() is where we increased the loop count from 3 to 10 (Bug#33080), however, in the several stalls I've investigated, the count never exceeded 0.

SRLUpdateIndex::commit()
 syncIndexes.lock(Shared)  <--- locked
 SRLUpdateIndex::execute()
   IndexRootPage::indexMerge()
     IndexRootPage::addIndexEntry()
        for (i = 0 < 10)
          IndexPage::addNode()
          [...] 

Changing the exclusive lock to a shared lock in SRLUpdateIndex::append() resolves the thread stalls such that the testcase ran up to Round 23 (33 million rows) in 9 minutes. Note that this required a record cache larger than physical memory, at least 8.5GB, so the longish run time is due to page faults.

Details: Out-of-memory condition
================================
If the record cache becomes full and forced scavenging fails to free enough memory, allocRecord() throws an exception to abort the current SQL operation with the error, "ERROR 1296 (HY000): Got error 305 'record memory is exhausted' from Falcon". This is normal. However, the error does not appear until after the transaction is rolled back, which can take considerable time for very large transactions.

(Note that SHOW PROCESSLIST gives no indication that a rollback is in process, so a long rollback may appear as a hung operation. However, SELECT * FROM information_schema.falcon_transactions provides the state of all Falcon transactions, i.e. committed, active, rolled back, etc.)

In this case, I found that much of the rollback processing occurs in IndexRootPage::deleteIndexEntry(). Perhaps there is an opportunity for improvement here.

Summary
=======
1. For this testcase, the page cache must be increased beyond the 4MB default and a very large cache size is required, possibly larger than physical RAM.
2. The extreme delay is caused by client/gopher conflicts, which was resolved by using a shared lock in SRLUpdateIndex::append().
3. A record cache out-of-memory condition may appear as a hang because of a long rollback.
[28 Dec 2007 2:39] Omer Barnir
Running with test binaries including

starts of with the inserts running much faster for the high # of inserts.

insert of 1048576 took 18.26 sec  - now 7.18 seconds
insert of 2097152 took 14.05 min  - now 19.26 seconds
insert of 4194304 took 1.62 hour  - now 30.66 seconds
insert of 8388608 (previously hang) - now 61.85 seconds

However, while inserting 16,777,216 rows the server crashes with a signal 4.
Back trace of the core file returns:
Program terminated with signal 4, Illegal instruction.
#0  0x00000033c040b002 in pthread_kill () from /lib64/libpthread.so.0
(gdb) backtrace full
#0  0x00000033c040b002 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x000000000064621c in handle_segfault ()
No symbol table info available.
#2  <signal handler called>
No symbol table info available.
#3  0x00000033c040dc1d in raise () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000000080eb9c in Error::error ()
No symbol table info available.
#5  0x000000000088dd1e in Cache::ioThread ()
No symbol table info available.
#6  0x00000000007e0670 in Thread::thread ()
No symbol table info available.
#7  0x00000000007e0509 in Thread::thread ()
No symbol table info available.
#8  0x00000033c04061b5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#9  0x00000033bf8cd39d in clone () from /lib64/libc.so.6
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.

Starting a second run of the test ('How to reproduce' above), to see if consistent.
[28 Dec 2007 7:51] Philip Stoev
The test crashed again at the same round of inserts. Core size is 5.8 gigabytes, no backtrace available.
[31 Dec 2007 10:08] Philip Stoev
A run with --mysqld=--falcon-page-cache-size=200MB --mysqld=--falcon-record-memory-max=30GB 

results in the following processlist (notice query State is "end":

mysql> show processlist;
+----+------+-----------------+------+---------+-------+-------+---------------------------------+
| Id | User | Host            | db   | Command | Time  | State | Info                            |
+----+------+-----------------+------+---------+-------+-------+---------------------------------+
|  1 | root | localhost       | test | Query   | 75535 | end   | insert into t1 select * from t1 |
|  4 | root | localhost:54686 | NULL | Query   |     0 | NULL  | show processlist                |
+----+------+-----------------+------+---------+-------+-------+---------------------------------+

meaning this query has been running for about 21 hours. CPU usage is at 100% (that is, 1 core is occupied all the time). virt memory 17.8g, res memory 14g (on a 16 gig ram, 16 gig swap).

Here is the query log with times, showing that the last query took disproportionally more time than all previous ones before being shutdowned:

                    1 Query     insert into t1 values (1),(2),(3),(4)
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
071230 15:53:25    1 Query      insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
071230 15:53:26    1 Query      insert into t1 select * from t1
071230 15:53:28    1 Query      insert into t1 select * from t1
071230 15:53:31    1 Query      insert into t1 select * from t1
071230 15:53:37    1 Query      insert into t1 select * from t1
071230 15:53:51    1 Query      insert into t1 select * from t1
071230 15:54:20    1 Query      insert into t1 select * from t1
071230 15:55:21    1 Query      insert into t1 select * from t1
071230 15:57:35    1 Query      insert into t1 select * from t1
071230 16:02:29    1 Query      insert into t1 select * from t1
071231 13:04:41    5 Connect    root@localhost on
                    5 Shutdown
[25 Feb 2008 19:35] Kevin Lewis
Patch is in mysql-6.0-release version 6.0.4

Philip, please reopen the bug if Chrs' change is no good enough.
[12 Mar 2008 23:02] Bugs System
Pushed into 6.0.4-alpha
[2 May 2008 1:48] Paul Dubois
Noted in 6.0.4 changelog.

Inserting millions of rows into a Falcon table could cause the insert
operation to hang.