| Bug #33175 | Falcon table hangs when insertng over 8M rows | ||
|---|---|---|---|
| Submitted: | 12 Dec 2007 16:56 | Modified: | 2 May 2008 1:48 | 
| Reporter: | Omer Barnir (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) | 
| Version: | 6.0.4 | OS: | Any | 
| Assigned to: | Christopher Powers | CPU Architecture: | Any | 
   [12 Dec 2007 16:56]
   Omer Barnir        
  
 
   [12 Dec 2007 16:57]
   Omer Barnir        
  test case
Attachment: large_falcon_tbl.sql (application/octet-stream, text), 2.99 KiB.
   [12 Dec 2007 22:09]
   Kevin Lewis        
  Omer, it does not say how much memory is on your system which uses this 5 Gb record cache. Or maybe I missed it. Can you provide more memory use information?
   [13 Dec 2007 14:04]
   Omer Barnir        
  Kevin, sorry about that. The machine is running with 16GB of memory
   [13 Dec 2007 16:48]
   Kevin Lewis        
  Chris, It seems that there is something other than record cache restricting these large transactions. After making an old record backlog, WL#4155, Pleas check if this problem still exists.
   [24 Dec 2007 7:58]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/40388 ChangeSet@1.2775, 2007-12-24 01:57:55-06:00, chris@xeno.mysql.com +1 -0 Bug#33175, Falcon table hangs when insertng over 8M rows - Changed exclusive lock to shared lock for syncIndexes in SRLUpdateIndexes to reduce conflict between the client and the gopher threads.
   [24 Dec 2007 16:58]
   Christopher Powers        
  There are a few issues here.
1. The Falcon parameters were not optimally configured for this test (details below).
2. There is a race condition between the client and gopher threads that causes continuous stalls on the client thread. 
3. An out-of-memory condition with the record cache may result in a long delay due to a rollback.
Details: Configuration
======================
This testcase was configured with a large record cache (~5GB) and small page cache (default 4MB). The test script issues repeated combinations of INSERT INTO/SELECT FROM followed by a SELECT *, which quickly fill the record cache. I found that a record cache of 8.5GB is required for the testcase to complete Round 23, 33 million rows. 
I configured Xeno as follows:
  Total RAM:      8GB
  record cache:  12GB (8.5GB used)
  page cache:     1GB
  serial buffers: 200
  record chill:    40
  index chill:     30
  
Note that a record cache > physical memory is acceptable, however, this will obviously result in page faulting. A larger number of serial log buffers is useful if large transactions are expected.
Details: Race condition
=======================
The client thread requests an exclusive lock on SerialLog::syncIndexes, whereas the gopher threads are content with a shared lock. With multiple gopher threads, the client thread becomes starved and eventually stalls.
The client thread waits for an exclusive lock on SerialLog::syncIndexes:
Transaction::commit()
 Dbb::logIndexUpdates()
    SRLUpdateIndex::append()
       syncIndexes.lock(Exclusive) <--- wait for lock
Each gopher thread has a shared lock on the same. Note that addIndexEntry() is where we increased the loop count from 3 to 10 (Bug#33080), however, in the several stalls I've investigated, the count never exceeded 0.
SRLUpdateIndex::commit()
 syncIndexes.lock(Shared)  <--- locked
 SRLUpdateIndex::execute()
   IndexRootPage::indexMerge()
     IndexRootPage::addIndexEntry()
        for (i = 0 < 10)
          IndexPage::addNode()
          [...] 
Changing the exclusive lock to a shared lock in SRLUpdateIndex::append() resolves the thread stalls such that the testcase ran up to Round 23 (33 million rows) in 9 minutes. Note that this required a record cache larger than physical memory, at least 8.5GB, so the longish run time is due to page faults.
Details: Out-of-memory condition
================================
If the record cache becomes full and forced scavenging fails to free enough memory, allocRecord() throws an exception to abort the current SQL operation with the error, "ERROR 1296 (HY000): Got error 305 'record memory is exhausted' from Falcon". This is normal. However, the error does not appear until after the transaction is rolled back, which can take considerable time for very large transactions.
(Note that SHOW PROCESSLIST gives no indication that a rollback is in process, so a long rollback may appear as a hung operation. However, SELECT * FROM information_schema.falcon_transactions provides the state of all Falcon transactions, i.e. committed, active, rolled back, etc.)
In this case, I found that much of the rollback processing occurs in IndexRootPage::deleteIndexEntry(). Perhaps there is an opportunity for improvement here.
Summary
=======
1. For this testcase, the page cache must be increased beyond the 4MB default and a very large cache size is required, possibly larger than physical RAM.
2. The extreme delay is caused by client/gopher conflicts, which was resolved by using a shared lock in SRLUpdateIndex::append().
3. A record cache out-of-memory condition may appear as a hang because of a long rollback.
 
   [28 Dec 2007 2:39]
   Omer Barnir        
  Running with test binaries including
starts of with the inserts running much faster for the high # of inserts.
insert of 1048576 took 18.26 sec  - now 7.18 seconds
insert of 2097152 took 14.05 min  - now 19.26 seconds
insert of 4194304 took 1.62 hour  - now 30.66 seconds
insert of 8388608 (previously hang) - now 61.85 seconds
However, while inserting 16,777,216 rows the server crashes with a signal 4.
Back trace of the core file returns:
Program terminated with signal 4, Illegal instruction.
#0  0x00000033c040b002 in pthread_kill () from /lib64/libpthread.so.0
(gdb) backtrace full
#0  0x00000033c040b002 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x000000000064621c in handle_segfault ()
No symbol table info available.
#2  <signal handler called>
No symbol table info available.
#3  0x00000033c040dc1d in raise () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x000000000080eb9c in Error::error ()
No symbol table info available.
#5  0x000000000088dd1e in Cache::ioThread ()
No symbol table info available.
#6  0x00000000007e0670 in Thread::thread ()
No symbol table info available.
#7  0x00000000007e0509 in Thread::thread ()
No symbol table info available.
#8  0x00000033c04061b5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#9  0x00000033bf8cd39d in clone () from /lib64/libc.so.6
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.
Starting a second run of the test ('How to reproduce' above), to see if consistent.
 
   [28 Dec 2007 7:51]
   Philip Stoev        
  The test crashed again at the same round of inserts. Core size is 5.8 gigabytes, no backtrace available.
   [31 Dec 2007 10:08]
   Philip Stoev        
  A run with --mysqld=--falcon-page-cache-size=200MB --mysqld=--falcon-record-memory-max=30GB 
results in the following processlist (notice query State is "end":
mysql> show processlist;
+----+------+-----------------+------+---------+-------+-------+---------------------------------+
| Id | User | Host            | db   | Command | Time  | State | Info                            |
+----+------+-----------------+------+---------+-------+-------+---------------------------------+
|  1 | root | localhost       | test | Query   | 75535 | end   | insert into t1 select * from t1 |
|  4 | root | localhost:54686 | NULL | Query   |     0 | NULL  | show processlist                |
+----+------+-----------------+------+---------+-------+-------+---------------------------------+
meaning this query has been running for about 21 hours. CPU usage is at 100% (that is, 1 core is occupied all the time). virt memory 17.8g, res memory 14g (on a 16 gig ram, 16 gig swap).
Here is the query log with times, showing that the last query took disproportionally more time than all previous ones before being shutdowned:
                    1 Query     insert into t1 values (1),(2),(3),(4)
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
071230 15:53:25    1 Query      insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
                    1 Query     insert into t1 select * from t1
071230 15:53:26    1 Query      insert into t1 select * from t1
071230 15:53:28    1 Query      insert into t1 select * from t1
071230 15:53:31    1 Query      insert into t1 select * from t1
071230 15:53:37    1 Query      insert into t1 select * from t1
071230 15:53:51    1 Query      insert into t1 select * from t1
071230 15:54:20    1 Query      insert into t1 select * from t1
071230 15:55:21    1 Query      insert into t1 select * from t1
071230 15:57:35    1 Query      insert into t1 select * from t1
071230 16:02:29    1 Query      insert into t1 select * from t1
071231 13:04:41    5 Connect    root@localhost on
                    5 Shutdown
 
   [25 Feb 2008 19:35]
   Kevin Lewis        
  Patch is in mysql-6.0-release version 6.0.4 Philip, please reopen the bug if Chrs' change is no good enough.
   [12 Mar 2008 23:02]
   Bugs System        
  Pushed into 6.0.4-alpha
   [2 May 2008 1:48]
   Paul DuBois        
  Noted in 6.0.4 changelog. Inserting millions of rows into a Falcon table could cause the insert operation to hang.

