Bug #40070 | repeating insert/delete rows NDB disk data tables cause ndbd crash | ||
---|---|---|---|
Submitted: | 16 Oct 2008 8:06 | Modified: | 29 Jan 2009 14:11 |
Reporter: | Kenji Hirohama | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Cluster: Disk Data | Severity: | S2 (Serious) |
Version: | 6.2.15 | OS: | Solaris (10) |
Assigned to: | CPU Architecture: | Any |
[16 Oct 2008 8:06]
Kenji Hirohama
[16 Oct 2008 8:08]
Jonas Oreland
can you try increasing size of DiskPageBufferMemory
[17 Oct 2008 0:32]
Kenji Hirohama
Do you have any insights from the attached stack traces?
[24 Oct 2008 3:50]
Kenji Hirohama
I tried DiskPageBufferMemory=2G. Still got the same error at the same point. Then, 8G. The situation is the same. I tried to set 10G, but ndbd can't start. (My box has 32GB physical memory.) Perhaps, I need more memory, but buy more memory is not realistic, so do you have any other ideas? Thanks,
[6 Nov 2008 5:30]
MySQL Verification Team
I confirmed that the problem happens on 6.3.18 as well.
[6 Nov 2008 8:40]
MySQL Verification Team
i repeated the problem using the following config.ini ------------------------------------------------------ [MGM] Id=1 DataDir=/var/lib/telco-6.3/mgm Hostname=127.0.0.1 [NDBD DEFAULT] NoOfReplicas=2 DataMemory=128M IndexMemory=16M LockPagesInMainMemory=1 MaxNoOfTables=256 MaxNoOfOrderedIndexes=512 MaxNoOfUniqueHashIndexes=256 MaxNoOfAttributes=8192 MaxNoOfConcurrentOperations=50000 FragmentLogFileSize=128M NoOfFragmentLogFiles=4 RedoBuffer=64M ODirect=1 #TimeBetweenGlobalCheckpoints=10000 ### Disk data related DiskPageBufferMemory=64M SharedGlobalMemory=128M [NDBD] Id=41 Hostname=127.0.0.1 DataDir=/var/lib/telco-6.3/ndbd1 [NDBD] Id=42 Hostname=127.0.0.1 DataDir=/var/lib/telco-6.3/ndbd2 [NDBD] Id=43 Hostname=127.0.0.1 DataDir=/var/lib/telco-6.3/ndbd3 [NDBD] Id=44 Hostname=127.0.0.1 DataDir=/var/lib/telco-6.3/ndbd4 [MYSQLD] Hostname=127.0.0.1 [MYSQLD] Hostname=127.0.0.1
[6 Nov 2008 8:48]
MySQL Verification Team
I repeated the problem using the following script: ------------------------------------------- #!/bin/bash MYSQL="mysql -h 127.0.0.1 -P 20001 " n=0 $MYSQL -e "DROP DATABASE disk_test" $MYSQL -e "CREATE DATABASE disk_test" $MYSQL disk_test -e "CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_1.dat' INITIAL_SIZE 128M UNDO_BUFFER_SIZE 16M ENGINE NDB" $MYSQL disk_test -e "CREATE TABLESPACE ts_1 ADD DATAFILE 'data_1.dat' USE LOGFILE GROUP lg_1 INITIAL_SIZE 256M ENGINE NDB" $MYSQL disk_test -e "CREATE TABLE t1 (c1 bigint NOT NULL auto_increment, c2 tinyint(4) NOT NULL default 99, c3 char(32) NOT NULL default 'AAAAAAAAAABBBBBBBBBBCCCCCCCCCC', c4 char(32) NOT NULL default 'AAAAAAAAAABBBBBBBBBBCCCCCCCCCC', c5 char(39) NOT NULL default 'AAAAAAAAAABBBBBBBBBBCCCCCCCCCC', PRIMARY KEY (c1), KEY Index_2 (c2)) /*!50100 TABLESPACE ts_1 STORAGE DISK */ ENGINE=ndbcluster DEFAULT CHARSET=utf8" mysql_sampledata_gen 127.0.0.1 20001 mikiya okuno disk_test t1 10000 while [ 1 ] do loop=0 while [ $loop -lt 20 ] do $MYSQL disk_test -e "insert into t1(c2,c3,c4) select c2, c3, c4 from t1 limit 10000" loop=`expr $loop + 1` done loop=0 while [ $loop -lt 20 ] do $MYSQL disk_test -e "delete from t1 limit 10000" loop=`expr $loop + 1` done echo $n n=`expr $n + 1` if [ -f /tmp/stopit ] then exit 1 fi done
[6 Nov 2008 8:54]
MySQL Verification Team
The problem happens randomly, and always the master node crashes. You'll see a crash within 100 loops.
[6 Nov 2008 10:19]
MySQL Verification Team
I did additional tests: with smaller transaction (LIMIT 1000) the error didn't happen. with smaller transaction (LIMIT 1000) and the larger tables (varchar(512) instead of char(32)), the error happened. with larger DiskPageBufferMemory (640M), the error happened. So, the possible workaround could be use smaller transactions.
[6 Nov 2008 13:59]
MySQL Verification Team
When I did tests using a 64MB undo buffer, and the problem didn't happen.
[7 Nov 2008 10:19]
MySQL Verification Team
Possible workarounds are: * increase TimeBetweenEpochsTimeout * ensure DiskPageBufferMemory and Undo buffer is enough * commit more often If a large transaction is made and disk page buffer and undo buffer are exhausted, then it will take very long time to commit. Then, TimeBetweenEpochsTimeout is hit.
[29 Jan 2009 14:11]
Jonathan Miller
http://bugs.mysql.com/bug.php?id=37227