Bug #41905 MySQLD (catastrophic failure) during on-line add column
Submitted: 6 Jan 2009 22:09 Modified: 23 Feb 2009 16:07
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.4 OS:Linux
Assigned to: Pekka Nousiainen CPU Architecture:Any

[6 Jan 2009 22:09] Jonathan Miller
Description:
This is the first time that I have seen this, but wanted to record it in case automation (ACRT) hits is again, or if a customer runs into it.

(NdbDictionaryColumngetAttrIdEv)

thd: 0x2aaab0070000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x409250f0 thread_stack 0x40000
/data0/cr_autotest/libexec/mysqld(my_print_stacktrace+0x39)[0x97b229]
/data0/cr_autotest/libexec/mysqld(handle_segfault+0x31d)[0x62f2ed]
/lib64/libpthread.so.0[0x3907e0de80]
/data0/cr_autotest/libexec/mysqld(_ZNK13NdbDictionary6Column9getAttrIdEv+0x0)[0x92eba0]
/data0/cr_autotest/libexec/mysqld(_ZN17NdbDictionaryImpl12createRecordEPK12NdbTableImplPKN13NdbDictionary19RecordSpecificationEjjjb+0xa6)[0x939896]
/data0/cr_autotest/libexec/mysqld(_ZN13NdbDictionary10Dictionary12createRecordEPKNS_5TableEPKNS_19RecordSpecificationEjjj+0x128)[0x9346a8]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster20add_table_ndb_recordEPN13NdbDictionary10DictionaryE+0x107)[0x7bb367]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster12get_metadataEP3THDPKc+0x181)[0x7cf641]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster4openEPKcij+0x264)[0x7cfa94]
/data0/cr_autotest/libexec/mysqld(_ZN7handler7ha_openEP8st_tablePKcii+0x3e)[0x72217e]
/data0/cr_autotest/libexec/mysqld(_Z21open_table_from_shareP3THDP14st_table_sharePKcjjjP8st_table15open_table_mode+0x591)[0x68b011]
/data0/cr_autotest/libexec/mysqld[0x6834f7]
/data0/cr_autotest/libexec/mysqld(_Z12reopen_tableP8st_table+0xb8)[0x683d68]
/data0/cr_autotest/libexec/mysqld(_Z32mysql_fast_or_online_alter_tableP3THDP8st_tableS2_P24st_ha_create_informationP23st_ha_alter_informationP6BitmapILj40EE22enum_enable_or_disable+0x1af)[0x73979f]
/data0/cr_autotest/libexec/mysqld(_Z17mysql_alter_tableP3THDPcS1_P24st_ha_create_informationP10TABLE_LISTP10Alter_infojP8st_orderb+0x2599)[0x73f119]
/data0/cr_autotest/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x1707)[0x63f197]
/data0/cr_autotest/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x202)[0x645b82]
/data0/cr_autotest/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x3d7)[0x645f67]
/data0/cr_autotest/libexec/mysqld(_Z10do_commandP3THD+0xe6)[0x647076]
/data0/cr_autotest/libexec/mysqld(handle_one_connection+0x64c)[0x6377ec]
/lib64/libpthread.so.0[0x3907e06307]
/lib64/libc.so.6(clone+0x6d)[0x39072d1ded]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x5747be0 = ALTER TABLE TABLE_ALTER.t1 ADD COLUMN c3 BIGINT
thd->thread_id=3
thd->killed=NOT_KILLED

How to repeat:
ACRT - 2-dn-func - configuration
ACRT - add-online-tester.sh - driver
ACRT - conf-2-dn-func.cnf

[mysqld]
skip-grant-tables
skip-innodb
ndb_use_exact_count=0
loose-join_cache_level=6
loose-ndb-cluster-connection-pool=3
loose-ndb_extra_logging=9
loose-engine_condition_pushdown=1
loose-ndb_cache_check_time=1000

[cluster_config]
MaxNoOfSavedMessages = 30
NoOfReplicas = 2
DataMemory = 4000M
IndexMemory = 400M
DiskPageBufferMemory=300MB
DiskCheckpointSpeed=16M
RedoBuffer=200M
NoOfFragmentLogFiles=10
FragmentLogFileSize=512M
InitFragmentLogFiles=FULL
SharedGlobalMemory=384
SendBufferMemory = 2M
MaxNoOfConcurrentOperations = 250000
MaxNoOfLocalOperations = 275000
MaxNoOfConcurrentIndexOperations = 20000
MaxNoOfAttributes=2048
MaxNoOfOrderedIndexes=512
MaxNoOfUniqueHashIndexes=512
LockPagesInMainMemory=1
MemReportFrequency=200
LogLevelCongestion=15
LogLevelStatistic=15
[12 Jan 2009 15:46] Jonathan Miller
Happened on a double this run:

/data0/cr_autotest/libexec/mysqld(my_print_stacktrace+0x39)[0x97b339]
/data0/cr_autotest/libexec/mysqld(handle_segfault+0x31d)[0x62f48d]
/lib64/libpthread.so.0[0x3907e0de80]
/data0/cr_autotest/libexec/mysqld(_ZNK13NdbDictionary6Column9getAttrIdEv+0x0)[0x92edb0]
/data0/cr_autotest/libexec/mysqld(_ZN17NdbDictionaryImpl12createRecordEPK12NdbTableImplPKN13NdbDictionary19RecordSpecificationEjjjb+0xa6)[0x939aa6]
/data0/cr_autotest/libexec/mysqld(_ZN13NdbDictionary10Dictionary12createRecordEPKNS_5TableEPKNS_19RecordSpecificationEjjj+0x128)[0x9348b8]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster20add_table_ndb_recordEPN13NdbDictionary10DictionaryE+0x107)[0x7bb597]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster12get_metadataEP3THDPKc+0x181)[0x7cf861]
/data0/cr_autotest/libexec/mysqld(_ZN13ha_ndbcluster4openEPKcij+0x264)[0x7cfcb4]
/data0/cr_autotest/libexec/mysqld(_ZN7handler7ha_openEP8st_tablePKcii+0x3e)[0x72239e]
/data0/cr_autotest/libexec/mysqld(_Z21open_table_from_shareP3THDP14st_table_sharePKcjjjP8st_table15open_table_mode+0x591)[0x68b231]
/data0/cr_autotest/libexec/mysqld[0x683717]
/data0/cr_autotest/libexec/mysqld(_Z12reopen_tableP8st_table+0xb8)[0x683f88]
/data0/cr_autotest/libexec/mysqld(_Z32mysql_fast_or_online_alter_tableP3THDP8st_tableS2_P24st_ha_create_informationP23st_ha_alter_informationP6BitmapILj40EE22enum_enable_or_disable+0x1af)[0x7399bf]
/data0/cr_autotest/libexec/mysqld(_Z17mysql_alter_tableP3THDPcS1_P24st_ha_create_informationP10TABLE_LISTP10Alter_infojP8st_orderb+0x2599)[0x73f339]
/data0/cr_autotest/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x1707)[0x63f3a7]
/data0/cr_autotest/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x202)[0x645d92]
/data0/cr_autotest/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x3d7)[0x646177]
/data0/cr_autotest/libexec/mysqld(_Z10do_commandP3THD+0xe6)[0x647286]
/data0/cr_autotest/libexec/mysqld(handle_one_connection+0x81b)[0x637bdb]
/lib64/libpthread.so.0[0x3907e06307]
/lib64/libc.so.6(clone+0x6d)[0x39072d1ded]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x13eb3760 = ALTER TABLE TABLE_ALTER.t1 ADD COLUMN c3 DOUBLE
[13 Jan 2009 0:42] Jonathan Miller
#0  0x00000030bbe0b142 in pthread_kill () from /lib64/libpthread.so.0
#1  0x000000000062f571 in handle_segfault (sig=11) at mysqld.cc:2543
#2  <signal handler called>
#3  NdbDictionary::Column::getAttrId (this=0x0) at NdbDictionary.cpp:361
#4  0x000000000093c636 in NdbDictionaryImpl::createRecord (this=0xbf85b88,
    table=0x2aaaac1441c0, recSpec=0x40e4dbe0, length=3, elemSize=2, flags=2,
    defaultRecord=false) at NdbDictionaryImpl.cpp:6063
#5  0x0000000000937448 in NdbDictionary::Dictionary::createRecord (this=0xbf85b88,
    table=<value optimized out>, recSpec=0x40e4dbe0, length=3, elemSize=24, flags=2)
    at NdbDictionary.cpp:2170
#6  0x00000000007bb647 in ha_ndbcluster::add_table_ndb_record (this=0x2aaaac13ecc0,
    dict=0xbf85b88) at ha_ndbcluster.cc:1702
#7  0x00000000007cf921 in ha_ndbcluster::get_metadata (this=0x2aaaac13ecc0,
    thd=0x2aaaac1572a0, path=<value optimized out>) at ha_ndbcluster.cc:1418
#8  0x00000000007cfd74 in ha_ndbcluster::open (this=0x2aaaac13ecc0,
    name=0x2aaaac0ee618 "./TABLE_ALTER/t1", mode=<value optimized out>,
    test_if_locked=<value optimized out>) at ha_ndbcluster.cc:7949
#9  0x000000000072244e in handler::ha_open (this=0x0, table_arg=<value optimized out>,
    name=0x2aaaac0ee618 "./TABLE_ALTER/t1", mode=2, test_if_locked=2) at handler.cc:2041
#10 0x000000000068b2e1 in open_table_from_share (thd=0x2aaaac1572a0, share=0x2aaaac0ee2b0,
    alias=<value optimized out>, db_stat=0, prgflag=<value optimized out>, ha_open_flags=0,
    outparam=0x40e4ed00, open_mode=OTM_OPEN) at table.cc:1914
#11 0x00000000006837c7 in open_unireg_entry (thd=0x2aaaac1572a0, entry=0x40e4ed00,
    table_list=0x40e4f590, alias=0x2aaaac145110 "t1", cache_key=0x2aaaac1566e8 "TABLE_ALTER",
    cache_key_length=15, mem_root=0x2aaaac159b30, flags=0) at sql_base.cc:3892
#12 0x0000000000684038 in reopen_table (table=0x2aaaac1399e0) at sql_base.cc:3041
#13 0x0000000000739a6f in mysql_fast_or_online_alter_table (thd=0x2aaaac1572a0,
    table=0x2aaaac1399e0, altered_table=0x2aaaac15bae0, create_info=0x40e51a30,
    alter_info=0x40e508e0, ha_alter_flags=0x40e50910, keys_onoff=LEAVE_AS_IS)
    at sql_table.cc:5986
#14 0x000000000073f3e9 in mysql_alter_table (thd=0x2aaaac1572a0,
    new_db=0xbf7e298 "TABLE_ALTER", new_name=0xbf7e2a8 "t1", create_info=0x40e51a30,
    table_list=0xbf7e2e0, alter_info=0x40e51b10, order_num=0, order=0x0, ignore=false)
    at sql_table.cc:7048
#15 0x000000000063f457 in mysql_execute_command (thd=0x2aaaac1572a0) at sql_parse.cc:2800
---Type <return> to continue, or q <return> to quit---
#16 0x0000000000645e42 in mysql_parse (thd=0x2aaaac1572a0,
    inBuf=0xbf7e1c0 "ALTER TABLE TABLE_ALTER.t1 ADD COLUMN c3 FLOAT", length=46,
    found_semicolon=0x40e52a58) at sql_parse.cc:5790
#17 0x0000000000646227 in dispatch_command (command=COM_QUERY, thd=0x2aaaac1572a0,
    packet=<value optimized out>, packet_length=<value optimized out>) at sql_parse.cc:1200
#18 0x0000000000647336 in do_command (thd=0x2aaaac1572a0) at sql_parse.cc:857
#19 0x0000000000637c8b in handle_one_connection (arg=0x2aaaac1572a0) at sql_connect.cc:1113
#20 0x00000030bbe06307 in start_thread () from /lib64/libpthread.so.0
#21 0x00000030bb2d1ded in clone () from /lib64/libc.so.6
[13 Jan 2009 0:43] Jonathan Miller
(gdb) f 3
#3  NdbDictionary::Column::getAttrId (this=0x0) at NdbDictionary.cpp:361
361     NdbDictionary::Column::getAttrId() const {
(gdb) l
356     NdbDictionary::Column::getColumnNo() const {
357       return m_impl.m_column_no;
358     }
359
360     int
361     NdbDictionary::Column::getAttrId() const {
362       return m_impl.m_attrId;;
363     }
364
365     bool
(gdb) f 4
#4  0x000000000093c636 in NdbDictionaryImpl::createRecord (this=0xbf85b88,
    table=0x2aaaac1441c0, recSpec=0x40e4dbe0, length=3, elemSize=2, flags=2,
    defaultRecord=false) at NdbDictionaryImpl.cpp:6063
6063        Uint32 attrId = recSpec[i].column->getAttrId();
(gdb) l
6058        tableNumDistKeys= table->m_noOfKeys;
6059
6060      int max_attrId = -1;
6061      for (i = 0; i < length; i++)
6062      {
6063        Uint32 attrId = recSpec[i].column->getAttrId();
6064        if ((int)attrId > max_attrId)
6065          max_attrId = (int)attrId;
6066      }
6067      Uint32 attrId_indexes_length = (Uint32)(max_attrId + 1);
(gdb) f 5
#5  0x0000000000937448 in NdbDictionary::Dictionary::createRecord (this=0xbf85b88,
    table=<value optimized out>, recSpec=0x40e4dbe0, length=3, elemSize=24, flags=2)
    at NdbDictionary.cpp:2170
2170                                             false); // Not default NdbRecord
(gdb) l
2165      NdbRecord* result= m_impl.createRecord(globalTabImpl,
2166                                             recSpec,
2167                                             length,
2168                                             elemSize,
2169                                             flags,
2170                                             false); // Not default NdbRecord
2171
2172      if (!result)
2173      {
2174        removeTableGlobal(*globalTab, false); // Don't invalidate
(gdb) f 4
#4  0x000000000093c636 in NdbDictionaryImpl::createRecord (this=0xbf85b88,
    table=0x2aaaac1441c0, recSpec=0x40e4dbe0, length=3, elemSize=2, flags=2,
    defaultRecord=false) at NdbDictionaryImpl.cpp:6063
6063        Uint32 attrId = recSpec[i].column->getAttrId();
(gdb) l
6058        tableNumDistKeys= table->m_noOfKeys;
6059
6060      int max_attrId = -1;
6061      for (i = 0; i < length; i++)
6062      {
6063        Uint32 attrId = recSpec[i].column->getAttrId();
6064        if ((int)attrId > max_attrId)
6065          max_attrId = (int)attrId;
6066      }
6067      Uint32 attrId_indexes_length = (Uint32)(max_attrId + 1);
[13 Jan 2009 9:19] Tomas Ulin
so it is easy to reproduce with this patch

Attachment: tmp.diff (text/x-diff), 1.22 KiB.

[13 Jan 2009 9:22] Tomas Ulin
with the introduction of schema trans the invalidation of the cache happens too early... already at alterTableGlobal (see diff)

however invalidation should happen on commit (endSchemaTrans)

a list of objects to invalidate should be kept so that the invalidation can be done at endSchemaTrans instead of at alterTableGlobal...

patch removes the early invalidation and also the async validation...
[16 Jan 2009 10:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63426

2791 Martin Skold	2009-01-16
      Bug #41905 MySQLD (catastrophic failure) during on-line add column: Moved re-setting of share state to always be done in the end of alter_table_phase2
[16 Jan 2009 11:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63427

2805 Martin Skold	2009-01-16 [merge]
      Bug #41905 MySQLD (catastrophic failure) during on-line add column: Moved re-setting of share state to always be done in the end of alter_table_phase2
[16 Jan 2009 11:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63440

3218 Martin Skold	2009-01-16 [merge]
      Merge

-- 
MySQL Code Commits Mailing List
For list archives: http://lists.mysql.com/commits
To unsubscribe:    http://lists.mysql.com/commits?unsub=commits@bugs.mysql.com
[16 Jan 2009 14:06] Bugs System
Pushed into 5.1.30-ndb-6.4.1 (revid:martin.skold@mysql.com-20090116115755-9yglqdafb1q3jcz0) (version source revid:martin.skold@mysql.com-20090116115755-9yglqdafb1q3jcz0) (merge vers: 5.1.30-ndb-6.4.1) (pib:6)
[16 Jan 2009 14:23] Bugs System
Pushed into 5.1.30-ndb-6.3.21 (revid:martin.skold@mysql.com-20090116110112-geetjoarz13r7tw2) (version source revid:martin.skold@mysql.com-20090116110112-geetjoarz13r7tw2) (merge vers: 5.1.30-ndb-6.3.21) (pib:6)
[16 Jan 2009 14:23] Bugs System
Pushed into 5.1.30-ndb-6.2.17 (revid:martin.skold@mysql.com-20090116105654-xgnl7kgbj1fma133) (version source revid:martin.skold@mysql.com-20090116105654-xgnl7kgbj1fma133) (merge vers: 5.1.30-ndb-6.2.17) (pib:6)
[20 Jan 2009 11:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63599

3221 Pekka Nousiainen	2009-01-20
      bug#41905 01.diff
      Test case for similar bug.
[20 Jan 2009 11:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63600

3222 Pekka Nousiainen	2009-01-20
      bug#41905 02.diff
      In alter table move invalidate old version to commit.
[20 Jan 2009 11:10] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:pekka@mysql.com-20090120110814-5l3671fg9nxj8kce) (version source revid:pekka@mysql.com-20090120110814-5l3671fg9nxj8kce) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[21 Jan 2009 14:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63729

3224 Pekka Nousiainen	2009-01-21
      bug#41905 03.diff
      Keep reference to old table version until commit.
[21 Jan 2009 14:37] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:pekka@mysql.com-20090121143735-zc0yvmoxvgy7s5s0) (version source revid:pekka@mysql.com-20090121143735-zc0yvmoxvgy7s5s0) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[10 Feb 2009 20:13] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090210194937-s7xshv5l3m1v7wi9) (version source revid:tomas.ulin@sun.com-20090123110300-k3vjhnm1iop8c0h7) (merge vers: 6.0.10-alpha) (pib:6)
[23 Feb 2009 13:21] Jonathan Miller
Test passes.
[23 Feb 2009 16:07] Jon Stephens
Documented bugfix in the NDB-6.2.17, 6.3.21, and 6.4.1 changelogs as follows:

        Trying to execute an ALTER ONLINE TABLE ... ADD COLUMN while
        inserting rows into the table caused mysqld to crash.