| Bug #64609 | NDB crashes and does not recover under load | ||
|---|---|---|---|
| Submitted: | 10 Mar 2012 10:17 | Modified: | 22 Aug 2016 12:58 | 
| Reporter: | Avi Bluestein | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) | 
| Version: | mysql-5.5.19 ndb-7.2.4 | OS: | Linux (CentOS 6) | 
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any | 
| Tags: | crash, MySQL Cluster, ndb | ||
   [10 Mar 2012 10:17]
   Avi Bluestein        
  
 
   [13 Mar 2012 17:23]
   Jonas Oreland        
  Hi, Looking at logs...it doesn't really look load related (but I can be wrong) It is a insert on table with id = 10 that is failing. Can you do "ndb_show_tables" and give us table definition (create table statement) of table with id=10 /Jonas
   [13 Mar 2012 17:56]
   Jonas Oreland        
  first guess is that this is a charset related issue.
   [14 Mar 2012 6:22]
   Jonas Oreland        
  setting state to need feedback
   [14 Mar 2012 23:25]
   Avi Bluestein        
  Thank you. I wish it was a matter of charset - but it's not so IMHO, as this crash happens in random places and is load-dependent. We've been able to overcome the NDB crashes by increasing various parameters such as the redo log, the number of concurrent transactions and various buffers. Additionally and most important - we had no choice but to implement a capping mechanism which stalls the system every predefined number of transactions and measured TPS - to allow NDB to flush the data to disk. This seems to have dealt with the crashes - although this is not normal or reasonable and the solution is not perfect at all - given that it stalls the system for 30-60 seconds every once in a while. Unfortunately we do not have the data or original table structure as we've deleted the whole DB and reconstructed it. But now comes the 'juicy' part. When we thought it was all behind us - now MYSQLD started crashing badly, entering a wild loop of recovery-crash-recovery-crash.... This seems to somehow be load-oriented and again, it's random - but it does happens always and when it does - it never recovers. Here's the info from the log (the crash repeats itself with the same log): ----------------------------------------------------------------------------- 120312 8:45:16 InnoDB: Waiting for the background threads to start 120312 8:45:17 InnoDB: 1.1.8 started; log sequence number 2390373 120312 8:45:17 [Warning] 'user' entry 'root@db-1.localdomain' ignored in --skip-name-resolve mode. 120312 8:45:17 [Warning] 'user' entry '@db-1.localdomain' ignored in --skip-name-resolve mode. 120312 8:45:17 [Warning] 'proxies_priv' entry '@ root@db-1.localdomain' ignored in --skip-name-resolve mode. 120312 8:45:17 [Note] Event Scheduler: Loaded 0 events 120312 8:45:17 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.5.19-ndb-7.2.4-gpl' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Cluster Community Server (GPL) 120312 8:45:18 [Note] NDB Binlog: DISCOVER TABLE Event: REPL$mysql/ndb_schema 120312 8:45:18 [Note] NDB Binlog: logging ./mysql/ndb_schema (UPDATED,USE_WRITE) 120312 8:45:18 [Note] NDB Binlog: DISCOVER TABLE Event: REPL$mysql/ndb_apply_status 120312 8:45:18 [Note] NDB Binlog: logging ./mysql/ndb_apply_status (UPDATED,USE_WRITE) 2012-03-12 08:45:18 [NdbApi] INFO -- Flushing incomplete GCI:s < 14092/13 2012-03-12 08:45:18 [NdbApi] INFO -- Flushing incomplete GCI:s < 14092/13 120312 8:45:18 [Note] NDB Binlog: starting log at epoch 14092/13 120312 8:45:18 [Note] NDB Binlog: ndb tables writable 120312 8:45:19 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=10 max_threads=200 thread_count=10 connection_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 445676 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x2cef180 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f2c486e9e68 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x39)[0x81d799] /usr/sbin/mysqld(handle_segfault+0x464)[0x583824] /lib64/libpthread.so.0[0x31b9c0f4a0] /usr/sbin/mysqld(_ZN5TABLE35mark_columns_used_by_index_no_resetEjP9st_bitmap+0x38)[0x6731b8] /usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xd53)[0x7da143] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x9d6)[0x5fa8a6] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x158)[0x5fe948] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x136f)[0x5ffcdf] /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0xcf)[0x69841f] /usr/sbin/mysqld(handle_one_connection+0x51)[0x698521] /lib64/libpthread.so.0[0x31b9c077f1] /lib64/libc.so.6(clone+0x6d)[0x31b98e592d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f2c00004b80): is an invalid pointer Connection ID (thread ID): 5 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. mysqld: /pb2/build/sb_0-4838533-1329327230.71/rpm/BUILD/mysql-cluster-gpl-7.2.4/mysql-cluster-gpl-7.2.4/mysys/my_new.cc:52: int __cxa_pure_virtual(): Assertion `! "Aborted: pure virtual method called."' failed. Fatal signal 6 while backtracing
   [22 Jul 2016 12:58]
   MySQL Verification Team        
  Hi, coming into this issues some years later, I don't see nearly enough data here to conclude what the problem is. I assume since you are no longer updating this bug that the issue was somehow solved but if you still have the issue with latest binaries, please let us know and upload full logs, both mysqld and cluster take care Bogdan Kecman
   [23 Aug 2016 1:00]
   Bugs System        
  No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

