MySQL Bugs: #29018: mysqld crash while backing up ndb cluster tables

Bug #29018	mysqld crash while backing up ndb cluster tables
Submitted:	11 Jun 2007 14:43	Modified:	20 Sep 2007 5:07
Reporter:	Steven Cain	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S2 (Serious)
Version:	5.0.45	OS:	Linux (Fedora Core 6)
Assigned to:		CPU Architecture:	Any

Description:
I have 6 databases that use the NDB Cluster engine type for all of there tables. mysqld crashes about 20% of the time when doing a backup with mysqldump. I use the default mysqldump options and specify the database to backup. I have an instance of mysqld, ndb_mgmd, and ndbd running on each of our servers. The mysqld crash can occur on either of the servers. I cannot create the numeric dump file in order to get the stack trace; when I call nm --numeric-sort /usr/sbin/mysqld I get the following error: "nm: /usr/sbin/mysqld: no symbols". I am using the following downloaded rpms of MySQL:
MySQL-bench-5.0.41-0
MySQL-client-5.0.41-0
MySQL-devel-5.0.41-0
MySQL-ndb-extra-5.0.41-0
MySQL-ndb-management-5.0.41-0
MySQL-ndb-storage-5.0.41-0
MySQL-ndb-tools-5.0.41-0
MySQL-python-1.2.1-1
MySQL-server-5.0.41-0
MySQL-shared-5.0.41-0
MySQL-shared-compat-5.0.41-0

I will attach files that contain the pertinent lines in the query log and /var/log/mysqld.log.

From the logs it looks like the database PhoneNumberCountersis causing the problem. When I run the Windows version of MySQL Administrator I occasionally crash mysqld when I click on the PhoneNumberCounters database in the catalogs section. When mysqld crashes when I use the gui the top 2 lines on the stack trace are the same as the backup mysqld crashes.

How to repeat:
I do 2 backups a day and I get a failure every 2-3 days.

Suggested fix:
I have no idea but, the first step would be to resolve the nm problem so I can provide better stack information.

Can you provide us with the mysqld.sym file from your installation or tell us the exact MySQL-server-5.0.41-0 version you are using (either full platform specs or just the exact download URL) so that we can resolve the stack trace?

Here's the link to the server's rpm:
http://dev.mysql.com/get/Downloads/MySQL-5.0/MySQL-server-5.0.41-0.i386.rpm/from/ftp://www...

I downloaded the rpms from the 5.0 community server "Linux x86 generic RPM (statically linked against glibc 2.2.5) downloads"

Uploaded bug-data-29018.zip to ftp.mysql.com/pub/mysql/upload/ with the mysqld.sym that was requested.

Has any progress been made?  I haven't seen any updates in 3 weeks.

I have upgraded to 5.0.45 and the problem is still happening. I have uploaded bug-data-29018-2.zip that contains the new mysqld.sym. Here is the new stack trace:
070718 21:00:02 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=402653184
read_buffer_size=2093056
max_used_connections=42
max_connections=1000
threads_connected=28
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 290904 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0xa1cba50
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbf77e278, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x80a819e
0x8367d08
0x82a92d7
0x815cec1
0x817c7d2
0x817c158
0x8181afc
0x80ec4c5
0x80edb40
0x80e9484
0x80bb238
0x80c180d
0x80b9436
0x80b8cc3
0x80b8195
0x83654bc
0x838f99a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0xa4ec5f8 = show table status like 'CounterDefinitions'
thd->thread_id=68484
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
070718 21:00:02 mysqld restarted
070718 21:00:02 [Warning] Asked for 196608 thread stack, but got 126976

Since installing the dynamic glibc version I can get the stack dump:
# resolve_stack_dump -s /usr/lib/mysql/mysqld.sym -n temp.txt
0x819aae9 handle_segfault + 521
0x83b07d3 _ZN3Ndb22readAutoIncrementValueEPKN13NdbDictionary5TableERy + 35
0x826412e _ZN13ha_ndbcluster4infoEj + 222
0x82844af _Z24get_schema_tables_recordP3THDP13st_table_listP8st_tablebPKcS6_ + 591
0x8283cc1 _Z14get_all_tablesP3THDP13st_table_listP4Item + 1585
0x82897d1 _Z24get_schema_tables_resultP4JOIN23enum_schema_table_state + 385
0x81e4fbd _ZN4JOIN4execEv + 6621
0x81e530d _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_sel + 493
0x81e07a9 _Z13handle_selectP3THDP6st_lexP13select_resultm + 377
0x81b1749 _Z21mysql_execute_commandP3THD + 713
0x81b8ae5 _Z11mysql_parseP3THDPKcjPS2_ + 261
0x81afde5 _Z16dispatch_command19enum_server_commandP3THDPcj + 1365
0x81af83e _Z10do_commandP3THD + 158
0x81aec66 handle_one_connection + 726
0x45f3db (?)
0x3a426e (?)

I added some debugging and built a version of mysqld and I found where the crash is occuring:
nbd/src/ndbapi/Ndb.cpp

The 'table' pointer is zero after the call to getImpl and it is not validated before its use in get_local_table_info.

int
Ndb::readAutoIncrementValue(const NdbDictionary::Table * aTable,
                            Uint64 & tupleId)
{
  print_time(stderr); fprintf(stderr,"Ndb::readAutoIncrementValue 01\n");
  const NdbTableImpl* table = & NdbTableImpl::getImpl(*aTable);
 
  const BaseString& internal_tabname = table->m_internalName;
  Ndb_local_table_info *info=
    theDictionary->get_local_table_info(internal_tabname, false);

The crash that I am experiencing with mysqld in readAutoIncrementValue is not always due to a null pointer; sometimes the pointer is not null but still causes the crash.  I have seen crashed where the pointer is null, 2, or a seemingly normal value. I built a debug version of mysqld and turned tracing on to find out where readAutoIncrementValue was being called.  readAutoIncrementValue is being called from get_schema_tables_record and immediately after open_normal_and_derived_tables.  I suspect there is something wrong with the cached tables.  I now perform a flush tables before my backup script runs and I have now gone four days without a crash.

I think this is BUG#26793, which is fixed in the 5.0-ndb tree. Please retest with 5.0-ndb tree.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".