MySQL Bugs: #53465: mysqld crash with a simple select statement

Bug #53465	mysqld crash with a simple select statement
Submitted:	6 May 2010 14:27	Modified:	25 Oct 2010 12:03
Reporter:	Oli Sennhauser	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-7.0	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	7.0.8a, crash, MySQL, MySQL Cluster, SELECT

Description:
3 of our 4 mysqld crashed within the same minute with the same query.

100506 12:58:50 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=4
max_threads=151
threads_connected=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338307 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x176b7c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x444c5bd8 thread_stack 0x40000
/opt/db/mysql/bin/mysqld(my_print_stacktrace+0x33)[0x973633]
/opt/db/mysql/bin/mysqld(handle_segfault+0x324)[0x62e9b4]
/lib64/libpthread.so.0[0x2aceed0cfc00]
/opt/db/mysql/bin/mysqld(_ZNK13NdbDictionary6Column11getColumnNoEv+0x0)[0x953750]
/opt/db/mysql/bin/mysqld(_ZN16NdbDictInterface27create_index_obj_from_tableEPP12NdbIndexImplP12NdbTableImplPKS3_+0x203)[0x9251d3]
/opt/db/mysql/bin/mysqld(_ZNK9InitIndex4initEP17NdbDictionaryImplR12NdbTableImpl+0x2d)[0x92b3cd]
/opt/db/mysql/bin/mysqld(_ZN17NdbDictionaryImpl23fetchGlobalTableImplRefERK21GlobalCacheInitObject+0x10a)[0x926fca]
/opt/db/mysql/bin/mysqld(_ZNK13NdbDictionary10Dictionary14getIndexGlobalEPKcRKNS_5TableE+0x8e)[0x95682e]
/opt/db/mysql/bin/mysqld(_ZN13ha_ndbcluster16add_index_handleEP3THDPN13NdbDictionary10DictionaryEP6st_keyPKcj+0xde)[0x7b567e]
/opt/db/mysql/bin/mysqld(_ZN13ha_ndbcluster12open_indexesEP3THDP3NdbP8st_tableb+0x8b)[0x7b5b5b]
/opt/db/mysql/bin/mysqld(_ZN13ha_ndbcluster12get_metadataEP3THDPKc+0x223)[0x7b5e83]
/opt/db/mysql/bin/mysqld(_ZN13ha_ndbcluster4openEPKcij+0x2ec)[0x7cad6c]
/opt/db/mysql/bin/mysqld(_ZN7handler7ha_openEP8st_tablePKcii+0x3f)[0x71ea7f]
/opt/db/mysql/bin/mysqld(_Z21open_table_from_shareP3THDP14st_table_sharePKcjjjP8st_table15open_table_mode+0x2f6)[0x686e76]
/opt/db/mysql/bin/mysqld[0x6819dc]
/opt/db/mysql/bin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP11st_mem_rootPbj+0x6cb)[0x683a3b]
/opt/db/mysql/bin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjj+0x1d9)[0x684039]
/opt/db/mysql/bin/mysqld(_Z28open_and_lock_tables_derivedP3THDP10TABLE_LISTb+0x2c)[0x68484c]
/opt/db/mysql/bin/mysqld[0x639fbf]
/opt/db/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1f1e)[0x6406be]
/opt/db/mysql/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x171)[0x646b81]
/opt/db/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x57b)[0x6471db]
/opt/db/mysql/bin/mysqld(_Z10do_commandP3THD+0xde)[0x64848e]
/opt/db/mysql/bin/mysqld(handle_one_connection+0x1f0)[0x638250]
/lib64/libpthread.so.0[0x2aceed0c8143]
/lib64/libc.so.6(__clone+0x6d)[0x2aceed84ebed]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x179cbf0 = SELECT ID, COMPANYID, OBJECTID, TYPE, INTERNALID, COLLABORATIONID, CONFERENCINGID, ALIVE, LASTUPDATE, VERSION FROM MAPPING WHERE ((((ID = '?') AND (COMPANYID = '?')) AND (OBJECTID = '')) AND (TYPE = 2))
thd->thread_id=22383
thd->killed=NOT_KILLED

How to repeat:
Not repeatable at will for the moment.

Suggested fix:
Remove the bug! :)

Please, send the results of EXPLAIN for the crashing query (with variables substituted by some real values if needed), SHOW CREATE TABLE and SHOW TABLE STATUS results for that MAPPING table, to begin with.

trace after running it through c++filt:

/opt/db/mysql/bin/mysqld(my_print_stacktrace+0x33)[0x973633]
/opt/db/mysql/bin/mysqld(handle_segfault+0x324)[0x62e9b4]
/lib64/libpthread.so.0[0x2aceed0cfc00]
/opt/db/mysql/bin/mysqld(NdbDictionary::Column::getColumnNo() const+0x0)[0x953750]
/opt/db/mysql/bin/mysqld(NdbDictInterface::create_index_obj_from_table(NdbIndexImpl**, NdbTableImpl*, NdbTableImpl const*)+0x203)[0x9251d3]
/opt/db/mysql/bin/mysqld(InitIndex::init(NdbDictionaryImpl*, NdbTableImpl&) const+0x2d)[0x92b3cd]
/opt/db/mysql/bin/mysqld(NdbDictionaryImpl::fetchGlobalTableImplRef(GlobalCacheInitObject const&)+0x10a)[0x926fca]
/opt/db/mysql/bin/mysqld(NdbDictionary::Dictionary::getIndexGlobal(char const*, NdbDictionary::Table const&) const+0x8e)[0x95682e]
/opt/db/mysql/bin/mysqld(ha_ndbcluster::add_index_handle(THD*, NdbDictionary::Dictionary*, st_key*, char const*, unsigned int)+0xde)[0x7b567e]
/opt/db/mysql/bin/mysqld(ha_ndbcluster::open_indexes(THD*, Ndb*, st_table*, bool)+0x8b)[0x7b5b5b]
/opt/db/mysql/bin/mysqld(ha_ndbcluster::get_metadata(THD*, char const*)+0x223)[0x7b5e83]
/opt/db/mysql/bin/mysqld(ha_ndbcluster::open(char const*, int, unsigned int)+0x2ec)[0x7cad6c]
/opt/db/mysql/bin/mysqld(handler::ha_open(st_table*, char const*, int, int)+0x3f)[0x71ea7f]
/opt/db/mysql/bin/mysqld(open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, open_table_mode)+0x2f6)[0x686e76]
/opt/db/mysql/bin/mysqld[0x6819dc]
/opt/db/mysql/bin/mysqld(open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int)+0x6cb)[0x683a3b]
/opt/db/mysql/bin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int)+0x1d9)[0x684039]
/opt/db/mysql/bin/mysqld(open_and_lock_tables_derived(THD*, TABLE_LIST*, bool)+0x2c)[0x68484c]
/opt/db/mysql/bin/mysqld[0x639fbf]
/opt/db/mysql/bin/mysqld(mysql_execute_command(THD*)+0x1f1e)[0x6406be]
/opt/db/mysql/bin/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x171)[0x646b81]
/opt/db/mysql/bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x57b)[0x6471db]
/opt/db/mysql/bin/mysqld(do_command(THD*)+0xde)[0x64848e]
/opt/db/mysql/bin/mysqld(handle_one_connection+0x1f0)[0x638250]
/lib64/libpthread.so.0[0x2aceed0c8143]
/lib64/libc.so.6(__clone+0x6d)[0x2aceed84ebed]

so the crash comes down to:

  mysqld(my_print_stacktrace+0x33)[0x973633]
  mysqld(handle_segfault+0x324)[0x62e9b4]
  /lib64/libpthread.so.0[0x2aceed0cfc00] 
  mysqld(NdbDictionary::Column::getColumnNo() const+0x0)[0x953750]
  ...

with the last mysqld function called being:

  int 
  NdbDictionary::Column::getColumnNo() const {
    return m_impl.m_column_no; 
  }

no idea how that could possibly cause a segfault yet ...

That getColumnNo function consists of just 3 lines:

:000953750                 mov     rax, [rdi] <---- crash
:000953753                 mov     eax, [rax+30h]
:000953756                 retn

So we can assume m_impl was either invalid or null.
Now there's a discrepancy in the bug report.  Original crashing query refers to
column `ID` which doesn't exist in the given table.  Was it masked or changed?

Hmmm, interesting.

What just comes in my mind when I read your comments is that we usually create and modify the database through the SQL node who did NOT crash!

So it could be that the other 3 SQL nodes have not received some information about the table structure which was applied/changed on the first node.

When we run the load test it run on all 4 nodes.

Thank you for the feedback.

I can not reepat described behavior with test data. Please try to create repeatable test case or at least core file.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".