Bug #10948 | Crash with ALTER/DROP table for NDB tables (HPUX and PPC32) | ||
---|---|---|---|
Submitted: | 30 May 2005 1:16 | Modified: | 13 Jun 2005 17:46 |
Reporter: | Stewart Smith | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 4.1, 5.0, 5.1, 5.1-2325 tree | OS: | HP/UX (HPUX 32bit, PPC 32bit) |
Assigned to: | Stewart Smith | CPU Architecture: | Any |
[30 May 2005 1:16]
Stewart Smith
[30 May 2005 1:17]
Stewart Smith
Have observed multiple times. Currently rebuilding latest pull to attempt to reproduce there
[30 May 2005 7:51]
Stewart Smith
could reproduce crash at the first ALTER statement. Appears to be the code in ha_ndbcluster::rename_table is not checking for the (currently unsupported) case of blob/no primary key tables and creating the event anyway. When we get an event, we then crash. i think. I was unable to crash mysqld with this check added. Am currently testing and will produce a sane looking patch.
[31 May 2005 3:02]
Stewart Smith
This is possibly a duplicate of 9826 (which is a problem in the 5.0 tree)
[2 Jun 2005 16:16]
Stewart Smith
Appears to be because of heap corruption rather than a race condition.
[3 Jun 2005 7:37]
Stewart Smith
This was caused by heap corruption. Thanks to a bit of time, valgrind and thinking, i've found the problem. in storage/ndb/src/ndbapi/DictCache.cpp, we have: Ndb_local_table_info * Ndb_local_table_info::create(NdbTableImpl *table_impl, Uint32 sz) { Uint32 tot_size= sizeof(NdbTableImpl *) + ((sz+7) & ~7); // round to Uint64 void *data= malloc(tot_size); if (data == 0) return 0; memset(data, 0, tot_size); new (data) Ndb_local_table_info(table_impl); return (Ndb_local_table_info *) data; } This (is meant to) allocate enough memory for the Ndb_local_table_info plus an arbitrary size for local info. In our case, we have (from ha_ndbcluster.cc): struct Ndb_local_table_statistics { int no_uncommitted_rows_count; ulong last_count; ha_rows records; }; But the tot_size calculation (in ::create) is incorrect. It asks for sizeof(NdbTableImpl *) and not sizeof(Ndb_local_table_info). This is a difference of 8bytes (on 32bit machines) due to the definition of Ndb_local_table info: class Ndb_local_table_info { public: static Ndb_local_table_info *create(NdbTableImpl *table_impl, Uint32 sz=0); static void destroy(Ndb_local_table_info *); NdbTableImpl *m_table_impl; Uint64 m_local_data[1]; private: Ndb_local_table_info(NdbTableImpl *table_impl); ~Ndb_local_table_info(); }; However, since m_local_data is going to point to the local data (that's the extra sz allocated in ::create), you would think this would be okay - as the "real" size of Ndb_local_table_info is just sizeof(NdbTableImpl*). On PPC at least, you're wrong. This simple C program: #include <stdio.h> struct a { void *a; unsigned long long b[1]; }; int main() { printf("%u\n",sizeof(struct a)); } Will output 16 on PPC and 12 on x86. It appears that on PPC we are getting things aligned to 64bits (possibly due to the 64bit sized member, i'm not sure). See http://www-128.ibm.com/developerworks/power/library/pa-dalign/ for an explanation of PPC alignment issues. so a malloc(sizeof(void*)+extra_bits) is wrong. with a malloc(sizeof(struct a)+N) you wast sizeof(long long), but it's better than crashing or, you can do malloc(sizeof(struct a)-sizeof(long long)+N) The GCC flag -Wpadded would have generated a warning for this. It is possible that it could be a good idea to enable this warning. Patch to come.
[3 Jun 2005 9:08]
Stewart Smith
Problem also exists in 4.1 tree (and 5.0, 5.1). Quite possibly has been causing problems elsewhere (HPUX is notable, see BUG#9826).
[4 Jun 2005 8:14]
Stewart Smith
Martin has informed me that Sparc is not affected. I don't think any other platforms would have this problem.
[4 Jun 2005 8:25]
Stewart Smith
Reviewed by Martin and Tomas.
[7 Jun 2005 2:50]
Stewart Smith
pushed to 4.1 and 5.0.
[13 Jun 2005 17:46]
Jon Stephens
Please do not submit the same bug more than once. An existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely to be the same. Because of this, we hope you add your comments to the original bug instead. Thank you for your interest in MySQL. Additional info: Duplicate of Bug#9826 (fixed/documented/closed).