Bug #19885 | memory corruption with ndb binlog in conjunction with online alter table | ||
---|---|---|---|
Submitted: | 17 May 2006 14:08 | Modified: | 19 May 2006 23:39 |
Reporter: | Kristian Nielsen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Replication | Severity: | S1 (Critical) |
Version: | 5.1.11 | OS: | Linux (Linux/x86 (32 bit)) |
Assigned to: | Tomas Ulin | CPU Architecture: | Any |
[17 May 2006 14:08]
Kristian Nielsen
[17 May 2006 14:24]
Kristian Nielsen
With a --debug run it crashes earlier when trying to dump the table: #0 0xb7f429d3 in pthread_kill () from /lib/tls/libpthread.so.0 #1 0x083fbf4a in write_core (sig=11) at stacktrace.c:220 #2 0x0824223d in handle_segfault (sig=11) at mysqld.cc:2151 #3 <signal handler called> #4 0x08392a03 in dbug_print_table (info=0x8816d98 "table", table=0x8d69a98) at ha_ndbcluster_binlog.cc:169 #5 0x08399bbe in ndb_binlog_thread_handle_data_event (ndb=0x8d04c10, pOp=0x8cc5050, row=@0xb79ce2cc, trans=@0xb79ce36c) at ha_ndbcluster_binlog.cc:2884 #6 0x0839cd64 in ndb_binlog_thread_func (arg=0x0) at ha_ndbcluster_binlog.cc:3516 #7 0xb7f3fced in start_thread () from /lib/tls/libpthread.so.0 #8 0xb7e69dee in clone () from /lib/tls/libc.so.6 (gdb) frame 5 #5 0x08399bbe in ndb_binlog_thread_handle_data_event (ndb=0x8d04c10, pOp=0x8cc5050, row=@0xb79ce2cc, trans=@0xb79ce36c) at ha_ndbcluster_binlog.cc:2884 (gdb) p i $6 = 4 (gdb) p *f $7 = {_vptr.Field = 0x803ff199, ptr = 0x0, null_ptr = 0x1e80301 <Address 0x1e80301 out of bounds>, table = 0x10001, orig_table = 0x100, table_name = 0x100, field_name = 0x0, comment = { str = 0x21020200 <Address 0x21020200 out of bounds>, length = 1963003610}, query_id = 1245845979797979175, add_index = false, key_start = { map = 17515}, part_of_key = {map = 0}, part_of_sortkey = {map = 0}, unireg_check = NONE, field_length = 7, field_index = 4, flags = 128, fieldnr = 5, null_bit = 8 '\b'} This time, it is field 4 that is (severely) corrupted.
[18 May 2006 7:24]
Kristian Nielsen
This Valgrind report may be relevant, it seems master1 is writing into free()'ed memory: ==15146== Invalid write of size 1 ==15146== at 0x866E9A4: NdbRecAttr::receive_data(unsigned const*, unsigned) (NdbRecAttr.cpp:131) ==15146== by 0x8688A14: NdbEventOperationImpl::receive_data(NdbRecAttr*, unsigned const*, unsigned) (NdbEventOperationImpl.hpp:522) ==15146== by 0x868298F: NdbEventOperationImpl::receive_event() (NdbEventOperationImpl.cpp:758) ==15146== by 0x86839D1: NdbEventBuffer::nextEvent() (NdbEventOperationImpl.cpp:1144) ==15146== by 0x8631727: Ndb::nextEvent() (Ndb.cpp:1329) ==15146== by 0x839CE2E: ndb_binlog_thread_func (ha_ndbcluster_binlog.cc:3458) ==15146== by 0x404BC36: pthread_start_thread (manager.c:310) ==15146== by 0x41C32B9: clone (clone.S:119) ==15146== Address 0x472F78A is 42 bytes inside a block of size 996 free'd ==15146== at 0x401CF37: free (vg_replace_malloc.c:235) ==15146== by 0x86DEE2E: _myfree (safemalloc.c:314) ==15146== by 0x86DE187: free_root (my_alloc.c:347) ==15146== by 0x829D070: closefrm(st_table*, bool) (table.cc:1604) ==15146== by 0x839322D: ndbcluster_binlog_close_table(THD*, st_ndbcluster_share*) (ha_ndbcluster_binlog.cc:259) ==15146== by 0x8395C30: ndb_handle_schema_change(THD*, Ndb*, NdbEventOperation*, st_ndbcluster_share*) (ha_ndbcluster_binlog.cc:1539) ==15146== by 0x8399F57: ndb_binlog_thread_handle_non_data_event(THD*, Ndb*, NdbEventOperation*, Binlog_index_row&) (ha_ndbcluster_binlog.cc:2884) ==15146== by 0x839D77B: ndb_binlog_thread_func (ha_ndbcluster_binlog.cc:3612) ==15146== by 0x404BC36: pthread_start_thread (manager.c:310) ==15146== by 0x41C32B9: clone (clone.S:119)
[18 May 2006 7:35]
Kristian Nielsen
The write into free()'ed memory is when nextEvent() copies data into NdbRecAttr::theValue. Note that the memory is freed in closefrm() calling free_root(). It seems that NdbRecAttr::theValue is allocated on the wrong memroot, being free()'ed while the pointer is still active.
[18 May 2006 8:40]
Kristian Nielsen
I tried to comment out the free_root() in closefrm(), and the crash goes away (but a leek is introduced of course). So the bug here clearly seems to be that NdbRecAttr.theValue is allocated on the table mem_root, and then when the table is closefrm()'ed the NdbRecAttr.theValue pointers are still around. The fix must be either to kill the bad theValue pointers prior to closefrm(), or to allocate and free them with another mechanism than table->mem_root.
[18 May 2006 9:18]
Jonas Oreland
testing again...
[19 May 2006 13:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6634
[19 May 2006 21:31]
Tomas Ulin
reviewed by martin
[19 May 2006 21:32]
Tomas Ulin
pushed to 5.1.11
[19 May 2006 23:39]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented fix in 5.1.11 changelog.