Bug #65979 | All SQL Nodes SIGSEGV when writing dupe data to tables where PK spans all cols | ||
---|---|---|---|
Submitted: | 22 Jul 2012 12:12 | Modified: | 5 Sep 2014 9:55 |
Reporter: | Jay Ward | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 5.5.25a-ndb-7.2.7-gpl-log | OS: | Linux (CentOS release 6.3 (Final) 2.6.32-279.1.1.el6.x86_64) |
Assigned to: | CPU Architecture: | Any | |
Tags: | binary log, ndb_binlog_index, next_filename, next_position, SIGSEGV |
[22 Jul 2012 12:12]
Jay Ward
[22 Jul 2012 12:22]
Jay Ward
ndb_error_reporter data uploaded to ftp.oracle.com/support/incoming as bug-data-65979.bz2
[22 Jul 2012 13:02]
Jay Ward
As a work around, you can drop the next_file and next_position fields from the mysql.ndb_binlog_index table.
[24 Jul 2012 20:42]
Sveta Smirnova
Thank you for the report. I can not repeat described behavior with current version 7.2.7. Please try this version and inform us if problem still exists.
[3 Aug 2012 15:10]
Jay Ward
Behavior persists after moving to 5.5.25a-ndb-7.2.7-gpl-log: key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=201 max_threads=200 thread_count=3 connection_count=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 172277 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0xeac3af0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 41c3e0f8 thread_stack 0x40000 /usr/local/mysql/bin//mysqld(my_print_stacktrace+0x35)[0x81b5b5] /usr/local/mysql/bin//mysqld(handle_fatal_signal+0x403)[0x6e8413] /lib64/libpthread.so.0[0x32f5a0ebe0] /lib64/libc.so.6(strlen+0x30)[0x32f5278630] /usr/local/mysql/bin//mysqld[0x84c897] /usr/local/mysql/bin//mysqld(ndb_binlog_thread_func+0x1af4)[0x854564] /lib64/libpthread.so.0[0x32f5a0677d] /lib64/libc.so.6(clone+0x6d)[0x32f52d33ed] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): is an invalid pointer Connection ID (thread ID): 1 Status: NOT_KILLED Here are the binlog specific params in the my.cnf file: ndb_report_thresh_binlog_mem_usage=20 sync-binlog=1 binlog-stmt-cache-size=2G binlog-cache-size=2G binlog-row-event-max-size=5242880 default-storage-engine=ndbcluster server-id=1093 log-bin=binlog binlog_format=MIXED Can you point me in the direction of where first->next_master_log_file is assigned so I can find out what's causing it to be null at that point?
[3 Aug 2012 15:58]
Jay Ward
I created a test sql node using RPMs for RedHat EL6 rather than using the binary tarball, and was not able to reproduce. So I will reinstall my nodes using RPMs. Setting this to "Not a Bug"
[17 Oct 2012 9:27]
Hendrik Woltersdorf
If get the same problem at Centos 5.* with version 5.5.27-ndb-7.2.8-gpl-log (64 bit). I tried to solve it by installing rpms, but that did not solve the problem.
[19 Oct 2012 19:28]
Hendrik Woltersdorf
I tested this again with 7.2.8 on Centos 6.3 with generic binaries and RPMs. Both versions crash when doing "replace into" a table with only pk columns.
[22 Oct 2012 8:29]
Hendrik Woltersdorf
For more tests I got the source code and patched it, to not crash at: ->store(first->next_master_log_file, (uint)strlen(first->next_master_log_file), &my_charset_bin); by not filling the columns next_file and next_position in ndb_binlog_index if first->next_master_log_file == NULL. This works. During these tests I saw, that in the crash case nothing gets written to the binary log. That makes sense, because nothing has changed eventually. But what makes no sense to me is, that a row gets written to ndb_binlog_index when nothing gets written to the binary log. That might explain, why input data is missing.
[22 Oct 2012 17:12]
Bernd Ocklin
I re-open the bug. Did you ever experience this in 7.1 (or older)?
[23 Oct 2012 5:16]
Hendrik Woltersdorf
No, I have not seen this kind of crash in 7.1.
[7 Nov 2012 9:07]
Magnus BlÄudd
Thanks for good bug report.
[28 Jan 2013 9:28]
Hendrik Woltersdorf
In one of my comments I mentioned a patch, to avoid the crashes. We use a server of version 7.2.8 with this patch in production now, and here it is: --- ha_ndbcluster_binlog.cc.old 2012-08-22 09:04:01.000000000 -0700 +++ ha_ndbcluster_binlog.cc 2012-11-25 08:52:54.000000000 -0800 @@ -3826,12 +3826,19 @@ if (ndb_binlog_index->s->fields > NBICOL_NEXT_POS) { /* Table has next log pos fields, fill them in */ + if(first->next_master_log_file == NULL) + { + sql_print_error("NDB Binlog: NULL in next_master_log_file (%llu, %u/%u), position %llu, file %s, ins/upd/del %u %u %u, schemaops %u",epoch,uint(epoch >> 32),uint(epoch),first->start_master_log_pos,first->start_master_log_file,row->n_inserts,row->n_updates,row->n_deletes,row->n_schemaops); + } + else + { ndb_binlog_index->field[NBICOL_NEXT_POS] ->store(first->next_master_log_pos, true); ndb_binlog_index->field[NBICOL_NEXT_FILE] ->store(first->next_master_log_file, (uint)strlen(first->next_master_log_file), &my_charset_bin); + } } row= row->next; }
[5 Sep 2014 9:55]
Santo Leto
Hi, We believe this bug has been fixed (indirectly) by fixing another bug If you are using 7.0, please upgrade to 7.0.38 or higher If you are using 7.1, please upgrade to 7.1.27 or higher If you are using 7.2, please upgrade to 7.2.11 or higher Then please test again. If you still see this problem after the upgrade, please reopen this bug. Setting status to "Can't Repeat" for now Many Thanks,