Bug #19459 BINLOG RBR command does not lock tables correctly causing crash for e.g. ndb
Submitted: 1 May 2006 18:19 Modified: 26 Oct 2006 3:07
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:5.1.10 OS:Linux (Linux 32 Bit OS)
Assigned to: Mats Kindahl CPU Architecture:Any

[1 May 2006 18:19] Jonathan Miller
Description:
./mysql-test-run.pl --do-test=rpl_row_mysqlbinlog --mysqld=--default-storage-engine=ndb --force --with-ndbcluster-all --valgrind

NOTE: This is on fresh clone.

Stack Trace:
0x86a67aa _my_thread_var + 20
0x83e2a22 _ZN13ha_ndbcluster9write_rowEPc + 476
0x832be05 _ZN7handler12ha_write_rowEPc + 29
0x82ee1f5 _Z14replace_recordP3THDP8st_table + 739
0x82ee261 _ZN20Write_rows_log_event11do_exec_rowEP8st_table + 55
0x82eb3aa _ZN14Rows_log_event10exec_eventEP17st_relay_log_info + 1376
0x83c0649 _Z29mysql_client_binlog_statementP3THD + 741
0x825dd0c _Z21mysql_execute_commandP3THD + 34494
0x825e074 _Z11mysql_parseP3THDPcj + 584
0x825ea6e _Z16dispatch_command19enum_server_commandP3THDPcj + 2148
0x825fdcd _Z10do_commandP3THD + 535
0x82601c7 handle_one_connection + 787
0xa25b80 (?)
0x97d9ce (?)

Valgrind:

060501 21:06:32 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t1
060501 21:06:32 [Note] NDB Binlog: logging ./test/t1
060501 21:06:33 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t2
060501 21:06:33 [Note] NDB Binlog: logging ./test/t2
060501 21:06:34 [Note] NDB Binlog: CREATE TABLE Event: REPL$test/t3
060501 21:06:34 [Note] NDB Binlog: logging ./test/t3
==5149==
==5149== Thread 20:
==5149== Invalid read of size 4
==5149==    at 0x859CF7E: NdbTransaction::getNdbOperation(NdbTableImpl const*, NdbOperation*) (NdbTransaction.cpp:1074)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==    by 0x825E073: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5887)
==5149==    by 0x825EA6D: dispatch_command(enum_server_command, THD*, char*, unsigned) (sql_parse.cc:1762)
==5149==    by 0x825FDCC: do_command(THD*) (sql_parse.cc:1548)
==5149==    by 0x82601C6: handle_one_connection (sql_parse.cc:1190)
==5149==    by 0xA25B7F: start_thread (in /lib/libpthread-2.3.5.so)
==5149==    by 0x97D9CD: clone (in /lib/libc-2.3.5.so)
==5149==  Address 0xC8 is not stack'd, malloc'd or (recently) free'd
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

How to repeat:
see above
[1 May 2006 18:20] Jonathan Miller
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=5
max_connections=100
threads_connected=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 39420 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x1f792988
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x21ae50ec, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x82389ce
0xa2b3e8
==5149==
==5149== Conditional jump or move depends on uninitialised value(s)
==5149==    at 0x8EF3FA: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==    by 0x825E073: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5887)
==5149==
==5149== Use of uninitialised value of size 4
==5149==    at 0x8ECB7D: _itoa_word (in /lib/libc-2.3.5.so)
==5149==    by 0x8EF48E: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==
==5149== Use of uninitialised value of size 4
==5149==    at 0x8ECB93: _itoa_word (in /lib/libc-2.3.5.so)
==5149==    by 0x8EF48E: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
[1 May 2006 18:21] Jonathan Miller
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==
==5149== Conditional jump or move depends on uninitialised value(s)
==5149==    at 0x8EF4E7: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==    by 0x825E073: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5887)
==5149==
==5149== Conditional jump or move depends on uninitialised value(s)
==5149==    at 0x8EF545: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==    by 0x825E073: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5887)
==5149==
==5149== Conditional jump or move depends on uninitialised value(s)
==5149==    at 0x8EF5AA: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8ED986: buffered_vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8EDC36: vfprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x8F690D: fprintf (in /lib/libc-2.3.5.so)
==5149==    by 0x838B538: print_stacktrace (stacktrace.c:158)
==5149==    by 0x82389CD: handle_segfault (mysqld.cc:2118)
==5149==    by 0xA2B3E7: (within /lib/libpthread-2.3.5.so)
==5149==    by 0x859D20C: NdbTransaction::getNdbOperation(NdbDictionary::Table const*) (NdbTransaction.cpp:1119)
==5149==    by 0x83E2A21: ha_ndbcluster::write_row(char*) (ha_ndbcluster.cc:2542)
==5149==    by 0x832BE04: handler::ha_write_row(char*) (handler.cc:3300)
==5149==    by 0x82EE1F4: replace_record(THD*, st_table*) (log_event.cc:6252)
==5149==    by 0x82EE260: Write_rows_log_event::do_exec_row(st_table*) (log_event.cc:6338)
==5149==    by 0x82EB3A9: Rows_log_event::exec_event(st_relay_log_info*) (log_event.cc:5455)
==5149==    by 0x83C0648: mysql_client_binlog_statement(THD*) (sql_binlog.cc:109)
==5149==    by 0x825DD0B: mysql_execute_command(THD*) (sql_parse.cc:5109)
==5149==    by 0x825E073: mysql_parse(THD*, char*, unsigned) (sql_parse.cc:5887)
[1 May 2006 18:21] Jonathan Miller
0x86a67aa
0x83e2a22
0x832be05
0x82ee1f5
0x82ee261
0x82eb3aa
0x83c0649
0x825dd0c
0x825e074
0x825ea6e
0x825fdcd
0x82601c7
0xa25b80
0x97d9ce
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1f7bb4f8  is invalid pointer
thd->thread_id=7
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
==5149==
==5149== ERROR SUMMARY: 222 errors from 13 contexts (suppressed: 21 from 1)
==5149== malloc/free: in use at exit: 24184040 bytes in 43913 blocks.
==5149== malloc/free: 84498 allocs, 40585 frees, 1184392048 bytes allocated.
==5149== For counts of detected errors, rerun with: -v
==5149== searching for pointers to 43913 not-freed blocks.
==5149== checked 100962556 bytes.
==5149==
[5 May 2006 19:58] Tomas Ulin
shortened .sql file to reproduce the bug

Attachment: m.sql (text/x-sql), 5.06 KiB.

[5 May 2006 20:12] Tomas Ulin
e.g. run

./mysql-test-run.pl --debug --skip-master-binlog --start-and-exit alias

mysql < m.sql    (using attached file)

viewing the trace... for the second binlog row statement on table id 19

T@163851: | | | >Rows_log_event::exec_event(st_relay_log_info*)
T@163851: | | | | >mysql_init_query
T@163851: | | | | | >lex_start
T@163851: | | | | | | >alloc_root
T@163851: | | | | | | | enter: root: 0x92d6250
T@163851: | | | | | | | exit: ptr: 0x9351d50
T@163851: | | | | | | <alloc_root
T@163851: | | | | | <lex_start
T@163851: | | | | | >mysql_reset_thd_for_next_command
T@163851: | | | | | <mysql_reset_thd_for_next_command
T@163851: | | | | <mysql_init_query
T@163851: | | | | >lock_tables
T@163851: | | | | <lock_tables

I.e. tables are not locked.... seems myisam can do alright without correct locking...   but e.g. ndb cannot

also if I understand correctly the handler+table object will be returned to the cache after the previous unlock_table and close _thread table...  hence the table mapping will be pointing to a table object not owned by the thread...
[2 Oct 2006 13:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12946

ChangeSet@1.2212, 2006-10-02 15:05:05+02:00, mats@romeo.(none) +5 -0
  BUG#19459 (BINLOG RBR command does not lock tables correctly causing
  crash for, e.g., NDB):
  Submitting patch to base64_decode() adding extra parameter.
[2 Oct 2006 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12947

ChangeSet@1.2213, 2006-10-02 15:08:40+02:00, mats@romeo.(none) +2 -0
  BUG#19459 (BINLOG RBR command does not lock tables correctly causing
  crash for, e.g., NDB):
  Adding new function my_b_copy_to_file() to copy an IO_CACHE to a file.
[2 Oct 2006 13:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12949

ChangeSet@1.2214, 2006-10-02 15:24:47+02:00, mats@romeo.(none) +4 -0
  BUG#19459 (BINLOG RBR command does not lock tables correctly causing
  crash for, e.g., NDB):
  
  Changing row-based functions to print one BINLOG statement for all
  events making up a statement.
  
  Changing BINLOG statement to be able to read the emitted format.
[6 Oct 2006 8:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13221

ChangeSet@1.2214, 2006-10-06 10:17:02+02:00, mats@romeo.(none) +5 -0
  BUG#19459 (BINLOG RBR command does not lock tables correctly causing
  crash for, e.g., NDB):
  
  Before, mysqlbinlog printed table map events as a separate statement, so
  when executing the event, the opened table was subsequently closed
  when the statement ended. Instead, the row-based events that make up
  a statement are now printed as *one* BINLOG statement, which means
  that the table maps and the following *_rows_log_event events are
  executed fully before the statement ends.
  
  Changing implementation of BINLOG statement to be able to read the 
  emitted format, which now consists of several chunks of BASE64-encoded
  data.
[11 Oct 2006 10:48] Lars Thalmann
Pushed into 5.1.12
[26 Oct 2006 3:07] Paul DuBois
Noted in 5.1.12 changelog.