Bug #36839 Version-commented disable/enable keys cripples slave in rewrite-db mode
Submitted: 21 May 2008 4:09 Modified: 11 Aug 2008 19:25
Reporter: Alex Tomic Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0.51a, 5.1.24 OS:Linux (CentOS)
Assigned to:
Tags: ALTER TABLE, memory corruption, replication

[21 May 2008 4:09] Alex Tomic
Description:
Environment:
CentOS 5 i386
5.0.22 master
5.0.51a and 5.1.24 slaves

When a slave is configured to rewrite the DB name, and an ALTER TABLE ... DISABLE/ENABLE KEYS statement is wrapped in a version-filtering comment (i.e. /*40000 ...*/), the thread executing the statement on the slave corrupts memory and the server becomes crippled, but does not necessarily crash entirely.

Statements like these cause the problem:
/*!41000 ALTER TABLE Rotation DISABLE KEYS */;
/*!40000 ALTER TABLE Rotation DISABLE KEYS */;
/*!50000 ALTER TABLE Rotation ENABLE KEYS */;

These seem ok:

/*!40000 ALTER TABLE Rotation ADD COLUMN blah2 INTEGER */;
/*!40000 INSERT INTO Rotation VALUES (102, 1,'1') */;
ALTER TABLE Rotation DISABLE KEYS;

So it would seem to be the combination of the version comment and the disable/enable keys. I haven't found any other statements that cause the problem, but i did not test extensively.

I built a debug version of mysql 5.0.51a and when I run one of the test cases above, I get a bunch of memory overrun messages in the error log :

Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_alloc.c:357'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_open.c:152'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_open.c:111'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'sql_base.cc:1905'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_open.c:152'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_alloc.c:201'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:134'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1046'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1059'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1079'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1080'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:312'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:339'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1079'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:1080'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_open.c:111'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_alloc.c:201'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'table.cc:873'
Error: Memory allocated at log_event.cc:1532 was overrun, discovered at 'my_open.c:152'

fyi,
log_event.cc:1532: Query_log_event::Query_log_event()
  if (!(start= data_buf = (Log_event::Byte*) my_malloc(catalog_len + 1 +
                                              time_zone_len + 1 +
                                              data_len + 1 +
                                              QUERY_CACHE_FLAGS_SIZE +
                                              db_len + 1,
                                              MYF(MY_WME))))

I tried attaching a debugger, but I'm not having a lot of luck tracking down the problem, because the behaviour is not consistent. This was also the case on our front-end production boxes -- some went down, others were "responsive" (could connect but queries would lock up). Sometimes when I try to attach a debugger and step through, I end up with an underrun message instead, like the below.

Error: Memory allocated at (null):262144 was underrun, discovered at my_alloc.c:357

How to repeat:
I am also attaching the following for reference:

- my.cnf files I used for master/slave
- a mysql trace I did from one of my reproductions
- original error log entries from production server and stack trace

This specific problem is not a huge issue as there is an obvious workaround, but it would be interesting to see if the bug that causes this crash affects anything else.

Suggested fix:
Unable to find cause for the bug.
[21 May 2008 4:12] Alex Tomic
Master my.cnf

Attachment: master-my.cnf (application/octet-stream, text), 1.29 KiB.

[21 May 2008 4:13] Alex Tomic
Slave my.cnf

Attachment: slave-my.cnf (application/octet-stream, text), 1.48 KiB.

[21 May 2008 4:15] Alex Tomic
Trace

Attachment: srv3_crash_1.log.gz (application/x-gzip, text), 37.91 KiB.

[21 May 2008 4:18] Alex Tomic
Stack trace from prod server:
0x81c44ed handle_segfault + 427
0xf8c402 (?)
0x12b451 (?)
0x15f1fb (?)
0x166f5d (?)
0x16a5b0 (?)
0x8481d8c my_no_flags_free + 100
0x824ec53 Query_log_event::exec_event(st_relay_log_info*, char const*, unsigned int) + 2137
0x824e3f8 Query_log_event::exec_event(st_relay_log_info*) + 24
0x82dcabc st_relay_log_info::cached_charset_compare(char*) + 1062
0x82de242 handle_slave_sql + 1270
0xcf82db (?)
0x1ce12e (?)
[11 Jul 2008 19:25] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with current version of master and test data. Please provide output of SHOW CREATE TABLE Rotation and master binary log with problem queires.
[11 Aug 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".