MySQL Bugs: #32709: Assertion failed: trx

Bug #32709	Assertion failed: trx_data->empty(), file log.cc
Submitted:	26 Nov 2007 9:34	Modified:	18 Oct 2008 12:37
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.1.24-debug,6.0.5-debug	OS:	Windows
Assigned to:	Mats Kindahl	CPU Architecture:	Any

Description:
during some long running tests i found this crash:

071126 11:23:37 [ERROR] mysqld-debug: Out of memory at line 3134, '.\sql_class.cc'
071126 11:23:37 [ERROR] mysqld-debug: needed 2779103866 byte (2713969k), memory in use: 26766872 bytes (26140k)
071126 11:23:38 [ERROR] mysqld-debug: Out of memory at line 3134, '.\sql_class.cc'
071126 11:23:38 [ERROR] mysqld-debug: needed 2779103866 byte (2713969k), memory in use: 26766872 bytes (26140k)
Assertion failed: trx_data->empty(), file .\log.cc, line 1293

mysqld-debug.exe!_NMSG_WRITE
mysqld-debug.exe!abort
mysqld-debug.exe!_assert
mysqld-debug.exe!binlog_close_connection
mysqld-debug.exe!closecon_handlerton
mysqld-debug.exe!plugin_foreach_with_mask
mysqld-debug.exe!ha_close_connection
mysqld-debug.exe!THD::~THD
mysqld-debug.exe!THD::`scalar deleting destructor'
mysqld-debug.exe!unlink_thd
mysqld-debug.exe!one_thread_per_connection_end
mysqld-debug.exe!handle_one_connection
mysqld-debug.exe!pthread_start
mysqld-debug.exe!_threadstart

How to repeat:
Repeatable testcase is unknown at this point. Falcon tables were involved.
Server was started with clean datadir like:

mysqld-debug --console --skip-grant-tables --skip-name-resolve --log-bin --log --log-warn=2

Suggested fix:
I don't think the bug is the assertion.  Instead, it's the huge 2.7G memory allocation just before it that should be a cause for concern.

full stack trace and server error log fyi

Attachment: bug32709_outputs.txt (text/plain), 6.05 KiB.

the failing memory allocation is this:

bool Discrete_intervals_list::append(ulonglong start, ulonglong val,
ulonglong incr)
{
...
Discrete_interval *new_interval= new Discrete_interval(start, val, incr);
if (unlikely(new_interval == NULL)) // out of memory
   DBUG_RETURN(1);
...
}

Test case for bug 32709

Attachment: bug32709.test (application/octet-stream, text), 1.98 KiB.

I was able to reproduce the assertion with the attached test case. To run:

# Start master and slave
$ perl mysql-test-run.pl --start-and-exit --skip-ndb --mysqld=--log-bin rpl_alter

# Run test case
$  perl mysql-test-run.pl --extern --socket=var/tmp/master.sock --user=root bug32709

The test case will complete , however the server will crash. Some observations:

* --log-bin appears to be required
* Innodb is not affected
* I did not see the memory allocation error Shane observed

i encountered this assertion on 5.1.24-debug today.

Assertion failed: trx_data->empty(), file .\log.cc, line 1329
080516 15:52:18 - mysqld got exception 0x80000003 ;

mysqld-debug.exe!_NMSG_WRITE
mysqld-debug.exe!abort
mysqld-debug.exe!_wassert
mysqld-debug.exe!binlog_close_connection
mysqld-debug.exe!closecon_handlerton

The assertion indicates that there were data left in the transaction cache on server shutdown, which means that a transaction did not commit or rollback properly. In this case, the failure is probably caused by the the out-of-memory condition causing the transaction to rollback, with a rollback of a transactional engine not being handled correctly by the binary log.

An easier way to reproduce the error is probably (untested):

  CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
  INSERT INTO t1 VALUES (1),(2),(3);
  INSERT INTO t1 VALUES (1),(2),(3);  // Duplicate entry error
  INSERT INTO t1 VALUES (4);

which would cause the 4 to not be properly replicated to the slave.

See BUG#38262 for a test case that triggers this error, and which is the one that pointed me to the problem.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51701

2658 Mats Kindahl	2008-08-15
      Bug #32709: Assertion failed: trx_data->empty(), file log.cc
      
      The assertion indicates that some data was left in the transaction
      cache when the server was shut down, which means that a previous
      statement did not commit or rollback correctly.
      
      What happened was that a bug in the rollback of a transactional
      table caused the transaction cache to be emptied, but not reset.
      The error can be triggered by having a failing UPDATE or INSERT,
      on a transactional table, causing an implicit rollback.
      
      Fixed by always flushing the pending event to reset the state
      properly.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52095

2658 Mats Kindahl	2008-08-20
      Bug #32709: Assertion failed: trx_data->empty(), file log.cc
      
      The assertion indicates that some data was left in the transaction
      cache when the server was shut down, which means that a previous
      statement did not commit or rollback correctly.
      
      What happened was that a bug in the rollback of a transactional
      table caused the transaction cache to be emptied, but not reset.
      The error can be triggered by having a failing UPDATE or INSERT,
      on a transactional table, causing an implicit rollback.
      
      Fixed by always flushing the pending event to reset the state
      properly.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53198

2679 Mats Kindahl	2008-09-03
      Bug #32709: Assertion failed: trx_data->empty(), file log.cc
      
      The assertion indicates that some data was left in the transaction
      cache when the server was shut down, which means that a previous
      statement did not commit or rollback correctly.
      
      What happened was that a bug in the rollback of a transactional
      table caused the transaction cache to be emptied, but not reset.
      The error can be triggered by having a failing UPDATE or INSERT,
      on a transactional table, causing an implicit rollback.
      
      Fixed by always flushing the pending event to reset the state
      properly.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53204

2680 Mats Kindahl	2008-09-03
      BUG#32709: Assertion failed: trx_data->empty(), file log.cc
      
      Incremental fixes: updating a comment and fixing a result file.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53228

2680 Mats Kindahl	2008-09-03
      BUG#32709: Assertion failed: trx_data->empty(), file log.cc
      
      Incremental fixes: updating a comment and fixing a result file.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/53230

2680 Mats Kindahl	2008-09-03
      BUG#32709: Assertion failed: trx_data->empty(), file log.cc
      
      Incremental fixes: updating a comment and fixing a result file.

Pushed into 5.1.29  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:kgeorge@mysql.com-20080910094421-1i1kxv3n1bxskiqa) (pib:3)

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/54543

2698 He Zhenxing	2008-09-26 [merge]
      Merge 5.1 main -> 5.1-rpl

Pushed into 5.1.29  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:kgeorge@mysql.com-20080910094421-1i1kxv3n1bxskiqa) (pib:4)

Set status to NDI pending merge to 6.0.

Pushed into 6.0.8-alpha  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:kpettersson@mysql.com-20080911114255-81pt7q1uvl1fkojq) (pib:5)

Documented bugfix in the 5.1.29 and 6.0.8 changelogs as follows:

        A statement did not always commit or roll back correctly when
        the server was shut down; the error could be triggered by having
        a failing UPDATE or INSERT on a transactional table, causing an implicit 
        rollback.

Closed.

Pushed into 5.1.29-ndb-6.2.17  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:tomas.ulin@sun.com-20081028140209-u4emkk1xphi5tkfb) (pib:5)

Pushed into 5.1.29-ndb-6.3.19  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:tomas.ulin@sun.com-20081028194045-0353yg8cvd2c7dd1) (pib:5)

Pushed into 5.1.29-ndb-6.4.0  (revid:mats@mysql.com-20080903205254-p8ij48ryqg9gzsjz) (version source revid:jonas@mysql.com-20081101082305-qx5a1bj0z7i8ueys) (pib:5)

I encountered the same error in MySQL 5.1.31-enterprise-gpl-advanced-log, bug#32709 only tells that the patch was push into 5.1.29_ndb_...,however my question is : when new patch will push into 5.1.3X GPL non-ndb version?

the error info is as follows on a slave of replication group:

Last_SQL_Error: Could not execute Delete_rows event on table 

myjob_wa.wa_sec_job_index_batch_group; Can't find record in 'wa_sec_job_index_batch_group', 

Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin-218.000036, 

end_log_pos 336238253

I am facing the same issue in Version: '5.1.34-log' as below:

 Slave SQL: Could not execute Delete_rows event on table emlplatform.Alarm; Can't find record in 'Alarm', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log emlplatform-bin.000001, end_log_pos 2198836, Error_code: 1032

It seems the fix is pushed in 5.1.29 and in 6.0 branch. Is the patch also present in 5.1.34-log?

@wu @sujoy your replication problems have nothing to do with this bug.
this bug is a crash in a debug server.  you should upgrade to 5.1.41 or newer, and if there's still a problem, then consider opening a new bug. many RBR bugs have been fixed already.

@Shane, my replication problem is the one reported as http://bugs.mysql.com/bug.php?id=37317 which got closed as duplicate of current ticket(id=32709) and got fixed in 5.1.29.

So, do you mean the patch isn't there for 5.1.34 and i need to upgrade to 5.1.41 to get this working?

I am encountering the exact same issue with 5.1.37 using row based replication on a linux system.  

Could not execute Delete_rows event on table cls.sessions; Can't find record in 'sessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000008, end_log_pos 827481.

If this fix was committed in 5.1.29, why am I still getting the error in 5.1.37?

We are also getting the same kind of replication error frequently.
Last_SQL_Error: Could not execute Delete_rows event on table 
myjob_wa.wa_sec_job_index_batch_group; Can't find record in
Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log

We thought the bug was solved in the recent MySQL community version 5.1.44
but it's not??

Please reopen this ticket, it's not fixed yet.

Bug 37317 relates to this one, and there are several folks who have noted their issues there.

I myself am also getting this on MySQL 5.1.41-community-log:

Could not execute Delete_rows event on table foo.bar; Can't find record in 'bar', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.004009, end_log_pos 760660166

the master of this machine has:

no triggers
READ-COMMITTED isolation level
MIXED binlog format
AUTOCOMMIT on

and the table itself is indeed InnoDB.

To people who get the following error message:

Could not execute Delete_rows event on table <DATABASE>.<TABLE>; Can't find record in '<TABLE>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <FILENAME>, end_log_pos <NUMBER>

When you see this error in the error log or in the output from SHOW SLAVE STATUS (the Last_SQL_Error column), there are many possible reasons. The present bug used to be one possible reason. The bug was fixed in 5.1.41, so it is not a possible reason any more.

==== What does the message mean? ====

Let's look at the message in detail. The error message means this:
- The master is using row-based replication, i.e., binlog_format=row or
binlog_format=mixed.
- The master has executed a statement that modifies or removes an existing
row from the table referred to by the error message. I.e., normally an
UPDATE or DELETE statement (or one of the following more unusual statements:
REPLACE, INSERT...ON DUPLICATE KEY UPDATE, or LOAD DATA INFILE...REPLACE).
- When the replication slave tried to re-execute the statement, it did not
find the row that it was going to modify or remove.

In other words, your slave is missing one or more rows that exists on master. So you need to figure out why the slave has gone out of sync. The possible reasons fall in these categories:
- a known bug (some possible candidates: BUG#50440, BUG#50439, BUG#49482,
BUG#49481, BUG#40045, BUG#39753, BUG#47312, BUG#51501, BUG#52868, or any
other bug where the slave goes out of sync);
- a user error (mistake in application logic or during manual intervention
with the database);
- a not yet reported bug.

==== How to act to find out the reason for the message ====

1. Figure out when and why your slave went out of sync. Try this:
1.1 Use the binlog positions in the message together with mysqlbinlog to
find out what row is missing on the slave.
1.2 Use your application-specific knowledge to figure out when this row
was last updated on master.
1.3 Use your application-specific knowledge to figure out if anything
unusual was going on when the record was last updated on the master, or
when it was replayed on the slave.

2. Now see what action to take next. There are four possibilities:
2.1 If you found out that there was a user error, remove the reason for the
error and try to repair your database.
2.2 Check if the preconditions for one or more of the bugs listed above
applies to you. For each candidate bug:
2.2.1 If the bug is closed: see if you can fix the problem by upgrading
to the version where it was fixed.
2.2.2 If the bug is open: see if there is a known workaround and wait
until we fix the bug.
2.3 If it is not an error in your application logic and you have checked
that it can not be any of the bugs above, file a new bug. As usual, be
specific about the reason and provide all details. In particular, your
table schemas and data may be useful, as well as the output from
"mysqlbinlog --hexdump" for relevant parts of the binary logs and relay
logs. If your report is not at least so specific that we can be sure
the reason is different from the bugs above, then it will be impossible
for us to fix the bug no matter how hard we try!