Bug #54901 assert during recovery when binlog enabled
Submitted: 29 Jun 2010 18:37 Modified: 26 Oct 2010 23:19
Reporter: Inaam Rana Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:mysql-trunk-innodb r3125 OS:Any
Assigned to: Sunny Bains CPU Architecture:Any
Triage: Triaged: D1 (Critical) / R2 (Low) / E3 (Medium)

[29 Jun 2010 18:37] Inaam Rana
Description:
When recovering from a kill -9 after moderate ibtest workload the following assertion is triggered:

100629 14:06:04 [ERROR] Invalid (old?) table or database name '-@'
InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
100629 14:06:04  InnoDB: Using Linux native AIO
100629 14:06:04  InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 316499434
100629 14:06:04  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 321741824
InnoDB: Doing recovery: scanned up to log sequence number 326984704
InnoDB: Doing recovery: scanned up to log sequence number 332227584
InnoDB: Doing recovery: scanned up to log sequence number 337470464
InnoDB: Doing recovery: scanned up to log sequence number 342439082
InnoDB: Transaction 0 was in the XA prepared state.
InnoDB: Transaction 0 was in the XA prepared state.
100629 14:06:05  InnoDB: Assertion failure in thread 3074672848 in file /home/inaam/w/plugin-1.1/storage/innobase/trx/trx0trx.c line 397
InnoDB: Failing assertion: trx->id > trx2->id
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.

The issue of note is that it says Transaction 0 is in XA prepared state. This trx number is definitely bogus. It subsequently chokes when we are trying to insert the trx in trx_sys->trx_list.

Note that sometimes when a trx is not caught in the XA prepared state the recovery goes fine but we see the following message:
InnoDB: Rolling back of trx id 0 completed
100629 14:04:58  InnoDB: Rollback of non-prepared transactions completed

Again trx id 0 appears bogus.

How to repeat:
enable UNIV_DEBUG
enable binlogging

run a workload (a few ibtests in parallel) for a minute or so.

kill -9

restart the server

If Michael can verify it and narrow it down to an easily repeatable test case that would be great.

Suggested fix:
From an initial code inspection it looks related to rseg/undo slot stuff:

trx0trx.c:trx_lists_init_at_db_start()

 440         while (rseg != NULL) {
 441                 undo = UT_LIST_GET_FIRST(rseg->insert_undo_list);
 442 
 443                 while (undo != NULL) {
 444 
 445                         trx = trx_create(trx_dummy_sess);
 446 
 447                         trx->is_recovered = TRUE;
 448                         trx->id = undo->trx_id;
 449                         trx->xid = undo->xid;
 450                         trx->insert_undo = undo;
 451                         trx->rseg = rseg;
 452 
 453                         if (undo->state != TRX_UNDO_ACTIVE) {
 454 
 455                                 /* Prepared transactions are left in
 456                                 the prepared state waiting for a
 457                                 commit or abort decision from MySQL */
 458 
 459                                 if (undo->state == TRX_UNDO_PREPARED) {
 460 
 461                                         fprintf(stderr,
 462                                                 "InnoDB: Transaction "
 463                                                 TRX_ID_FMT
 464                                                 " was in the"
 465                                                 " XA prepared state.\n",
 466                                                 (ullint) trx->id);

somehow rseg->trx_insert_undo_list has an undo slot with undo->trx_id = 0.

Since Sunny is our resident expert on rseg stuff I have assigned this to him. Meanwhile, before he comes back from vacation, I'll also try to look into it.
[8 Jul 2010 5:56] Sunny Bains
Inaam,

I can't reproduce this bug on my desktop. Can you post your my.cnf file here ?

Regards,
-sunny
[14 Jul 2010 1:32] Sunny Bains
I can't reproduce this bug.
[14 Jul 2010 17:19] Omer Barnir
From a discussion with Davi on #engineering - is not a compiler bug...

[09:51]	...
	<davi>	omer, have they read the description for the pure attribute? 
                that is bound to not work. it's not a gcc bug.
        ...
[09:56]	davi: thanks - not sure what they have done - will add it to the bug
[09:58]	<davi>	omer, the pure attribute should be use only when the function 
                has no effect other than the return value. if there is a 
                output value in one of the function parameters, it's the
               return value is no the only effect.
[09:59]	"it's the return value is no the only effect" -> "the return value 
        is not the only effect"
	<omer>	davi: I see - thanks for the explanation
[4 Aug 2010 9:00] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (merge vers: 5.6.99-m4) (pib:20)
[4 Aug 2010 9:05] Bugs System
Pushed into mysql-next-mr (revid:alik@ibmvm-20100804081630-ntapn8bf9pko9vj3) (version source revid:alik@sun.com-20100731075120-qz9z8c25zum2wgmm) (pib:20)
[26 Oct 2010 23:19] John Russell
Was added to change log some time ago, but I forgot to close the bug:

With binary logging enabled, InnoDB could halt during crash recovery
with a message referring to a transaction ID of 0.