Bug #45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation failed
Submitted: 23 Jun 13:01 Modified: 12 Nov 13:20
Reporter: Philip Stoev
Status: Closed
Category:Server: Replication Severity:S2 (Serious)
Version:5.4 OS:Any
Assigned to: Zhenxing He Target Version:5.4+
Triage: Triaged: D2 (Serious)

[23 Jun 13:01] Philip Stoev
Description:
When executing queries against a master that has the semisync master module installed,
queries return

ERROR 1180 (HY000): Got error 1 during COMMIT

and the error log says:

090623 13:53:46 [ERROR] ActiveTranx:insert_tranx_node: transaction node allocation failed
for: (master-bin.000001, 347083)
090623 13:53:46 [ERROR] Error writing file
'/build/bzr/azalea/mysql-test/var/log/master-bin' (errno: 0)

The problem is that the query is actually COMMITTED on the master, even though an error
message is returned. This is a major no-no.

How to repeat:
1. Run

MTR_VERSION=1 perl mysql-test-run.pl \
--start-and-exit \
--mysqld=--plugin-dir=/build/bzr/azalea/plugin/semisync/.libs \
--mysqld=--plugin-load=rpl_semi_sync_master=libsemisync_master.so \
--mysqld=--rpl_semi_sync_master_enabled=1 \
rpl_alter

This is going to start a master and a slave (both are loaded with the semisync master
module only), however I do not think the slave matters in this case, since it is not
connected to the master via CHANGE MASTER.

2. Execute:

gentest-old.pl

From the mysql-test-extra-6.0 tree, mysql-test/gentest directory. Though I suspect that
any other sufficiently large queries will cause the problem to show up.

Suggested fix:
Apart from whatever the original cause of this bug is, transactions should never both
return an error *and* commit.
[28 Jun 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77401

2804 He Zhenxing	2009-06-28
      BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation
failed
      BUG#45673 Semisynch reports correct operation even if no slave is connected
      
      When semi-sync was enabled on master without any semi-sync slaves
      connected, it would still think that semi-sync status is ON and
      keep insert tranx node and finally result in tranx_node allocation
      error.
      
      This is fixed by not consider semi-sync master status as ON if
      no semi-sync slaves connected.
     @ plugin/semisync/semisync_master.cc
        do not consider semi-sync master status as ON if no semi-sync slaves connected
     @ plugin/semisync/semisync_slave_plugin.cc
        run slave in async mode if master disabled semi-sync
     @ sql/log.cc
        set error to 1 when flush binlog or run after_flush hooks fails
[28 Jun 15:26] Philip Stoev
He Zhenxing, I am not sure if messages "please contact the developer" are appropriate. I
think that ASSERT() is better in this situation.

Furthermore, can we fix the general issue where the COMMIT both returned an error and
committed the transaction so that this never happens regardless of the underlying issue
(or an ASSERT() is triggered)?
[28 Jun 15:47] Zhenxing He
Hi Philip,

I use assert() also, but since assert() will have no effect for release build, so I also
added the error message.

And the COMMIT issue is also fixed with this patch.
[2 Jul 12:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77754

2804 He Zhenxing	2009-07-02
      BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation
failed
      BUG#45673 Semisynch reports correct operation even if no slave is connected
      
      When semi-sync was enabled on master without any semi-sync slaves
      connected, it would still think that semi-sync status is ON and
      keep insert tranx node and finally result in tranx_node allocation
      error.
      
      This is fixed by not consider semi-sync master status as ON if
      no semi-sync slaves connected.
     @ plugin/semisync/semisync_master.cc
        do not consider semi-sync master status as ON if no semi-sync slaves connected
     @ plugin/semisync/semisync_slave_plugin.cc
        run slave in async mode if master disabled semi-sync
     @ sql/log.cc
        set error to 1 when flush binlog or run after_flush hooks fails
[7 Jul 4:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78070

2824 He Zhenxing	2009-07-07
      BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation
failed
      BUG#45673 Semisynch reports correct operation even if no slave is connected
      
      When semi-sync was enabled on master without any semi-sync slaves
      connected, it would still think that semi-sync status is ON and
      keep insert tranx node and finally result in tranx_node allocation
      error.
      
      This is fixed by not consider semi-sync master status as ON if
      no semi-sync slaves connected.
     @ plugin/semisync/semisync_master.cc
        do not consider semi-sync master status as ON if no semi-sync slaves connected
     @ plugin/semisync/semisync_slave_plugin.cc
        run slave in async mode if master disabled semi-sync
     @ sql/log.cc
        set error to 1 when flush binlog or run after_flush hooks fails
[23 Jul 12:24] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090723102221-ps4uaphwbxzj8p0q) (version
source revid:zhenxing.he@sun.com-20090707024417-efaow72lsf1f4rm8) (merge vers:
5.4.4-alpha) (pib:11)
[3 Aug 20:42] Paul DuBois
No changelog entry needed. Does not appear in any released version.
[26 Sep 6:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84703

3108 He Zhenxing	2009-09-26
      Backporting WL#4398 WL#1720
      Backporting BUG#44058 BUG#42244 BUG#45672 BUG#45673
      Backporting BUG#45819 BUG#45973 BUG#39012
[27 Oct 10:49] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version
source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers:
6.0.14-alpha) (pib:13)
[28 Oct 0:18] Paul DuBois
Noted in 6.0.14 changelog.

With semisynchronous replication enabled, the master considered
semisynchronous status to be on even with no slaves connected.
[28 Oct 0:19] Paul DuBois
Setting report to NDI pending push into 5.5.x.
[12 Nov 9:18] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version
source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta)
(pib:13)
[12 Nov 13:20] Jon Stephens
Already documented in the 5.5.0 changelog; re-closed.