Bug #61324 InnoDB: Failing assertion: page_get_n_recs(page) > 1 on slaves
Submitted: 27 May 2011 12:29 Modified: 23 Oct 2011 6:08
Reporter: Ger Apeldoorn Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.5.11 OS:Linux (RHEL 5.5)
Assigned to: CPU Architecture:Any
Tags: crash, failing assertion, linux, page_get_n_recs, replication, slave

[27 May 2011 12:29] Ger Apeldoorn
Description:
Hi,

My client is having problems with crashing slaves. In this case it is a MySQL 5.5.8 master server, with 5.5.11 slaves. (Master will be upgraded to 5.5.11 soon)

The problem has occurred on all slaves. The fault occurred when the slaves were version 5.5.8, and later when they were upgraded to 5.5.11 trying to resolve this issue.

One thing caught my attention; this error occurs regularly on all slaves:

110426 12:48:08 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
110426 12:48:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'svr06-bin.000059' at postion 1228148

When it actually crashes; this is the error:

110428 16:55:30 InnoDB: Assertion failure in thread 1216510272 in file /export/home/pb2/build/sb_0-3159149-1301581932.71/rpm/BUILD/mysql-5.5.11/mysql-5.5.11/storage/innobase/ibuf/ibuf0ibuf.c line 4130
InnoDB: Failing assertion: page_get_n_recs(page) > 1

My theory is that the 'lost connection during query' error causes some corruption and that the slave crashes when it is trying to access that record.

In 3 of the 6 crashes I investigated, this error was present: (5.5.8 and 5.5.11)

mysqld: /export/home/pb2/build/sb_0-2629600-1291401220.79/rpm/BUILD/mysql-5.5.8/mysql-5.5.8/mysys/my_new.cc:51: int __cxa_pure_virtual(): Assertion `! "Aborted: pure virtual method called."' failed.

Any help is very much appreciated!

Ger Apeldoorn

How to repeat:
AFAIK, it cannot be reproduced on demand.
[16 Jul 2011 14:40] Valeriy Kravchuk
Please, check if this problem ever happens with a newer version, 5.5.14.
[21 Jul 2011 8:36] Ger Apeldoorn
Hi,

This is a production system, is there a specific bug fixed that could have resolved this situation?

Regards,
Ger.
[15 Sep 2011 14:34] Sascha Curth
We had the same issue on a Solaris 10 10/09 "s10x_u8wos_08a X86" system with the most recent version 5.5.15

pkginfo -l mysql
   PKGINST:  mysql
      NAME:  MySQL Community Server (GPL)
  CATEGORY:  application
      ARCH:  i86pc
   VERSION:  5.5.15
   BASEDIR:  /opt/mysql
    VENDOR:  Sun Microsystems, Inc.
    PSTAMP:  Sun Microsystems, Inc. Build Engineers
  INSTDATE:  Sep 02 2011 09:02
     EMAIL:  build@mysql.com
    STATUS:  completely installed
     FILES:     6563 installed pathnames
                 144 directories
                  89 executables
             2668604 blocks used (approx)

110912 21:14:38 [Warning] Slave: Got error 10000 'Error on remote system: 1205: Lock wait timeout exceeded; try restarting transaction' from FEDERATED Error_code: 1296
110912 21:14:38 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.001374' position 308783542
110912 21:15:00 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.001374' at position 308783542, relay log './relay-bin.000720' position: 40348457
110915  0:03:51  InnoDB: Assertion failure in thread 4 in file ibuf0ibuf.c line 4185
InnoDB: Failing assertion: page_get_n_recs(page) > 1
InnoDB: We intentionally generate a memory trap.

We set the slave_net_timeout to 30 seconds and additionally installed some triggers. Until now, this bug does not occur on machines (with the same setup) which have not installed any triggers.

The last errors i remember were always related to:

"Slave: Got error 10000 'Error on remote system: 1205: Lock wait timeout exceeded; try restarting transaction' from FEDERATED Error_code: 1296"
[23 Sep 2011 6:08] Valeriy Kravchuk
Looks like a duplicate of bug #61104.
[23 Oct 2011 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".