MySQL Bugs: #61324: InnoDB: Failing assertion: page_get_n

Bug #61324	InnoDB: Failing assertion: page_get_n_recs(page) > 1 on slaves
Submitted:	27 May 2011 12:29	Modified:	23 Oct 2011 6:08
Reporter:	Ger Apeldoorn	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.5.11	OS:	Linux (RHEL 5.5)
Assigned to:		CPU Architecture:	Any
Tags:	crash, failing assertion, linux, page_get_n_recs, replication, slave

Description:
Hi,

My client is having problems with crashing slaves. In this case it is a MySQL 5.5.8 master server, with 5.5.11 slaves. (Master will be upgraded to 5.5.11 soon)

The problem has occurred on all slaves. The fault occurred when the slaves were version 5.5.8, and later when they were upgraded to 5.5.11 trying to resolve this issue.

One thing caught my attention; this error occurs regularly on all slaves:

110426 12:48:08 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
110426 12:48:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'svr06-bin.000059' at postion 1228148

When it actually crashes; this is the error:

110428 16:55:30 InnoDB: Assertion failure in thread 1216510272 in file /export/home/pb2/build/sb_0-3159149-1301581932.71/rpm/BUILD/mysql-5.5.11/mysql-5.5.11/storage/innobase/ibuf/ibuf0ibuf.c line 4130
InnoDB: Failing assertion: page_get_n_recs(page) > 1

My theory is that the 'lost connection during query' error causes some corruption and that the slave crashes when it is trying to access that record.

In 3 of the 6 crashes I investigated, this error was present: (5.5.8 and 5.5.11)

mysqld: /export/home/pb2/build/sb_0-2629600-1291401220.79/rpm/BUILD/mysql-5.5.8/mysql-5.5.8/mysys/my_new.cc:51: int __cxa_pure_virtual(): Assertion `! "Aborted: pure virtual method called."' failed.

Any help is very much appreciated!

Ger Apeldoorn

How to repeat:
AFAIK, it cannot be reproduced on demand.

Please, check if this problem ever happens with a newer version, 5.5.14.

Hi,

This is a production system, is there a specific bug fixed that could have resolved this situation?

Regards,
Ger.

We had the same issue on a Solaris 10 10/09 "s10x_u8wos_08a X86" system with the most recent version 5.5.15

pkginfo -l mysql
   PKGINST:  mysql
      NAME:  MySQL Community Server (GPL)
  CATEGORY:  application
      ARCH:  i86pc
   VERSION:  5.5.15
   BASEDIR:  /opt/mysql
    VENDOR:  Sun Microsystems, Inc.
    PSTAMP:  Sun Microsystems, Inc. Build Engineers
  INSTDATE:  Sep 02 2011 09:02
     EMAIL:  build@mysql.com
    STATUS:  completely installed
     FILES:     6563 installed pathnames
                 144 directories
                  89 executables
             2668604 blocks used (approx)

110912 21:14:38 [Warning] Slave: Got error 10000 'Error on remote system: 1205: Lock wait timeout exceeded; try restarting transaction' from FEDERATED Error_code: 1296
110912 21:14:38 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.001374' position 308783542
110912 21:15:00 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.001374' at position 308783542, relay log './relay-bin.000720' position: 40348457
110915  0:03:51  InnoDB: Assertion failure in thread 4 in file ibuf0ibuf.c line 4185
InnoDB: Failing assertion: page_get_n_recs(page) > 1
InnoDB: We intentionally generate a memory trap.

We set the slave_net_timeout to 30 seconds and additionally installed some triggers. Until now, this bug does not occur on machines (with the same setup) which have not installed any triggers.

The last errors i remember were always related to:

"Slave: Got error 10000 'Error on remote system: 1205: Lock wait timeout exceeded; try restarting transaction' from FEDERATED Error_code: 1296"

Looks like a duplicate of bug #61104.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".