Bug #2903 Assertion failure in file ha_innodb.cc line 2020
Submitted: 20 Feb 2004 6:59 Modified: 17 Nov 2005 23:09
Reporter: Bradford Folkens Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:4.0.17-standard-pc-linux-i686 OS:Linux (Linux - RH9)
Assigned to: Assigned Account CPU Architecture:Any

[20 Feb 2004 6:59] Bradford Folkens
Description:
Mysql fails with an assertion failure from ha_innodb.cc line 2020 at different points in the iterative process described below, i.e. it doesn't fail at the same point in the number of interations (this, obviously would make debugging it a lot easier, sorry ;)

Any help you may have would be so very appreciated, I recently upgraded to 4.0.17 in hopes this would fix it, but it's still a problem.

Thank you,

Brad Folkens
bfolkens@citytechusa.com

How to repeat:
tblTmp_Sum is a temporary table with 2 columns (approx. 300,000 records):
IParent int(24), Rank double(8,6)

tblRank is an InnoDB table with 2 columns (approx. 750,000 records):
Parent int(24), Rank double(8,6)

After repeated:

SELECT DISTINCT IParent FROM tblTmp_Sum LIMIT 156000,157000;
(where limit increases a specified amount each time)

and

UPDATE tblRank a, tblTmp_Sum b  SET a.Rank=(0.15 + 0.85 * b.Rank)   WHERE a.Parent=b.IParent;

Suggested fix:
Unknown.
[20 Feb 2004 6:59] Bradford Folkens
MySQL error log

Attachment: mysql.err (text/plain), 2.07 KiB.

[20 Feb 2004 7:00] Bradford Folkens
(modified version)
[20 Feb 2004 9:19] Heikki Tuuri
Hi!

Please send to me the FULL .err log from mysqld.

Is it the assertion below which fails?

/************************************************************************
Stores a row in an InnoDB database, to the table specified in this
handle. */

int
ha_innobase::write_row(
/*===================*/
                                /* out: error code */
        mysql_byte*     record) /* in: a row in MySQL format */
{
        row_prebuilt_t* prebuilt = (row_prebuilt_t*)innobase_prebuilt;
        int             error;
        longlong        auto_inc;
        longlong        dummy;
        ibool           incremented_auto_inc_for_stat = FALSE;
        ibool           incremented_auto_inc_counter = FALSE;
        ibool           skip_auto_inc_decr;

        DBUG_ENTER("ha_innobase::write_row");

        ut_a(prebuilt->trx ==
                (trx_t*) current_thd->transaction.all.innobase_tid);

I do not understand how the assertion can come in ::write_row(), which is basically an INSERT. You only perform SELECT and a multi-table UPDATE? Nothing else?

Regards,

Heikki
[20 Feb 2004 9:45] Bradford Folkens
Well first a SQL runs:

CREATE TEMPORARY TABLE tblTmp_Outbound
 (PRIMARY KEY (Parent)) TYPE=InnoDB 
 SELECT a.Parent AS Parent, COUNT(a.Link) AS SumOutbound FROM tblLink a 
	GROUP BY a.Parent;

And after that it's runs the select/update pair at every iteration through a loop:

SELECT SUM(Rank / SumOutbound) AS RankSum FROM tblTmp_Sum WHERE IParent=<variable>

UPDATE tblTmp_Sum SET Rank=<variable> WHERE IParent=<variable>

(where <variable> is part of the current iteration set)

Once that's finished it does this:

UPDATE tblRank a, tblTmp_Sum b
  SET a.Rank=(0.15 + 0.85 * b.Rank) 
  WHERE a.Parent=b.IParent;

I realize in my earlier post I put the latest UPDATE in the iterative set, the above comment is the correct, expanded version.  In regards to your comment though, you are correct that this is a relatively simple statement and that there is no INSERT statement in the iterations.  The only INSERT that may occur (you'd be better to answer this) is in the initial CREATE TABLE.  The application is using a single thread to do these operations, so I know there aren't >1 thread trying to use the create table statement or the UPDATE / SELECT loop.
[23 Feb 2004 2:26] Heikki Tuuri
Hi!

The resolved stack dump looks sensible.

Regards,

Heikki

0x8070610 handle_segfault + 420
0x8289628 pthread_sighandler + 184
0x80ca56b write_row__11ha_innobasePc + 239
0x80a3b95 write_record__FP8st_tableP12st_copy_info + 513
0x80a5e5d send_data__13select_createRt4List1Z4Item + 73
0x809e00f end_send__FP4JOINP13st_join_tableb + 123
0x809cfa7 sub_select__FP4JOINP13st_join_tableb + 255

...........

InnoDB: Failing assertion: prebuilt->trx == (trx_t*) current_thd->transaction.all.innobase_tid
InnoDB: We intentionally generate a memory trap.
InnoDB: Send a detailed bug report to mysql@lists.mysql.com mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.

key_buffer_size=402653184
read_buffer_size=258048
max_used_connections=105
max_connections=4000
threads_connected=25
It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3449184 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.

thd=0x79c14a60
Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...
Cannot determine thread, fp=0xbfbdecb8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8070610
0x8289628
0x80ca56b
0x80a3b95
0x80a5e5d
0x809e00f
0x809cfa7
New value of fp=0xbfb5f174 failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=96298
[23 Feb 2004 8:25] Heikki Tuuri
Hi!

We need a script + the tables to repeat this. I tested with very small tables, and it looks like it must be the statement

CREATE TABLE ... SELECT ...

that causes the assertion to fail, because in the multi-table UPDATE there are no INSERTs to InnoDB type tables.

The SQL statements you give below apparently have typing errors, because the column names and the table names do not match.

This might be memory corruption: maybe the trx object of the table handle is wiped over.

Regards,

Heikki

mysql> create temporary table tmpoutb(primary key (Parent)) type=innodb select o
utb.a as Parent, count(outb.b) as SumOutbound from outb group by outb.a;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select sum(SumOutbound) from tmpoutb where Parent = 2;
+------------------+
| sum(SumOutbound) |
+------------------+
|                3 |
+------------------+
1 row in set (0.00 sec)

mysql> update tmpoutb set SumOutbound = 15 where Parent = 2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select sum(SumOutbound) from tmpoutb where Parent = 3;
+------------------+
| sum(SumOutbound) |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

mysql> update tmpoutb set SumOutbound = 15 where Parent = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update tblRank, tmpoutb set tblRank.Rank = tmpoutb.SumOutbound where tblR
ank.Parent = tmpoutb.Parent;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 3  Changed: 2  Warnings: 0

mysql>
mysql>
mysql> drop table tmpoutb;
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql>
mysql>
mysql> create temporary table tmpoutb(primary key (Parent)) type=innodb select o
utb.a as Parent, count(outb.b) as SumOutbound from outb group by outb.a;
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql>

Calling ha_innobase::external_lock() for test/outb
handle 84fcc08
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::write_row() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::write_row() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::write_row() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for test/outb
handle 84fcc08
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230

Calling ha_innobase::external_lock() for test/tblRank
handle 8503970
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for test/tblRank
handle 8503970
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_3
handle 8503230
Calling ha_innobase::external_lock() for test/outb
handle 84fcc08
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_4
handle 8502eb0
Calling ha_innobase::write_row() for tmp/#sql3eb_1_4
handle 8502eb0
Calling ha_innobase::write_row() for tmp/#sql3eb_1_4
handle 8502eb0
Calling ha_innobase::write_row() for tmp/#sql3eb_1_4
handle 8502eb0
Calling ha_innobase::external_lock() for tmp/#sql3eb_1_4
handle 8502eb0
Calling ha_innobase::external_lock() for test/outb
handle 84fcc08
[23 Feb 2004 20:22] Bradford Folkens
Just got another signal 11 - wondered if this is related, this time I'm doing really simple queries repeatedly (no create tables like before).

You'll note though, that the error log didn't say anything about an assertion failure this time...

Queries:

SELECT tblFoodDes.NDB_No,
tblFoodDes.Shrt_Desc,
tblFoodDes.Long_Desc,
tblFoodDes.ComName,
tblFoodDes.ManufacName,
tblFoodDes.SciName
FROM tblFoodDes
WHERE tblFoodDes.FdGrp_Cd='01100'
ORDER BY tblFoodDes.Long_Desc ASC;

Followed by many

SELECT
tblNutData.Nutr_Val,
tblNutrDef.Units,
tblNutrDef.Tagname,
tblNutrDef.NutrDesc,
tblNutrDef.Num_Dec,
tblWeight.Amount,
tblWeight.Msre_Desc,
tblWeight.Gm_Wgt
FROM tblFoodDes,tblNutData,tblNutrDef,tblWeight
WHERE tblFoodDes.NDB_No='A1001'
  AND tblNutData.Nutr_No=tblNutrDef.Nutr_No
  AND tblNutData.NDB_No=tblFoodDes.NDB_No
  AND tblWeight.NDB_No=tblFoodDes.NDB_No
ORDER BY tblWeight.Seq,tblNutrDef.SR_Order ASC;
[24 Feb 2004 7:34] Heikki Tuuri
Hi!

It might be even faulty RAM, since you got another crash. Best to run a RAM test.

I added now diagnostic code to 4.0.19 to track this problem if the assertion fails again.

Regards,

Heikki
[24 Feb 2004 9:38] Bradford Folkens
Heikki,

Thank you very much for your efforts on this - Since the errors seem to be skipping around now I will definitely start pulling RAM.

Brad
[18 Mar 2004 8:08] Bradford Folkens
Close - Turned out to be a memory issue in the machine.  Thank you so much for your help though!
[15 Oct 2005 8:31] Rami Rabinovich
I have similiar problem, I do not know how to trace it - the D/b is huge, concurrent 370 connections. Computer has 2Gb of Ram and 3Gb of swap. 2 CPU with HT enabled.
Kernel: Linux gamma 2.4.26 #2 SMP Sun Aug 15 13:29:51 EDT 2004 i686 GNU/Linux
Mysql : stable debian 4.1.11-Debian_4

Oct 14 19:42:45 localhost mysqld[10003]: InnoDB: Error: the transaction object for the table handle is at
Oct 14 19:42:46 localhost mysqld[10003]: InnoDB: 0x2c92bc68, but for the current thread it is at 0x2c9ad468
Oct 14 19:42:46 localhost mysqld[10003]: InnoDB: Dump of 200 bytes around prebuilt:  len 200; hex 043ca33f00000000010000000800000000000000
00020000000000001834962c01000000b307952d6d7973716c2e63007c01000000000000000000000000000028188d2c28a49e2cc001000000000000000000000001000040
0000000000000010000000ef6fae0468f2722c68bc922c0100000001000000000000000000000000000000000000006f72756d000000006f7374696e672e7068703f6d0000
000028188d2c00000000000000000000000000000000000000000000000068bb972c68c4a02c00000000; asc  < ?                         4 ,       -mysql.c
|               (  ,(  ,                @            o  h r,h  ,                        orum    osting.php?m    (  ,
  h  ,h  ,    ;
Oct 14 19:42:46 localhost mysqld[10003]: InnoDB: Dump of 200 bytes around transaction.all:  len 200; hex 000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000068d49a2c0100000000000000000000004d4b5a0800000000000000000000000000000000010000000000000030ecf45000000
00030ecf45020000000e41f00000400000000000000109c15080000000000000000000000000000000000000000; asc
                                                               h  ,            MKZ                         0  P    0  P
                     ;
Oct 14 19:42:49 localhost mysqld[10003]: mysqld got signal 11;
Oct 14 19:42:49 localhost mysqld[10003]: This could be because you hit a bug. It is also possible that this binary
Oct 14 19:42:49 localhost mysqld[10003]: or one of the libraries it was linked against is corrupt, improperly built,
Oct 14 19:42:49 localhost mysqld[10003]: or misconfigured. This error can also be caused by malfunctioning hardware.
Oct 14 19:42:49 localhost mysqld[10003]: We will try our best to scrape up some info that will hopefully help diagnose
Oct 14 19:42:49 localhost mysqld[10003]: the problem, but since we have already crashed, something is definitely wrong
Oct 14 19:42:49 localhost mysqld[10003]: and this may fail.
Oct 14 19:42:49 localhost mysqld[10003]:
Oct 14 19:42:49 localhost mysqld[10003]: key_buffer_size=16777216
Oct 14 19:42:49 localhost mysqld[10003]: read_buffer_size=131072
Oct 14 19:42:49 localhost mysqld[10003]: max_used_connections=392
Oct 14 19:42:49 localhost mysqld[10003]: max_connections=1024
Oct 14 19:42:49 localhost mysqld[10003]: threads_connected=390
Oct 14 19:42:49 localhost mysqld[10003]: It is possible that mysqld could use up to
Oct 14 19:42:49 localhost mysqld[10003]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 2244600 K
Oct 14 19:42:49 localhost mysqld[10003]: bytes of memory
Oct 14 19:42:49 localhost mysqld[10003]: Hope that's ok; if not, decrease some variables in the equation.
Oct 14 19:42:49 localhost mysqld[10003]:
Oct 14 19:42:49 localhost mysqld[10003]: You seem to be running 32-bit Linux and have 390 concurrent connections.
Oct 14 19:42:49 localhost mysqld[10003]: If you have not changed STACK_SIZE in LinuxThreads and built the binary
Oct 14 19:42:49 localhost mysqld[10003]: yourself, LinuxThreads is quite likely to steal a part of the global heap for
Oct 14 19:42:49 localhost mysqld[10003]: the thread stack. Please read http://www.mysql.com/doc/en/Linux.html
Oct 14 19:42:49 localhost mysqld[10003]:
Oct 14 19:42:49 localhost mysqld[10003]: thd=(nil)
Oct 14 19:42:49 localhost mysqld[10003]: Attempting backtrace. You can use the following information to find out
Oct 14 19:42:49 localhost mysqld[10003]: where mysqld died. If you see no messages after this, something went
Oct 14 19:42:49 localhost mysqld[10003]: terribly wrong...
Oct 14 19:42:49 localhost mysqld[10003]: Cannot determine thread, fp=0x5f7fe0b8, backtrace may not be correct.
Oct 14 19:42:49 localhost mysqld[10003]: Stack range sanity check OK, backtrace follows:
Oct 14 19:42:49 localhost mysqld[10003]: 0x81893bf
Oct 14 19:42:49 localhost mysqld[10003]: 0x25647825
Oct 14 19:42:49 localhost mysqld[10003]: 0x84b9e9e
Oct 14 19:42:49 localhost mysqld[10003]: 0x84b9ef7
Oct 14 19:42:49 localhost mysqld[10003]: 0x822d355
Oct 14 19:42:49 localhost mysqld[10003]: 0x822e7e2
Oct 14 19:42:49 localhost mysqld[10003]: 0x819bea5
Oct 14 19:42:49 localhost mysqld[10003]: 0x819bd46
Oct 14 19:42:49 localhost mysqld[10003]: 0x819b4a8
Oct 14 19:42:49 localhost mysqld[10003]: 0x25641e51
Oct 14 19:42:49 localhost mysqld[10003]: 0x2588c92a
Oct 14 19:42:49 localhost mysqld[10003]: New value of fp=(nil) failed sanity check, terminating stack trace!
Oct 14 19:42:49 localhost mysqld[10003]: Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on h
ow to resolve the stack trace. Resolved
Oct 14 19:42:49 localhost mysqld[10003]: stack trace is much more helpful in diagnosing the problem, so please do
Oct 14 19:42:49 localhost mysqld[10003]: resolve it
Oct 14 19:42:49 localhost mysqld[10003]: The manual page at http://www.mysql.com/doc/en/Crashing.html contains
Oct 14 19:42:49 localhost mysqld[10003]: information that should help you find out what is causing the crash.
[16 Oct 2005 9:42] Heikki Tuuri
Rami,

please resolve the stack dump.

Does it crash often? Is the resolved stack dump always the same?

Regards,

Heikki
[16 Oct 2005 9:49] Heikki Tuuri
Hi!

The printout comes from this.

Quite a few bugs in MySQL locking of tables have been fixed since 4.0.17. You could test the latest 4.0 version.

Regards,

Heikki

int
ha_innobase::write_row(
/*===================*/
                                /* out: error code */
        mysql_byte*     record) /* in: a row in MySQL format */
{
        row_prebuilt_t* prebuilt = (row_prebuilt_t*)innobase_prebuilt;
        int             error;
        longlong        auto_inc;
        longlong        dummy;
        ibool           incremented_auto_inc_for_stat = FALSE;
        ibool           incremented_auto_inc_counter = FALSE;
        ibool           skip_auto_inc_decr;

        DBUG_ENTER("ha_innobase::write_row");

        if (prebuilt->trx !=
                        (trx_t*) current_thd->transaction.all.innobase_tid) {
                fprintf(stderr,
"InnoDB: Error: the transaction object for the table handle is at\n"
"InnoDB: %p, but for the current thread it is at %p\n",
                        prebuilt->trx,
                        current_thd->transaction.all.innobase_tid);
                fputs("InnoDB: Dump of 200 bytes around prebuilt: ", stderr);
                ut_print_buf(stderr, ((const byte*)prebuilt) - 100, 200);
                fputs("\n"
                        "InnoDB: Dump of 200 bytes around transaction.all: ",
                        stderr);
                ut_print_buf(stderr,
                        ((byte*)(&(current_thd->transaction.all))) - 100, 200);
                putc('\n', stderr);
                ut_error;
        }
[18 Nov 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".