MySQL Bugs: #43958: Assertion (transState != NULL) failed in Transaction::waitForTransaction

Bug #43958	Assertion (transState != NULL) failed in Transaction::waitForTransaction
Submitted:	30 Mar 2009 13:07	Modified:	15 May 2009 16:10
Reporter:	John Embretsen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0.11-bzr	OS:	Solaris (Solaris 10 x64)
Assigned to:	Olav Sandstå	CPU Architecture:	Any
Tags:	F_TRANSACTION

Description:
While running the transactional mysql-stress-test "stress_tx_rr" on a 4-core Solaris 10 x64 host using 50 concurrent client threads, Falcon crashed as follows:

[Falcon] Error: assertion (transState != NULL) failed at line 1026 in file Transaction.cpp

090329  0:59:38 - mysqld got signal 6 ;
(...)

Stack trace from core file (dbx):

  [11] Error::debugBreak(), line 94 in "Error.cpp"
  [12] Error::error(string = 0x17ca222 "assertion (%s) failed at line %d in file %s\n", ...), line 71 in "Error.cpp"
  [13] Error::assertionFailed(text = 0x17cdcb2 "transState != NULL", fileName = 0x17cdcc5 "Transaction.cpp", line = 1026), line 78 in "Error.cpp"
  [14] Transaction::waitForTransaction(this = 0x2b88340, transState = (nil), transId = 3239U, deadlock = 0xfffffd7ffc85fc0b), line 1026 in
"Transaction.cpp"
  [15] Transaction::waitForTransaction(this = 0x2b88340, transId = 3239U), line 1005 in "Transaction.cpp"
  [16] Table::validateAndInsert(this = 0x2976d28, transaction = 0x2b88340, record = 0x5b25310), line 3383 in "Table.cpp"
  [17] Table::update(this = 0x2976d28, transaction = 0x2b88340, orgRecord = 0x5b24f10, stream = 0x29ad660), line 3216 in "Table.cpp"  [18] StorageDatabase::updateRow(this = 0x29eaae8, storageConnection = 0x2ba4210, table = 0x2976d28, oldRecord = 0x5b24f10, stream
= 0x29ad660),
line 731 in "StorageDatabase.cpp"
  [19] StorageTable::updateRow(this = 0x29a80b0, recordNumber = 1003), line 135 in "StorageTable.cpp"
  [20] StorageInterface::update_row(this = 0x66ff908, oldData = 0x66ffc18 "^A\xf0\xf2^C", newData = 0x66ffbe0 "^A\xf0\xf2^C"), line
1236 in
"ha_falcon.cpp"
  [21] handler::ha_update_row(this = 0x66ff908, old_data = 0x66ffc18 "^A\xf0\xf2^C", new_data = 0x66ffbe0 "^A\xf0\xf2^C"), line 5530 in
"handler.cc"
  [22] write_record(thd = 0x67618f8, table = 0x67118f0, info = 0x690dd98), line 1480 in "sql_insert.cc"
  [23] select_insert::send_data(this = 0x690dd60, values = CLASS), line 3153 in "sql_insert.cc"
  [24] end_send(join = 0x7aa7a60, join_tab = 0x6911030, end_of_records = false), line 17239 in "sql_select.cc"
  [25] evaluate_join_record(join = 0x7aa7a60, join_tab = 0x6910d90, error = 0), line 16511 in "sql_select.cc"
  [26] sub_select(join = 0x7aa7a60, join_tab = 0x6910d90, end_of_records = false), line 16248 in "sql_select.cc"
  [27] do_select(join = 0x7aa7a60, fields = 0x7aad670, table = (nil), procedure = (nil)), line 15807 in "sql_select.cc"
  [28] JOIN::exec(this = 0x7aa7a60), line 2881 in "sql_select.cc"
  [29] mysql_select(thd = 0x67618f8, rref_pointer_array = 0x6763820, tables = 0x6909c68, wild_num = 0, fields = CLASS, conds = 0x690a960, og_num
= 0, order = (nil), group = (nil), having = (nil), proc_param = (nil), select_options = 3491514880ULL, result = 0x690dd60, unit = 0x67631d0,
select_lex = 0x6763638), line 3062 in "sql_select.cc"
  [30] handle_select(thd = 0x67618f8, lex = 0x6763130, result = 0x690dd60, setup_tables_done_option = 1073741824U), line 302 in "sql_select.cc"
  [31] mysql_execute_command(thd = 0x67618f8), line 3256 in "sql_parse.cc"
  [32] mysql_parse(thd = 0x67618f8, inBuf = 0x6859908 "INSERT INTO t1 (`id`, `int1`, `int1_key`, `int1_unique`,\n`int2`, `int2_key`,
`int2_unique`,\n`for_update`, `is_uncommitted`, `is_consistent`)\nSELECT src.`id`, src.`int1`, src.`int1_key`, src.`int1_unique`,\nsrc.`int2`,
src.`int2_key`, src.`int2_unique`,\nsrc.`for_update`, src.`is_uncommitted`, src.`is_consistent`\n               FROM t1 AS src\nWHERE (src.`pk`
BETWEEN 1000 AND 1049) AND (src.`id` > 0) AND (src.`is_consistent` = 1) AND (src.`int1_unique` MOD 8 = 0)\nON DUPLICATE KEY UPDATE `int1_unique`=
src.`int1" ..., length = 604U, found_semicolon = 0xfffffd7ffc863be8), line 5843 in "sql_parse.cc"
  [33] dispatch_command(command = COM_QUERY, thd = 0x67618f8, packet = 0x65a98e9 "INSERT INTO t1 (`id`, `int1`, `int1_key`,
`int1_unique`,\n`int2`, `int2_key`, `int2_unique`,\n`for_update`, `is_uncommitted`, `is_consistent`)\nSELECT src.`id`, src.`int1`,
src.`int1_key`, src.`int1_unique`,\nsrc.`int2`, src.`int2_key`, src.`int2_unique`,\nsrc.`for_update`, src.`is_uncommitted`, src.`is_consistent`\n
FROM t1 AS src\nWHERE (src.`pk` BETWEEN 1000 AND 1049) AND (src.`id` > 0) AND (src.`is_consistent` = 1) AND (src.`int1_unique` MOD 8 = 0)\nON
DUPLICATE KEY UPDATE `int1_unique`= src.`int1" ..., packet_length = 604U), line 1057 in "sql_parse.cc"
  [34] do_command(thd = 0x67618f8), line 739 in "sql_parse.cc"
  [35] handle_one_connection(arg = 0x67618f8), line 1146 in "sql_connect.cc"
  [36] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13fd7b
  [37] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff13ffb0

How to repeat:
Run a transactional workload with ~50 concurrent client threads.
Details may follow later.

Suggested fix:
Adjust code to the new TransactionState implementation, and/or fix the assertion.

Issue was observed against mysql-6.0-falcon-team branch revision christopher.powers@sun.com-20090328181017-hadhr0dt1a389301 (at 2009-03-29 around 01:00 CET).

I have reproduced this issue and obtained core file. Let me know if you need more information.

In the error log the assertion was preceeded by:

090330 16:25:47 [ERROR] Got error 123 when reading table './test/t1'

$SRC_BRANCH/extra/perror tells me what this error means:

MySQL error code 123: Someone has changed the row since it was read (while the table was locked to prevent it)

I think this is less a Falcon issue than a Falcon/handler issue.  We don't
do transactional table locking.

This crash is caused by an wrong assert added first in Transaction::waitForTransaction() as part of the implementation of the new Transaction State object (see bug#41357).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/70867

3086 Olav Sandstaa	2009-03-31
      Fix for Bug #43958 Assertion (transState != NULL) failed in Transaction::waitForTransaction
      
      Remove too strict assert on the in parameter to Transaction::waitForTransaction.
     @ storage/falcon/Transaction.cpp
        Remove too strict assert on the in parameter to Transaction::waitForTransaction.

Given your explanation and from looking at the code, I say OK to push from me

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71002

3093 Kevin Lewis	2009-04-01
      Bug#43958 - An alternative solution to deleting the assert for the existence of the TransState object is to re-organize the calls to Transaction::waitForTransaction so that they ALWAYS use a TransactionState pointer.  This can be done because the only place where the transId is sent in by itself is in Table::validateAndInsert().  This function calls waitForTransaction when a new base record is found superceding the record that was known as the prior.  This new base record MUST be a RecordVersion, so it MUST have a TransactionState pointer.  So instead of sending the TransId, the patch now sends the TransactionState pointer.  This also allows us to eliminate a costly and obnoxious search for a Transaction based on a TransId.
      
      Note that Table::validateAndInsert() holds its own useCount on the TransactionState that it gets from the new pending base record, just in case that record is rolledBack.  Although, with the new CycleManager, that rolledBack record will not be deleted until validateAndInsert() is done.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/71138

3093 Kevin Lewis 2009-04-01
Bug#43958 - An alternative solution to deleting the assert for the existence of the TransState object is to re-organize the calls to Transaction::waitForTransaction so that they ALWAYS use a TransactionState pointer. This can be done because the only place where the transId is sent in by itself is in Table::validateAndInsert(). This function calls waitForTransaction when a new base record is found superceding the record that was known as the prior. This new base record MUST be a RecordVersion, so it MUST have a TransactionState pointer. So instead of sending the TransId, the patch now sends the TransactionState pointer. This also allows us to eliminate a costly and obnoxious search for a Transaction based on a TransId.

Note that Table::validateAndInsert() holds its own useCount on the TransactionState that it gets from the new pending base record, just in case that record is rolledBack. Although, with the new CycleManager, that rolledBack record will not be deleted until validateAndInsert() is done.

In addition to waitForTransaction, the various calls to getRelativeState are also changed to use TransactionState and not use TransId. waitForTransaction does not need to increment its own useCount on the TransactionState object sent in. It either comes from validateAndInsert or getRelativeState. Table::validateAndInsert holds a useCount itself. Most of the getRelativeState calls hold a useCount on the record associated with the TransactionState, which by extention, holds the useCount on the TransactionState. An addRef is added to getrelativeState(Record*) so that waitForTransaction does not need to.

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:olav@sun.com-20090331091553-21i60f3ser2slzf0) (merge vers: 6.0.11-alpha) (pib:6)

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:kevin.lewis@sun.com-20090401193845-97k8cnd4d0zlzyll) (merge vers: 6.0.11-alpha) (pib:6)

Internal/test fix. No changelog entry required.