Bug #36289 Running iuds6.tst leads to assertion in Falcon
Submitted: 23 Apr 2008 12:52 Modified: 7 May 2009 15:54
Reporter: Hakan Küçükyılmaz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon bzr OS:Linux
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_RECORD TREE

[23 Apr 2008 12:52] Hakan Küçükyılmaz
Description:
Running iuds6.tst from system-qa is crashing Falcon.

How to repeat:
./run_systest --scenario=iuds6.tst

(print_stacktrace+0x1e) [0x77aa0e]
(handle_segfault+0x33b) [0x635edb]
/lib64/libpthread.so.0 [0x2b9623e0dc10]
/lib64/libpthread.so.0(raise+0x2d) [0x2b9623e0dabd]
(Error::error(char const*, ...)+0xec) [0x81bcdc]
(Record::~Record()+0x25) [0x844275]
(RecordVersion::~RecordVersion()+0x3b) [0x844c2b]
(Table::insert(Transaction*, Stream*)+0x1af) [0x7e001f]
(StorageTable::insert()+0x39) [0x7d4929]
(StorageInterface::write_row(unsigned char*)+0x6a) [0x7cae9a]
(handler::ha_write_row(unsigned char*)+0x6f) [0x728f5f]
(ha_partition::write_row(unsigned char*)+0xb6) [0x72c566]
(handler::ha_write_row(unsigned char*)+0x6f) [0x728f5f]
(write_record(THD*, st_table*, st_copy_info*)+0x5d) [0x6bd1ad]
(select_insert::send_data(List<Item>&)+0xaa) [0x6bda4a]
(JOIN::exec()+0x8e5) [0x6b8ea5]
(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x1b0) 
[0x6baa40]
(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x174) [0x6bb424]
(mysql_execute_command(THD*)+0x62a7) [0x64a977]
(mysql_parse(THD*, char const*, unsigned int, char const**)+0x2ed) [0x64be1d]
(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x8a7) [0x64c6d7]
(do_command(THD*)+0xc7) [0x64d2f7]
(handle_one_connection+0xf4) [0x63e584]

Suggested fix:
Don't crash.
[9 May 2008 11:43] Philip Stoev
I do not recall seeing that crash on my test runs.
[21 May 2008 15:17] Philip Stoev
This has not been observed again. I am setting to Can't Repeat and will open it if I observe it in the future.
[24 Nov 2008 15:23] Hakan Küçükyılmaz
I could verify this bug using latest sources from mysql-6.0-falcon.

Stack trace of assertion:

[Falcon] Error: assertion (!active) failed at line 164 in file Record.cpp

081120 22:30:12 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=54
max_threads=151
thread_count=53
connection_count=51
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60607 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x2aaab6746840
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4a811190 thread_stack 0x40000
(my_print_stacktrace+0x24) [0xa80b74]
(handle_segfault+0x33b) [0x64ab9b]
/lib64/libpthread.so.0 [0x2b9fce87cc10]
/lib64/libpthread.so.0(raise+0x2d) [0x2b9fce87cabd]
(Error::error(char const*, ...)+0x102) [0x8867b2]
(Record::~Record()+0x2a) [0x8b162a]
(RecordVersion::~RecordVersion()+0x3b)[0x8b206b]
(StorageDatabase::nextRow(StorageTable*, int, bool)+0x16a) [0x83979a]
(StorageInterface::rnd_next(unsigned char*)+0x79) [0x8343b9]
(ha_partition::rnd_next(unsigned char*)+0x43) [0x74e3f3]
(rr_sequential(READ_RECORD*)+0x21) [0x73d2f1]
(sub_select(JOIN*, st_join_table*, bool)+0x8c) [0x6b90ec]
 [0x6b96bd]
(JOIN::exec()+0x911) [0x6cf9e1]
(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x1b0) [0x6d15d0]
(handle_select(THD*, LEX*, select_result*,unsigned long)+0x169) [0x6d1fa9]
 [0x6560a9]
(mysql_execute_command(THD*)+0x1a21)[0x65ada1]
(mysql_parse(THD*, char const*, unsigned int, char const**)+0x22c) [0x65ef8c]
(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x908) [0x65f8a8]
(do_command(THD*)+0xc7) [0x660367]
(handle_one_connection+0xf4) [0x653574]
/lib64/libpthread.so.0 [0x2b9fce875143]
/lib64/libc.so.6(__clone+0x6d) [0x2b9fcefaa74d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x252f038 = SELECT ROUND(RAND() * MAX(i1)), MAX(i1), MIN(i1),
CONNECTION_ID()
INTO @next_val, @max_val, @min_val, @connection_id
FROM systest1.tb1_eng1
thd->thread_id=17675
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
[24 Nov 2008 21:04] Kevin Lewis
Hakan,  Can you verify if this is happening in the 6.0.8 codebase?  Just before you reported this problem on 2008-11-20, originally Bug#40902- iuds6.tst leads to assert in Falcon, I pushed a change for Bug#40893 which may have provided a mechanism for this to occur.  Please try this on a bzr tree that has that change backed out.

I have investigated this code change to see if this active flag would stay true for the lock records involved in the change, but it always seems to go false.  However, it would be nice to know if this change caused a regression, of if this assert in iuds6 has been around a while.
[25 Nov 2008 16:08] Hakan Küçükyılmaz
Kevin,

I just ran the test with latest mysql-6.0-falcon tree and your fix is _not_ in:

    if (!dup->hasRecord())
            {
            // If the record is locked or being unlocked keep looking for a dup.

            if ((dup->state == recLock) || (dup->state == recUnlocked))
                    continue;  // Next record version.
[26 Nov 2008 7:38] Hakan Küçükyılmaz
I could verify the assertion on caneland with latest mysql-6.0-falcon bzr tree.

081126  4:14:33 [ERROR] Event Scheduler: [root@localhost][systest1.ev_del_tb1_eng1] Got error 305 'record memory is exhausted' from Falcon
081126  4:14:33 [Note] Event Scheduler: [root@localhost].[systest1.ev_del_tb1_eng1] event execution failed.
(my_print_stacktrace+0x35)[0xa99915]
(handle_segfault+0x336)[0x6512f6]
/lib64/libpthread.so.0[0x2b71ab5f8c00]
/lib64/libpthread.so.0(raise+0x2d)[0x2b71ab5f8aad]
(Error::error(char const*, ...)+0xf9)[0x898e89]
(Record::~Record()+0x2b)[0x8c4b2b]
(RecordVersion::~RecordVersion()+0x3c)[0x8c55ec]
(StorageDatabase::nextRow(StorageTable*, int, bool)+0x173)[0x84a6c3]
(StorageInterface::rnd_next(unsigned char*)+0x93)[0x845013]
(ha_partition::rnd_next(unsigned char*)+0x46)[0x7598b6]
(rr_sequential(READ_RECORD*)+0x21)[0x748461]
(sub_select(JOIN*, st_join_table*, bool)+0x97)[0x6c27a7]
[0x6c2dbd]
(JOIN::exec()+0x8bb)[0x6d92fb]
(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x191)[0x6daf41]
(handle_select(THD*, LEX*, select_result*, unsigned long)+0x167)[0x6db8f7]
[0x65d5a8]
(mysql_execute_command(THD*)+0x1b98)[0x662338]
(mysql_parse(THD*, char const*, unsigned int, char const**)+0x214)[0x666884]
(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x913)[0x6671a3]
(do_command(THD*)+0xc6)[0x667c86]
(handle_one_connection+0xf4)[0x659e64]
/lib64/libpthread.so.0[0x2b71ab5f1143]
/lib64/libc.so.6(__clone+0x6d)[0x2b71abc268cd]
[Falcon] Error: assertion (!active) failed at line 164 in file Record.cpp

081126  4:14:37 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=55
max_threads=151
thread_count=52
connection_count=51
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60613 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x2aaab12405b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x49be1188 thread_stack 0x40000
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x2aaab663d0c8 is an invalid pointer
thd->thread_id=23374
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
[26 Nov 2008 9:19] Hakan Küçükyılmaz
I get the same assertion with latest mysql-6.0-falcon-team tree.
[26 Nov 2008 18:30] Kevin Lewis
Chris,  This is the assert that is being hit in the iuds6 test.

#ifdef CHECK_RECORD_ACTIVITY
	ASSERT(!active);
#endif

This is an extra flag that is just used to see if we know what is going on.  If we hit this, then there is a code path that has escaped the placements of 

#ifdef CHECK_RECORD_ACTIVITY
	record->active = false;
#endif

This may or may not be a problem.  It is triaged now as a BETA bug, but we could probably comment out 

#define CHECK_RECORD_ACTIVITY

and it will go away.  But it would be nice ot know if thisproblem is leading to a serious issue.  Please try to find the history of the record that hits this, if you can reproduce it.
[3 Jan 2009 11:31] Philip Stoev
To reproduce 100% immediately after takeoff with the Random Query Generator:

$ perl runall.pl \
  --mem \
  --rows=10 \
  --threads=16 \
  --mask=39 \
  --queries=1000000 \
  --duration=300 \
  --basedir=/build/bzr/6.0-falcon-team \
  --engine=Falcon \
  --grammar=conf/combinations.yy \
  --gendata=conf/combinations.zz \
  --reporter=ErrorLog,Backtrace \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--log-output=none

Note that the --mask parameter means that only parts of the SQL grammar will be used.
[7 Jan 2009 6:34] Kevin Lewis
I reproduced this problem several times today while running the RQG script combinations.yy.  Every time, the record with active still set to true was being released from a catch section in Table::insert or Table::update().  A newly created record was being cleaned up and released, but the active flag was not getting set to false.
[7 Jan 2009 6:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62567

2957 Kevin Lewis	2009-01-07
      Bug#36289 - The record::active flag was not getting set to false when inserts or updates failed and an eception was caught.  Before releasing a newly created record, these 5 catch{} sections should set Table::active to false.
[13 Feb 2009 7:23] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 15:54] MC Brown
Internal/test failure. No changelog entry required.