MySQL Bugs: #38568: Falcon assertion in Record::release

Bug #38568	Falcon assertion in Record::release - useCount > 0
Submitted:	5 Aug 2008 11:47	Modified:	15 May 2009 13:49
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0-falcon	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_RECORD TREE, pb2, pushbuild, test failure

Description:
When executing and insert/update scenario with chill_threshold=8K, Falcon asserted as follows:

#0  0x085b2241 in Record::release (this=0xa61cfc98) at Record.cpp:525
#1  0x0853043c in Table::fetchForUpdate (this=0xb7452e98, transaction=0xb7275940, source=0xa61cfc98, usingIndex=true) at Table.cpp:3539
#2  0x0851b925 in StorageDatabase::nextIndexed (this=0xb70dc130, storageTable=0xb4ec1ce0, recordBitmap=0xb7226c60, recordNumber=1, lockForUpdate=true)
    at StorageDatabase.cpp:428
#3  0x08521e67 in StorageTable::nextIndexed (this=0xb4ec1ce0, recordNumber=0, lockForUpdate=true) at StorageTable.cpp:182
#4  0x08514fcc in StorageInterface::index_next (this=0xb81dd98, buf=0xb81df60 "") at ha_falcon.cpp:1597
#5  0x08513bf9 in StorageInterface::read_range_first (this=0xb81dd98, start_key=0x0, end_key=0xb81de3c, eq_range_arg=false, sorted=false)
    at ha_falcon.cpp:1565
#6  0x083ce5fa in handler::multi_range_read_next (this=0xb81dd98, range_info=0xa729e170) at handler.cc:4258
#7  0x083aeeef in QUICK_RANGE_SELECT::get_next (this=0xa6e9fc00) at opt_range.cc:8518
#8  0x083c8897 in rr_quick (info=0xa729e33c) at records.cc:298
#9  0x0835fc35 in mysql_update (thd=0xa7f982c0, table_list=0xb678b00, fields=@0xa7f99758, values=@0xa7f99988, conds=0xb679188, order_num=0, order=0x0,
    limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:573
#10 0x082bf817 in mysql_execute_command (thd=0xa7f982c0) at sql_parse.cc:2846
#11 0x082c63e1 in mysql_parse (thd=0xa7f982c0, inBuf=0xb6787e8 "UPDATE B AS X SET `pk` = 164 WHERE X . `int_key` < 197", length=54,
    found_semicolon=0xa729f270) at sql_parse.cc:5809
#12 0x082c6e55 in dispatch_command (command=COM_QUERY, thd=0xa7f982c0, packet=0xa7f9a619 "UPDATE B AS X SET `pk` = 164 WHERE X . `int_key` < 197",
    packet_length=54) at sql_parse.cc:1050
#13 0x082c8145 in do_command (thd=0xa7f982c0) at sql_parse.cc:723
#14 0x082b5e23 in handle_one_connection (arg=0xa7f982c0) at sql_connect.cc:1153
#15 0x0057d32f in start_thread () from /lib/libpthread.so.0
#16 0x0049a27e in clone () from /lib/libc.so.6

520             INTERLOCKED_INCREMENT (useCount);
521     }
522
523     void Record::release()
524     {
525             ASSERT (useCount > 0); <<<<<<<<<<<<<<<< HERE
526
527             if (INTERLOCKED_DECREMENT (useCount) == 0)
528                     {
529                     /*** debugging only

(gdb) print * this
Cannot access memory at address 0xa61cfc98

How to repeat:
If this is repeatable, a test case will be provided.

Suggested fix:
It appears that in StorageDatabase::nextIndexed(), a bad pointer to a Record was created. This pointer was passed down the call stack a few levels before the issue was detected. Maybe some extra assertions would have helped catch this closer to the source.

Similar failure is seen with DBT2 run on lu0009:

090220 14:19:59 [Note] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld: ready for connections.
Version: '6.0.10-alpha'  socket: '/tmp/mysql.sock'  port: 3306  'MySQL-Community-Server'
[Falcon] Error: assertion (record->useCount >= 2) failed at line 3534 in file Table.cpp

090220 14:21:55 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388608
read_buffer_size=1048576
max_used_connections=64
max_threads=1601
thread_count=64
connection_count=64
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4935933 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x7f0170027a30
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x400530e0 thread_stack 0x40000
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(my_print_stacktrace+0x2e) [0xb0c3de]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_segfault+0x373) [0x685c53]
/lib/libpthread.so.0 [0x7f01783300f0]
/lib/libpthread.so.0(raise+0x2b) [0x7f017832ffab]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Error::error(char const*, ...)+0x109) [0x8977f9]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Table::fetchForUpdate(Transaction*, Record*, bool)+0x334) [0x8b46
f4]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageDatabase::nextIndexed(StorageTable*, void*, int, bool)+0x8
9) [0x89eab9]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::index_next(unsigned char*)+0x103) [0x8925d3]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::read_range_first(st_key_range const*, st_key_ra
nge const*, bool, bool)+0x74) [0x891dd4]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handler::multi_range_read_next(char**)+0x92) [0x78d132]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(QUICK_RANGE_SELECT::get_next()+0x36) [0x76eb46]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld [0x78900e]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*,
unsigned int, st_order*, unsigned long long, enum_duplicates, bool)+0x91d) [0x728c9d]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_execute_command(THD*)+0xee4) [0x697904]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x3b9)
 [0x69c039]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+
0xb4c) [0x69cfbc]
/data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_one_connection+0x1a1) [0x68f931]
/lib/libpthread.so.0 [0x7f01783283ea]
/lib/libc.so.6(clone+0x6d) [0x7f017729acbd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x3a480c8 = UPDATE order_line
SET ol_delivery_d = current_timestamp
WHERE ol_o_id = 2101
  AND ol_w_id = 1
  AND ol_d_id = 1
thd->thread_id=20
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
pure virtual method called
terminate called without an active exception

Also seen when running stress_tx_rr (transactional stress test suite), using 8 concurrent threads against mysql-6.0-falcon-team branch as of 2009-03-13.

090313 16:00:26 [Note] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld: ready for connections.
Version: '6.0.11-alpha-debug-log'  socket: '/tmp/hvwABaaWDu/master.sock'  port: 9306  Source distribution
[Falcon] Error: assertion (useCount > 0) failed at line 551 in file Record.cpp

090313 16:05:45 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=10
max_threads=151
thread_count=8
connection_count=8
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60022 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0xada36638
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xae4f0fb0 thread_stack 0x30c00
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(my_print_stacktrace+0x32) [0x8864d95]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_segfault+0x2cd) [0x82b92b1]
[0xb80cc400]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::debugBreak()+0x12) [0x85d66f4]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::error(char const*, ...)+0x98) [0x85d678e]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::assertionFailed(char const*, char const*, int)+0x27) [0x85d6837]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Record::release()+0x24) [0x860f2d8]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageDatabase::nextRow(StorageTable*, int, bool)+0x17f) [0x8570cd3]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageTable::next(int, bool)+0x3a) [0x85775a0]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageInterface::rnd_next(unsigned char*)+0x92) [0x85685e4]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(rr_sequential(READ_RECORD*)+0x78) [0x83fb68e]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x212) [0x83419e1]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld [0x834e3e8]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(JOIN::exec()+0x20a9) [0x83653d3]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x30f) [0x835fee8]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x1ec) [0x83656cf]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_execute_command(THD*)+0x1cbc) [0x82cc0c4]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x22f) [0x82d3bb8]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x9f2) [0x82d473a]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(do_command(THD*)+0x244) [0x82d5cfe]
/home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_one_connection+0x11d) [0x82c3074]
/lib/tls/i686/cmov/libpthread.so.0 [0xb808050f]
/lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7e887ee]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0xad38658 = CREATE TEMPORARY TABLE tmp2 (PRIMARY KEY (`pk`), KEY (`int1_key`), KEY (`int2_key`), UNIQUE (`int1_unique`), UNIQUE (`int2_unique`)) SELECT * from t1
thd->thread_id=2393
thd->killed=NOT_KILLED

Without line numbers, I cannot tell which call to Record::release() was the problem. This bug reflects various conditions in which the useCount seems to be unbalanced.  There are two such call stacks.

Stack trace with line numbers from crash during stress_tx_rr test.

Attachment: gdb-stacktrace_stress_tx_rr.txt (text/plain), 33.88 KiB.

Here is part of the stack trace with line numbers from the latest crash/assert. For details, see the attached gdb output.

Error::assertionFailed(char const*, char const*, int)
 storage/falcon/Error.cpp:79
Record::release()
 storage/falcon/Record.cpp:551
StorageDatabase::nextRow(StorageTable*, int, bool)
 storage/falcon/StorageDatabase.cpp:317
StorageTable::next(int, bool)
 storage/falcon/StorageTable.cpp:158
StorageInterface::rnd_next(unsigned char*)
 storage/falcon/ha_falcon.cpp:651
rr_sequential(READ_RECORD*)
 sql/records.cc:390
sub_select(JOIN*, st_join_table*, bool)
 sql/sql_select.cc:16245
do_select
 sql/sql_select.cc:15786
(...)

Now happens more than once in the new Cycle Manager - on 2 out of 57 test runs. It also has happened with DBT2 and with workloads using all-default Falcon options, with no explicit chill/thaw settings.

# 15:54:27 #7  0x00000000009831bf in Error::assertionFailed (
# 15:54:27     text=0xe7cfb5 "useCount > 0", fileName=0xe7cf80 "Record.cpp", line=542)
# 15:54:27     at Error.cpp:78
# 15:54:27 #8  0x0000000000a249a8 in Record::release (this=0x2aaab4fa2328)
# 15:54:27     at Record.cpp:542
# 15:54:27 #9  0x00000000009d5181 in CycleManager::cycleManager (this=0x2aaaaadee2a8)
# 15:54:27     at CycleManager.cpp:124
# 15:54:27 #10 0x00000000009d540b in CycleManager::cycleManager (arg=0x2aaaaadee2a8)
# 15:54:27     at CycleManager.cpp:49
# 15:54:27 #11 0x00000000009b04cf in Thread::thread (this=0x2aaaaadf8838)
# 15:54:27     at Thread.cpp:167
# 15:54:27 #12 0x00000000009b06f7 in Thread::thread (parameter=0x2aaaaadf8838)
# 15:54:27     at Thread.cpp:146
# 15:54:27 #13 0x0000003587a062f7 in start_thread () from /lib64/libpthread.so.0
# 15:54:27 #14 0x0000003586ed1b6d in clone () from /lib64/libc.so.6
# 15:54:30 Using host libthread_db library "/lib64/libthread_db.so.1".
# 15:54:30 Core was generated by `/export/home/pb2/test/sb_1-462247-1239511953.98/mysql-6.0.11-alpha-linux-x86_64'.
# 15:54:30 Program terminated with signal 6, Aborted.
# 15:54:30 #0  0x0000003587a0b122 in pthread_kill () from /lib64/libpthread.so.0

Requesting re-triage to SR60BETA, I hope Kevin would agree.

Philip,  Can you provide the RQG settings that cause this to fail 2 out of 57 times?  This is an old bug.  The code and the tests have evolved.  What is the best way to reproduce it now?

To reproduce:

perl runall.pl    --engine=Falcon   --reporters=Deadlock,ErrorLog,Backtrace,Recovery,Shutdown   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc   --mysqld=--transaction-isolation=SERIALIZABLE --mysqld=--falcon-page-size=4K --mem --rows=1000 --threads=4 --mysqld=--falcon-use-supernodes=0   --mask=7447   --queries=100000000   --duration=900   --basedir=/build/bzr/6.0-falcon-team  --gendata=conf/combinations.zz  --grammar=conf/combinations.yy 

Note that I have no idea why record memory gets exhausted in this test. I see no reason for that, so it may be a bug on its own.

Philip, This is the first mention of "record memory gets exhausted".  Does that have anything to do with reproducing this bug?

Kevin, at least some of the occurrences of "useCount > 0" happened without the "record memory is exhausted" message. Therefore, those may be two separate things. Let me know if you want a separate bug for the "record memory is exhausted" issue.

On April 13, when Philip last repeated this bug, there existed a memory leak due to a patch for bug#44161 on April 10.  It was fixed on April 14.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72280

3128 Kevin Lewis	2009-04-16
      Bug#38568 - This patch may help to fix this bug.  Needs testing.
      
      All changes to Record::priorVersion are now done using 
      COMPARE_EXCHANGE_POINTER instead of direct assignments so that 
      we can be assured that a pointer was not lost.  If this fails, there are several 
      new FATAL errors, which is better than continuing with a race condition.  
      The only direct assignments left are in the RecordVersion constructors.
      
      When pruning, a call is made to clearPriorVersion and the following code 
      may or may not actually prune those records.  If not, they need to be 
      re-attached with setPriorVersion.  That was a possible but very rare 
      memory leak.
      
      In clearPriorVersion, instead of crashing with FATAL if the CAS fails,
      it will just return NULL so the pruning for that record will be skipped.
      This change in behavior might prevent a race condition.

Pushed into 6.0.11-alpha (revid:alik@sun.com-20090423071213-afmyrzvolemph7mz) (version source revid:hky@sun.com-20090421195958-j33v1cuo3yer9niu) (merge vers: 6.0.11-alpha) (pib:6)

Philip Stoev reports that this bug no longer occurs, after a weekend battery of tests.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73244

2703 Kevin Lewis	2009-05-01
      Bug#38568 continued;  This is only a cleanup patch to the previous patch which uses COMPARE_EXCHANGE_POINTER on all calls that update RecordVersion::priorVersion.  Jim Starkey originally intended the priorRecord chain to be accessed in a completely non-blocking manner.  This use of CAS should be unneccessary, in theory.  It is used in order to try and catch any conflicting code path in which two separate threads may want to update a priorVersion pointer simultaneously, or nearly so on separate CPUs.  So these CAS assignments are enclosed in a new macro called  USE_CAS_FOR_PRIOR_VERSION  in this patch.  If there is no perceptible performance cost with this, the CAS code may become permanent and this patch can be undone.
      
      Basically, this patch  allows one to revert to the old direct assignments of priorVersion in order to test the performance.  That is done by commenting out 
      #define USE_CAS_FOR_PRIOR_VERSION

Internal/test failure. No changelog entry required.