Bug #38568 | Falcon assertion in Record::release - useCount > 0 | ||
---|---|---|---|
Submitted: | 5 Aug 2008 11:47 | Modified: | 15 May 2009 13:49 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0-falcon | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_RECORD TREE, pb2, pushbuild, test failure |
[5 Aug 2008 11:47]
Philip Stoev
[20 Feb 2009 15:04]
Hakan Küçükyılmaz
Similar failure is seen with DBT2 run on lu0009: 090220 14:19:59 [Note] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld: ready for connections. Version: '6.0.10-alpha' socket: '/tmp/mysql.sock' port: 3306 'MySQL-Community-Server' [Falcon] Error: assertion (record->useCount >= 2) failed at line 3534 in file Table.cpp 090220 14:21:55 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=1048576 max_used_connections=64 max_threads=1601 thread_count=64 connection_count=64 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4935933 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x7f0170027a30 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x400530e0 thread_stack 0x40000 /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(my_print_stacktrace+0x2e) [0xb0c3de] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_segfault+0x373) [0x685c53] /lib/libpthread.so.0 [0x7f01783300f0] /lib/libpthread.so.0(raise+0x2b) [0x7f017832ffab] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Error::error(char const*, ...)+0x109) [0x8977f9] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(Table::fetchForUpdate(Transaction*, Record*, bool)+0x334) [0x8b46 f4] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageDatabase::nextIndexed(StorageTable*, void*, int, bool)+0x8 9) [0x89eab9] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::index_next(unsigned char*)+0x103) [0x8925d3] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(StorageInterface::read_range_first(st_key_range const*, st_key_ra nge const*, bool, bool)+0x74) [0x891dd4] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handler::multi_range_read_next(char**)+0x92) [0x78d132] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(QUICK_RANGE_SELECT::get_next()+0x36) [0x76eb46] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld [0x78900e] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, enum_duplicates, bool)+0x91d) [0x728c9d] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_execute_command(THD*)+0xee4) [0x697904] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x3b9) [0x69c039] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+ 0xb4c) [0x69cfbc] /data0/work/mysql/mysql-6.0-falcon/sql/mysqld(handle_one_connection+0x1a1) [0x68f931] /lib/libpthread.so.0 [0x7f01783283ea] /lib/libc.so.6(clone+0x6d) [0x7f017729acbd] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x3a480c8 = UPDATE order_line SET ol_delivery_d = current_timestamp WHERE ol_o_id = 2101 AND ol_w_id = 1 AND ol_d_id = 1 thd->thread_id=20 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. pure virtual method called terminate called without an active exception
[13 Mar 2009 15:29]
John Embretsen
Also seen when running stress_tx_rr (transactional stress test suite), using 8 concurrent threads against mysql-6.0-falcon-team branch as of 2009-03-13. 090313 16:00:26 [Note] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld: ready for connections. Version: '6.0.11-alpha-debug-log' socket: '/tmp/hvwABaaWDu/master.sock' port: 9306 Source distribution [Falcon] Error: assertion (useCount > 0) failed at line 551 in file Record.cpp 090313 16:05:45 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=10 max_threads=151 thread_count=8 connection_count=8 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60022 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0xada36638 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0xae4f0fb0 thread_stack 0x30c00 /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(my_print_stacktrace+0x32) [0x8864d95] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_segfault+0x2cd) [0x82b92b1] [0xb80cc400] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::debugBreak()+0x12) [0x85d66f4] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::error(char const*, ...)+0x98) [0x85d678e] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Error::assertionFailed(char const*, char const*, int)+0x27) [0x85d6837] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(Record::release()+0x24) [0x860f2d8] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageDatabase::nextRow(StorageTable*, int, bool)+0x17f) [0x8570cd3] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageTable::next(int, bool)+0x3a) [0x85775a0] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(StorageInterface::rnd_next(unsigned char*)+0x92) [0x85685e4] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(rr_sequential(READ_RECORD*)+0x78) [0x83fb68e] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x212) [0x83419e1] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld [0x834e3e8] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(JOIN::exec()+0x20a9) [0x83653d3] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x30f) [0x835fee8] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x1ec) [0x83656cf] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_execute_command(THD*)+0x1cbc) [0x82cc0c4] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x22f) [0x82d3bb8] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x9f2) [0x82d473a] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(do_command(THD*)+0x244) [0x82d5cfe] /home/johnemb/dev/mysql/bzr-repos/mysql-6.0-falcon-team-jsandbox/sql/mysqld(handle_one_connection+0x11d) [0x82c3074] /lib/tls/i686/cmov/libpthread.so.0 [0xb808050f] /lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7e887ee] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0xad38658 = CREATE TEMPORARY TABLE tmp2 (PRIMARY KEY (`pk`), KEY (`int1_key`), KEY (`int2_key`), UNIQUE (`int1_unique`), UNIQUE (`int2_unique`)) SELECT * from t1 thd->thread_id=2393 thd->killed=NOT_KILLED
[15 Mar 2009 12:53]
Kevin Lewis
Without line numbers, I cannot tell which call to Record::release() was the problem. This bug reflects various conditions in which the useCount seems to be unbalanced. There are two such call stacks.
[16 Mar 2009 8:08]
John Embretsen
Stack trace with line numbers from crash during stress_tx_rr test.
Attachment: gdb-stacktrace_stress_tx_rr.txt (text/plain), 33.88 KiB.
[16 Mar 2009 8:11]
John Embretsen
Here is part of the stack trace with line numbers from the latest crash/assert. For details, see the attached gdb output. Error::assertionFailed(char const*, char const*, int) storage/falcon/Error.cpp:79 Record::release() storage/falcon/Record.cpp:551 StorageDatabase::nextRow(StorageTable*, int, bool) storage/falcon/StorageDatabase.cpp:317 StorageTable::next(int, bool) storage/falcon/StorageTable.cpp:158 StorageInterface::rnd_next(unsigned char*) storage/falcon/ha_falcon.cpp:651 rr_sequential(READ_RECORD*) sql/records.cc:390 sub_select(JOIN*, st_join_table*, bool) sql/sql_select.cc:16245 do_select sql/sql_select.cc:15786 (...)
[12 Apr 2009 14:57]
Philip Stoev
Now happens more than once in the new Cycle Manager - on 2 out of 57 test runs. It also has happened with DBT2 and with workloads using all-default Falcon options, with no explicit chill/thaw settings. # 15:54:27 #7 0x00000000009831bf in Error::assertionFailed ( # 15:54:27 text=0xe7cfb5 "useCount > 0", fileName=0xe7cf80 "Record.cpp", line=542) # 15:54:27 at Error.cpp:78 # 15:54:27 #8 0x0000000000a249a8 in Record::release (this=0x2aaab4fa2328) # 15:54:27 at Record.cpp:542 # 15:54:27 #9 0x00000000009d5181 in CycleManager::cycleManager (this=0x2aaaaadee2a8) # 15:54:27 at CycleManager.cpp:124 # 15:54:27 #10 0x00000000009d540b in CycleManager::cycleManager (arg=0x2aaaaadee2a8) # 15:54:27 at CycleManager.cpp:49 # 15:54:27 #11 0x00000000009b04cf in Thread::thread (this=0x2aaaaadf8838) # 15:54:27 at Thread.cpp:167 # 15:54:27 #12 0x00000000009b06f7 in Thread::thread (parameter=0x2aaaaadf8838) # 15:54:27 at Thread.cpp:146 # 15:54:27 #13 0x0000003587a062f7 in start_thread () from /lib64/libpthread.so.0 # 15:54:27 #14 0x0000003586ed1b6d in clone () from /lib64/libc.so.6 # 15:54:30 Using host libthread_db library "/lib64/libthread_db.so.1". # 15:54:30 Core was generated by `/export/home/pb2/test/sb_1-462247-1239511953.98/mysql-6.0.11-alpha-linux-x86_64'. # 15:54:30 Program terminated with signal 6, Aborted. # 15:54:30 #0 0x0000003587a0b122 in pthread_kill () from /lib64/libpthread.so.0 Requesting re-triage to SR60BETA, I hope Kevin would agree.
[13 Apr 2009 5:12]
Kevin Lewis
Philip, Can you provide the RQG settings that cause this to fail 2 out of 57 times? This is an old bug. The code and the tests have evolved. What is the best way to reproduce it now?
[13 Apr 2009 12:06]
Philip Stoev
To reproduce: perl runall.pl --engine=Falcon --reporters=Deadlock,ErrorLog,Backtrace,Recovery,Shutdown --mysqld=--loose-falcon-lock-wait-timeout=1 --mysqld=--loose-innodb-lock-wait-timeout=1 --mysqld=--log-output=none --mysqld=--skip-safemalloc --mysqld=--transaction-isolation=SERIALIZABLE --mysqld=--falcon-page-size=4K --mem --rows=1000 --threads=4 --mysqld=--falcon-use-supernodes=0 --mask=7447 --queries=100000000 --duration=900 --basedir=/build/bzr/6.0-falcon-team --gendata=conf/combinations.zz --grammar=conf/combinations.yy Note that I have no idea why record memory gets exhausted in this test. I see no reason for that, so it may be a bug on its own.
[13 Apr 2009 15:20]
Kevin Lewis
Philip, This is the first mention of "record memory gets exhausted". Does that have anything to do with reproducing this bug?
[13 Apr 2009 15:24]
Philip Stoev
Kevin, at least some of the occurrences of "useCount > 0" happened without the "record memory is exhausted" message. Therefore, those may be two separate things. Let me know if you want a separate bug for the "record memory is exhausted" issue.
[16 Apr 2009 11:55]
Kevin Lewis
On April 13, when Philip last repeated this bug, there existed a memory leak due to a patch for bug#44161 on April 10. It was fixed on April 14.
[16 Apr 2009 12:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/72280 3128 Kevin Lewis 2009-04-16 Bug#38568 - This patch may help to fix this bug. Needs testing. All changes to Record::priorVersion are now done using COMPARE_EXCHANGE_POINTER instead of direct assignments so that we can be assured that a pointer was not lost. If this fails, there are several new FATAL errors, which is better than continuing with a race condition. The only direct assignments left are in the RecordVersion constructors. When pruning, a call is made to clearPriorVersion and the following code may or may not actually prune those records. If not, they need to be re-attached with setPriorVersion. That was a possible but very rare memory leak. In clearPriorVersion, instead of crashing with FATAL if the CAS fails, it will just return NULL so the pruning for that record will be skipped. This change in behavior might prevent a race condition.
[23 Apr 2009 7:22]
Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090423071213-afmyrzvolemph7mz) (version source revid:hky@sun.com-20090421195958-j33v1cuo3yer9niu) (merge vers: 6.0.11-alpha) (pib:6)
[28 Apr 2009 17:14]
Kevin Lewis
Philip Stoev reports that this bug no longer occurs, after a weekend battery of tests.
[1 May 2009 18:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73244 2703 Kevin Lewis 2009-05-01 Bug#38568 continued; This is only a cleanup patch to the previous patch which uses COMPARE_EXCHANGE_POINTER on all calls that update RecordVersion::priorVersion. Jim Starkey originally intended the priorRecord chain to be accessed in a completely non-blocking manner. This use of CAS should be unneccessary, in theory. It is used in order to try and catch any conflicting code path in which two separate threads may want to update a priorVersion pointer simultaneously, or nearly so on separate CPUs. So these CAS assignments are enclosed in a new macro called USE_CAS_FOR_PRIOR_VERSION in this patch. If there is no perceptible performance cost with this, the CAS code may become permanent and this patch can be undone. Basically, this patch allows one to revert to the old direct assignments of priorVersion in order to test the performance. That is done by commenting out #define USE_CAS_FOR_PRIOR_VERSION
[15 May 2009 13:49]
MC Brown
Internal/test failure. No changelog entry required.